The bilinual Project also rebuild ebooks (modern HTML, PDF and open ePUB format) with better quality and readability( while it is not its prime goal). Take a look at one example here:
https://www.bilinual.com/book/18043sven/sv/en#line=68&lpp=23 https://www.bilinual.com/download/18043sven-sv-en.pdf https://www.bilinual.com/download/18043sven-sv-en.epub
https://dave.autonoma.ca/blog/2020/04/11/project-gutenberg-p...
fwiw, Gitenberg has been around at least since 2012.
The Girl from Alsace in Gitenberg: https://www.gitenberg.org/book/35926
The Girl from Alsace in Gutenberg: http://www.gutenberg.org/ebooks/35926
The numbers are even the same which seems suspicious. Hmmm.
Public domain obviously can be ascertained, but if CC0 hasn’t been granted we rely on dated reproductions: basically a photograph of the artwork in question in a book or journal with a copyright date of 1924 or earlier.
(that’s obviously specifically a US legal reading, but SE from a legal point of view is a US project)
I think that the GITenberg collection contains all of the books in PG. At this point, the creation of new repos is automatically done when Distributed Proofreaders creates a new book in PG. Originally, I didn't include around 400 PG books due to their creators claiming copyright, and didn't include Bruce Sterling's book because he wouldn't let me re-license it creative commons rather than his pseudo-public-domain license.
Not much has been happening with GITenberg itself in the past few years. But luckily, a lot of the concepts and code are getting upstreamed into PG. Which in my opinion, is way way better.
* One of them 404ed: https://www.bilinual.com/download/30117fren-fr-en.pdf
* The other was full of problems: https://www.bilinual.com/download/16210fren-fr-en.pdf
For example, many words don't have translations at all, and those that do are often incorrect. This feels like a very rough machine translation? For example:
> et c'est surtout dans les paroisses riveraines du Saint-Laurent
You translate this
> and Ce east primarily in the · · some saint Laurence
While Google Translate gives
> and it is especially in the parishes bordering the St.Lawrence
If you're using machine translation, why not use a Google API that might give usable results at least? If that's not plausible, maybe you should try to get together a team of volunteers to manually translate these ebooks for language learners?
(I hope these suggestions are helpful, I'm not trying to be dismissive of your project.)
1- 404 issue: I implemented the PDF generation recently and I noticed that WeasyPrint has issue with html files that have too many tags (our books have around 2*number_of_words tags in them). This is not a big issue and it will be fixed soon in the next iteration.
2- Using Google API: Google APIs and other translation tools are great for translating sentences. However, the problem with use of parallel texts for language learning is our brain laziness. After few pages, our brain looses its patient to solve the translation problems (critical thinking!?) and actually learn words and structure of sentences. The focus immediately goes toward translated sentences in your native language rather than the original text.
Personally, I learn a word for a life when I slow down and think about similar words, its root, and at the end looking it up in a dictionary. The process is valuable.
3- Team of volunteers: It is easier said than done. The functionality is present but I prefer to improve the suggestion engine as much as possible before I involve volunteers. Are you interested to join?
I prefer https://www.deepl.com/translator to Google.
'hermanos' is translated 'brethren' - super-archaic.
p8, 'dedicado' in 'te has dedicado a pintar?' is translated 'hardcore'.
'sí' in 'Que sí, hombre' is translated 'do', as in do-re-mi, I guess.
p5,11-12 has 'quieres/quiero repeatedly translated as "with friends like those who needs enemies". Which is just inexplicable. I can't imagine how that would happen.
Corrupted dictionary?
..and most of the trickiest words on a page aren't translated, maybe because not in your dictionary or they have 'lo' or 'se' appended.
"I can't imagine how that would happen." : Just as a hint, click on "translations" here:
https://en.wiktionary.org/wiki/with_friends_like_these_who_n...
The difference between falsely claiming something is public domain and falsely claiming to grant a license to it under CC0 is going to be pretty minimal (and is likely result in little more than "please stop", blood and turnips and so on).