Google Books (or similar) all book scans – $200k bounty (2025)

Google Books (or similar) all book scans – $200k bounty (2025)(software.annas-archive.gl)

177 points by Cider9986 3 hours ago | 74 comments

I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.

If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.

Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)

jvm___ 1 hour ago | |

https://send.djazz.se/

This is key for getting epubs to your Kobo.

ahmedfromtunis 51 minutes ago | | |

Thanks, but I don't use e-readers as they are not available here.

I've been using MoonReader for many years now and settled on pretty good parameters that make the reading experience very comfortable on both my phone and my tablet.

pull_my_finger 43 minutes ago | | |

I don't understand what this is doing. Can't you sideload any ebook onto a kobo anyway? Never had an issue on my Clara

christofosho 1 hour ago | | |

Calibre? https://calibre-ebook.com/

Brian_K_White 3 minutes ago | | |

I don't recall ever needing anything special on my Aura H2O. It's one of the reasons I chose Kobo in the first place. Just copy any file onto it.

If you mean stripping drm I used Calibre for that but mostly I just avoid buying books with drm where possible.

andrepd 16 minutes ago | | |

Handy, but a book lover with an ereader probably already uses Calibre :)

dr_dshiv 1 hour ago |

https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.

Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…

wrsh07 41 minutes ago | |

Hey, this looks fascinating!

I can't quickly tell what all you have archived^, but I have some friends who are academic historians who might be interested in certain categories of work (and could help verify some esoteric languages) - is it possible to search by region or language?

Have you reached out to any types of historians WRT the project? It seems like some PhD students might be able to find some projects in this work etc

^ when I looked at the timeline https://sourcelibrary.org/timeline, I got an error

dr_dshiv 13 minutes ago | | |

Yes, this is designed with historians and librarians from the Embassy of the Free Mind (https://embassyofthefreemind.com) in Amsterdam, stewards of the collection of the Biblioteca Philosophica Hermetica

Please share with historian friends. I’m not great at socials or fundraising but this was really designed to support humanists. It can give DOIs for the versions of the translated books, which means they can be quoted and cited in academic papers.

Tip: Try it in Claude or Claude code (even better)! Just point it towards the source library. It can find quotes and evidence on any topic of interest. Or try the librarian — our source-grounded research agent https://sourcelibrary.org/librarian

Thanks for the feedback, I’ll fix the timeline.

sgc 21 minutes ago | |

Curious as to what your budget was to get where you are today? That's a lot of tokens. I presume you are using gemini flash?

dr_dshiv 7 minutes ago | | |

All the models used are shown with each page of translation and each book has a whole data provenance treatment.

You can add it up!

trilogic 2 hours ago |

Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.

Cider9986 53 minutes ago | |

I think Anna is behind it.

https://redlib.catsarch.com/r/Annas_Archive/comments/1f6h74r...

https://reddit.com/r/Annas_Archive/comments/1f6h74r/im_curio...

DeepYogurt 1 hour ago |

Anyone afraid of being laid off at google right now? Perhaps this is a backup :)

Cthulhu_ 1 hour ago | |

I think if you get caught exfiltrating data they'll sue you for much more than $200K.

imhoguy 1 hour ago | | |

I don't think anybody would do it purely for money. I would rather see someone who is terminally ill and decides to do some "good".

merpkz 1 hour ago | | |

Copy data into extra large capacity micro sdcard and hide it in your rubiks cube, nobody will suspect a thing

mmooss 10 minutes ago | | |

I'm sure they'd go after you, but hypothetically: What damages would they claim? They still have the data, which isn't their IP to begin with.

the_real_cher 1 hour ago | | |

If your money is in private crypto or offshore you have nothing to worry about.

hedora 2 hours ago |

I wonder how long it will be before they offer bounties for internet scrapes.

Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.

rvnx 2 hours ago | |

https://x.com/CloudflareDev/status/2031488099725754821

Well, there is this little conflict of interest

aspect0545 2 hours ago | | |

https://xcancel.com/CloudflareDev/status/2031488099725754821

bix6 2 hours ago |

Piracy / copyright predictions?

The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.

codemog 2 hours ago | |

Hopefully the guillotines. Look up how much the authors and artists who create the actual work get paid.

0x3f 1 hour ago | | |

Quite a few textbook authors I know are paid well to be part of the whole scheme (kickbacks, forced yearly repurchase for the 'online' component of books, etc). So I think it varies a lot.

specproc 2 hours ago | |

It was never sustainable, just regulatory capture by large IP owners.

Spotify, Netflix, Amazon etc provided OK value for a while, but now enshitification is biting, this is due a massive comeback.

hereme888 58 minutes ago |

The link sort of reads like people who have very easy access to the requested material. Almost like they're Google employees.

neilv 2 hours ago |

The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".

Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.

We're killing the goose that lays the eggs, for selfish gain.

anyaya1 32 minutes ago |

Does Anna's Archive use a completely different "source repository" from LibGen?

takipsizad 21 minutes ago | |

annas archive is practically a compilation from all sources possible (including libgen afaik)

stephenlf 17 minutes ago |

Anna’s archive rocks

wxw 2 hours ago |

Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

> Purchase all Library of Congress MARC datasets — $3,000 bounty

> English Wikipedia pages about relevant institutions — up to $100 per new page

> Internet Archive Digital Lending — $5000 per 1 million pdf files

> Text version of our full library — $20,000

...

Cider9986 48 minutes ago | |

Up to 500k for OPSEC failures is interesting, as well. It gives me hope that there are wealthy individuals contributing to sharing books, or many small donations.

https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

FerritMans 2 hours ago |

So AA is a front for openai?

flexagoon 48 minutes ago | |

No, but they openly make a lot of money from selling their library to AI companies. Fast enterprise access to Anna's Archive starts at $100.000

650REDHAIR 1 hour ago | |

How did you come to that conclusion?

awakeasleep 1 hour ago | |

the bounty would be a bit higher with openAI money behind it

OrangeDelonge 2 hours ago |

Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?

0x3f 1 hour ago | |

If it works as AA seems to theorize, you'd need to:

  (a) work out how Google books exposes fragments of books, and see if there's a systematic way of using this to get whole books.  For example, a naive approach might be to find any fragment of the book by searching some exact phrase.  Then, you can search for an exact phrase from the start or end of the fragment it gave you, hoping it will show you the previous or next part of the book.  You can then just loop that to get the whole book.

  (b) once you have (a), you need a way of bypassing Google's bot detection/rate limiting.  I don't know what current state of the art is, but there may be a solution for sale out there.  E.g. you pay to receive a cookie or browser state, and use that to fetch the URLs from (a).  Or if you're good/already in the scene, you could do this part yourself.

takipsizad 47 minutes ago | | |

That way definitely will work with the current access google provides however its an extremely inconvenient way to scrape google books

ThrowawayTestr 2 hours ago |

One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.

Aboutplants 2 hours ago | |

Not worried about that, you will only have to wait 3-6 months and get a Chinese model just as good.

sulam 1 hour ago | | |

That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.

Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.

[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]

yorwba 2 hours ago | | |

Chinese companies giving away expensive models for free is a symptom of the AI bubble, too. It's not a law of nature that they'll always be able to scrounge up the money for yet another training run.

fastball 1 hour ago | |

If it's a bubble, why do you care about frontier models?

FpUser 1 hour ago | | |

Internet was a bubble, so was telecom etc. at some point. Being bubble does not mean that when 90% of investments go down the drain the remains are not useful.

thx67 1 hour ago | |

Prediction markets can solve this.

zuzululu 1 hour ago | |

which will be very difficult to run unless you have a large budget to operate your own mini datacenter

lelanthran 21 minutes ago | | |

In a crash the hardware will go for pennies on the dollar, if not for fractions of pennies on the dollar.

Lots of companies will pick them up for scrap metal prices and host them for fractions of what we are paying today.

That's the nature of bubbles.