Author of the npm module search-index here.
"1- Finding information is trivial"
The web already consists, for the most part, of marked up text. If speed is not a contraint, then we can already search through the entire web on demand, however, given that we dont want to use 5 years on every search we carry out, what we really need is a SEARCH INDEX.
Given that we want to avoid Big Brother like entities such as Google, Microsoft and Amazon, and also given, although this is certainly debatable, that government should stay out of the business of search, what we need is a DECENTRALISED SEARCH INDEX
To do this you are going to need AT THE VERY LEAST a gigantic reverse index that contains every searchable token (word) on the web. That index should ideally include some kind of scoring so that the very best documents for, say, "banana" come at the top of the list for searches for "banana" (You also need a query pipeline and an indexing pipeline but for the sake of simplicity, lets leave that out for now).
In theory a search index is very shardable. You can easily host an index that is in fact made up of lots of little indexes, so a READABLE DECENTRALISED SEARCH INDEX is feasable with the caveat that relevancy would suffer since relevancy algorithms such as TD-IDF and Page Rank generally rely on an awareness of the whole index and not just an individual shard in order to calculate score.
Therefore a READABLE DECENTRALISED SEARCH INDEX WITH BAD RELEVANCY is certainly doable although it would have Lycos-grade performance circa 1999.
CHALLENGES:
1) Populating the search index with be problematic. Who does it, how they get incentivized/paid, and how they are kept honest is a pretty tricky question.
2) Indexing pipelines are very tricky and require a lot of work to do well. There is a whole industry built around feeding data into search indexes. That said, this is certainly an area that is improving all the time.
3) How the whole business of querying a distributed search index would actually work is an open question. You would need to query many shards, and then do a Map-Reduce operation that glues together the responses. It may be possible to do this on users devices somehow, but that would create a lot of network traffic.
4) All of the nice, fancy schmancy latest Google functionality unrelated to pure text lookup would not be available.
"2- You don't need services indexing billions of pages to find any relevant document"
You need to create some kind of index, but there is a tiny sliver of hope that this could be done in a decentralized way without the need for half a handful of giant corporations. Therefore many entities could be responsible for their own little piece of the index.