For
> Ask HN: Is there a search engine which
excludes the world's biggest websites?
> Discovering unknown paths of the web
seems almost impossible with google et
al..
> Are there any earch engines which
exclude or at least penalize results from,
say, top 500 websites?
Let's back up a little and then try for an
answer:
Some points:
(1) For some qualitative exclamation,
there is a LOT of content on the
Internet.
(2) There are in principle and no doubt so
far significantly in practice a LOT of
searches people want to do. The search in
the OP is an example.
(3) Much like in an old library card
catalog subject index, the most popular
search engines are based heavily on key
words and then whatever else, e.g., page
rank, date, etc.
So: (1) -- (3) represent some challenges
so far not very well met: In particular,
we can't expect that the key words, etc.
of (3) will do very well on all or nearly
all the searches in (2) for much of the
content in (1).
And the search in the OP is an example of
a challenge so far not well met.
Moreover, the search in the OP is no doubt
just one of many searches with challenges
so far not well met.
Long ago, Dad had a friend who worked at
Battelle, and IIRC they did a review of
information retrieval that concluded
that keyword search covers only a
fraction, maybe ballpark only 1/3rd, of
the need for effective searching. And the
search in the OP is an example of what is
not covered because the library card
catalog did not index size of the book or
Web site! :-)!
Seeing this situation, my rough, ballpark
estimate has been that the currently
popular Internet search engines do well on
only about 1/3rd of the content on the
Internet, searches people want to do, and
results they want to find.
So, I decided to see what could be done
for the other 2/3rds.
I started with some not very well known or
appreciated advanced pure math; it looks
like useless, generalized abstract
nonsense, but if calm down, stare at it,
think about it, ..., can see a path for a
solution. Although I never thought about
the search in the OP until now, in
principle the solution should work also
for that search. Or, the math is a bit
abstract and general which can
translate in practice to doing well on
something as varied as the 2/3rds.
Then for the computing, I did some
original applied math research.
Using TeX, I wrote it all up with theorems
and proofs.
So, the project is to be a Web site.
While in my career
I've been programming for decades,
this was my first Web site. I selected
Windows and .NET, and typed in 100,000
lines of text with 24,000 statements in
Visual Basic .NET (apparently equivalent
in semantics to C# but with
syntactic sugar I prefer).
The software appears to run as intended
and well enough for significant
production.
I was slowed down by one interruption
after another, none related to the work.
But, roughly, ballpark, the Web site
should be good, or by a lot the best so
far, for the 2/3rds and in particular for
the search in the OP.
So, for
> Ask HN: Is there a search engine which
excludes the world's biggest websites?
there's one coded and running and on the
way to going live!
I intend to announce an alpha test here at
HN.