Show HN: No Trash Search

Show HN: No Trash Search(notrashsearch.github.io)

137 points by rickdeveloper 4 years ago | 96 comments

I built this website a couple of months ago because I was annoyed by how hard it was to find useful things on Google. As "Google no longer producing high quality search results in significant categories" [0] is currently #1 on the front page I figured I'd share this project again. I hope it's useful to some people.

'No Trash Search' is very focussed on STEM and not "for daily use". It's surprisingly good when you're looking for certain kinds of information. Under the hood it's little more than a programmable search engine [1] with a whitelist of ~120 sites.

[0] https://news.ycombinator.com/item?id=29772136

[1] http://programmablesearchengine.google.com

throwawayboise 4 years ago | |

> Under the hood it's little more than a programmable search engine [1] with a whitelist of ~120 sites

So back to what web search was in the 1990s, roughly: an index from a curated selection of sites.

rdiddly 4 years ago | |

120 sites is pretty hilarious and sad. "Here you go, the worthwhile part of the internet!"

BlueTemplar 4 years ago | |

While I can understand the appeal, restricting your search engine to only ~120 websites out of hundreds of millions (?) is basically giving up on the Web.

(BTW, any good search engines these days that aren't indirectly using Google or Bing ?)

version_five 4 years ago | | |

> restricting your search engine to only ~120 websites out of hundreds of millions (?) is basically giving up on the Web.

Sure - the web is now a cesspool optimized for advertising and attention. The traditional search engines made a lot more sense at the dawn of the internet when it was more about discovery. Now, for the most part, it's closer to an information retrieval tool, where a finite list of established sites have the bulk of what one is looking for. It only makes sense to have a tool that lets one navigate the established, legit internet, and not have to deal with all the crap.

That doesn't mean there is no use case for google as it is, but some more focused competition is a no brainer.

narrator 4 years ago | | |

There's http://yandex.com . It's great if you want to search controversial subject matter and controversial results that Google wouldn't give you. The reverse image search is also amazing.

1vuio0pswjnm7 4 years ago | | |

"(BTW, any good search engines these days that aren't indirectly using Google or Bing ?)"

The code for Gigablast is open-source, including the crawler.

I could be wrong but I do not think search.marginalia.eu nor wiby.me use Google or Bing.

The comment about "hundreds of millions" is interesting. Assume hypothetically a search engline claimed to be searching millions of sites for a given query but in truth it was actually only searching 120 sites that it had determined answered this query (i.e., was the most popular answer source) for the majority of users. How would a user verify the search engine's claim about searching millions of sites was true. What if the search engine only allowed the user to retrieve a maxmimum of about 230 results, not matter how many sites it claimed to search.

ColinHayhurst 4 years ago | | |

Try Mojeek https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm... Disclosure: team member. Feedback good or bad appreciated

fuckcensorship 4 years ago | | |

Check out marginalia[1], made by another user on HN.

[1]: https://search.marginalia.nu/

blaerk 4 years ago | | |

I think https://www.qwant.com/ use their own, just started using it so I can't really say much about it other than it seems alright compared to ddg and google(?)

dataflow 4 years ago | |

You might want to add cppreference.com to your list of programming sites.

lionkor 4 years ago | | |

Seems to be in there now :)

SilasX 4 years ago | |

FYI, I think this is just the case where you should prefix the submission title with “Show HN:”. Can mods update it so it shows with the others? @dang?

https://news.ycombinator.com/show

https://news.ycombinator.com/showhn.html

pronoiac 4 years ago | | |

I emailed this suggestion to the mods.

DantesKite 4 years ago | |

Hey I was looking for something like this. Thanks.

pacificmint 4 years ago |

I need to buy a 3 hole punch, and when searching for reviews yesterday I had the same problem of lots of hits with affiliate links and low quality sites.

I searched for “3 hole punch review” [1] here, and the results have zero relevancy.

First one is a Chinese cell phone company, second a Wikipedia page for an episode of the office, third a thesaurus page with synonyms for ‘colorful’ and fourth a link to the Wikipedia page of Yellow Submarine.

I can’t even imagine how you get there from “3 hole punch review”

[1] https://notrashsearch.github.io/?q=3+hole+punch+review

baal80spam 4 years ago |

First test search "python random" - returned just what I would expect instead of multitude of low-quality blogs like Google Search does. +1 from me!

thekyle 4 years ago | |

When I search for "python random" on Google, I get: https://docs.python.org/3/library/random.html as the first result, which is what I would expect.

mda 4 years ago | |

Google easily finds correct results for "python random" no spam no bs.

mellavora 4 years ago | | |

so your google search returns random python snippets?

oburb 4 years ago | |

try "python.org random"

etchalon 4 years ago |

This is the approach I imagine Apple would take if they were to ever launch a Search Engine. A large corpus of handpicked sites.

quickthrower2 4 years ago |

This is good, but also sad in a way. It means Jo Blogg's blog won't get discovered and they may have had some good information on the topic.

One way to improve is a "bring your own list" feature, and the ability to include vetted lists. Maybe some kind of web of trust - if your friends have whitelisted a site, it is whitelisted for you too. If you find a problem with that site, you let your friend know to remove it. If they don't respond you can remove that friend from your trusted persopn list (maybe they got hacked?). Then maybe you can 'follow' a few lists of famous trusted people (e.g. paulg etc.) to build up a bigger slice of the internet you can search.

A spammer will want to come in then and create something that white lists their spam sites, but they need to convince you to add their list! And when you see the spam you can just unfollow them. They can't succeed.

rank0 4 years ago |

The first two results for “Rust awesome” were two ads which took up my entire screen on mobile. Both travel related.

The GitHub repo was third and had to be scrolled to.

Seems pretty trash to me.

schleck8 4 years ago |

If you are annoyed by Google, take a look at kagi and neeva, they are new takes on search engines

- https://neeva.com

- https://kagi.com

skinkestek 4 years ago | |

Haven't tried Neeva but I can vouch for Kagi.

Also search.marginalia.nu puts a smile on my face almost every time I use it :-)

(I should try Neeva, I keep hearing good things about it.)

twofornone 4 years ago | |

Why does neeva require an email sign in?

schleck8 4 years ago | | |

They have a waitlist, it's a beta

GlitchMr 4 years ago |

Pretty nifty project. I'm curious what whitelist are you using, would it be possible to list allowed domains somewhere publicly.

One suggestion that I have is to remove w3schools.com from the whitelist. MDN is a much better source for information about web development.

rickdeveloper 4 years ago | |

Here's a pastebin of the list I made yesterday [0] in a format that allows uploading to your own instance of programmable search. I have updated a few things since (like removing w3schools :)).

[0] https://pastebin.com/qLC0wQ0t

throw_me_up 4 years ago |

Great job! How are the ads implemented and do they cover the costs? I'm thinking of building a similar search engine for a completely different domain. I am a bit concerned about paying for it though.

rickdeveloper 4 years ago | |

Thanks!

Ads are added automatically by Google. The whole thing is little more than a wrapper around the 'Programmable Search Element Control API' which is an HTML element you can just insert into any site, like an iframe. Unfortunately this is the only way to make Programmable Search available at scale as the API is restricted to either 10 sites or 10K queries / day, even when paid!

There is a paid version for the HTML plugin, but that would leak the API key and so it wouldn't work as a business.

There is an option to get a share of the revenue generated by a search engine. Maybe it's time for me to figure out how that works.

I was thinking of making a hosted, ad free, customizable version where people upload their own keys. Not sure if people would like that.

As a side-note, it's super easy to remove ads with 1 line of CSS, but I wasn't sure how Google would feel about that so it's not in the online version. TamperMonkey is an extension that allows people to insert their own CSS on different websites. Hmm.

You can view all offerings in the docs [0].

[0] https://developers.google.com/custom-search/docs/overview#su...

throw_away 4 years ago | | |

It would be cool if I could take my existing browser history, aggregate by domain, sort by frequency & then create the necessary xml for the programmable search. Maybe with a pick & choose UI so I could decide which sites I wanted.

Right now, looking at your allow-list config, it feels a bit custom to you, but if I had an easy way to limit search to the sites I myself know and trust, I could see how that would be useful.

I know I could probably pick it out of my browser's history UI & poke it into Google's Programmable Search UI, but that seems like a hassle and a half.

yanmaani 4 years ago | | |

How about using the Bing API? Isn't that more open?

With caching, I think you might be able to reduce the load.

Also, why is w3fools in the list? It's an awful site.

tyingq 4 years ago | |

It appears to be google's custom search that you use to embed search on your own site.

https://programmablesearchengine.google.com/about/

llaolleh 4 years ago |

I agree with this approach. You can't crawl everything from the getgo - focus on a very specific domain catered to a specific set of users.

razemio 4 years ago |

Googled "Best smartphone 2021" which resulted in crappy result. Maybe I am missing what significant categories actually means?

version_five 4 years ago | |

I agree those results are not helpful.

The poster did say it was mostly for STEM subjects though...

More importantly though, I think "Best smartphone 2021" is really a search that has been conditioned on the crap google gives back now. At best you might expect to find a "best smartphones" listicle or something.

This is just a whitelisted search, so in my 5 min playing with it, it looks like popular or consumer queries are more likely to just provide reddit or wikipedia links, while more technical searches land on SO or documentation sites.

I think with a little tuning, this approach is great. Given the modern internet and all the crap there is, a manual whitelist of sites that are actually legit is always going to be superior to an algorithmic approach.

dzhiurgis 4 years ago |

So many listed their search engines that I feel at this point we need an aggregater

Supermancho 4 years ago |

I did a search for "eternal crusade".

The blob of the ads are still the top results. This is not the "no trash search" I'm looking for.

UltimateFloofy 4 years ago |

so you've used google programmable search to make googling better?

bruhhh 4 years ago |

the github repo is just few html pages i dont see anh source code.. i wont trust this search engine...

rickdeveloper 4 years ago | |

Glad you're interested in the source code!

As explained in my other comment, this website is a wrapper around google programmable search. The actual searching happens on Google servers, and I can see why people have problems with that. The code you see on the website is the same as the repo, though. It's actually hosted by GitHub! You can verify this by opening the web inspector in any browser or looking at the `.github.io` portion of the URL.

You can learn more about Programmable Search here: https://developers.google.com/custom-search/docs/overview. NoTrashSearch uses the 'Programmable Search Element Control API', which is documented here: https://developers.google.com/custom-search/docs/element and can be used with very little code!

version_five 4 years ago | | |

I think your site is great, I've thought about something like this before but didn't realize how simply it could be implemented.

Stupid question though: where is the list of whitelisted sites? Is that something you set up separately with google? I scanned though the code and expected to find a list somewhere, but obviously you do it in a different way

vizzah 4 years ago | |

Nothing should be in the repo. There is no source. Check author's link [1].