1. What is so bad about python in specific?
2. If you worry about root privileges, required for modifying the host file, you can use app armor to put the thing on a leash
Modifying system hosts configuration requires privileged file system access.
The mindset here should be default deny.
The list of hosts to exclude comes from several sites here: https://github.com/anned20/begoneads/blob/2c90fcee221edf71f8...
The actual application of the hosts file is here: https://github.com/anned20/begoneads/blob/2c90fcee221edf71f8...
I missed something though. Is a simple domain name per line enough to send that content to /dev/null? I haven’t used that form in /etc/hosts.
My primary concern was that this technique could be used to send ad traffic to a site that returns 404 but gathers metrics on the web regardless.
To be honest, this code is over-engineered. It could be a single script with a handful of functions. At the same time, it’s missing functionality such as deduplicating entries from the different lists.
[1]: https://www.howtogeek.com/225487/what-is-the-difference-betw...
If the goal of the project is actual adoption, a native executable without external dependencies would have been a much better option.
Browsers cache and use outside DNS servers despite the hosts files. Chrome and sometimes Safari don't really honor the hosts files 100% of the time. Every once in a while I google around to try and restore my control, try to tweak my browser settings but I have yet to find anything that makes using hosts files bulletproof.
Does anyone know an up-to date list for blocking social networks?
Here's another one to toss on the pile (works, I use it, supports wildcards, *nix only): https://github.com/jakeogh/dnsgate
Hosts files will only affect the host (workstation/desktop/laptop etc) they're installed on.
Things like piHole try to make it easy to apply the solution to all members of your network - which even in household cases these days can number in dozens, making it impractical to manage hosts files for all of them (This includes items like phones which are typically unfeasible to mess with hosts file).
62,448 line (63,370 actual '0.0.0.0' entries) /etc/hosts file, 100x resolving 'www.google.com', Debian GNU/Linux, Thinkpad with spinning rust.
The short version has 32 lines, with 14 active entries, mostly defaults and local systems.
Short hosts:
$ for i in {1..100}; do time host www.google.com; done 2>&1| grep real | sed 's/^real[ ]*//; s/0m//; s/s$//' | mean
n: 100, sum: 2.209, min: 0.015, max: 0.052, mean: 0.022090, median: 0.02, sd: 0.007450
%-ile: 5: 0.016, 10: 0.016, 15: 0.016, 20: 0.016,
25: 0.0165, 30: 0.02, 35: 0.02, 40: 0.02, 45: 0.02,
55: 0.02, 60: 0.02, 65: 0.02, 70: 0.021, 75: 0.022,
80: 0.0245, 85: 0.029, 90: 0.033, 95: 0.0385
Big hosts: $ for i in {1..100}; do time host www.google.com; done 2>&1| grep real | sed 's/^real[ ]*//; s/0m//; s/s$//' | mean
n: 100, sum: 2.517, min: 0.016, max: 0.063, mean: 0.025170, median: 0.023, sd: 0.009818
%-ile: 5: 0.016, 10: 0.016, 15: 0.016, 20: 0.016,
25: 0.017, 30: 0.0185, 35: 0.02, 40: 0.021, 45: 0.022,
55: 0.024, 60: 0.0255, 65: 0.0265, 70: 0.028, 75: 0.029,
80: 0.03, 85: 0.0325, 90: 0.0395, 95: 0.042
The delta of means is .003080s -- call it 3ms slower for the large hosts file.("mean" is an awk script for computing univariate moments.)
As others have mentioned, the main benefit of a centralised LAN service is that all devices on the LAN are protected. The hosts file on this system (a laptop) is effective regardless of where I am. It also pre-dates my configuring OpenWRT's adblock package about a month ago, though I'd had a hand-rolled DNSMasq configuration earlier. The laptop hosts file is almost certainly a few years out of date -- another occupational hazard of such things.
The OpenWRT solution runs on the Knot Resolver (kresd) caching nameserver. I've not noted any lag for it. The blocklist there is currently 231,627 hosts/domains (roughly doubled: specific + wildcard matches), from 0-29.com to zzzpooeaz-france.com.
Another experience I had was that certain sites failed to work correctly. I didn’t do extensive testing but when I disabled the hosts nocking the sites worked, when I enabled it they broke. These were companies with whom I was trying to do account related business: so it wasn’t just that something didn’t render correctly it actively prevented me from updating my accounts when I tried to submit requests.
I still like the approach and will continue to use it, but it hasn’t been frictionless.
Then I removed the hosts file, and it worked instantly.
Maybe for a static workstation it wouldn't be bad, but for a laptop or something that loses link frequently, it could be an issue.
It’s a ridiculous comparison. I am not a friend to intrusive ad-tech, but making a moral equivalence to slavery is to trivialize slavery. It’s like comparing parking tickets to the death penalty.
I side with the adblock solutions in this war.
</2 cents>
I'm glad Firefox is now blocking third-party trackers by default (not that I needed it for myself, but it's important for others to have this).
Yes, downloading hosts files from 3rd-party sites is kind of sketchy. But using python to do it is what you're worried about?
your link was to an informative but still sloppily written article and in this context, your summary of the article isn't clarifying.
to write clearly, people gotta stop throwing around the word localhost because at the level of n.n.n.n there are no names, only numerical addresses and localhost is a name, one defined in a text file: 127.0.0.1 points to, not localhost, but to the local host, always; localhost (the name) points to 127.0.0.1 iff it is defined to (which should be all the time).
what I learned from the article is that a local host server listening on 0.0.0.0 will listen to everthing it can hear. But the question context here is, where will a packet sent to 0.0.0.0 go?
The point that 0.0.0.0 is not routable does not answer the question because 127.0.0.1 is also not routable; however, 127.0.0.1 will arrive someplace, at the local host. The question is whether 0.0.0.0 will also arrive into "the pool o' packets", that place that packets arrive on the local host prior to their disposition being determined (a. routed out of the local tcp/ip pool o' packets, b. listened to within the local tcp/ip pool o' packets, or c. dropped on the floor) because routing isn't only what Routers do, it's what tcp/ip does within a local host. (and btw, the article also describes that 0.0.0.0 means the default routing of last resort address in the context of a route address, also not the same as a packet destination address)
That's not exactly what happened here.
OP repeated the beginning of a truism: "ads fund the development of the web while at the same time causing a whole host of severe problems for its users (individually and as a whole)."
OP left a hole where the italized part of the truism should be.
OP asked HN to fill that hole.
On a side devoted to tech/software, it's either low effort or bad-faith to ask others to fill a hole in such a well-known truism.
In light of this I offer up a countervailing law, "Loki's Law:"
"If you leave a Hitler-sized hole in your argument, expect it to be filled accordingly."
even if he disagreed with Steven solution and chose to reimplement, it would have been interesting to understand what the motivation was. I'm not saying he shouldn't, but just that it would be nice to know what motivated his design and why he thinks it's better to redo ...
Blocking ads does not drive businesses to be more "honest". They'll just spend more on PR and influencers. And given how hostile this community is to ads and perhaps even marketing overall, (how YC ever backed a marketing or ad startup is beyond me), companies already realize that getting a fawning TC article purchased thru connections and favors and PR chicanery is going to be more effective than ad campaign even though the ad campaign is more honest, upfront and transparent with its agenda.
With the proper permissions something like this should be ok, but I'd tread lightly. Especially with something that dynamically updates your hosts file.
Naturally it's on the user to properly configure the permissions.
I'm not saying this isn't a worthy project, I'm just adding to the discussion on why people should be cautious when running scripts with root permissions.
I’m just guessing, but it might be easier to programmatically install a systemd cron job (and making sure it runs) than doing so for the old/conventional crond?
Yes, obviously cron would work. If systemd can do what this dev needs, what's the harm?
Along similar lines, I think I heard that 30% of detected malware was signed with a “trusted” authority last year.
* Not running if the network is down.
* Not running if the download path isn't available.
* Running if the machine was off during the scheduled time.
* Monitoring and retry logic.
* Logging to syslog.
* Resource constraints.
* Random wait.
It ends up being a lot of code factored out of the actual application.