Hey there I made a opensource alternative for these services. Although these worked very well, I was not so confident what they do. So I made my own and opensourced it. It is written in Golang and is fully customizable. |
Hey there I made a opensource alternative for these services. Although these worked very well, I was not so confident what they do. So I made my own and opensourced it. It is written in Golang and is fully customizable. |
You mean like Bypass Paywall Clean?
https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean
javascript:location.href='https://archive.is/?run=1&url=%27+encodeURIComponent(documen...
It's a shame Google won't let this addon be in the store.
Edit : The Digital Millennium Copyright Act (DMCA) prohibits circumventing an effective technological means of control that restricts access to a copyrighted work. I guess that would apply here.
I remember some guy that wrote a WoW bot and got sued using the DMCA, with the argument that his bot was circumventing the anti-cheat and the anti-cheat could be seen as a 'mechanism protecting copyrighted material', because it was safeguarding access to the game servers, the servers were generating parts of the game world (such as sounds) dynamically, and those were under copyright... Wild stuff.
Or, looking at it the other way, if you put a small sticker that says "do not do X" and even one person follows that, isn't that therefore an "effective" method?
It doesn't if you're not in the US.
chrome and firefox extension for removing paywall: https://github.com/iamadamdev/bypass-paywalls-chrome
If you want an alternative that only requests permissions for sites with paywalls, this one is better: https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clea...
I tried a Bloomberg article which gave me a "suspicious activity from your IP, please fill out this captcha" page, only the captcha was broken and didn't load.
Then I tried a WSJ article which loaded basically the same couple of paragraphs that I could get for free, but did not load any of the rest of the content.
javascript:window.location.href="https://archive.is/latest/"+location.href
It will usually open up the archived version of article without the paywall.
The ladder applies custom rules to inject code. It basically modifies the origin website to remove the Paywall. It rewrites (most of) the links and assets in the origins HTML to avoid CORS Errors by routing thru the local proxy.
The ladder uses Golangs fiber/fasthttp, which is significantly faster than Python (biased opinion) .
Several small features like basic auth ...
I have a feeling that this performance difference is practically imperceptible to regular humans. It's like optimizing CPU performance when the bottleneck is the database.
* I say "yet" because there could conceivably be ways to mitigate this, but afaik most would involve individual deals/contracts between every search engine & every subscription website - Google's monopoly simplifies this somewhat, but there's not much of an incentive from Google's perpsective to facilitate this at any scale.
Is it actually illegal anywhere to bypass a paywall?
The obvious thing is to mock Googlebot, but site owners can check that the request isn't coming from a Google-published IP and see that it's a fake, right?
> https://github.com/kubero-dev/ladder#environment-variables
> USER_AGENT User agent to emulate Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
> X_FORWARDED_FOR IP forwarder address 66.249.66.1
> RULESET URL to a ruleset file https://raw.githubusercontent.com/kubero-dev/ladder/main/rul... or /path/to/my/rules.yaml
just because they can doesn't mean they will... also most "site owners" are (by this point) a completely different people than "site operators" (who I take to be the 'engineers' who indeed can check this IP things)
But I feel a bit unconfident to let someone inject code to sites i view.
> Remove CORS headers from responses, assets, and images ...
I use txtify.it
One single downside was the intransparency. It was not clear which code was added or removed on the site you where looking at.
Wouldn't we always require a paid account to cache the HTML through (the SciHub model)?
Edit: Access to other projects & domains was apparently restored some time after: https://twitter.com/thmsmlr/status/1719480558932148272
> Freedom of information is an essential pillar of democracy
However, this reads like this tool saves democracy by letting you bypass a crappy pay wall on a site you visit once a year, and that whoever wants to get paid for their published content online is an enemy of democracy.
Maybe even better: magnolia outperforms archive.is on paywalls
Does anyone have any insight into how it would take Vercel hundreds of hours of support time? https://twitter.com/rauchg/status/1718680650067460138
Access to private property is an essential pillar of democracy and the safe proliferation of ideas. While property owners have legitimate financial interests, it is crucial to strike a balance between property and the public's right to access property. The proliferation of locks on doors raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to people's homes and workspaces without compromising the sustainability of property ownership.. In a world where property should be shared and not commodified, locks should be critically examined to ensure that they do not undermine the principles of an open and informed society.
That said, I work in news media (and have been involved in building paywalls at different orgs - NYT and New Yorker). I know how money for these directly support journalism - salaries and the costs with associated with any story.
If you are skipping paywalls a lot, I would encourage you to pay for a subscription to at least one or two news sites you respect - bonus points if its a small or medium local newsroom that benefits!
For me that has been; NYTimes, New Yorker, Wired, Teen Vogue, and my wife's hometown paper in Illinois.
Edit: apparently it is down now.
402: PAYMENT_REQUIRED Code: DEPLOYMENT_DISABLED ID: fra1::8wkv2-1699275385535-39dedae23d6a
It was just an effective way to get through substack/medium in my experience.
> Freedom of information is an essential pillar of democracy and informed decision-making. While media organizations have legitimate financial interests, it is crucial to strike a balance between profitability and the public's right to access information. The proliferation of paywalls raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to vital information without compromising the sustainability of journalism.
I think it’s quite reasonable that they blocked the account rather than the project. You wouldn’t have got that level of service from big tech.
> Hey Thomas. Your paywall-bypassing site broke our ToS and created hundreds of hours of support time spent on all the outreach from the impacted businesses.
> Our support team reached out to you on Oct 14th to let you know this was unsustainable and to try to work with you.
“Other than the completely unreasonable thing, they seemed pretty reasonable”.
I mean he's not being a complete asshole in the discussion here[1], but nuking the entire customer's account for a ToS violation on one specific product still isn't a reasonable move. Yes Google and Amazon do it routinely, but if you're not a trillion dollar monopoly and you care about your business' reputation, you shouldn't behave like those.
[1] CEOs behaving like assholes in Twitter discussions isn't supposed to be the norm.
I do wish there was a better way for me to share an account across multiple news sites that let me properly pay for good journalism without these issues. I do subscribe to a very local news source that seems to handle this a lot better, but they also don't paywall (most) of their primary content.
In the meantime I do find it strange that so many sites wish to gain the advantage of advertising that they have put up an article on the web, without actually providing that article. I have no issue with paid content, but when that content gets listed in search engine results and social media links like a web page, but clicking on it does not behave like a web page, It feels something feels like something has broken from the idea of the linkable World Wide Web.
Instead I just don't pay anyone, turn back when I encounter a paywall and look for someone's summary if I'm really interested.
The problem creating such a service is that most media houses believe that their content is the best thing since sliced bread and thus they often don't want to partner. Even though most of their content isn't that unique. Of course, some publications do have unique content, e.g. nyt, bloomberg.
I could see artifact being an interesting company to tackle this though (https://artifact.news/). They are already sending traffic to news sites and only serving what the user wants. If they now let me bypass paywalls for $20 that would be nice.
Those rules still need to be build up. (by me or the OS-community)
That is a looters mentality, sorry to say that. Paywall jumping software is like robbing a disabled old veteran in public transport. It are the last blows to finish off what was once good journalism.
https://massivelyop.com/2020/02/28/lawful-neutral-cheating-c...
(a-10) For purposes of subsection (a), accessing a computer network is deemed to be with the authorization of a computer's owner if:
(1) the owner authorizes patrons, customers, or guests to access the computer network and the person accessing the computer network is an authorized patron, customer, or guest and complies with all terms or conditions for use of the computer network that are imposed by the owner;Listed under spam policies:
https://developers.google.com/search/docs/essentials/spam-po...
"Cloaking refers to the practice of presenting different content to users and search engines with the intent to manipulate search rankings and mislead users"
The top of the pages says sites that violate the policies may "rank lower or not appear in results at all".I'd rather have a system that was just a cross-website web account.
javascript:location.href='https://archive.is/?run=1&url=%27+encodeURIComponent(document.location.href)+%27'I’m just responding to your last sentence: why would you go out of your way to say it is reasonable to block the account rather than the project?
I can understand locking the account just as the “lazy default” but I would not call it in any way reasonable - but you did, so I’m curious.
If that is reasonable, what would you consider unreasonable?
(Because to me, the obviously reasonable thing to do would be to block the project and not his entire account.)
Just because it's their right to conduct this way, doesn't mean it's not everyone elses exactly equal right to judge that and avoid.
Also, the “offending part” has been well known from them for several years (they even claim it costed them a lot in support over the years) so it's not like they received a DMCA and had to take everything down in a hurry, they knew exactly what product they wanted to stop because they did stop it because it was too costly. The fact that it violated their TOS is just the legal justification for the closure, not its source.
After digging upwards, additional support seems like an option delivered too late, and too outside of 'proper' channels - if you want a sanitized rant I can probably deliver it tomorrow, but too-little too-late is where vercel has landed in the operations team.
Mischaracterization much? He was on Vacation.
How many of us read every email for personal projects that comes in when you're half way across the world and supposed to be relaxing.
> I’m sympathetic to vercel here. Honestly very reasonable to take down 12ft with no response in 2 weeks.
> Taking everything else is the weird part.
Archive is intentionally violating copyright, and needs to know which country you're coming from, so they can serve you content from a country that isn't yours. they need that information to protect the service and keep it running.
[0] https://news.ycombinator.com/item?id=38171524
[1] https://en.wikipedia.org/wiki/Archive.today#Cloudflare_DNS_a...
[2] https://www.reddit.com/r/DataHoarder/comments/12trawt/has_an...
https://x.com/archiveis/status/1018691421182791680?s=20
https://news.ycombinator.com/item?id=36971650
It's kind of a "everybody sucks" situation and there's no real winners.
Archive.[whatever] setup a server system to give you access from a country not your own, so that abusers have a harder time of archiving illegal content, then instantly reporting it to get the entire archive taken down. He uses EDNS to do this, but CF doesn't provide EDNS since it's a privacy issue to them.
So archive.[whatever] doesn't work for CF DNS because he doesn't want to risk bad actors being able to take down the archive.
Sensible reasons on both sides, especially for a service like archive.[whatever], and the real losers in this situation are the users.
There's some issue with DNS over HTTPS, so you have to whitelist their sites in your settings, or turn off DNS over HTTPS (which I don't recommend).
To whitelist, on Firefox: Hamburger menu > settings > privacy and security > DNS over HTTPS > Manage exceptions > Add "archive.is", "archive.ph", and "archive.today"
In my case I added override rules in my opnsense router so that archive.is .ph .today .md are all resolved by a different nameserver.
Disabling DOH can appear to fix it only in the happenstance case that the fallback plain dns doesn't end up using cloudflare, or doesn't use it first.
After install go to "Filter Lists" > Import ... > and add the url of the "list"... which is actually from a different repo: https://gitlab.com/magnolia1234/bypass-paywalls-clean-filter...
Note: this apparently works for fewer sites than the linked extension.
It used to be in mozillas addon store, but they removed it, so have to install via dev mode
https://gitlab.com/magnolia1234/bypass-paywalls-clean-filters/-/raw/main/bpc-paywall-filter.txtStill, this is great to know because it can then be used on Firefox mobile.
https://mullvad.net/en/help/dns-over-https-and-dns-over-tls/
No issues with any archive-sites.
Aside from not being censored at all, thereby enabling visiting sites which are blocked at DNS-level in some locations, there are several options for adblocking at DNS-level, too. Often eliminating the need for a Proxy or VPN to get access, with optional Adblock as a service.
For free.
It's nice.
There is no special reason not to use cloudflare dns in general though.
The problem is only between cloudflare and archive.is (and it's aliases) and it's hard to say if either side is wrong, except for the fact that either or both of them could figure out some special exception where they recognize each other's traffic if they cared to. Cloudflare are not censoring archive.is for example, and are not doing anything wrong.
Then I tried Mullvad-DNS, the speed was still there, the 'lawful' censoring was gone, the problems with archive-sites ceased to exist, and somewhat configurable adblocking-as-a-service.
It's a seamless 'plugin'-solution, not degrading anything.
Triple-A!