Show HN: Ladder, open source alternative to 12ft.io and 1ft.io

Show HN: Ladder, open source alternative to 12ft.io and 1ft.io(github.com)

377 points by 2cpu1container 2 years ago | 150 comments

Hey there

I made a opensource alternative for these services. Although these worked very well, I was not so confident what they do. So I made my own and opensourced it.

It is written in Golang and is fully customizable.

ktpsns 2 years ago |

I got the feeling that these features should be part of a browser extension the same way as there are AdBlock extensions. I guess the reason it is not is "personal preference" of the author, or is there some technical reason?

sva_ 2 years ago | |

> these features should be part of a browser extension

You mean like Bypass Paywall Clean?

https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean

Beijinger 2 years ago | | |

Does not work so well anymore. Better use a bookmarklet

javascript:location.href='https://archive.is/?run=1&url=%27+encodeURIComponent(documen...

NelsonMinar 2 years ago | | |

This works quite well and probably covers 90% of my needs. For the other 10% I still use archive.today or 12ft (RIP).

It's a shame Google won't let this addon be in the store.

radlad 2 years ago | | |

Is there a Firefox version?

bilekas 2 years ago | |

I don't know for sure, but I would imagine there are more severe actions taken against circumventing paid material (content behind a paywall) than there is for free content supplemented by advertisements..

Edit : The Digital Millennium Copyright Act (DMCA) prohibits circumventing an effective technological means of control that restricts access to a copyrighted work. I guess that would apply here.

mckirk 2 years ago | | |

Given how liberally the DMCA is applied, you definitely don't want to be on the wrong side of that.

I remember some guy that wrote a WoW bot and got sued using the DMCA, with the argument that his bot was circumventing the anti-cheat and the anti-cheat could be seen as a 'mechanism protecting copyrighted material', because it was safeguarding access to the game servers, the servers were generating parts of the game world (such as sounds) dynamically, and those were under copyright... Wild stuff.

nerdbert 2 years ago | | |

Isn't anything that can be circumvented ineffective?

Or, looking at it the other way, if you put a small sticker that says "do not do X" and even one person follows that, isn't that therefore an "effective" method?

Aaargh20318 2 years ago | | |

> The Digital Millennium Copyright Act (DMCA) prohibits circumventing an effective technological means of control that restricts access to a copyrighted work. I guess that would apply here.

It doesn't if you're not in the US.

nottheengineer 2 years ago | | |

Good old section 1201. The EFF has been fighting it for a while, but hasn't had much success unfortunately.

overtomanu 2 years ago | |

there is below extension for this purpose which I know of, I think there can be many more if we search for them

chrome and firefox extension for removing paywall: https://github.com/iamadamdev/bypass-paywalls-chrome

user764743 2 years ago | | |

This extension is asking for a lot of permissions it shouldn't ask for

If you want an alternative that only requests permissions for sites with paywalls, this one is better: https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clea...

nfriedly 2 years ago |

The docker image, and on the upside is fairly easy to get running. But I'm downside, I'm zero for two actually using it.

I tried a Bloomberg article which gave me a "suspicious activity from your IP, please fill out this captcha" page, only the captcha was broken and didn't load.

Then I tried a WSJ article which loaded basically the same couple of paragraphs that I could get for free, but did not load any of the rest of the content.

fyzix 2 years ago |

I'm very new to this kind of service, but do you have to write your own rulesets for each site you want to bypass? The repo doesn't seem to include much...

2cpu1container 2 years ago | |

Yes, the one i provide is still pretty empty yet. I plan to build one that can be used as a starting point or as a default.

KoftaBob 2 years ago |

Create a browser book mark and set this as the URL of the bookmark:

javascript:window.location.href="https://archive.is/latest/"+location.href

It will usually open up the archived version of article without the paywall.

SigmundurM 2 years ago |

You mention 13ft as another open source inspiration. How is Ladder improving on what 13ft does?

2cpu1container 2 years ago | |

I did try 13ft. But it misses several points.

The ladder applies custom rules to inject code. It basically modifies the origin website to remove the Paywall. It rewrites (most of) the links and assets in the origins HTML to avoid CORS Errors by routing thru the local proxy.

The ladder uses Golangs fiber/fasthttp, which is significantly faster than Python (biased opinion) .

Several small features like basic auth ...

withinboredom 2 years ago | | |

> The ladder uses Golangs fiber/fasthttp, which is significantly faster than Python

I have a feeling that this performance difference is practically imperceptible to regular humans. It's like optimizing CPU performance when the bottleneck is the database.

oh_sigh 2 years ago | | |

If the paywall is implemented in client code, then usually just disabling javascript for the site is enough to let you view it. If it is implemented server side, then there usually isn't a way around it without an account.

pacifika 2 years ago |

Open source makes it easy for the cat in the cat mouse game, right?

lucideer 2 years ago | |

There's no real cat & mouse game here (yet*) - sites don't do anything to mitigate this. Sites deliberately make their content available to robots to gain SEO traction: they're left with the choice of allowing this kind of bypass or hurting their own SEO.

* I say "yet" because there could conceivably be ways to mitigate this, but afaik most would involve individual deals/contracts between every search engine & every subscription website - Google's monopoly simplifies this somewhat, but there's not much of an incentive from Google's perpsective to facilitate this at any scale.

tiagod 2 years ago | | |

Google publishes IP ranges for GoogleBot. You can also reverse-lookup the request IP address - the resolved domain should in turn resolve to the original address.

gumby 2 years ago |

The README says "The author does not endorse or encourage any unethical or illegal activity."

Is it actually illegal anywhere to bypass a paywall?

qingcharles 2 years ago | |

Certainly in Illinois it would be a crime to violate the TOS of a website. Misdemeanor for first offense, felony for second, IIRC.

quickthrower2 2 years ago | | |

Can’t be that simple. What if TOS has ridiculous shit in it. Stuff about life long servitude to the webmasters pet goldfish, for example?

2cpu1container 2 years ago | |

Not sure about the paywalls. But it might be used for "drive by attacks" or phishing.

janejeon 2 years ago |

Really dummy question: how do services like this work? As in, how do they bypass these paywalls?

The obvious thing is to mock Googlebot, but site owners can check that the request isn't coming from a Google-published IP and see that it's a fake, right?

Fnoord 2 years ago | |

Some possible clues:

> https://github.com/kubero-dev/ladder#environment-variables

> USER_AGENT User agent to emulate Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

> X_FORWARDED_FOR IP forwarder address 66.249.66.1

> RULESET URL to a ruleset file https://raw.githubusercontent.com/kubero-dev/ladder/main/rul... or /path/to/my/rules.yaml

janejeon 2 years ago | | |

Oh wow... I'm surprised that's enough. When I was researching scraping protection bypass, you had to do some real crazy stuff with the browser instance + using residential IPs at a minimum...

ComputerGuru 2 years ago | | |

I don’t know of any off-the-shelf product that respects X_FORWARDED_FOR unless the current request ip originates from a whitelisted (or lan) address.

narinxas 2 years ago | |

> site owners can check that the request isn't coming from a Google-published IP and see that it's a fake, right?

just because they can doesn't mean they will... also most "site owners" are (by this point) a completely different people than "site operators" (who I take to be the 'engineers' who indeed can check this IP things)

calflegal 2 years ago | |

related: If this is how they work, why doesn't google offer a private service to allow publishers to have content indexed while still protected?

matsemann 2 years ago | | |

It used to be against guidelines to serve different content to google vs what users would see. Not sure if still the case, but I don't think it's in google's interest to give a result that the user actually can't access.

fader 2 years ago |

For folks like me who have no idea what 12ft.io or 1ft.io are, they appear to be services for bypassing paywalls on websites.

alberto_ol 2 years ago | |

Previous dicussions of the service on HN:

https://hn.algolia.com/?q=12ft.io

2cpu1container 2 years ago | |

Those were Paywall bypassing tools. 12ft.io was shut down one week ago and 1ft.io still works.

But I feel a bit unconfident to let someone inject code to sites i view.

szaboat 2 years ago |

Not relevant to the project but I usually check for earlier versions of the paywalled pages in the wayback machine (~75% success). I felt bad using these services (paywall removers), and just feeling a bit better checking in archive.org.

arendtio 2 years ago |

Sounds great, not just for paywalls, but for removing CORS as well:

> Remove CORS headers from responses, assets, and images ...

JustinGoldberg9 2 years ago |

I still miss outline.com

I use txtify.it

TanguyN 2 years ago |

I have noticed that on a lot of websites, if you stop the page loading at just the right moment (you have to be quick), the whole content will display without the paywall. And that's without any external tools. These kinds of tools seem, of course, much more convenient.

jwmoz 2 years ago |

12ft was really good!

2cpu1container 2 years ago | |

In deed it was. Sad it's gone.

One single downside was the intransparency. It was not clear which code was added or removed on the site you where looking at.

cooper_ganglia 2 years ago |

Really great and easy to use. I was trying to read an article that was on the front page of HN and couldn't due to paywall. Downloaded the binary and was reading it within 30 seconds. Awesome and very useful tool, thanks!

rounakdatta 2 years ago |

Given a very different paywall model for Substack, what exactly would work for bypassing their paywalls?

Wouldn't we always require a paid account to cache the HTML through (the SciHub model)?

zippytyro 2 years ago |

damn, thanks man

some1else 2 years ago |

Relevant: 12ft.io was banned by Vercel, taking down the developer's entire account with multiple other hosted projects & domains: https://twitter.com/thmsmlr/status/1718663563353755982

Edit: Access to other projects & domains was apparently restored some time after: https://twitter.com/thmsmlr/status/1719480558932148272

serial_dev 2 years ago |

First of all congrats on the project and thank you for open sourcing it.

> Freedom of information is an essential pillar of democracy

However, this reads like this tool saves democracy by letting you bypass a crappy pay wall on a site you visit once a year, and that whoever wants to get paid for their published content online is an enemy of democracy.

UncleEntity 2 years ago | |

Ironically, the rest of the paragraph you quoted from gives their reasoning why they believe this tool is needed beyond "whoever wants to get paid for their published content online is an enemy of democracy". Double-plus democracy and all that...

karaterobot 2 years ago |

It seemed to me like 12ft.io was useful for a couple of months, but then stopped being useful as they agreed to blacklist more and more URLs. I thought everybody switched to archive.is, which (so far) works 100% of the time, even if it is sometimes a pain in the butt.

Axsuul 2 years ago | |

Is there an open source version of archive.is?

metadat 2 years ago | | |

The operator of archive.is must constantly re-up on hacked credentials for wsj and nyt. Given this is a critical aspect of the service, it is not really feasible/useful to open source it.

raxi 2 years ago | | |

Just webrecorder + magnolia and you'll get something similar.

Maybe even better: magnolia outperforms archive.is on paywalls

benatkin 2 years ago |

This reminds me of the thread when 12ft was taken down.

Does anyone have any insight into how it would take Vercel hundreds of hours of support time? https://twitter.com/rauchg/status/1718680650067460138

someotherperson 2 years ago | |

My assumption here is that affected websites sent multiple, persistent support tickets and engaged in back and forth communication, as well as updates to the client, support team contacting engineering/legal/management/meetings on how to deal with 12ft.

boplicity 2 years ago |

Slightly edited "Why":

Access to private property is an essential pillar of democracy and the safe proliferation of ideas. While property owners have legitimate financial interests, it is crucial to strike a balance between property and the public's right to access property. The proliferation of locks on doors raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to people's homes and workspaces without compromising the sustainability of property ownership.. In a world where property should be shared and not commodified, locks should be critically examined to ensure that they do not undermine the principles of an open and informed society.

2cpu1container 2 years ago | |

Nice analogy. But with ladder, you're just wearing a Google shirt and get invited.

donohoe 2 years ago |

I use services like this as I often skip news site paywalls because I just can't afford, nor is it practical, to have so many subscriptions.

That said, I work in news media (and have been involved in building paywalls at different orgs - NYT and New Yorker). I know how money for these directly support journalism - salaries and the costs with associated with any story.

If you are skipping paywalls a lot, I would encourage you to pay for a subscription to at least one or two news sites you respect - bonus points if its a small or medium local newsroom that benefits!

For me that has been; NYTimes, New Yorker, Wired, Teen Vogue, and my wife's hometown paper in Illinois.

roydivision 2 years ago |

Is it just me or has 12ft become less and less effective? I rarely get through with it these days.

user_7832 2 years ago | |

Their policies have apparently… changed. They accept donations to not have your website bypassed. Archive.org is much better.

Edit: apparently it is down now.

402: PAYMENT_REQUIRED Code: DEPLOYMENT_DISABLED ID: fra1::8wkv2-1699275385535-39dedae23d6a

jdiff 2 years ago | | |

Is it donations they accept or legal threats?

i67vw3 2 years ago | | |

Archive.today never fails compared to Archive.org or various browser extensions

To remove paywalls 12ft<Archive.org<Archive.today is my opinion.

snarkyturtle 2 years ago | |

Before they went down it seemed that there were many big publishers who got the owner to disable it for their sites. Either that or the sites learned to actually not send their articles unless the user is logged in (and didn't care about googlebot not scanning it).

It was just an effective way to get through substack/medium in my experience.

ams92 2 years ago | |

I’ve rarely found it to be able to skip a paywall, I gave up after trying a few times.

j-a-a-p 2 years ago |

In the README there is a WHY paragraph:

> Freedom of information is an essential pillar of democracy and informed decision-making. While media organizations have legitimate financial interests, it is crucial to strike a balance between profitability and the public's right to access information. The proliferation of paywalls raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to vital information without compromising the sustainability of journalism.

j-a-a-p 2 years ago | |

For me this is grotesque. Democracy is in dispair so is journalism. What exactly is this software doing to support journalism or democracy?

2cpu1container 2 years ago | | |

We live in a world, where we have more misinformation and poor journalism every day, and less money in the pockets of the people to afford paying for good journalism. So this might start a more open discussion on how to finance journalism. And while discussions are still going on, people can inform themselves with good journalism, which supports the democracy.