ReCAPTCHA due lack of opt out is effectively illegal in the EU.
Large sites like Amazon or CNN can afford to eat the bot traffic. Smaller sites can't.
I haven't encountered a captcha using Lemmy. There might be one on some servers for account creation.
I've used Amazon from the same IP address for years and I still regularly get the "you look like a bot, solve this" crap.
To say "it's worthless from a security perspective" is a pretty harsh and largely inaccurate representation. It's been tremendously useful to those who have used it. If it wasn't valuable, it wouldn't be so widely used.
Definitely agree with the whole "tons of free $$$ for Google", but that's kind of their business model, so yeah, Google is being Google. In other breaking news, water is still wet.
> More concretely, the current average value life-time of a cookie is €2.52 or $2.7 [58]. Given that there have been at least 329 billion reCAPTCHAv2 sessions, which created tracking cookies, that would put the estimated value of those cookies at $888 billion dollars.
The cited paper is https://www.sciencedirect.com/science/article/pii/S016781162... - but it doesn't deal with CAPTCHAs, just with the general economics of third-party cookies.
In practice, many of these cookies will have already been placed by other Google services on the site in question, with how ubiquitous Google's ad and analytics products are. And it's unclear whether Google uses the _GRECAPTCHA cookies for purposes other than the CAPTCHA itself (in the places where this isn't regulated).
But reCAPTCHA does gives Google an ability to have scripts running that fundamentally can't be ad-blocked without breaking site functionality, and it's an effective foot in the door if Google ever wanted to use it more broadly. It's absolutely something to be aware of.
The researchers put the vast majority of this value to tracking cookies, and this revenue happens whether or not a manual challenge is completed.
It's more than just your answers that are fed into ML and more than just what others have already said: there's also the way that your browser functions and the way you interact with it. Your IP address, browser, OS, screen size, input type, timezone and current time of day, how fast do you select different images, etc etc. All of this gets fed into ML algorithms and answers to the obvious images are used as corollaries to support/deny your ancillary information.
Frankly a lot of the images I get are... kinda easy? This isn't the classic book-reading recaptcha where you could see why the text had confused the OCR.
We'll have to have in-person attestation or make all services paid, perhaps.
How are you going to connect the physical person with an identity with in-person attestation? Many (several of which major English-speaking) countries don't have mandatory government IDs...
A commenter below suggests that government eIDs could be used. I bet this will be harder to implement and will have much worse conversion rates than (the already terrible) mandatory credit/debit cards... Not to mention the hell that we as non-US citizens will have to endure if anyone tries to impose any form of mandatory ID there... One can only take so much complaining about government overreach about something that is basic necessity here in the EU...
Specially since all of the sudden, a bot service running hundreds of thousands of requests will suddenly and inadvertedly have to compute cryptographic hashes at the cost of the user running the bots?
On the other side, an amount of work reasonable for modern desktop will absolutely overwhelm an older cell phone.
But in the end it is not (effective) security for a website, is an antifeature for users and is profit for google.
> A lifetime value of $888 billion for all of reCAPTCHAv2's tracking cookies produced between 2010 and 2023.
It should be a general standard of proof for any sort of sociological claim that you look at rates, not just examples, but it usually isn't.
There are a lot of things that can trivially cut down SPAM ranging from utterly unhelpful to just simply a bad idea. Like for example, you can deny all requests from IPs that appear to be Russian or Chinese: that will cut out a lot of malicious traffic. It will also cut some legitimate traffic, but maybe not much if your demographics are narrow. ReCAPTCHA also cuts some legitimate traffic.
The actual main reason why people deployed reCAPTCHA is because it was free and easy, effectiveness was just table stakes. The problem with CAPTCHAs prior to reCAPTCHA is simply that they really weren't very good; the stock CAPTCHAs in software packages like MediaWiki or phpBB were just rather unsophisticated, and as a double whammy, they were big targets for attack since developing a reliable solver for them would unlock bot access to a very large number of web properties.
Do you need reCAPTCHA to make life hard for bots, though? Well, no. Having a bespoke solution is enough for most websites on the Internet. However, reCAPTCHA isn't even necessarily the best choice even for something extremely high-volume. Case-in-point, last I checked, Google's own DDoS protection system still used a bespoke CAPTCHA that largely hasn't changed since the early 2010s; you can see what it looks like by searching for the Google "sorry" page.
I agree that reCAPTCHA is not "worthless" but it's worth is definitely overstated. Automated services that solve CAPTCHAs charge less than a cent per-solve. For reCAPTCHA to be very effective against direct adversaries rather than easily-thwarted random bots, the actual value of bypassing your CAPTCHA has to be pretty damn low. At that point, it's very reasonably possible that even hashcash would be enough to keep people from SPAMing.
Reminds me of the advice around the deadbolt on your house - it won't stop a determined attacker, but it will deter less-determined ones.
(And while I don't have hard data on this, I suspect that bot authors that don't know how to properly set up rate-limits and don't know how to set up captcha solving service bypass, so captchas are especially effective against them)
What colour is snow is close but you can't assume that everyone knows what snow is, let alone what colour it is. This includes both people with disabilities and in parts of the world where there is no snow...
[1]: https://blog.cloudflare.com/introducing-cryptographic-attest...
[2]: https://github.com/mCaptcha/mCaptcha
[3]: https://blog.torproject.org/introducing-proof-of-work-defens...
PoW tasks are meant to work on a wide range of mobile phones, desktops, single-board computers, etc... you have vastly different compute budgets in every environment. For a PoW task that is usable on a five year old mobile phone, an adversary with a consumer RTX 50 series card (or potentially even an ASIC) can easily perform it many, many, many orders of magnitude faster.
Am I missing something?
Important to note though that as AI gets more accessible then the downsides of v3 start to weigh more.
Still in beta though.
What a time where people on a site called "Hacker News" ask such a question..
Of course reCAPTCHA is also still vulnerable to the use of a mechanical turk so even giving away your users' data won't save you.
For example, there is a someone's personal blog, which is beset by comment spammers. The blog's owner is tired of deleting spammy comments, and do not want their comment section to look like garbage bin, so they want some bot protection. The website's author is not that technical, so they do some googling and install reCaptcha (or cloudflare) and this cuts off bad comments to 1/week, which is easy to clean manually.
So in that story, who should be re-evaluating what, and what answer do you expect?
(keep in mind the blog's author cannot host their own captcha service / AI bot detector, as they are not proficient enough to install all the required dependencies for such a complex task, nor is their VPS powerful enough to keep it running.)
An example of an inaccessible homebrew CAPTCHA that causes very poor UX can be found on the portal that provides access to the legal acts of the Bulgarian judicial system: https://legalacts.justice.bg/ . Try taking the legal system to court. I tried for this one, you can see for yourself how it went.
Once, "we" come up with the answer we are going to build based on the newly defined assumptions.
Here is what critically re-examining everything means: Do i really need to have a comment section? And if I do, shall I be the one responsible for managing it's technical infrastructure? The answer could be to use an off-the-shelf solution in your particular story. When you pivot to the side of the off-the-shelf solution developers who actually needs to do the spam filtering, then answers may differ, as will assumptions.
Edit: What are your thoughts of the mechanism hacker news have used to reduce bot comments?
They don't get a lot of bots trying stolen credit cards, but mostly because they are pretty niche.
Of course authors really wanted to write their conclusion, so they just ignored all the practical considerations. It's really a shame on the part of paper's reviewers.
[0] on my contact page my email is protected via a custom cypher. if the bots execute javascript and wait 0.5s they can read it, but most don't. It’s the dumbest PoW imaginable, but it works
I recall a form of "CAPTCHA" that involved a text input which was hidden via CSS, but which bots would fill in anyway. Any text in the input caused the entire form to be rejected. I wonder if that style still works today.
Nice one! I guess you mainly need to get above a certain novelty threshold, because all ML is based on what has already been seen/learned rather than actually outsmarting the defence.
I worked on Google's system for solving this. It's a pretty sophisticated analysis that decides whether to demand CAPTCHAs or not based on a lot of factors. The CAPTCHA was needed (at that time) because it's the only way to slow down the attacker without bothering the real user, who won't be shown one.
I also don't see how EU solves issues with CAPTCHAs. Anonymous CAPTCHAs are allowed in the EU.
Once example is for a landscaper: What is the color of healthy grass?
The answer is "green" of course, but grass is common in our region. That question would not work in a culture or region unfamiliar with "lawn grass".
This has the added benefit that translators will be forced to come up with a translation that makes sense when your projects gets to a point that it needs i18n.
Google will happily ask you to point out which squares contain fire hydrants. Is there a captcha that meets your standards?
However, I am far from arguing in favour of reCAPTCHA. It is also an example of shit CAPTCHA that also bans people. I am often one of these people.
No there is no example of a CAPTCHA-as-a-service that I know of that I would be fine to impose on my users.
Sorry, I don't follow, English is a second language to me, but how does this stand against my statement that 'many people don't know the concept of snow, let alone what colour it is'?
If a website is multilingual, it can offer language/region selection and add appropriate questions for each of them.
Have you tried register your phone number in the DO-NOT-CALL list (or your local equivalent)?
Did it stop anything?
Another example for an inadvertently hard puzzle, this time due to a lack knowledge as a consequence of being part of a different culture, would be asking US people what colour is the edelweiss. In my country children learn about it in first grade if not in kindergarten. Another -- asking Europeans/US people what colour is romduol... I don't consider this discriminatory, I don't consider people in the US or Europe uneducated because they cannot solve such a simple puzzle... It is just poor/lazy/stupid design that fails the single requirement to block bots and only bots. And I get it 'I would just google it'... But how many conversions will you lose if a considerable part of your users need to google something to go to the next step of your funnel? It's just inexcusably shit UX...
You would indeed be fine with the 'snow' question if your site must only be visited and used by fellow citizens of your country (where citizens implies similar education -- both cultural and scientific). You would indeed be fine if you can make sure the puzzle will be translated intelligently (including the solution) if your site may be used in a foreign country or by users speaking the language in your own country.
I usually cannot make any of these assumptions for any of the projects I work on. The site's audience is but a whim of the Product team, and I18n is outsourced to (once) translation agencies and now directly to an LLM... This can even be done (and frankly should be done) without the knowledge or input of the dev team. Also, neither translators nor LLMs can be expected to understand that they must come up with basically a new puzzle that will not be hard for people that use the specific language. And I as a developer that does not speak the specific foreign language while I can roughly validate their translation (if by any chance it passes by me for review and I go above and beyond what is expected of me and pass it trough a translation service) and return it with feedback for fixes, I cannot rely that they will abide by the feedback, or how long it would take... Those are a lot unknowns to consider these assumptions reliable, and it seems much less effort to come up with a simpler puzzle that contains the answer in itself... Its effectiveness against spam will be exactly the same.
Also, you will definitely not be fine if your puzzle contains a concept foreign for a considerable part of people who can't for example see or hear. You would also not be fine if your puzzle's technical implementation makes it impossible to be perceived by them. The latter part is very simple to get wrong. For example, one of the best ways to protect any site from blind people is to implement a hero image slidshow that steals the focus on each slide. Their screen readers' focus gets moved each second and they literally cannot perceive, let alone navigate the site...
Finally, none of the peculiarities above excuses straight up going for reCAPTCHA. Even if you don't give a f about your users' data EU users can and will get you in trouble with EU regulators exactly when you get to a scale at which CAPTCHA use is a necessity. There's a cultural difference for you.
Anonymous CAPTCHAs are fine so long as they're accessible for people with disabilities. I would not venture to say I know of such one as a service...
* Don't have captcha.
* Make it illegal for bots to access.
* Block foreign ips.
* Make it illegal to provide a proxy for foreign bots.
Countries unwilling to sign regulations that would have them lock up scammers/DDoS botters and other toxic e-criminal, or unwilling to enforce their own laws when they exist, should not be allowed to continue to pollute the internet at large. Block them until they learn their lesson.
In the west, you can and will lose your access to the internet even if you are not doing this sort of shit on purpose but have an infected computer. See for example : https://it.slashdot.org/story/05/04/13/0320249/major-aussie-...
We enforce the rules on our own citizens, why are we tolerating this level of criminal traffic from China, India, and, the worst of them all, Russia, the country through which we are very much fighting a proxy war right now in funding Ukraine?
We can send missiles to kill russians but we can't cut them from the internet at large ? Really?
Internet access is not a human right. Just like driving on the roads is not a human right and terrible drivers get their license revoked.
bullshit
In fact, all you are doing is slowing down legitimate clients with old equipment and doing nothing against adversaries.
I bet that requiring JS stops more spam than the PoW itself. Can anyone who tried it chime in?
Oh, I see, it's effective against 'someone [who] wants to hammer your site'. That is usually never the case with my sites. I do get a steady stream of spam, but it is quite gentle as to not trigger any WAFs. The load comes from LLMs scraping this everliving shit of my sites and fortunately they don't seem to bother with filling in forms...
For the most part bots wish to be hidden and sites wish to reveal them, and this plays out over repeat games on small and large scales. Can be near-constantly or intermittently.
The bot usually gets to make the first move against a backdrop that the anti-bot may or may not have a hand in.
The website says: "Fast mode - requires 2080 MiB of shared memory. Light mode - requires only 256 MiB of shared memory, but runs significantly slower"
If you want your website challenge to work on the cheap phone - slow CPU, with little memory, and when implemented in Javascript, you'd have to tune complexity way down. And when a modern PC with fast CPU and tons of memory tries to solve it.. it probably will take only a few milliseconds, basically being useless.