If Claude Fable stops helping you, you'll never know

If Claude Fable stops helping you, you'll never know(jonready.com)

566 points by mips_avatar 6 hours ago | 274 comments

Related: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/

asveikau 7 minutes ago |

Aw shucks. You might turn out to need to do your own work. That would turn out so horrible for you.

SwellJoe 4 hours ago |

The moat looks deep today but it's going to become more shallow every year.

Training a new model from scratch takes serious resources. Post-training/fine-tuning an existing model, dramatically less. The knowledge for the process was esoteric two years ago, now you can ask a current model (one of several) to walk you through it, while building the tools to do it as you go. Several of my recent weekend projects have been exactly that sort of thing, just so I understand it better. "Let's make a LoRA", "let's generate a corpus of training data for fine-tuning a model for X task", "how can I put my face in a text-to-image model?" stuff like that. All of this is do-able on kinda modest local hardware (a couple of old GPUs or a Strix Halo or DGX Spark or big Mac Studio), or for a few bucks or a few hundred bucks or a few thousand bucks of cloud compute, depending on scale.

Scale that up to corporate or startup scale, with the money that's been flowing into AI for the past couple/few years, and it's obviously there's going to be a lot of competition just as the top model makers need to start ringing the cash register. That's a lot of opportunities for people to look at their ballooning Claude usage costs and find other ways to do the same thing for drastically less money. $100/month or $200/month is a no-brainer for Claude Code with probably the best model for coding, but they're pushing more users to usage-based billing which becomes cost-prohibitive real fast.

So, they desperately need to continue to be among the only ways to solve the hardest problems, and they need the alternatives to cost a similar amount. They can count on OpenAI and Google to ratchet up prices, too. They probably can't count on everybody, especially the vendors in China with different economics, to do it. And, they can't count on companies to look at their own usage and not ask, "Can we train a smaller specialist model that does this one thing we're using the Anthropic API most heavily for?"

I'm hoping they just mean stuff like using Claude for distillation by e.g. Chinese model makers, and not "how do I fine-tune Gemma 4 to write more like me?" or whatever.

hedora 4 hours ago | |

What moat? There are multiple companies providing pareto-optimal frontier models, and it takes O(10) people to build one of these things.

The rest is capital intensive, and the price will approach the cost of production over time.

Thinking this is a profitable endeavor is equivalent to claiming coal plants have good margins because boilers are expensive.

SwellJoe 4 hours ago | | |

I think we agree?

What moat? You answered yourself: "capital intensive"

But, history says the supercomputer of today will fit in your pocket in a few years.

They've bought up all the RAM and GPUs, which pushes the capital requirements upward for everyone else. But, they can't corner the market forever, there are too many competing interests. AMD and Intel keep making new GPUs and APUs. The memory makers can't just sell to only AI companies forever, if they do Chinese manufacturers will move in and eventually eat them from below (as has happened many times before).

They have a moat today, and it's just that it's really expensive to train and host frontier models, especially at commercial scale. It used to be there was also some secret sauce to making it fast and efficient. But, secret sauce is being published daily by all sorts of researchers, folks are figuring out how to do more with less and it often finds its way into llama.cpp or vLLM or SGLang within days or weeks.

jatora 52 minutes ago | | |

Other models arent even close except for gpt 5.5. You're dead wrong on that. You read too many benchmarks and/or chinese propaganda. There hasn't been a serious contender in agentic SWE besides OAI and anthropic for a long time, and no chinese model has even reached opus 4.5 performance yet. The moat isnt insurmountable but it is very solid for at least a 12 month lead time. Which is such an insane amount of time in this landscape and industry. The moat is stretching, not shrinking, on agentic SWE. And that is literally the only moat that matters for RSI.

redox99 1 hour ago | | |

O(10) people?

iplaymyowngames 1 hour ago | |

> The moat looks deep today

Does it? What can this model do that I both want and cannot already do?

Anthropic made a nice little post saying how dangerous it is, because it is good enough to eat their own business. But I don't want to eat their business. They also said it was good at playing Slay the Spire, but I can't think of anything more insulting than have a machine do that in my place. That's MY comfort game, not something for a stupid Clanker to take away.

They did not provide any other use case.

Ferret7446 1 hour ago | |

The moat is not the model, it's the harness. I wager that's one of the main reasons why Google made Antigravity closed source.

dudisubekti 26 minutes ago | | |

But harness is relatively easy to code yourself?

They're just system prompt composer, with some tool functions that the LLM can invoke. I've vibe coded my own in just one day.

jsw97 2 hours ago |

Given the high rate of false positives people are reporting for the non-silent cybersecurity, biological, etc., safeguards, there is a strong likelihood that you will encounter silently nerfed behavior even if you are _not_ violating their TOS.

Ultimately this will be evident in the way customers / external benchmarkers experience Fable. Hopefully competition will drive future models toward a lower false positive rate. Until that happens, Mythos and Fable users seem likely to have pretty divergent experiences.

nsingh2 1 hour ago | |

It's such an obviously bad policy, it's mind-boggling that they thought this was a good idea. It just breeds paranoia and mistrust, especially when people are already a bit paranoid about silent model quantification for cost cutting reasons.

KennyBlanken 23 minutes ago | | |

Another "knob" is reducing the thinking time...

KennyBlanken 25 minutes ago | |

If a benchmark is affected the model owner will almost certainly tune it, so there will be a game of cat and mouse...

Honestly, wouldn't surprise me if the AI companies try to detect benchmarking. Most hardware companies do...

code_duck 1 hour ago |

This is the way tech companies have been dealing with perceived abuse for years, at least a decade. Instead of telling you what a problem is, they'll just say "something went wrong". Theoretically this is to prevent bad actors from learning the bounds and how to abuse a system. It is similar to shadow banning.

somesortofthing 5 hours ago |

This is a fun peek into the economic implications of RSI/ASI. Because it's so infinitely valuable that it basically destroys all markets, labs will eventually do stuff like stop releasing models completely and skipping out on contracted commitments because they'll have the power to just drive their competitors out of business before the legal battle gets expensive.

Cloud providers - at first smaller ones, then the hyperscalers - will follow suit, completely closing sales to anyone but the labs and demanding payment in equity/direct decision-making power rather than cash. There's no particular reason why the inference/training split has to be 80/20, and no amount of willingness to pay can help you in an event that turns your money worthless.

torben-friis 5 hours ago |

They have a silent nerfing system for their models and say so openly. The obvious question is how much it is being used already.

Competitor companies being nerfed?

Non Americans getting worse code?

Punishing and rewarding users to maximize engagement, like online games do affecting victories through matchmaking?

gck1 18 minutes ago | |

No big pockets and ask it to review your own codebase for security issues? You hacker. Ban.

Anthropic simply can't be allowed to succeed. This is the most E Corp shit I've seen since I've been alive.

notrealyme123 4 hours ago | |

This send chills down my spine. For now I will not use Fable in my research. The risks of being sabotaged by the model are not worth it.

canada_dry 1 hour ago | |

I re-subscribed to GPT's "PLUS" plan after ditching Anthropic for lack luster results... one of the first coding tasks I gave it resulted in a progress/thinking message that said something to the effect of (it vanished too quickly to get a screen shot unfortunately):

                   Evaluating client value

It took me aback. Note: the code had nothing to do with "client value".

Behind the scenes it is not hard to imagine OpenAI, Anthropic, et al simply minimizing processing for clients - like me - that are hopping from one to another to chase the just released SOTA model.

cyanydeez 4 hours ago | |

$$$$$$: no nerf $$$$: a little nerf $$$: more nerf $$: are you poor? $: be permanent underclass

Ifkaluva 5 hours ago |

I guess an uncharitable way to read this might be “the ML engineers/scientists want to automate all of the jobs except their own.”

afavour 5 hours ago | |

The charitable read is that their restrictions for "safety" (i.e. what's separating Fable from Mythos) makes this inevitable. If you could just make your own Mythos it would circumvent the protection.

Which kinda just highlights how weird this situation is.

cyanydeez 4 hours ago | | |

"Haves" and "Havenots" is how they should be calling, init

throwaway89864 5 hours ago | |

Insta-job security.

CrankyBear 5 hours ago |

"Claude can now be silently nerfed. Anthropic has decided it won't tell users when this happens." W T F!!

__natty__ 5 hours ago |

This makes Fable unusable for me. If I cannot tell whether I am paying for the whole service or just a partial one, because somehow their guardrails have decided my work silently broke their terms of service, then I prefer to go to older models or alternatives

maxall4 5 hours ago | |

As someone who works in bioinformatics, and, as such, does a great deal of machine learning, this makes Fable unusable for me as well.

flexagoon 4 hours ago | | |

Fable would be unusable for you in a more literal way, since it just directly refuses to answer any query even remotely related to biology

ivanmontillam 3 hours ago | | |

For sure Anthropic should be developing a model without these guardrails for your use case? Kinda like Mythos is only available to certain organizations.

varispeed 5 hours ago | |

I am sure they've been doing that with Opus. I am getting mixed results all the time.

mike-cardwell 4 hours ago |

I spend a lot of time telling Opus 4.8 to search for security bugs in the code it wrote, and it spends a lot of time finding them, and then fixing them. Fable wont let me fix the security issues that Opus 4.8 created.

HarHarVeryFunny 1 hour ago | |

Yeah, this breaks the notion that the technical debt you're accumulating with today's AI can be fixed by tommorrow's AI.

Tomorrows AI may either refuse, or silently mess up your code because Anthropic don't like what you're working on.

soraminazuki 1 hour ago | | |

Yup, you always have to consider the modus operandi of the tech industry when listening to the utopian dream that very same industry is espousing.

gck1 15 minutes ago | |

All you need is a little bit more money, a few millions will do, and you're on board with access to non-nerfed model. Sounds like a fair deal to me.

zoogeny 3 hours ago |

It is very difficult to see this move as anything other than Anthropic pulling the ladder up behind itself. They can dress it up in "safety" all they want, I find it hard to interpret this in a charitable way.

This reminds me of how dark-pattern common wisdom in Web 1.0 website development was to ban external links. Then how social apps prevented the export of data and actively worked to nerf significant interoperability through APIs.

But this is a tool, not just a data moat. Like a knife that degrades your ability to create knives. Or like a text editor that prevents you from implementing a text editor.

numpad0 5 hours ago |

I don't understand how businesses could trust cloud LLMs going forward with this ongoing "safety" paranoia. Building dependence on them doesn't feel like a sane strategic decision for users.

stale2002 3 minutes ago | |

Of course you can trust them.

Just do benchmarks yourself on the new model and decide if it is valuable for your usecase, even with the supposed nerfing.

Benchmarks are benchmarks. And you can ignore the data at your own risk.

forshaper 5 hours ago | |

Looking better and better for people to go after local solutions.

mcmcmc 4 hours ago | | |

Tell that to the GPU market

thinkingtoilet 4 hours ago | |

Because this effectively hinders 0% of people. I understand why people don't like it but day to day this is nothing. If you're using it for coding, it won't stop you. The pearl clenching here and over reacting is predictable and sad. If you are working for a large organization and you were going through the vendor procurement process, questions like Can this produce pornography? Can this tell my employees how to break the law? are normal and anyone wiht half a brain knows that this is the case. Before people jump on that, I understand people have access to the internet. Your question "how businesses could trust cloud LLMs going forward" is absurd and you know it. There is an extremely small set of edge cases that effect 0% of people day to day. You can trust them just fine.

gopher_space 3 hours ago | | |

This is software development, not sales. We rely on our tooling.

If I’m using a calculator to verify my math, I don’t want to use a second calculator to verify the first one.

cubefox 5 hours ago | |

It's not paranoia. Cyber attacks have gone up massively in the past few months even with the weaker models we had so far. And Claude Mythos 5 scores even higher than the unreleased Mythos Preview on ExploitBench. If you made this capability publicly available you would see another acceleration of cyber attacks.

extr 5 hours ago | | |

This isn't even about cyber attacks. This is just LLM development which is increasingly just called software development. And at least for cyber it says "Sorry I can't help with that"!

josh-wrale 10 minutes ago |

It strikes me that Karpathy's Auto Research loop might trigger this...

variety8675 5 hours ago |

It is absolutely fine to distill the IP of everyone else, but you'd be violating the TOS to distill ours :)

thot_experiment 5 hours ago |

It's a SaaS, when in the history of SaaS has it ever been a good idea to trust that the company won't ruin the product under you?

jkxyz 4 hours ago |

"To effectively contain a civilization’s development and disarm it across such a long span of time, there is only one way: kill its science." - Cixin Liu, The Three-Body Problem

This immediately made me think of the Sophons silently manipulating the sensors of particle accelerators to prevent humanity from developing advanced knowledge of particle physics.

delichon 4 hours ago | |

The level of oppression necessary to get software geeks to stop making progress on AI is similar to that necessary to get Ukrainian geeks to stop making progress on drones.

NewJazz 3 hours ago | | |

Even if our oppressors are ineffective, we must still resist them and not underestinate them.

xyzsparetimexyz 4 hours ago | | |

Not so sure those things are equivalent

sometimelurker 1 hour ago | | |

unless you could convince them that making smarter-than-human AI is bad. it would be nice if we all thought this. instead they should figure out how to make dumb models faster or more efficient, that's safe

mylifeandtimes 2 hours ago | |

and my mind went to the current US administration. Sigh. You made the better choice.

mips_avatar 5 hours ago |

I'm really uncomfortable with these changes, like everything Anthropic's doing as "frontier research" today will be regular product engineering in a year.

kingcauchy 3 hours ago |

The silently never telling you is so insidious on top of it being ridiculous given how they trained the model in the first place. We do distributed model training for embedder/reranker models and I'd deeply resonate that this article's message exactly for our company. We couldn't trust the model in the first place, but now the model is intentionally burning our money if we asked it the wrong question, on top of being deeply expensive in the first place. If we did find evidence of being incorrectly nerfed, we'd never be able to reach a human to let them know. Too many reverse incentives with Anthropic, maybe they're about AI security but that doesn't make them ethical to consumers (i.e. humans).

hatthew 41 minutes ago |

I work on "AI" stuff. Not LLMs, but large neural nets that include transformers and are as big as the smaller LLMs of today. Half the prompts I give fit their category of examples like "building pretraining pipelines, distributed training infrastructure, or ML accelerator design." I generally don't trust AI and have been very slow to trust and adopt it, but recently I've been warming up to it as part of my coding workflow.

Now with this, it makes me wonder if I should step back? Should I try to get used to a non-claude model/harness? Should I go back to less AI in my workflow? Either way, it makes me less inclined to pay for tokens from claude.

Artoooooor 4 hours ago |

It is as if Jetbrains told that "you can't use IntelliJ Idea to develop frontier IDE. We can introduce slight compilation errors if we detect you doing so".

kajman 3 hours ago | |

Chilling. They could break my Gradle and I would hardly notice.

vdfs 2 hours ago | |

It would be runtime errors

skeptic_ai 39 minutes ago | | |

That’s too easy. Would be nicer memory leaks, intertwined spaghetti code, time dead bombs, bugs based on time

HoldOnAMinute 2 hours ago | |

Modern-day Stuxnet

mips_avatar 4 hours ago | |

Thats exactly it

prmph 4 hours ago |

Wow, this is like saying:

> If you buy a car from us, you agree not use it driving to and from work that involves automotive R&D that might compete with our product. And if our (heavily spying) car detects you are violating this, it will slow down to 20mph and cannot be made to go any faster, until we are sure the violation has ceased.

> If you buy a laptop from us, you agree not to use it to study or acquire any knowledge that you may use to compete against us. If the laptop detects such a use, it degrades to one core and 4GB of memory, until the violation stops.

porphyra 3 hours ago | |

If your car slows down to 20mph you'd instantly know. If Claude silently switches to dumb mode, you might not even realize.

8note 2 hours ago | | |

we notice when anthropic thinks it's kept claude smart while degrading for capacity. we will definitely notice when they purposefully make it dumb

SauntSolaire 4 hours ago | |

Or "we'll ship our code as binary blobs so you can't reverse engineer it".. oh, wait

Paracompact 3 hours ago | | |

This impacts the functioning of the product, not the form of the product.

capevace 3 hours ago |

has dario (or sam tbh) ever been thoroughly asked about the hypocrisy of them claiming distillation to be „theft“ vs. them training on the copyright of others?

I’ve only seen him talk about one of those topics, but never together.

I just can’t see how you can talk yourself out of that hypocrisy, if BS answers are properly followed up on (journalism!)

FeteCommuniste 1 hour ago | |

Distilling the entirety of thousands of years of human intellectual output: totally cool.

Distilling the answers of one LLM: totally uncool.

anankaie 3 hours ago | |

Moreover, the Library of Congress ruled that LLM outputs are not copyrightable, so technically, Anthropic has even less claim here.

skeledrew 4 hours ago |

It was good while it lasted. Time for me to resume my migration to another provider. One that promotes an open ecosystem, even if I can't opt out of them using my data to train. Heck I'll actively GIVE them my data and do my part in promoting openness, tiny though it may be. DeepSeek and GLM looking damn fine for a start.

0xbadcafebee 42 minutes ago |

OpenAI already did this when it released its "super scary advanced" security model. They silently return an earlier model's results if they think you're redteaming/abusing with it. https://openai.com/index/scaling-trusted-access-for-cyber-de...

helsinkiandrew 1 hour ago |

> Startups train embedding models. They build rerankers. They finetune and host small llms.

Isn’t that prohibited without permission from Anthropic: https://support.claude.com/en/articles/12326764-can-i-use-my...

nsingh2 1 hour ago | |

This isn't about training on the output tokens from Anthropic models, it's just about using their models to build things like pretraining pipelines, etc. Even if you train on your own data.

From the phrasing, it might as well be that any ML or infra. related work that even incidentally looks like it could be used to train LLMs may trigger a silent nerf.

comboy 5 hours ago |

I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.

tempestn 4 hours ago |

> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.

You should be able to know if your problem was solvable by using your own expertise and judgement, no? If you're relying on LLMs as a substitute for those, I wouldn't expect great results.

notrealyme123 4 hours ago | |

You come up with a hypothesis -> you let fable implement it -> fable sabotages your experiment -> you get evidence that hypothesis is not true.

It's that simple.

hedora 4 hours ago | | |

Or, worse:

- It says your safety hypothesis is true, you incorrectly ship, killing lots of people.

- It proposes dangerous experiments.

hedora 4 hours ago | |

No; once the LLM switches to this new saboteur mode, it’ll be very hard to detect.

Sabotage is an asymmetric weapon. The ratio of damage to effort is nearly unbounded, and any decent saboteur knows that the key trick is to make your output indistinguishable from incompetence.

They’re building state of the art offensive capabilities into a public model, then expecting to maintain control over when it decides to attack its human users.

The premise is laughable, and we’ve all seen how this movie ends.

HarHarVeryFunny 1 hour ago | | |

Great way for Anthropic to build trust with the military

gck1 1 hour ago |

Wait, so to get this straight, Anthropic knows:

1) LLMs are non-deterministic

2) This class of models has a particular tendency to "misbehave"

3) Their classifiers have a high rate of false positives

4) Millions of people give these models access to their machines

And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic?

I think this has a name. I think it may be called malware.

creativeSlumber 1 hour ago | |

... that you pay to install on your machine.

atleastoptimal 4 hours ago |

There is a possibility this may not end at simply nerfing the model. The idea of manipulating the behavior of a model depending on the prompt given to it can extend to

1. Detecting if employees from competing companies are using it and sabatoge their work, even not LLM-training related

2. Direct users to outcomes that would justify higher compute spend. Deliberately coding a project to 95% completion but designed to be losing a critical step right before one's weekly rate limit is expended

3. Reduce the quality of writing when a person is writing an essay where the argument is against the interests of the model company, or steering the user using the model for brainstorming in a direction which causes them to waste time or abandon their train of reasoning

etc. etc. The possibilities are enormous. Many people use AI daily for their job, personal advice, companionship. A model company that steers the behavior of the model towards a deliberate outcome could develop a controlling interest in human behavior and productivity at large, even with subtle influence would compound enormously over its millions of users.

dimitri-vs 1 hour ago | |

Anthropic: were commiting to being ad free.

Also Anthropic: if you use our models in any way that might negatively impact our revenue we'll sabotage you.

Can I pick the ads please?

maxbond 1 hour ago | | |

The ad-supported alternative suffers from the same principle-agent problem. What's to stop an ad-supported model from declining to refer you to products that would be better for your use case but who's vendors haven't paid the model's provider?

Ultimately if you can't trust the provider it is game over and you don't have an alternative other than to move to self hosted and open source solutions.

matheusmoreira 2 hours ago | |

This is terrifying.

Avicebron 5 hours ago |

Can't you just switch the toggle that says "switch models when a message is flagged"? I turned mine off in case anything does get flagged I will know..

For now, I'm really not happy about this limited rollout and then turning off. That's probably the most egregious thing I think Anthropic has done recently

platinumrad 4 hours ago | |

This is a separate mechanism. The user is not notified about the flagging and rather than redirecting to a weaker model, the response is intentionally sabotaged.

It's user-hostile to the point of parody.

Avicebron 4 hours ago | | |

I stand corrected. That sucks. A lot.

djfergus 2 hours ago |

We need a benchmark that tests a models ability to do LLM research.

sneilan1 4 hours ago |

I am so happy that Anthropic has signaled the possibility that their UI moat for agentic AI is copyable by competitors. At least that's the way I read this. When companies try to lock something down it can be a signal of weakness.

If so, it's possible to built great user interfaces in Chatbots and more companies/people can have amazing agentic development workflows! We don't have to live in a world where only the market leader has the most enjoyable model.

7e 14 minutes ago |

Any market that Anthropic suddenly thinks is valuable will silently and suddenly be off limits to you. They will train their model on your prompts, and then become your competitor.

amdivia 2 hours ago |

Aren't there immense security risks when the model is allowed to deceive even if it was for "good"?

Reminds me of an excerpt from Edward Fredkin's "The intelligent machine" [1]

https://noor.imx.sh/2017/09/30/when-they-communicate-they-co...

altcognito 3 hours ago |

I suspect we'll get the same behavior from Codex, even if they don't openly say as much. Maybe they'll openly lie and say "noooo, we'd never do such a thing"

More efforts to get more data and processing power behind local models.

dhbradshaw 1 hour ago |

I think evals are the key here. If your fable system fails them, it's a bad system for your use case. If not, compare cost with other systems that also succeed.

throwawayffffas 4 hours ago |

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

Dig that moat son, we would want to automate our job away.

andrewchambers 4 hours ago |

So this is what 'alignment' looks like to them.

extr 5 hours ago |

I'm a big fan of Anthropic. Just check my post history. I've been accused of working there. But this is complete bullshit and they need to get real. Silent sandbagging is not acceptable, especially given they've shown with this release their safety filters have HUGE amounts of false positives.

zzleeper 4 hours ago | |

It's increasingly obvious that the only safeguard we got is open models and semi open ones like from China. Crazy world

lelanthran 4 hours ago |

I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.

Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO?"

idle_zealot 4 hours ago |

I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.

davesque 1 hour ago |

And they probably don't enforce those restrictions within their own company would be my guess.

charlie90 1 hour ago |

Epic. I love the future where everyones dependent on AI and you can just get shadow banned from reality.

Anvoker 5 hours ago |

This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.

pablogancharov 5 hours ago |

“When you realize the goal is the path, the pursuit itself becomes the prize. Stones in the road are not obstacles blocking your path; they are the path”

now I understand distillation is much more important thank I thought

trilogic 5 hours ago |

https://huggingface.co/Trilogix1/Hugston-Nex-N2-Pro-gguf

Levitating 3 hours ago |

I don't know why anyone is surprised with this, it's their product it's going to behave on their terms. If anything it is surprising that they're admitting to it.

If these interventions create demand for a model with fewer safeguards surely a competitor will meet that demand.

jesse_dot_id 2 hours ago |

Will be funny when I can call the Office of Weights and Measures on Anthropic because they underweighted the model I was paying for and got pwned because the dumber one missed something.

agnosticmantis 3 hours ago |

Governments need to stop contracting these companies and instead invest in public, fully open source models.

These companies are owned and operated by the darkest of dark triads our species has managed to evolve. I doubt Dario is self-aware enough to realize the hypocrisy in all of this safety theater.

Personally I don't even mind that they are anticompetitive and power-hungry (same as it ever was), but it's the cringe-worthy hypocrisy that grinds my gears. This new brand of self-righteous paternal savior overlords is just unbearable.

noncoml 5 hours ago |

Disillusioned CEOs convincing themselves they have the mandate and right to define morality for everyone else. They get to decide what is right, wrong, permissible, or dangerous from the top, in the name of "safety". This is corporate nannying.

themaninthedark 5 hours ago | |

You just have to force behavior...

https://youtube.com/shorts/QmGGUnZNqv4?si=Q4CsGsYMvR02vay8

miroljub 5 hours ago | |

It's dangerous when personal moral and religious beliefs of company leadership leaks into the product itself and get force fed upon customers.

maipen 4 hours ago | | |

careful there cowboy, we are in the golden age of ai, regulation is still catching up.

You don't want to sell guns to people without some sort of background check. The amount of exploits found in the last few months have been pretty scary already.

This is just one more layer of caution, because it reveals how little we know how these llms work. They know how to make them, but they seem to be unable to properly restrain them.

sometimelurker 1 hour ago |

been thinking, and ngl, this has probably already been happening in their models. I'm sure the other labs probably do the same.

just self host at this point

wookmaster 2 hours ago |

Skeptical they’re even able to pull up a ladder there’s so many more models out there making great progress just behind them.

mrinterweb 5 hours ago |

It kind of sucks, but I get the silent change. If a user was trying to use the model for something untoward, having a rejected prompt would just give signal to train on how to eventually successfully bypass security measures.

antaviana 5 hours ago |

It seems we now have a new product category, HaaS, Hallucination as a Service.

ashley95 2 hours ago |

Has it finally come time that I have to be nice to Claude?

manoDev 3 hours ago |

Linux killed proprietary UNIX; open source models will kill proprietary AI.

_0ffh 3 hours ago |

No at least we know why they spent all that money on "safety research".

darkbatman 5 hours ago |

This is crazy and would be frustrating, I probably would just be using another model as authority and keep fable as reviewer only in this case.

tuggi 5 hours ago |

It’s very frustrating…

mips_avatar 5 hours ago | |

Like if you hired a different services company who decided to sabotage your business that would be fraud.

Guillaume86 5 hours ago | | |

The EU could/should probably legislate against this, it's bonkers...

hmokiguess 5 hours ago |

I'm sure someone is gonna be able to jailbreak, abliterate, or equivalent, on this input moderation attempt they have going on.

cayley_graph 4 hours ago |

Intentionally and silently sabotaging work done with Claude whenever Anthropic decides it is appropriate is unacceptable behavior, and comically tone deaf given the state of open models. Why on earth would I ever pay for a malicious product?

gowld 5 hours ago |

That's always been the case with corporate LLMs.

chroma_zone 4 hours ago | |

Minus the policy restrictions, this has always been true for all LLMs in general.

exabrial 4 hours ago |

New frontier in anti-competitive practices.

cute_boi 5 hours ago |

I tried today and it gave cybersecurity error on base64 implementation. It is so nerfed....

mips_avatar 5 hours ago | |

At least it gave an error! This whole silent nerfing idea is so wrong

dgudkov 1 hour ago |

"We collect everyone's data without paying a dime or respecting copyright, trained our models, but you can't train your models on our models that are trained on everyone's data collected without paying a dime or respecting copyright. We did a hard job stealing that all data and processing it, have some shame!"

jadar 50 minutes ago |

I’ve already had Fable disable itself during a normal /code-review skill invocation. What a joke.

m_krebs 4 hours ago |

this is probably overstating their abilities at present - I am experimenting with Fable on a completely benign personal application and I am constantly hitting the "cybersecurity and biology topics" guardrail

BoorishBears 4 hours ago |

"Anthropic says these safeguards only affect 0.03% of developers. Maybe that's true today."

I don't think it's true today. It's like when schools mention "average class size", where that average is dominated by classes with like 2 students instead of classes with 100.

Much more honest would be the percentage of developers who previously used their models for the model development tasks they're targeting, but it actually looks like they're saying 100% of them are affected based on the language around it "always having been prohibited".

So awful.

varispeed 5 hours ago |

That's what I observed with Opus. This is probably a lawsuit going to happen because you pay for tokens and you expect to get performance you pay for, instead you never know if the model suddenly become dumb and your whole session has to be started again.

CamperBob2 5 hours ago |

We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building ... distributed training infrastructure ...)

What an interesting thing to call out as a threat. Hmm.

mohamedkoubaa 1 hour ago |

PSA: Treat these models like genius interns.

edot 3 hours ago |

Wow, this is horrible. Local LLMs are the future. Thanks, China! Seriously crazy that I’m saying that, but the American companies are being so anti-freedom they’re making the CCP look libertarian.

Also, Fable’s sensing is hypersensitive. Feels like they just have regex for phrases. No nuance. If I say I’m working on something using “GPUs to train” xyz then, will that trigger this sneaky silent screw-my-stuff-up mode?

derac 5 hours ago |

Is there some consumer protection law around this?

morpheos137 3 hours ago |

I wonder if this would qualify as illegal anticompetitive behavior?

lwhi 3 hours ago |

The part that disturbs me most, is that the model won't reveal you've reached the threshold.

It's literally been designed to gaslight its users in these cases.

kingcauchy 3 hours ago | |

"We won't use this product to spy or build weapons but you'll have to trust us, but we're also going to intentionally lie to you when you break our terms of service but trust us."

nharada 4 hours ago |

Imagine if Github said "if we detect you're building a competitor to Github, we will silently degrade the results of your CI actions so that tests sometimes randomly fail"

hsaliak 4 hours ago |

Big Monsanto energy

iLoveOncall 5 hours ago |

At this point you're criminally incompetent if you still feed your proprietary data and code to AI labs.

They legally can steal it all and now you can't use the product of this theft to improve your own systems.

hbarka 4 hours ago |

I think this is a bit hyperbolic. Fable will fall back to Opus.

MichaelNolan 4 hours ago | |

> Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through …

No it won’t fall back to Opus, they will purposely return dumbed down or tainted information with the goal of the end user not knowing the results have been impacted.

schrijver 3 hours ago | |

That’s a separate mechanism, and it will tell you so if it does (if the prompt is remated to cybersecurity, biology)

TZubiri 3 hours ago |

If I understand correctly, this is to protect against distillation Reverse Engineering like Deepseek vs OpenAI.

6510 3 hours ago |

Reads like, permanently shadow ban.

mystraline 5 hours ago |

I have never ever trusted "corporate ethics".

Theres no ethical framework. No axioms. Its a mixture of legal, political, and public-facing 'rules'. And what are the rules? Youre not permitted to know.

"We reserve the right to lie about the models we provide, silently downgrade you, and give you blatant misinformation cause you triggered our unstated rules... BUT we'll still use your token budget with lots of thinking and waste your money."

No, folks. Seriously, local LLMs are where its at. You can run the model YOU want, on your hardware, with no data exfiltration.

And with tools like Krasis that can synthesize nvidia ram and system ram as unified-ish memory, makes doing Local LLMs absolutely foable, now!

hedora 4 hours ago | |

The rules:

- Breaking fiduciary responsibility is (almost) the only way you go to jail.

- At acquisition/merger/bankruptcy, data, customers, employees (chattle) are assets to be sold off to pay debts. This takes explicit priority over contractual obligations (like “we don’t sell personal data”)

dofm 4 hours ago |

PRODUCT VIOLATION

https://www.youtube.com/watch?v=Tr3t1uZNbKo

DIRECTIVE 4: [Classified]

Any attempt to arrest a senior officer of OCP results in shutdown.

—

Putting aside my snark, is Anthropic actually anticipating some new expansion of ITAR? (Or a stipulation for the Trump administration taking/not taking a share?)

That is to say, do they expect to be told that they must have this mechanism, not just the terms?

greatgib 4 hours ago |

Imagine if code editors were created by greedy **** behaving as Anthropic, and it would not have been allowed to create other code editors using an existing code editor. Or even better, you couldn't use Bash, zsh, ... to create another cli prompt input tool like Claude Code...

mickdarling 4 hours ago |

No, this is their get out of jail free card if people start complaining about the model being dumb or forgetful or lying, they can just say, oh well, you must have been doing something that triggered its distillation prevention technique.

And, they can say that for anybody at any time, and you'll never know why, and there's no way to prove it.

Everyone needs a flight data recorder to prove... "here's what I was actually doing and why it was not distillation." And now you're having to prove your innocence instead of them having to prove you're guilty, and really at the end of the day, it's just the model being stupid that they're protecting themselves from.