Anthropic downgraded cache TTL on March 6th(github.com) |
Anthropic downgraded cache TTL on March 6th(github.com) |
Why the FUD?
I notice some interesting public opinion weather change since Anthropic passed OpenAI wrt revenue
>> Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
The entire issue lays out how and why it's a silent downgrade. Also silent because it just happened, without announcing.
I don't understand how is this FUD?
I mean, you are investing a lot (infrastructure and capital) into something that is essentially not yours. You claim credit for the offspring (the solution) simply because it resides in your workspace. You accept foreign code to make your project appear more successful and populated than you could manage alone. Your over-reliance on a surrogate for the heavy lifting leads to the loss of your own survival skills (coding and debugging). Last but not least, you handle the grunt work of territory defense (clients and environments) while the AI performs the actual act of creation (Displaced Agency).
https://redbeardlab.gitbook.io/acem/essays/ambient-developme...
I basically wrote a small GitHub app and I simply create a GitHub issue, the bot read it, run an LLM loop and come up with a PR (or a design)
Then I simply approve the pr (or the design)
I find it much calmer and much more productive
I canceled my subscription and switched to a codex, but it's not as good. I'm tired of Anthropic changing things all the time. I use Claude because it doesn't redirect you to a different model like OpenAI does. But now it seems like both companies are doing the same thing in different way.
anthropic for now, at least just seems to change quantization of the model
So you'd need some adaptive algorithm to decide when to keep caching and when to purge it whole, possibly on client side, but if you give client the control, people will make it use most cache possible just to chase diminishing returns. So fine grained control here isn't all that easy; other possible option is just to have cache size per account and then intelligently purge it instead of relying just on TTL
the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty.
The very instant the AI suppliers lock in a dependency on their product, prices are going through the roof.
Looking at the table with February and April- I don't get it. What am I missing?
The cost and number of calls look pretty aligned on all rows
But more likely they are constrained on GPUs and can't get them fast enough.
(My guess having no understanding of how this industry actually works.)
They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one
So I can't continue my claude code session I started yesterday.
Phase 1: $200/mo prosumer engineer tool
Phase 2: AI layoffs / "it's just AI washing"
Phase 3: $20,000/mo limited release model "too dangerous" to use
Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.
Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."
Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.
Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.
Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.
We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.
"You wouldn't distill an Opus."
You will be backstabbed
You will be squeezed for all they can.
And you will be betrayed.
> Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China
Anthropic is a private company but nevertheless, the sentiment is accurate and applies to all kinds of corporations.
I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.
It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.
---
Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:
Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).
And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.
(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)
Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.
It all boils down to a brilliant but extremely expensive technology. Both to build and to run.
We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.
Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.
Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.
Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.
I've been using GLM for over 6 months and pretty happy.
Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.
They DO NOT want you to run AI. They want you to pay them to do it
The AI landscape in China is larger than just Qwen and Alibaba.
If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.
Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.
Like with the dot com bubble there will be a crash and then whatever shakes out of that will be the companies and products who invested in understanding the actual strengths and weaknesses of the tech, instead of just trying to slap an "AI" sticker on everything.
This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.
However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.
But I wrote public because it is at least possible for privately held company to not do all evil (see Steam, tho in case of Steam it is more "do your thing and look as competitors are shooting themselves in foot"...)
So, it makes a lot of sense to get people a "demo" and claim the paid product is better.
i think a lot of people have no idea how capable local models are atm.
z.ai did go public on the HK exchange. They are under pressures similar to other public companies.
I know that China models are increasingly being trained and run using Huawei chips instead of Nvidia. I know China has a surplus of electricity from renewables (wind, solar, hydro).
its under new license prohibiting any commercial use.
That license is more like business source license vs open source license.
You know what else was under a license prohibiting commercial use? The training data.
The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".
If China is forced to choose between giving the entire AI market to the US or releasing free models, they'll be releasing free models as long as it's necessary.
We are almost at that point now, where the harnesses and tools are more important drivers of functionality and performance than the model weights themselves. We'll get there.
I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.
- Banning OpenClaw users (within their rights, of course, but bad optics)
- Banning 3rd party harnesses in general (ditto)
(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))
- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.
- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.
It's all circumstantial but everything points towards "desperately trying to cut costs".
I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.
I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?
What they changed is that it now uses extra usage, which is charged at api rates
How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.
For example, after a question about a script that used taskset and CUDA_VISIBLE_DEVICES to handle NUMA domains on a parallel GLUE evaluation run:
>Round 2 tasks reuse GPU slots that Round 1 may still be occupying. Cola, SST-2, and QQP are small datasets and will likely finish before MNLI, but QNLI is sizable. If QNLI (GPU 3, slot 3) is still running when Round 2 launches, that's fine — they're on different slots. But more importantly, Round 2 uses slots 5/6/7 (GPUs 7/0/1), while Round 1 uses slots 0/1/2/3/4 (GPUs 4/5/2/3/6). So there's no GPU collision — you're good here actually. Disregard, this is clean.
Support consisted of AI bots saying you did something stupid, you did something wrong, you were abusing the system, followed by (only when I asked for it explicitly) claiming to file a ticket with a human who will contact you later (and it either didn't happen or their ticket system is /dev/null).
(By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)
Generally I find codex and claude make a good team. I'm not a heavy user, but I am currently Claude Max 5x and ChatGPT Plus. Now that OpenAI has a $100 offering and I am finding myself using Claude less, I am considering switching to Claude Pro and ChatGPT Pro x5. The work hours restriction on Claude Max x5 really pisses me off.
I am not a heavy user. Historically I only break over 50% weekly one week a month and average about 30-40% of Max x5 over the entire month. I went Max because of the weekly limits and to access the better models and because I felt I was getting value. I need an occasional burst of usage, not 24/7 slow compute. But even for pay-as-you-go burst usage Anthropic's API prices are insane vs Max.
I have yet to ever hit a limit on codex so it's not on my mind. And lately it seems like Claude is likely to be having a service interruption anyway. A big part of subscribing to Claude Max was to get away from how the usage limits on Pro were causing me to architect my life around 5hr windows. And now Anthropic has brought that all back with this don't use it before 2pm bullshit. I want things ready to go when the muses strike. I'm honestly questioning whether Anthropic wants anyone who isn't employed as a software engineer to use their kit.
Anyway for the last month or so codex "just works" and Claude has been an invitation for annoyances. There was a time when codex was quite a bit behind claude-code. They have been roughly equal (different strength and weaknesses) since at least February (for me).
Very sad considering I got my whole company on Claude Code for them to just ban be like this, with no customer support response.
100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.
it's a bug only if they get a harsh public response, otherwise it becomes a feature
I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.
I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.
But claude -p is fine.
... Or it was 2 week ago. Who knows if they've silently throttled it by now?
1) Nobody should expect to use OpenClaw without API usage.
2) We have known for a long time that the plans are subsidized. It was not as big of a deal but now that demand has continued to explode at a multiple and tools like OpenClaw were creating a lot of usage from a small minority of customers, prices change.
Everything for me points more towards, we have made a service people really want to use and we are trying to balance a supply shortage (compute) with pricing. Nothing is stopping folks like yourself from simply paying the API rates. It is the simple no hassle way to get around any issue you are having, pay the API cost and you will have no limitations!
Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.
It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.
A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.
There's 0 chance of him facing the consequences for it either.
Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.
I now have been using Codex and everything has been great (I still swap back and forth but generally to check things out.)
My theory is just that the models are great after release to get people switching, then they cut them back in capabilities slowly over time until the next major release to increase the hype cycle.
I think it's more likely they're trying to optimize the Claude Code prompts to reduce load on their system and have overcorrected at the cost of quality.
1: https://gist.github.com/roman01la/483d1db15043018096ac3babf5...
Duly runs, and finishes. "All merged into develop".
I do some other work, don't see any of this, double check myself, I'm working off of develop.
"Hey, where is this work?"
"It is in this branch and this worktree, as you would expect, you will need to merge into develop."
"I'm confused, I asked you to do that and you said it was done."
"You're right and I did say that but I didn't do it. Shall I do it now?"
There's like this really weird balancing act between managing usage, but making people burn more tokens...
I was using both Codex and Claude Code heavily on some projects this weekend.
In one project Codex was screwing everything up and in another one absolutely killing it. I’ve seen the same from Claude.
In the bad Codex example it had the wrong idea and kept trying to figure out how to accomplish the same thing no matter how many times I attempt to correct it. Undoing the recent changes where it went down the wrong path was the only way to get things back on track.
I wonder if context poisoning is a bigger problem than people realize.
I'm on the enterprise team plan so a decent amount of usage.
In March I could use Opus all day and it was getting great results.
Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.
This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.
It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.
That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.
It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.
The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.
And beyond that, there's a ton of people who are just regular 9-5 Claude CLI users with an enterprise subscription who are getting punished with a worse model at the same price just as if we were Claw users. This kind of thing does not make one feel warm and fuzzy. I feel like I just got a boot to the teeth.
Sometimes Claude can be a little weird. I was asking it about some settings in Grafana. It gave me an answer that didn't work. I told it that. "Yeah, I didn't really check, I just guessed." Then I said, "please check" and it said "you should read the discussion forums and issue tracker". I said "YOU should read the discussion forms and issue tracker". It consumed 35k tokens and then told me the thing I wanted was a checkbox. It was! I am not sure this saved me time, Claude. I am not experienced enough to say that this is a deal breaker. While this is burned into my mind as an amusing anecdote, it doesn't ruin the service for me.
My coworkers have noticed a degradation and feel vindicated by some of the posts here that I link. A lot of them are using Cursor more now. I have not tried it yet because I kind of like the Claude flow and /effort max + "are you sure?" yield good results. For now. I'm always happy to switch if something is clearly better.
I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.
So yeah... I'm not thrilled with that, because I had done a similar analysis in December and had plenty of logs to review.
The results I do have for the last month aren't great. If you're curious I did post the results on HN:
It looks like the spreadsheet-touchers over at Anthropic won out over the brand leaders, which is too bad as good will can be a trench if you don't abuse your customers.
Claude Code was able to implement something in one shot. It was decent for a proof of concept initial implementation. It's barely able to do work now with full specs and detailed plans.
ChatGPT is also being watered down.
It seems obvious that Anthropic and OpenAI aren't the solution to any problem.
The quality will be a bit behind frontier proprietary models. You gotta pay for what you use, no way to cover your expenses from peers underusing their subscription. But otherwise it should be a reasonable middle ground, with very little risk of rug being pulled out from you.
Quite interesting considering all the claims that Cursor was dead a few months ago.
The SI symbol for minutes is "min", not "M".
A compromise would be to use the OP notation "m".
If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in first place, and which in turn causes you to burn through next session quota even faster.
Seems like a vicious cycle that made the UX very poor. I remember Claude Code with Pro became virtually unuseable in middle of March with session quota expiring within first hour or less for me -- which was wildly different experience from early March.
Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.
[0] https://www.dwarkesh.com/i/187852154/004620-if-agi-is-immine...
It costs him more in ingredients alone than he charges. He even offers some pseudo unlimited buffet, combo sets, and happy hours.
He announced a new restaurant, apparently it will be even better, so good he's a bit worried. He makes sure to share his worries while he picks a few select enterprise for business parties and the likes.
In the meantime he cracks down on free buffet goers who happen to eat too much, and downgrades all ingredients without notice to finally hope to make a profit.
All the news i hear about this company for the past weeks made it sound like they're really desperate.
However, for the basic turn-based conversation the cache (at 5 minutes) is almost always insufficient. By the time I read the LLM response, consider my next question, write it out, etc. I frequently miss the cache.
I imagine it is much more useful if you have a tool that has a common prefix (like a system instruction, tool specs or common set of context across many users).
If you can get it to work frequently enough the savings are quite worth it.
It would be interesting to graph the cost/savings of this approach based on context length, percent cached, etc.
The UI for this is a bit tricky, I could mark conversations as "active" and then do the ping/pong dance on only active conversations and up to some determined max cached (e.g. 1 hour).
Edit: I may have conflated these two threads. https://news.ycombinator.com/item?id=47739260
https://www.anthropic.com/engineering/a-postmortem-of-three-...
Since the caching really primarily is something they can be judged at scale from across many users I can only assume that Anthropic looked at their infra load and impact and made a very intentional change.
It’s only making the news now because it’s affecting Max users as well ($100/$200 plans). I understand the need for change, but having zero communication about it is just wrong.
Meanwhile their 'best' competitor just announced they want to provide unreliable mass destruction guidance tools but they don't wanna feel said.
Honestly speaking, we are wrong whenever we do business with this sort of people
FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.
What judges say is that whatever is problematic should be dealt by customer support.
For example, provider X is faulty and causes damages to you or a third party. You contact the company and the company must have a procedure to give a formal answer when required.
If that's is breach of the contact, although not required by law, the company can offer to fix the problem or at least an explanation and why is that in the contract.
If you still feel that's a breach of the contract and the company is not willing to cooperate, then you can file it.
In other places, there are laws that cannot be undermined by forceful terms of service or contracts. For example, you have the right for law anywhere.
I more or less understand the whys of why US is like that, but it feels that the law is bendable.
So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.
You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.
A few days later it simply stopped working again, API authentication error. What must I do to have working, paid, premium service?
Screwing around with it today, it works 5x slower and times out all of the time. I'm paying more and getting waaaaay less. Why can't companies just raise prices like normal?
They do indeed get the product they originally paid for.
It's simply that they were suckers and didn't read the "fine" print of the product they bought.
The label says "more tokens than the lower tier".
For instance on exe.dev VMs with Shelley agent/harness and Opus 4.5/4.6, I haven't noticed any deterioration.
Any similar feedback perhaps from Opencode / GH Copilot subscription-provided Opus models?
The UX of codex is exceptionally nice however.
If you don't believe me you can search HN posts about Codex/Claude six months ago.
And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.
As others have said, anthropic is between a rock and a hard place, you can't scale compute as quickly, and the influx of new accounts has definitely made things tough for them: I think all the "how is claude this session 1/2/3/4" questions that keep coming up must be part of some a/b on just how far to quantize / lower thinking while still maintaining user satisfaction.
I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.
Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..
Step 2: Panic.
Step 3: Destroy product.
The above was a successful prompt to get Claude to stop whining about effort, difficulty, and time.
Unfortunately abusive language well placed is an effective LLM motivator.
I never use it to answer questions like that, what I care about is consistent tool callig and following the prompt.
And an admittedly uncharitable TLDR on the response is: "yeah... but most users just ask one thing and barely use the product so they never need the cache. Also trust me bro".
Which sure, fine. I'm willing to bet is technically true. I'd also bet those users never previously came close to hitting their session limits given their usage because their usage is so low. But now people who were previously considered low to moderate users are hitting limits within minutes.
They may as well have just said "we've looked at the data and we're happy with this change because it's a performance improvement for people we make the most margin on. Sucks to be you".
I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close
Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.
So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.
That sort of GPU code has a lot of concepts and machinery, it’s not just a syntax to express, and everything has to be just right or you will get a blank screen. I also use them differently than most examples; I use it for data viz (turning data into meshes) and most samples are about level of detail. So a double whammy.
But once I pointed either LLM at my own previous work — the code from months of my prior personal exploration and battles for understanding, then they both worked much better. Not great, but we could make progress.
I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.
Xhigh can also perform worse than High - more frequent compaction, and "overthinking".
Such as:
Adding fine curl noise to a volumetric smoke shader
Fixing an issue with entity interpolation in an entity/snapshot netcode
Find some rendering bugs related to lightmaps not loading in particular cases, and it actually introduced this bug.
Just basic stuff.
(Don’t get mad at me, I’m a webshit developer)
What you're doing is more specialized and these models are useless there. It's not intelligence.
Another NFT/Crypto era is upon us so no you're not holding it wrong.
Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.
I'm not accusing anyone of foul play and I don't have financial interests in either company, but it feels like "something" within Code Claude/Anthropic models is optimizing to make you spend more tokens instead of helping you complete the task.
All of the major models have been getting worse lately, not just Opus.
What they changed is that if you have OpenClaw run 'claude -p' for you, that gets your account banned or charged API rates, and if they think your usage of 'claude -p' is maybe OpenClaw, even if it's not, you get charged API rates or banned.
It seems so silly to me. They built a feature with one billing rate, and the feature is a bash command. If you have a bad program run the bash command, you get billed at a different rate, if you have a good script you wrote yourself run it, you're fine, but they have literally no legitimate way to tell the difference since either way it's just a command being run.
The justification going around is that OpenClaw usage is so heavy that it impacts the service for other people, but like OpenClaw was just using the "claude code max" plan, so if they can't handle the usage the plan promises, they should be changing the plan.
If they had instead said "Your claude code max plan, which has XX quota, will get charged API rates if you consistently use 50% of your quota. The quota is actually a lie, it's just the amount you can burst up to once or twice a week, but definitely not every day" and just banned everyone that used claude code a lot, I wouldn't be complaining as much, that'd be much more consistent.
https://news.ycombinator.com/item?id=47664442
Configuration and environment variables seem to have improved things somewhat but it still seems to be hit or miss.
I do use the browser version on occasion. I have no strong feelings one way or the other there. I like it better than Google search in many cases, but probably just search more often.
- pre file write -> block editing code files without a task and plan of work
- post tool use -> show next open checkbox in the task to the agent, like an instruction pointer
- post user message -> log all user messages for periodic review of intent alignment
These 3 hooks + plain md files make my claude harness.
I’d use it with Claude too if they hadn’t banned it…
I've also tried giving LLMs religion to much more limited success (haven't figured out the right way yet).
I'm manipulating a language model, not a person. "fuck you" translates into a vector in a really big space, and it has different results than being polite about it.
In that prompt I'm reenforcing a directive in five different ways
- idgaf about risk
- you coward
- waste some time
- just do it
- stop bitching
This cluster of instructions are all related but in slightly different directions, are unambiguously strong, attention grabbing, and direct and the model does not argue or get confused about intent
In this particular instance this was the fifth time I had given a particular instruction only to have it subverted by the model that had decided "that's too hard I'm going to do something else instead" in four separate ways.
Abusive cursing did indeed work better than any other form of urgency or insistence.
Not sure how that's enforced though. I was in OpenClaw discord a while ago and enforcement seemed a bit random.
I'll try to find the source, I might have gotten the details mixed up.
Just tmux and use that.
Soon if they drop -p people will just vibe code in 5 minutes a way to type inside it remotely similar to their own built in remote access tool. Seems like a losing game from anthropics side
I'll change my mind when I see otherwise.
And this isn't being positive about Anthropic support or their treatment of users, as I too have seen lots of people here getting billed by them for stuff they never paid for, blatant fraud. That's even worse than Google. I'm only talking about getting banned for usage.
1. Cancer patient banned mid-payment — https://news.ycombinator.com/item?id=46675740
2. Hobbyist coder, VPN trigger, forms into void for 10+ months — https://news.ycombinator.com/item?id=47286867
3. "Reinstated" but still locked out — two systems out of sync — https://news.ycombinator.com/item?id=46007408
4. Banned for testing vision API — https://news.ycombinator.com/item?id=39988137
5. Banned on first ever prompt ("What do you know about Hacker News?") — https://news.ycombinator.com/item?id=39698788
6. Mass banning wave, some banned before first use — https://news.ycombinator.com/item?id=39672765
7. Entire company banned without warning, thousands of users stranded — https://news.ycombinator.com/item?id=42210199
8. Forced new account (no email change support) → immediately banned — https://news.ycombinator.com/item?id=46339741
9. Banned for scaffolding a Claude.md file, support email never arrives — https://news.ycombinator.com/item?id=46723384
10. $81 billing overcharge, human promised, month of silence — https://news.ycombinator.com/item?id=47693679
How would anthropic increase future profits without satisfying customers?
The weakest signal to me is investor money, because when you think of it, investors are betting on a future that may or may not be there. Heck even trends aren't guaranteed, "past performance is no guarantee etc etc"
1. Build AGI
2. Use said AGI to tell us how to become profitable
3. Profit!
Anthropic seems to be going all in on enterprise sales. Which means they don't actually have to please customers, or it's what ThePrimeagen humorously calls a "yacht problem"—a problem that only needs a solution after the IPO. For now all they have to do is convince corporate leadership that this is the future of work and sow enough FOMO to close those sales contracts and their projected sales, and stock valuation, goes through the roof.
Of course that value will collapse if they go without delivering on their promises long enough. That's why they call it a bubble. But by then, hopefully, Dario and the early investors will be long gone and even richer than they were to start. Their only competitor, OpenAI, is confronted with the same issues: the scalability problems won't go away, and addressing them doesn't drive stock valuation the way promising high rollers that AGI and total workforce automation are just around the corner does.
Demand is way up and compute supply is extremely limited because data center buildouts can't keep up with demand.
In the face of rising demand and insufficient compute their only practical options (other than refusing new business until demand can be met) are signicantly raising the price of tokens (and more tighly limiting subscription options) or doing behind the scenes inference optimizations that are likely to make the model dumber.
It is very easy to believe that they took the route of inference optimizations that have reduced quality of the service and that that is where the perceived enshittification is coming from.
On the plus side, I got to see first-hand how Postgres handles deadlocks and read up on how to avoid them.
I'm not saying it's the hardest thing but I also wouldn't consider it trivial.
I don’t think they could have done that much better I’d say.
There is very poor clarity about what is and isn't allowed with the Claude SDK/claude -p. Are we allowed to use it to automate stuff? What kind of tasks is it permitted to be used for? What if you call your script 'OrangeClaw' and release that on GitHub? What if your script gets super popular, does it suddenly become against TOS?
If I'm understanding you correctly: they changed that policy, you can now use 3rd party software unofficially with the undocumented Claude Code endpoint, and their servers auto-detect this and charge you extra for it?
EDIT: Yeah, something like that?
> Starting April 4 at 12pm PT / 8pm BST, you’ll no longer be able to use your Claude subscription limits for third-party harnesses including OpenClaw. Instead, they’ll require extra usage.
https://news.ycombinator.com/item?id=47633568
This seems to mean that unauthorized usage of the sub endpoint is tolerated now (and billed as though it were the regular API). And possibly affects claude -p, though I don't know yet.
That’s correct. It’s more like a convenience technicality: you can use your sub account, but you’re paying extra. So it doesn’t really count towards your subscription in any way.
Subscriptions can buy extra credits against a 30% discount, though, so it’s a decent amount cheaper than actual API, but still prohibitively expensive.
That's also how you can get the LLM to do stuff outside of the training data in a reasonably good way, by not just including the _what_ in the prompt, but also the _how_.
Although it seems that enterprise wasn’t included, so maybe not in your case.
https://support.claude.com/en/articles/14063676-claude-march...
Not only that, but the lack of transparency about what's happening, in clear and simple terms, directly from Anthropic is concerning.
I've already told my org's higher ups that in the current situation we're not close to getting our money's worth with these models.
In all seriousness though, I've observed the same thing with my own usage.
One of these is better.
It’s typically equivalent, sometimes better, sometimes behind. Better at following a well defined plan, less good at concept exploration and planning imo.
At 1m context it’s basically the same price.
Right now the only blocker for me is the lack of Linux support.
I'm in month 4 of development, working on it full-time.
(FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)
You definitely shouldn't trust me, as we're way beyond the point where you can trust ANYTHING on the internet that has a timestamp later than 2021 or so (and even then, of course people were already lying).
Personally I use Claude models through Bedrock because I work for Amazon, and I haven't noticed any decline. Instead it's always been pretty shit, and what people describe now as the model getting lost of infinite loops of talking to itself happened since the very start for me.
in short, it looks like nothing has been nerfed, but sentiment has definitely been negative. I suspect some of the openclaw users have been taking out their frustrations.
Any idea what their test harness looks like? My experience comes primarily from Claude Code; this makes me wonder if recent CC updates could be more to blame than Opus 4.6 itself.
Can't wait, I need to buy some RAM for my local model server.
Maybe there’s some truth to that, but then why haven’t OpenAI made the same move? I believe the main reason is platform control. Anthropic can’t survive as a pipeline for tokens, they need to build and control a platform, which means aggressively locking out everybody else building a platform.
OpenAI has never shyed away from burning mountains of cash to try and capture a little more market share. They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.
No, I'm paying $200 a month for a premium product that I expect premium service for. It's the single most expensive IT expense I have. Taking advantage my foot.
Lol no they didn't. It wasn't even an acquihire. They just hired Peter.
Maybe they are paying him incredibly well, but not a billion dollars well.
Third-party harnesses are the exact opposite of stickiness!
Ditching Claude Code for a third party harness while using the Claude Code subscription means it's trivial to switch to a different model when you {run out of credits | find a cheaper token provider | find a better model}.
If you're paying normal API prices they'll happily let you use whatever harness you want.
Claude code uses a bunch if best practices to maximize cache hit rate. Third party harnesses are hit or miss, so often use a lot more tokens for the same task.
most of the users of those third party harnesses care just as much about hitting cache and getting more usage.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
Another thing is branding: Their CLI might be the best right now, but tech debt says it won’t continue to be for very long.
By enforcing the CLI you enforce the brand value — you’re not just buying the engine.
1. openclaw like - using the LLM endpoint on subscription billing, different prompts than claude code
2. using claude cli with -p, in headless mode
The second runs through their code and prompts, just calls claude in non-interactive mode for subtasks. I feel especially put off by restricting the second kind. I need it to run judge agents to review plans and code.
I get that the traditional dev is allergic to the concept of reading between the lines and demands everything to be spelled out explicitly, but maybe you should just see it as something to learn because it's an incredibly useful life skill.
Are you willing to bet your account over whether you've read between the lines correctly? Anthropic aren't going to listen to appeals.
In a single prompt? From zero usage? That doesn't "just happen".
It can tell if your cron is running them every 10 minutes 24/7, because basic biology rules out you doing that for more than a day or so.
https://x.com/steipete/status/2040811558427648357
"Anthropic now blocks first-party harness use too
claude -p --append-system-prompt 'A personal assistant running inside OpenClaw.' 'is clawd here?'
→ 400 Third-party apps now draw from your extra usage, not your plan limits.
So yeah: bring your own coin "
> This is not intentional, likely an overactive abuse classifier. Looking, and working on clarifying the policy going forward.
It only makes sense for them to get users to use their ecosystem, rather than other tools.
Yes, definitely, they’re gracefully failing to meet demand. They could also deny new customers, but it would probably be bad for business.
"We're sorry, what we were able to give you for $100/mo before now needs to be $200/mo (or more). We miscalculated/we were too generous/gave too much away for too little. It's a new technology, we are seeing a ton of demand, we are trying to run a business, hope you understand. If you don't want it, don't pay for it."
/s
Anthropic's thing has always been that they are perceived as slightly ahead of the competition, if they 2X their pricing then the competition that used to be "slightly worse" suddenly becomes an absolute bargain and guts their user base.
I’d like to see a lot more than entitled whining. I would like to see the fist of regulation slammed down on the back of these tech shenanigans where they know they’ll never be able to match the prices they’re starting with
In a bustling restaurant, an excited patron recognized the famous artist Picasso dining alone. Seizing the moment, the patron approached Picasso with a simple request. With a plain napkin and a big smile, he asked the artist for a drawing. He promised payment for his troubles. Picasso, ever the creator, didn’t hesitate. From his pocket, he produced a charcoal pencil and he brought to life a stunning sketch of a goat on the napkin—a clear mark of his unique style. Proudly, he presented it to the patron.
The artwork mesmerized the patron, who reached out to take it, only to be stopped by Picasso’s firm hand. “That will be $100,000,” Picasso declared.
Astonished, the patron balked at the sum. “But it took you just a few seconds to draw this!”
With a calm demeanor, Picasso took back the napkin, crumpled it, and tucked it away into his pocket, replying, “No, it has taken me a lifetime.”
Competition will prevent that from happening. When anyone can host open models and there is giant demand for LLMs companies can not easily raise token prices without sending a lot of traffic to their competitors.
They'd still need to pay the actual power costs.
So the price for fixing the problem is equal. Sounds like a great argument for AI.
Despite this I don't think engineers should feel threatened. As long as there is a need for a human in the loop, as today, there will still be engineering jobs. And if demand for engineering effort is elastic enough, there could easily be even more jobs tomorrow.
Rather than threatened, I think engineers should feel exposed. To danger, yes, but opportunity as well.
I have zero assurances that the above can't result in a ban. The usage pattern is not distinct from OpenClaw.
The GP has described a task which feels like a task very well within intended usage of CC, but can easily eat up the usage limit.
What should we read between the lines about this scenario?
Is it a bannable offense?
Yea there are other ways to do this, you can set up a separate harness sure to make it more efficient, but just the above will also work, it's just text you paste into your CC terminal, and it will absolutely cap the largest subscription plan available no problem.
Signup prices seem higher now than three months ago.
This is actually the least frustrating method because people who can't afford to pay are not as angry as people who paid and aren't getting served (like when sign-in emails don't arrive for hours or days), or people who have paid for a long time to suddenly see quality decrease.
But it might not be best for business: Having more users than you can handle might suck, but if you're popular enough, people are still gonna put up with it.
It will basically cut the hobbyist out and entrench large corporations that can pay the real costs.
If that happened and I was working for myself, I would just buy the beefiest computer I could finance and do everything locally.
You are more than able to pay for API rates.
If you are going to come and complain about a $200 subscription that gives you $400 worth of API tokens there is only so much room to complain. Only so many lemons can be squeezed. Hope that was a helpful for you.
$200 is premium.
Don't confused price with "premium service". It was not that long ago that folks would be spending $100-200 on their cable service bundle. You are buying a subsidized product when using the plan and the more you spend the more tokens you get, has nothing to do with being a premium service.
It’s not unreasonable for people to expect the most expensive subscription plan to be “premium”. That’s how it works everywhere else. They typically have better margins on the premium plans, and the monthly payment gives them reliable cash flow at that higher margin.
You’re right that that’s not true at Anthropic (or really most AI providers). You’re not even really buying tokens because you get billed whether you use it or not, the tokens don’t carry over like buying API tokens, and they get to dictate what an acceptable way to use those tokens is. They are cheaper though, assuming you actually use them. Which Anthropic et al would really prefer you didn’t.
The cheap plans are usually semi-unlimited the same way but not as powerful. This isn't simply a matter of buying more tokens.
> It was not that long ago that folks would be spending $100-200 on their cable service bundle.
Compared to OTA that's premium, but more relevantly if most cable buyers are getting a hypothetical $10 bundle then the $100 one is a premium bundle.
The original comment was they are paying for a premium service. No they are paying for more tokens. You lot going on and on arguing over some small hill.
Claude was the topic at hand and higher tiers buy you more tokens. I know some like Gemini bundle a ton of junk alongside the tokens but you really are still buying yourself more tokens. There is nothing premium in a $200 Claude account. You are buying more tokens, $100 is the same as $200 except token count. Hope that helps. ;)
But I was making an argument about the $10 plans, not the $100 plans.
Claude doesn't even go that low. Except the free plan which has a very reduced feature list.
Claude's $20 and $100 are pretty similar except tokens, that part is true. So they're a bit higher priced and more of the "it's just tokens" model. But the market as a whole is mostly selling a limited feature set down at lower price points. On average, getting up to the point where you have full access and are paying per-token is itself a premium jump.
It is not a premium service that deserves special care which was what the original commenter stated. It is a $200 account that buys you $400+ on tokens.
Hope that helps recenter this weird path we are following. :)
What? What I just said was my one and only point from the very beginning. The price is so much higher than the median that that makes it premium and deserving of some special care.
I understand your point of view here, and it's fine if you disagree with mine but it's weird if you don't at least understand my point by now. You saying my last comment is a tangent suggests you don't understand me. But it's a simple point and I'm not sure how to make it clearer.
Does that help recenter?