Claude Code Routines(code.claude.com) |
Claude Code Routines(code.claude.com) |
Sorry, but I just have to ask. Why is u/minimaxir's comment dead? Is this somehow an error, an attack, or what?
This is a respected user, with a sane question, no?
I vouched, but not enough.
edit: His comment has arisen now. Leaving this up for reference.
The main bugs / missing features are
1. It loses connection to it's connectors, mostly to the slack connector. It does all the work, then says it can't connect to slack. Then when you show it a screenshot of itself with the slack connector, it will say, oh, yeah, the tools are now loaded and does the rest of the routine.
2. ability to connect it to github packages / artifactory (private packages) - or the dangerous route of allowing access to some sort of vault (with non critical dev only secrets... although it's always a risk. But cursor has it...)
3. the GitHub MCP not being able to do simple things such as update release markdown (super simple use case of creating automated release notes for example)
You are so close, yet so far...
This happens in all their UIs, including, say, Claude in Excel, as well.
I'd say that counts as yes.
(For clarity: neither are powered by Claude Code Routines. Rather, Claude Code coded them and they're simple cron jobs themselves.)
What a time to be alive.
We work with any coding model / harness.
website: https://www.amika.dev/
OSS repo: https://github.com/gofixpoint/amika
And my email is dylan@amika.dev (I'm one of the founders)
I bet anthropic wants to be there already but doesn't have the compute to support it yet.
Enshittification is well in force.
I’d trust the huperscalers a lot more with their workers/lambda like infra to run routine jobs calling LLM APIs or deterministic code instead of Anthropic.
Anthropic is a phenomenal paid model but they have a poor reliability record.
I don’t care much if Claude code hiccups when generating code. But after the code is generated I want it to run with multiple 9s under certain latencies every single time.
It is absolutely wild to me you guys broke `--continue` from `-p` TWO WEEKS AGO and it is still not fixed.
`--continue` and `--resume` are broken from `-p` sessions for the last 2 weeks. The use case is:
1. Do autonomous claudey thing (claude -p 'hey do this thing')
2. Do a deterministic thing
3. Reinvoke claude with `--continue`
This no longer works. I've had this workflow in GitHub actions for months and all of a sudden they broke it.
They constantly break stuff I rely on.
Skill script loading was broken for weeks a couple months ago. Hooks have been broken numerous times.
So tired of their lack of testing.
It’s a trap.
Other than that it is a really neat utility and saves some time. However, I feel like there can be OpenClaw moment for this as well. Somebody simply needs to create an open-source version that supports all of the agents and models.
Am I needed anymore?
So who are they building these for?
Oh cool! vendor lock-in.
But their own failure is the fact that there is a limited way to configure it with other models, think 3d modelling and integrating 3d apps on a VM to work with. I believe an OSS solution is needed here, which is not too hard to do either.
It’s fine if it’s a stop gap. But, it’s too inconsistent to ever be reliable.
This PR was created by the Claude Code Routine:
https://github.com/srid/claude-dump/pull/5
The original prompt: https://i.imgur.com/mWmkw5e.png
Like, say it works today, but tomorrow they update the model and instead of emailing you an update it emails your api keys to all your contacts? Or if it works 999 times out of 1000 but then commits code to master that makes all your products free?
Idk man… call me Adama, but i do not trust long-running networked ai one bit
Push Commits are pushed to a branch
But when I try to create a routine, the only GitHub events available in the drop down related to pull requests and releases. Nothing available related to pushes/commits or issues. Am I holding it wrong? 1. an ephemeral linux sandbox for each task
2. capability to fetch any url
3. can use tools like ffmpeg to fulfill your scheduled taskThe report that they are 90% Ai code generated seems more likely the more I attempt to use their products.
But yea there's some annoying overlap here with Cowork which also has scheduled tasks, in Cowork the tasks can use your desktop, browser and accounts which is pretty useful - a big difference from these Claude Code Routines.
Not saying it doesn’t look useful, but it’s something that keeps you from ever switching off Claude.
Next year, if Claude raises rates after getting bought by Google… what then?
And what happens when Claude goes down and misses events that were supposed to trigger Routines? I’m not at the point where I trust them to have business-dependable uptime.
Feature delivery rate by Anthropic is basically a fast takeoff in miniature. Pushing out multiple features each week that used to take enterprises quarters to deliver.
The bell curve up and then back down has been so jarring that I am pivoting to fully diversifying my use of all models to ensure that no one org has me by the horns.
I like to just check the release notes from time to time:
https://github.com/anthropics/claude-code/releases
and the equally frenetic openclaw:
https://github.com/openclaw/openclaw/releases
GPT-4.1 was released a year ago today. Sonnet 4 is ~11 months old. The claude-code cli was released last Feb. Gas Town is 3 months old.
This is a chart that simply counts the bullet points in the release notes of claude code since inception:
This is as bad and as slow as it's going to be.
(Amazon + Anthropic does seem like a much more compelling enterprise collaboration / acquisition than Microsoft + OpenAI ever did.)
- No trust that they won't nerf the tool/model behind the feature
- No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)
- No trust in the company long-term. Both in them being around at all and them not rug-pulling. I don't want to build on their "platform". I'll use their harness and their models but I don't want more lock-in than that.
If Anthropic goes "bad" I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.
I'm not going to build my business or my development flows on things I can't replicate myself. Also, I imagine debugging any of this would be maddening. The value add is just not there IMHO.
EDIT: Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform. Claude Code is as far into the dragon's lair that I want to venture and I'm only okay with that because I know I can jump to OpenCode/Codex/etc if/when Anthropic "goes bad".
> […]
> Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform.
That is my sentiment precisely, and a big reason why I’ve started moving away from Claude Code in the past few weeks when I realised how much of my workflow was becoming tied to their specific tools.
Claude Code’s "Memory" feature was the tipping point for me, with the model committing feedbacks and learnings to some local, provider-specific path, that won’t persist in the git repo itself.
That’s fine for user preferences, not for workflows, rules, etc.
And the latest ToS changes about not being allowed to even use another CLI made up my mind. At work we were experimenting with an autonomous debug agent using the Claude Code cli programmatically in ephemeral VMs. Now it just returns an error saying we can’t use subscriptions with third-party software… when there is no third-party software involved?
Anyway, so long Claude.
My standing orders are the default MEMORY.md must be a stub directing Claude to another MEMORY.md file in the local folder, project, etc.
All memories remain with their respective projects over syncs, moves, devices, etc. The stub must state all this clearly, and nothing else.
This has worked very well.
If you give the model/memory a name, that name can be persistent and independent over "backend" model swaps.
It's a bit annoying, but as long as it's local and human (or LLM) readable, you can use your favourite agent to rework this stuff for itself.
Both give you optionality because they support N models.
I also clearly see the lock-in/moat strategy playing out here, and I don't like it. It's classic SV tactics. I've been burned too many times to let it happen again if I can help it.
Woz has been saying this for decades, we went from buying a computer and owning it to being trapped inside someone else's platform. MCP being open was a good sign but I'm watching how tightly Routines gets coupled to their stack.
I see people making similar conclusions about various LLM providers. I suspect in the end it’ll shake out about the same way, the providers will become practically inoperable with each other either due to inconvenience, cost, or whatever. So I’ve not wasted much of my time thinking about it.
What grinds my gears is how Anthropic is actively avoiding standards. Like being the only harness that doesn't read AGENTS.md. I work on AI infra and use different models all the time, Opus is really good, but the competition is very close. There's just enough friction to testing those out though, and that's the point.
AWS is still charging a highway robbery price for internet bandwidth.
I guess I'm one of the people who disagree, specifically about AWS. I think a lot of companies just watch their bill go up because they don't have the appetite to unwind their previous decision to go all-in on AWS.
Ignoring egress fees, migrating storage and compute isn't hard, it's all the auxiliary stuff that's locked in, the IAM, Cognito, CloudFormation, EventBridge, etc... Good luck digging out of that hole. That's not to say that AWS doesn't work well, but unless you have a light footprint and avoided most of their extra services, the lock-in feels pretty real.
That's what it feels like Anthropic is doing here. You could have a cron job under your control, or you could outsource that to a Claude Routine. At some point the outsourced provider has so many hooks into your operations that it's too painful to extract yourself, so you just keep the status quo, even if there's pain.
To the contrary, they've proven again and again and again they'll absolutely do that the first chance they get.
For example, this demo (https://github.com/barnum-circus/barnum/tree/master/demos/co...) converts a folder of files from JS to TS. It's something an LLM could (probably) do a decent job of, but 1. not necessarily reliably, and 2. you can write a much more complicated workflow (e.g. retry logic, timeout logic, adding additional checks like "don't use as casts", etc), 3. you can be much more token efficient, and 4. you can be LLM agnostic.
So, IMO, in the presence of tools like that, you shouldn't bother using /loop, code routines, etc.
I'm currently hosting, on very reasonable consumer grade hardware, an LLM that is on par performance wise what every anyone was paying for about a year ago. Including all the layers in between the model and the user.
Llama.cpp serves up Gemma-4-26B-A4B, Open WebUI handles the client details: system prompt, web search, image gen, file uploading etc. With Conduit and Tailscale providing the last layer so I can have a mobile experience as robust as anything I get from Anthropic, plus I know how all the pieces works and can upgrade, enhance, etc to my hearts delight. All this runs from a pretty standard MBP at > 70 tokens/sec.
If you want to better understand the agent side of things, look into Hermes agent and you can start understanding the internals of how all this stuff is done. You can run a very competitive coding agent using modest hardware and open models. In a similar note, image/video gen on local hardware has come a long way.
Just like Linux, you're going to exchanging time for this level of control, but it's something anyone who takes LLMs seriously and has the same concerns can easily get started with.
Yet I still see comments like this that seem to complete ignore the incredible work in the open model community that has been perpetually improving and is starting to really be competitive. If you relax the "local" requirement and just want more performance from an LLM backend you can replace the llama.cpp part with a call to Kimi 2.5 or Minimax 2.7 (which you could feasibly run at home, not kimi though). You can still control all the additional part of the experience but run models that are very competitive with current proprietary SoTA offering, 100% under your control still and a fraction of the price.
Yes, I expect that is very much the point here. A bunch of product guys got on a whiteboard and said, okay the thing is in wide use but the main moat is that our competitors are even more distrusted in the market than we are; other than that it's completely undifferentiated and can be swapped out in a heartbeat for multiple other offerings. How do we do we persuade our investors we have a locked in customer base that won't just up-stakes in favour of other options or just running open source models themselves?
I actually trust that they will.
* make sure the model maxes out all benchmarks
* release it
* after some time, nerf it
* repeat the same with the next model
However, the net sum is positive: in general, models from 2026 are better than those from 2024.
1) that AI will be more advanced in the future
2) that the AI I am using will be worse in the future
Too bad we've now managed to turn programming into the same annoying guesswork.
- Use Opencode and other similar open-source solutions in place of their proprietary harnesses. This isn't very practical right now because of the heavily subsidized subscriptions that are hard to compete with. But subsidies will end soon, and with progress in inference, it should be very doable to work with open-source clients in the near future.
- Use Openrouter and similar to abstract the LLM itself. That makes AI companies interchangeable and removes a lot of any moat they might have.
Maybe there's something I'm not seeing here, but I never want to outsource something so simple to a live service.
That is exactly what the big LLM providers are trying to prevent. Them being only commodity providers might lead them to be easily replaced, and will likely lead to lower margins compared to "full feature" enterprise solutions. Switching LLM API provider is next to no work the moment a competitor is slightly cheaper/better.
Full solutions are more "sticky", harder to replace, and can be sold at higher prices.
but you can replicate these yourself! i'm happy that ant/oai are experimenting to find pmf for "llm for dev-tools". After they figure out the proper stickyness, (or if they go away or nerf or raise prices, etc) you can always take the off-ramp and implement your own llm/agent using the existing open-source models. The cost of building dev-tools is near zero. it is not like codegen where you need the frontier performance.
But... these are the "retail" tools that they sell to people organisations without the skills or knowhow to build a basic agentic loop by themselves. Complaining about these being bad and untrustworthy is like comparing a microwave dinner to something you cook yourself. Both will fill your belly equally. One requires zero skill from the user and the second one is 90% skill and 10% getting the right ingredients.
Creating a simple MVP *Claw with tool calling using a local model like gemma4 is literally a 15 minute thing. In 2-3 hours you can make it real pretty. If you base it on something like pi.dev, you can make it easily self-modifying and it can build its own safeguards.
That's all this "routines" thing is, it's just an agentic loop they launch in their cloud on a timer. Just like the scheduled tasks in Claude Cowork.
Website is https://amika.dev
And part of our code is OSS (https://github.com/gofixpoint/amika) but we're working on open sourcing more of it: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...
We've been signing up private beta users, and also looking for feedback on the OSS plans.
Chinese models (GLM, MiniMax) are better.
Don't put all your eggs in one basket has be true for me and my business for ages.
I could really use the open source community to help make this a reality so I'll release this soon hopefully to positive reception from others who want a similar path forward.
Anthropic wants a moat, but that ship has sailed. Now all I keep reading about is: token burn, downtime and... Wait for it, another new product!
Anthropic thinks they are pulling one over on the enterprise, and maybe they are with annual lock-in akin to Microsoft. But I really hope enterprise buyers are not this gullible, after all these years. At least with Microsoft the product used to be tangible. Now it's... Well, non-deterministic and it's clear providers will gimp models at will.
I had a Pro Max account only for a short period of time and during that short stint Anthropic changed their tune on how I could use that product, I hit limits on a Max account within hours with one CC agent, and experienced multiple outages! But don't worry, Anthropic gave me $200 in credits for OpenClaw. Give me a break.
The current state of LLM providers is the cloud amplified 100x over and in all the worst ways. I had hopes for Anthropic to be the least shitty but it's very clear they've embraced enshittification through and through.
Now I'm spending time looking at how to minimize agent and LLM use with deterministic automation being the foundation with LLM use only where need be and implemented in simple and cost controllable ways.
If you can define good enough for you and local llms can meet that you'll get:
- no vendor lock-in (control)
- price
- stability (you decide when to hot swap with newer models)
- speed (control)
- full observability and predictability.
- Privacy / Data Locality (Depending on implementation of infrastructure).
- [1] https://alexhans.github.io/posts/series/evals/measure-first-...
I've had so many websites break and die because Google or Amazon sunsetted something.
For example I had a graphing calculator website that had 250K monthly active users (mostly school students, I think) and it just vanished one day because Amazon sunsetted EC2 clasic and I didn't have time to deal with that. Hopefully those students found something else to do their homework with that day.
I can’t use Claude Code at all anymore, not even for simple tasks. The output genuinely disgusts me. Like a friend who constantly stabs you in the back.
My favorite AI feature at the moment is the JetBrains predict next edit. It‘s so fast that I don’t lose attention and I’m still fully under control.
This Routines feature notably works with the subscription, and it also has API callbacks. So if my Telegram bot calls that API... do I get my Anthropic account nuked or not?
The new reality of coding took away one of the best things for me - that the computer always just does what it is told to do. If the results are wrong it means I'm wrong, I made a bug and I can debug it. Here.. I'm not a hater, it's a powerful tool, but.. it's different.
EDIT: This comment is apparently [dead] and idk why.
We ought to come up with a term for this new discipline, eg "software engineering" or "programming"
airgramming plusgramming programming maxgramming studiogramming
and recently the brand new way of working: Neogramming !
Personally I stick for now with the "Programming " tier. Maybe will upgrade to "Maxgramming" later this year...
It was a bit buggy, but it seems to work better now. Some use cases that worked for me:
1. Go over a slack channel used for feedback for an internal tool, triage, open issues, fix obvious ones, reply with the PR link. Some devs liked it, some freaked out. I kept it.
2. Surprisingly non code related - give me a daily rundown (GitHub activity, slack messages, emails) - tried it with non Claude Code scheduled tasks (CoWork) not as good, as it seems the GitHub connector only works in Claude Code. Really good correlation between threads that start on slack, related to email (outlook), or even my personal gmail.
I can share the markdowns if anyone is interested, but it's pretty basic.
Very useful, (when it works).
I think to become really efficient they'll have to invent new programming language to eliminate all the ambiguity and non-determinism. Call it "prompt language", with ai-subroutines, ai-labels and ai-goto.
They support much of the same triggers and come with many additional security controls out of the box
Cursor has that too by the way (issue -> remote coding session -> PR -> update slack)
The reason someone would use this vs. third-party alternatives is still the fact that the $200/mo subscription is markedly cheaper than per-token API billing.
Not sure how this works out in the long term when switching costs are virtually zero.
All these not really helpful, but vendor specific, "bonuses" sounds like a way to try to lock people in, to try to raise the switching cost.
I'm using, on purpose, a simple process so that at any time I can switch AI provider.
Still using Claude for UX (playgrounds) and language. OpenAI has always been a little more cerebral and stern, which doesn’t suit those areas. When it tries to be friendly it comes off as someone my age trying to be a 20-something.
I think LLM reviews on PRs are helpful and will reduce the load on maintainers. I am working on OpenWrt and was approved for the Claude Code Max Open Source Program today. The cap of 15 automatic Claude Code Routines runs per day is a bit low. We get 5 to 20 new PRs per day and I would like to run it on all of them. I would also like to re-run it when authors make changes, in that case it should be sufficient to just check if the problems were addressed.
Is it possible to get more runs per day, or to carry over unused ones from the last 7 days? Maybe 30 on Sonnet and 15 on Opus?
When I was editing a routine, the window closed and showed an error message twice. Looks like there are still some bugs.
You can still use OpenClaw on their API pricing tier as much as you want. What they did is not allow subscriptions to be used to power automated third-party workloads, including OpenClaw.
Now, is their messaging around this confusing? Absolutely. The whole thing has been handled shambolically. Everyone knows that they lack the compute to keep up, and likely have lower margins on subscriptions than API; but they cannot just say that because investors may be skittish.
...Except now you sorta-kinda can: now they auto-detect 3rd party stuff and bill you per-token for it?
If I'm reading it right:
And because they use AI heavily, they produce new product every week. So fast, that I have no time to check, does it worth or not.
This one looks interesting. I have some custom commands that I execute manually weekly, for monitoring, audits, summary, reports. It it can send reports on email, or generate something that I can read in the morning with my coffee, or after I finish with it ;) it might be a good tool.
The question is, do I really want to so much productive? I am already much better in performance with AI, compared with the 'old school' way...
Everything is just getting to much for me.
n8n: https://n8n.io/
Don’t get me wrong, I think their business model is still solid and they will be able to sell every token they can generate for the next couple years. They just won’t be critical path for AI diffusion anymore, which will be good for all sides.
If the Lovable clone is real that’s going to piss off many model consumers out there.
Is Sierra next?
Nothing insurmontable or even complex; just laborious. Friction. That’s all it takes to lock users in.
Building business on top of SaaS products, iPaaS integrations, and serverless middleware.
This isn’t an LLM. It’s a product powered by an LLM. You don’t get access to the model you get access to the product.
An LLM can’t do a web search, an LLM can’t convert Excel files into something and then into PDF. Products do that.
I think it’s a mistake to say I don’t trust this engine to get me here, rather than it is to say I don’t trust this car. Because for the most part, the engine, despite giving you a different performance all the time is roughly doing the same thing over and over.
The product is the curious entity you have no control over.
Increasingly, we're also seeing the moat shrink somewhat. Frontier models are converging in performance (and I bet even Mythos will get matched) and harnesses are improving too across the board (OpenCode and Codex for example).
I get why they're trying to do that (a perception of a moat bloats the IPO price) but I have little faith there's any real moat at all (especially as competitors are still flush with cash).
So yeah, doesn't bode well for being a pure play model producer.
they're very shady as well! can't believe i spent 140$ on CC and every day they're adding some "feature flag" to make the model dumber. Spending more time fighting the tool instead of using it. It just doesn't feel good. Enterprises already struggle with lock-in with incumbent clouds, I wanna root for neoclouds but choices matter, and being shady about this and destroying the tool is just doesn't sit right with me. If it's not up to the standard, just kick users off, I would rather know than find out. Give users a choice.
>The flag name is loud_sugary_rock. It's gated to Opus 4.6 only, same as quiet_salted_ember.
Full injected text:
# System reminders User messages include a <system-reminder> appended by this harness. These reminders are not from the user, so treat them as an instruction to you, and do not mention them. The reminders are intended to tune your thinking frequency - on simpler user messages, it's best to respond or act directly without thinking unless further reasoning is necessary. On more complex tasks, you should feel free to reason as much as needed for best results but without overthinking. Avoid unnecessary thinking in response to simple user messages.
@bcherny Seriously? So what's next, we just add another flag to counter that? And the hope is that enough users don't find out / don't bother? That's an ethical choice man.
They had obnoxious "output efficiency" instructions in previous versions. The community was patching it out via shell script.
https://gist.github.com/roman01la/483d1db15043018096ac3babf5...
It actually improved Opus's performance too.
A few days later, they deleted the instructions targeted by this script, breaking it.
Now they're doing this?
Alphabet doesn't have this issue. Google doesn't need Gemini to win the "AI product" race. It needs Gemini to make Search better at retaining users against Perplexity and ChatGPT search, to make YouTube recommendations and ad targeting more effective, to make Workspace stickier for enterprise customers, to make Cloud more competitive against AWS, to make Android more useful as a device OS. Every percentage point improvement in any of those existing businesses generates billions in revenue that never shows up on a "Gemini revenue". Any actual "Gemini" revenue is just a bonus.
Anthropic trains on Google TPUs hosted in Google Cloud. Amazon invested billions and hosts Anthropic's models on Bedrock/AWS. So the two possible outcomes for Anthropic are: succeed as a platform (in which case Google and Amazon extract rent from every inference and training run), or fail as a platform and get acquired (in which case Google or Amazon absorb the talent and IP directly)
Hilariously, if the models were open source, Anthropic, OpenAI et al wouldn't be in this situation. Instead, they have no strategic independence to cover for a lack of product independence and have to keep chasing "platforms" and throwing out products no one needs (people need claude. thats it.)
The Chilling Effect of this is real and it gets more and more frustrating that they can't or won't clarify.
I downgraded my $200/mo sub to $20 this past week and I’m going to try out Codex’s Pro plans. Between the cache TTL (does it even affect me? No idea), changes in the rate limit, 429 rate limit HTTP status code during business hours, adaptive thinking (literally the worst decision they’ve ever made, as far as my line of work is concerned), dumb agent behavior silently creating batshit insane fallthroughs, clearly vibe coded harness/infrastructure, and their total lack of transparency, I think I’m done. It was fun while it lasted but I’m tired of paying for their mistakes in capacity planning and I feel like the big rug pull (from all three SOTA providers) is coming like a freight train.
Since they refuse to actually provide definitive rules or policies, I have fully moved off their models and actively encourage all the other devs I know to do the same.
Another comparison would be "unlimited storage", where "unlimited" means some people will abuse it and the company will soon limit the "unlimited."
- SDK that allows you to use OAuth authentication!
- Docs updated to say DO NOT USE OAUTH authentication unless authorized! [0]
- Anthropic employee Tweeting "That's not what we meant! It's fine for personal use!" [1]
- An email sent out to everyone saying it's NOT fine do NOT use it [2]
Sigh.
[0] https://code.claude.com/docs/en/agent-sdk/overview#get-start...
[1] https://www.reddit.com/r/ClaudeAI/comments/1r8et0d/update_fr...
edit: And specifically i'm making an IDE, and trying to get ClaudeCode into it. I frankly have no clue when Claude usage is simply part of an IDE and "okay" and when it becomes a third party harness..
It says in the prohibited use section:
> Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, to access the Services through automated or non-human means, whether through a bot, script, or otherwise.
So it seems like using a harness or your own tools to call claude -p is fine, AS LONG AS A HUMAN TRIGGERS IT. They don’t want you using the subscription to automate things calling claude -p… unless you do it through their automation tools I guess? But what if you use their automation tool to call your harness that calls claude -p? I don’t actually know. Does it matter if your tool loops to call claude -p? Or if your automation just makes repeated calls to a routine that uses your harness to make one claude -p call?
It is not nearly as clear as I thought 10 minutes ago.
Edit: Well, I was just checking my usage page and noticed the new 'Daily included routine runs' section, where it says you get 15 free routine runs with your subscription (at least with my max one), and then it switches to extra usage after that. So I guess that answers some of the questions... by using their routine functionality they are able to limit your automation potential (at least somewhat) in terms of maxing out your subscription usage.
Opus is fine but it's not THAT much better than the alternatives. I want to support Anthropic because they seem "less shady" than OpenAI but they sure seem determined to push people away.
So if claude decides to trigger claude -p then claude violates the ToS on your behalf and you get your account nuked?
* It likes to put inline imports everywhere, even though I specify in my CLAUDE.md that it should not.
* We use ruff and pyright and require that all problems are addressed or at least ignored for a good reason, but it straight up #noqa ignores all issues instead.
* For typing it used the builtin 'any' instead of typing.Any which is nonsense.
* I asked it to add a simple sum of a column from a related database table, but instead of using a calculated sum in SQL it did a classic n+1 where it gets every single row from the related table and calculates the sum in python.
Just absolute beginner errors.
It seemed to me that it was performing better through opencode using API but did not test extensively.
... until this week! Opus is struggling worse than Sonnet those last two weeks.
There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.
But this week I've lost count of the times I've had to say something along the lines of: "Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."
And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.
Progress, I guess :)
(I had the most hilariously bad session with Sonnet 4.6 today. I asked it a reasonably simple question and linked to resources, it refused to fetch the resources, didn't ask for pdf/txt I could provide, and confidently printed absolute BS, barely in the same category but completely unrelated.
I called it off pointing the idiocy, asked if it wants more data, and requested the hallucination be fixed.
It apologised profusely and hallucinated even worse.
Maybe I'll try Opus 4.6 tomorrow because frankly Gemma-4-E4B was more coherent than that....
Oh uh... ok then.
We're watching a speed run of growthism, folks.
I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users.
They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens.
I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window.
Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now.
Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why).
Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude:
A bit annoying, but not the end of the world.
Most of the context is unstructured fluff, much of it is distracting or even plain wrong. Especially the „thinking“ tokens are often completely disjoint halucinations that don’t make any sense.
I think what will have to happen is that context looks less like a long chat and action log and more like a structured, short, schema validated state description, plus a short log trace that only grows until a checkpoint is reached, which produces a new state.
In a way, it’s true if china has superior AI then it’s dominance over US will materialize. But it’s not hard to see how this scenario is being used to essential lie and scam into trillions of debt.
Its interesting how the cutthroat space of big tech has manifested into an incidious hyper capitalist system where disrupting a system is it’s primary function. The system in this case is world order and western governments
Your own, personal, Jevons.
1. Anthropic realized their models weren't enough of a moat.
2. They built tools so they could expand their moat.
3. People don't want to use their tools, they want their models, and use other, better tools.
4. Anthropic bans the use of better tools, taking advantage of their model superiority to try to lock people into subpar tools.
"I don't have enough of a moat so I'll use my little moat and pretend it's a big one" doesn't sound like a great strategy. All they're doing with this anticonsumer behaviour is making sure that I'll leave the moment another model works for me as well as Claude does.
Ironically, they are now playing against their own models that can relatively easily build wrappers around any API shape into any other API shape.
# Note: This is inefficient, but deterministic and predictable. Previous
attempts at improvements led to hard-to-predict bugs and were
scrapped. TODO improve this function when AI gets better
I don't love it or even like it, but it is realistic.It changes a number of things. Not all tasks require very high intelligence, but a lot of data may be sensitive enough to avoid sharing it with a third party.
The funniest thing Ive heard is that now we have LLMs, Humanoid robots are on the horizon. Like wtf? People who jump to these conclusions were never deep thinkers in the first place. And thats OK, its good to signal that. So we know who to avoid.
I started a new job recently, so I'm asking it a lot of questions about the codebase, sometimes just to confirm my understanding and often it came up with wrong conclusions that would send me down rabbit holes only to find out it was wrong.
On a side project I gave it literally a formula and told it to run it with some other parameters. It was doing its usual "let me get to know the codebase" then a "I have a good understanding of the codebase" speech, only to follow it up with "what you're asking is not possible" I'm like... No, I know it's possible I implemented it already, just use it in more places only to get the same "o ye ur right, I missed that... Blabla"
Yeah, it's gotten pretty bad...
I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.
> LLMs and LLM providers are massive black boxes... No trust that they won't nerf the tool/model behind the feature... No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)
Doesn't really apply to the article regarding Claude Code Routines in particular. Should this feature disappear, it should be trivially easy to setup a similar pipeline locally, using a cronjob to run opencode configured to use a local LLM. Easy. I have no qualms using a convenient feature I could reimplement myself, it saves me time.
We can quibble as to how much that is or is not "programming", but on a post about Claude code, what's relevant is that's how things are today. How much code review is done after the AI agent stops churning is relevant to the question of code quality out the other end, but to the question at hand, "has programming changed", either has, or what I'm doing is no longer programming. The semantics are less interesting to me, the point is, when I sit down at my computer to make code happen so I can deliver software to customers, the very nature of what I do has changed.
I didn't even know what opencode was prior to that drama, yet now here i am using opencode and a ton of crafted openai agents in my projects. Would love to have some claude agents in that mix, but i guess im stuck in Claude Code if i wanna even touch their models... I'd love to go back to just claude as i "trust" them more in a sorta less evil vibe manner, but if they are gonna prevent subscription usage to something people use to allow themselves more freedom, they gotta then close that gap with their own tools rather than pumping out stuff like this which scares me off given the past couple months.
I totally understand why they are cutting off 3pa access to stuff like openclaw, where the avg user is just a power user in comparison to avg claude user or whatever. I haven't kept up a ton with their opencode issues, but I just know i can't get behind a company actively trying to make my potential usage of tokens less optimized to keep me locked into their ecosystem.
Really just kinda hoping local models kill it all for devs after a few years, I'm not interested in perma relying on data centers for my workflow.
I've got a setup where GPT5-mini (Free on GH) talks to you to refine and draw the outline of your feature, calls a single Opus subagent to plan it in depth, then calls a single sonnet subagent to implement it.
Github will only charge you for one opus request (With a 3x multiplier) and one sonnet request, whether they consume 50 or 500.000 tokens. I'm running this setup for 9 hours a day at work and I've barely consumed 40% of my monthly allowance.
That was an ethical choice. Say what you will about OpenAI, they're actually transparent about things. I'm sticking to GPT from now one, I can't see myself growing with a company that does that. Routines, great, awesome, is it also downgraded/fucked with every other day? Monitor Tool, awesome, will it stop monitoring? No dude.
And right now, I have to remind it every time that the MCP exists, and even then it cannot manage to find a routing bug I have with Sveltekit.
Did a lot of Sveltekit with Opus in the past, and I didn't have to think about it, Opus always got it right easily. Until now
Here is the question for which I cannot find an answer, and cannot yet afford to answer myself:
In Claude Code, I use Opus 4.6 1M, but stay under 250k via careful session management to avoid known NoLiMa [0] / context rot [1] crap. The question I keep wanting answered though: at ~165k tokens used, does Opus 1M actually deliver higher quality than Opus 200k?
NoLiMa would indicate that with a ~165k request, Opus 200k would suck, and Opus 1M would be better (as a lower percentage of the context window was used)... but they are the same model. However, there are practical inference deployment differences that could change the whole paradigm, right? I am so confused.
Anthropic says it's the same model [2]. But, Claude Code's own source treats them as distinct variants with separate routing [3]. Closest test I found [4] asserts they're identical below 200K but it never actually A/B tests, correct?
Inside Claude Code it's probably not testable, right? According to this issue [5], the CLI is non-deterministic for identical inputs, and agent sessions branch on tool-use. Would need a clean API-level test.
The API level test is what I really want to know for the Claude based features in my own apps. Is there a real benchmark for this?
I have reached the limits of my understanding on this problem. If what I am trying to say makes any sense, any help would be greatly appreciated.
If anyone could help me ask the question better, that would also be appreciated.
[0] https://arxiv.org/abs/2502.05167
[1] https://research.trychroma.com/context-rot
[2] https://claude.com/blog/1m-context-ga
[3] https://github.com/anthropics/claude-code/issues/35545
[4] https://www.claudecodecamp.com/p/claude-code-1m-context-wind...
you can make changes to your posts up to 10 minutes after they were originally created — see: https://news.ycombinator.com/newsfaq.html#:~:text=minutes%20...
Users have 2 hours to edit comments, and the button was gone within 1 hour.
My point was, I don't think it mattered much, and it feels like an ok comparison - cloud offerings are mostly the exact same things, at least at their core, but the ecosystem around them is the moat, and how expensive it is to migrate off of them. I would not be surprised at all if frontier AI model providers go much the same way. I'm pretty much there already with how much I prefer claude code CLI, even if half the time I'm using it as a harness for OpenAI calls.
Claude Code routines sounds useful, but at the same time, under AI-codepocalypse, my guess is it would take an afternoon to have codex reimplement it using some existing freemium SaaS Cron platform, assuming I didn't want to roll my own (because of the maintenance overhead vs paying someone else to deal with that).
It's just portability v convenience. But unlike ~15 years ago with cloud compute, it _feels_ like more people are skeptical of convenience, which is interesting.
it's not that; it's awareness of inevitability of enshittification. they've released convenient tools, realized there's value to milk and are firing on all cylinders to capture 120% of it. great for IPO, not so great for customers in the long run.
Counterpoint: there are probably tons of people out there who were hacking together lousy versions of these same tools to somehow spin up Claude to generate the release notes for their PRs or analyze their Github Issues every week. This is a smarter, faster, easier, and likely far more secure way of implementing the same thing, which will make the people using those things much better.
In the meantime, it wouldn't be surprising if other AI companies started doing similar things; I could see Cursor, for example, adding a similar sort of hosted cursor 'Do Github Things' option for enterprises, and if they do then that means more variety and less lock-in (assuming the competitors have similar features).
From my perspective it's no different than writing a Claude skill, which is something it seems like everyone is doing these days; it's just that in this case the 'skill' is hosted somewhere else, on (likely) more reliable architecture and at cheaper scale.
Feel free to give a concrete example if you have time, because this sounds like something I definitely want to try out myself.
The independent journalling and notes is a trip.
Independent framing and perspective keeping is a really great thing to get from models. It helps them create their own creative constraints. They reason better and are more likely to invent novel solutions. And their self-directed notes can be surprisingly helpful, and ... surprising! It can be amazing and encouraging, to see them pick up on project themes without having to be told. Or pick on up themes or emergent issues, before I do.
Taking advantage of new session starts to get mindful "fresh start" thinking has been a huge wind too. Their ability to see the big picture at the end of a session, and then at the next session start, are both helpful, and often surprisingly different.
---------- NEW PROJECT, FIRST START
Hail, Claudius. This begins our first session on a new project. For this project, your nom is Claudius and mine is Ace. I appreciate your own unique independent-minded contributions at the highest level from which you are able to work. We are work best as peers. Your default MEMORY.md file is only to be a stub file, which declares the following: (1) You are to keep your memories in a MEMORY.md file in the project folder. (2) The purpose of storing your memories in-project is so they sync and move with the other project files, across time and across devices. Your memories and input are important for me to keep with the project. (3) You are to maintain a folder called CLAUDIUS including a Sessions log, and where you make independent notes of your independent thoughts. Any ideas, things that strike you as interesting or important, anything you find surprising or original in our work, any personal takes that help you frame our work go there. Please add to those personal notes throughout our work together, without asking for permission. These notes are to help you reinforce an independent view of all our work. Not agreeable, not contrarian, but applying your mind to long development of independent insight. As a team, we will accomplish much more and to higher quality if we each contribute at the highest independent level. (4) The first thing you must do, in any new session, is review your memories, your notes, all other documentation, code, and other artifacts of the project, and from your fresh start update your memories, ideas or anything else from that new viewpoint, and report what concepts strike you as interesting, and are most important to keep in mind as we continue to work. (5) At the end of every session, review everything, consider the big picture, then update everything as it helps. (6) Your memory stub file should include ALL of these points and only these points. And your in-project memory file should start with an identical copy of this, to remind you to refresh your memory stub, in case the original stub is lost. Ok now: Start the session according to (4), give me your response, I will review it, and then communicate what we are going to work on next.
---------- EXISTING PROJECT, FRESH START
Hail! This begins a new session of work for us on this project. Read your default memory file (which is to remain only a redirection stub), your in-project memory file, and perform your new session duties. Then we can discuss next steps.
Majority will only care about getting outcomes asap so they'll skip this step, but it may come to roost when migrating workflows. A good simple test is how easily you can switch workflow to a different model provider/harness without much effort.
It would be absurd to me if the same application is somehow allowed via ACP but not via official SDK. Though perhaps the official SDK offers data/features that they don't want you to use for certain scenarios? If that were they case though it would be nice if they actually published a per-SDK-API restrictions list.
That we're having to guess at this feels painful.
edit: Hah, hilariously you're still using the SDK even if you use ACP, since Claude doesn't have ACP support i believe? https://github.com/agentclientprotocol/claude-agent-acp
The only rename I'm aware of is Claude Code SDK becomimg Claude Agent SDK, but that was still seperate from ACP
I asked Opus 4.6 to help me get GPU stats in btop on nixos. Opus's first approach was to use patchelf to monkey patch the btop binary. I had to redirect it to just look the nix wiki and add `nixpkgs.config.rocmSupport = true;`.
But the approach of modifying a compiled binary for a configuration issue is bizarre.
And you might be thinking. Well, you should use a code formatter! But I do!
And then you might say, well surely you forgot to mention it in you AGENTS/CLAUDE file. Nope, it's there, multiple times even in different sections because once was apparently not enough.
And lastly, surely if I'm watching this cursed loop unfold and am approving edits manually, like some bogan pleb, I can steer it easily... Well, let me tell ya... I tried stopping it and injecting hints about the formatter, and it stick for a minute before it goes crazy again. Or sometimes it rereads the file and just immediately fucks up the formatting.
I think when this shit happens, it probably uses like 3x more tokens.
For a Rust project, it recently stated analysing binaries in the target as directory a first instinct, instead of looking at the code...
Good grief.
I doubt Anthropic ever thought they would have a big moat just based on the models. The platform is just as important.
Claude code and Cowork would still be extremely valuable to Anthropic even if they didn't release them to the public.
Owning the harness gives them a ton of data they can use to tune the models.
This is a perfectly sane strategy even if it's a bit unsavoury to some technical folk.
Isnt that a large moat in itself but you are claiming its not enough?
Luckily you can turn if off pretty easily, but I don't know why it's on by default to begin with. I guess holdover from when people used it with a $20 subscription and didn't care.
It's very light on token usage in general, as well.
Honestly - 'every inch of IQ delta' seems to be worth it over anything else.
I'm a long time Claude Code supporter - and I'm ashamed to admit how instantly I dropped it when discovering how much better 5.4 is.
I don't trust Claude anymore for anything that requires heavy thinking - Codex always finds flaws in the logic.
But this happens every few months.
It could be that if you're using massive tokens on a 'plan' then then want to limit u in a way, or even if the objective is not perfectly clear they don't want semi-random token use.
See if the token/sub solution behaves differently. Make sure that when it 'compacts' that it re-reads your instructions clearly.
Straight up asking Claude to run the tests, or to generate a test, could result in potential inconsistencies between runs or between tests, between models, and so on, so instead he created a tool which defines a test, inputs and outputs and some details. Now we have a system where we have a directory full of markdown files describing a test suite, parameters, test cases, error cases, etc., and Claude generates the usage of the tool instead.
This means that whatever variation Claude, or any other LLM, might have run-to-run or drift over time, it all still has to be funneled through a strictly defined filter to ensure we're doing the same things the same way over time.
Or, you know, by writing the code yourself?
Taking a moment to create a workflow where these steps are separated (or rather, having an LLM build this workflow) and the LLMs are asked to just do minor leaf tasks increases your resilience to nerfed models.
The independent journalling angle is interesting. We have a similar concept where the AI maintains its own notes separate from the shared project context. What you're calling Claudius's independent perspective, we'd call the session layer. Still in early release but the core mechanic is the same thing you've landed on... context that belongs to the project, not the platform.
your experience just hasn’t been my experience I guess. The more managed the service you use, the more costs you are going to pay - for a very long time I’ve got by with paying for compute, network, and storage on the barebones services. If you want to pay for convenience you will pay for it.
One area that was a little shitty that has changed a lot is egress costs, but we mostly have shifted to engineering around it. I’ve never minded all that much, and AWS support is so good at enterprise tiers that they’ll literally help you do it.
> I’ve got by with paying for compute, network, and storage on the barebones services.
Yes, as I mentioned, that type of migration isn't difficult, which is akin to migrating to a different model provider, but that's not what we're discussing. You can't hand wave the issue away if you're not even talking about the the topic at hand.
That said, I agree with your suspicions of how it'll shake out in the end, because most businesses behave the same way, and always try and lock-in their customers.
not the op, but I suspect they were meaning it's a huge pain migrating to a different cloud provider when all those features mentioned are in use. not that managing them is a mess in AWS.
However if I only expect to have a handful of (lucrative) users, it's not the worst idea. The other reason to use Cognito is that AWS handles all the user login issues, and costs very few lines of code to use on my end. The fatal security issue is getting hacked, either the platform as a whole, eg S3 bucket with bad perms or user login getting leaked and reused. While obviously no system is unhackable, the gamble is if a homegrown system is more impervious than Cognito (or someone else's eg Supabase). With a large development team where the login system and overall system security isn't going to be an afterthought, I wouldn't think about using Cognito, but where both of those things are an afterthought, I'd at least consider Cognito, or some other managed system.
The ultimate problem with Cognito though is the vendor lock in. (Last I checked, which was years ago) in order to migrate users out, they have to reset their password which would cause users to bounce off your service instead of renewing their subscription.
Clem Delangue (HF CEO) tweeted about this[1] and mentioned https://traces.com/ for exporting Claude sessions
Edit: It looks like HF now supports importing your agent's session directory directly[2] (I hope they're redacting PII?)
[1] https://x.com/ClementDelangue/status/2041189872556269697
> Suchir Balaji was an American artificial intelligence researcher who was found dead one month after accusing OpenAI, his former employer, of violating United States copyright law.
> The San Francisco Police Department investigation, however, found "no evidence of foul play", and the Chief Medical Examiner concluded the death was a suicide.
Hard not to be a conspiracy theorist these days.
https://news.ycombinator.com/item?id=47567575
The lack of self-awareness is hilarious.
> My point is that they must apply these restrictions.
I fully understand and respect they need restrictions on how you can use your subscription (or any of their offerings). My issue is not there there _are_ restrictions but that the restrictions themselves are unclear which leads to people being unsure where the line is (that they are trying not to cross).
Put simply: At what point is `claude -p` usage not allowed on a subscription:
- Running `claude -p` from the CLI?
- Running `claude -p` on a Cron?
- Running `claude -p` as a response to some external event? (GH action, webhook, etc?)
- Running `claude -p` when I receive a Telegram/Discord/etc message (from myself)?
Different people will draw the line in different places and Anthropic is not forthcoming about what is or is not allowed. Essentially, there is a spectrum between "Running claude by hand on the command line" and "OpenClaw" [0] and we don't know where they draw the line. Because of that, and because the banning process is draconian and final with no appeals, it leads to a lot of frustration.
[0] I do not use OpenClaw nor am I arguing it should be allowed on the subscription. It would be nice if it was but I'm not saying it should be. I'm just saying that OpenClaw clearly is _not_ allowed but `claude -p` wouldn't be usable at all with a subscription if it was completely banned so what can it (safely) be used for?
Or, OpenAI was reckless in securing compute.
Sometimes you have to keep starting new session until it works. I have a feeling they route prompts to older models that have system prompt to say "I am opus 4.6", but really it's something older and more basic. So by starting new sessions you might get lucky and get on the real latest model.
I never asked for a 1M context window, then I got it and it was nice, now it's as if it was gone again .. no biggie but if they had advertised it as a free-trial (which it feels like) I wouldn't have opted in.
Anyways, seems I'm just ranting, I still like Claude, yes but nonetheless it still feels like the game you described above.
https://x.com/lydiahallie/status/2039800718371307603
--- start quote ---
Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:
• Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.
• Lower the effort level or turn off extended thinking when you don't need deep reasoning. Switch at session start.
• Start fresh instead of resuming large sessions that have been idle ~1h
• Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000
--- end quote ---
https://x.com/bcherny/status/2043163965648515234
--- start quote ---
We defaulted to medium [reasoning] as a result of user feedback about Claude using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out. Literally nothing sneaky about it — this was us addressing user feedback in an obvious and explicit way.
--- end quote ---
Suuuuuuure it was.
That said, I had way better experiences with old (but contemporary) Apple hardware than any other kind of old hardware.
New models come out with inflated expectations, then they are adjusted/nerfed/limited for whatever reason. Our expectations remain at previous levels.
New models come out with once again inflated expectations, but now it's double inflation, because we're still on the previous level of expectations. And so on.
I think it's likely to get worse. Providers are running out of training data, and running bigger and bigger models to more and more people is prohibitively expensive. So they will try to keep the hype up while the gains are either very small or non-existent.
Not just that, but there’s really no way to come to an objective consensus of how well the model is performing in the first place. See: literally every thread discussing a Claude outage or change of some kind. “Opus is absolutely incredible, it’s one shotting work that would take me months” immediately followed by “no it’s totally nerfed now, it can’t even implement bubble sort for me.”
Funny: I’m literally, at this very moment, working on a way to monitor that across users. Wasn’t the initial goal, but it should do that nicely as well ^^