Dynamic Workflows in Claude Code

Dynamic Workflows in Claude Code(claude.com)

161 points by mil22 1 day ago | 123 comments

inerte 2 hours ago |

I am getting so confused when to use what... agents, sub-agents, tasks, team mates, /goal, /loop, and now workflow. Each with different degrees of effort.

Don't make me think.

All these knobs are also exposed in ChatGPT, which I am more familiar when chatting. Which one of the models? Do I go Instant, Thinking, Pro? Extended Pro? Oh no, maybe I need Deep Research.

Sometimes I think it's on purpose. I fear if I try a lowest knob, it will miss something. So turn everything up. And token usage goes up.

notatoad 2 hours ago | |

i think it is on purpose, but not for the cynical reason of burning more tokens. developers like knobs, it helps us feel like we're in control. even when we're not.

so the ai companies give us knobs and buttons and sliders to make us more comfortable.

esafak 1 hour ago | |

That's what product management is for.

SkyPuncher 1 day ago |

I don't really get this. At this point, my limiting factor is not how quickly Claude can self-trudge through code. It's whether Claude is going to do the task correctly or not.

I need more mechanisms for controlling long-running sessions and dynamically injecting my thoughts, correction, and nudges rather than faster ways to burn through my tokens without knowing if the results are going to be correct.

wrs 1 day ago | |

I think the theoretical answer here is this:

"Agents address the problem from independent angles, other agents try to refute what they found, and the run keeps iterating until the answers converge."

So you will be supplying the "ground truth" (test suite, detailed spec, whatever) and empower an agent to use it to guide the other agents. Currently a lot of people do this sequentially in the form of multiple code-review passes by fresh agent sessions looking at the work of previous sessions.

Adversarial models are a longstanding technique in ML so it makes sense they would try to go this way.

vadansky 1 day ago | | |

I don't know, maybe I'm doing it wrong but I feel LLMs add a slop debt, and each agent pass just exuberates it.

Like I had an LLM implement a spec and said it was done... Except it had a ton of `casts` everywhere. Okay, my bad, I should have been clear "NO CASTS", so I use the LLM to remove the casts, except it just kept making things more and more complicated and ugly.

It took me taking a break and having a shower thought to realize all the ugliness is because one type should have been broken up into 2, which would remove a ton of generics and code. But Claude never suggested that, it was always "we need at least one cast here, or we need 1000 LOC of generic factories". I tried multiple new sessions with various prompts too.

Maybe one day soon LLMs could pay off their own slop debt but at least right now I don't trust them to write code unseen.

Edit: Maybe the correct action should have been to delete everything and make it re-write everything from scratch with the clear "NO CASTS EVER" rule. But still the point is feels like having LLM clean up after an LLM doesn't work well enough to just have keep it in a loop and never look at what it does.

KronisLV 23 hours ago | | |

> Currently a lot of people do this sequentially in the form of multiple code-review passes by fresh agent sessions looking at the work of previous sessions.

Up until now I've used a review loop approach, where within a Claude Code session I just tell it to spawn three review sub-agents, each with context of what's going on and instructions to look over all of the changed code in search for serious/critical issues, but otherwise a more fresh look at things. It works really well for the most part (token usage aside): https://news.ycombinator.com/item?id=48277011

Garlef 23 hours ago | | |

Doesn't help if the wrong design is implemented correctly.

tsunamifury 1 day ago | | |

Ground truth is not consensus, it has to be graded against what actually works for the original goal. Plenty of scenarios with AI and Humans can result in consensus around incorrectness.

sfourdrinier 22 hours ago | |

Yes, that, accuracy, speed, and single computer-use.

I find those to be the limiting factors to speed.

I have extensive rules, I do extensive planning. Yet at implementation, the rules are not respected, errors are introduced, etc...

I spend more time fixing than writing code.

Then speed... Because of the fixes and bad code quality even with frontiers model speed makes a very big difference. I (agents) spend hours daily doing reviews and fixes. 5x speed boost would make me much more productive.

And when working super fast with agents, having only one computer is limiting. Even worktrees don't solve problems because I use things like convex, chrome use, etc... and it conflicts with each others all the time.

Still many problems to solve. It's already evolved so much in the last two years.

eggplantemoji69 1 day ago | |

This is my experience. Quantity of output is not the issue right now. Quality is. But I’m not sure if this will ever be solved for, given LLMs are non-deterministic sophisticated autocomplete at their core.

Sure, ‘human in the loop’ and all that jazz, but I feel like my knowledge suffers even with this approach. I have to use llms w pinpoint focus to get decent results.

The original copilot completions behavior might be peak llm performance for coding, sans having an agent write boilerplate and such.

clickety_clack 21 hours ago | |

A more interactive Claude code would be great instead of 50 “here’s a tiny snapshot of a change shorn of the context you need to understand it. Yes or no?”s

root-parent 1 day ago | |

When this is all finished and done, these coding models will allow you to rewrite the linux kernel in rust, recode Kubernetes in assembly, and create your own web framework in 10 min.

But each prompt will cost your company, 10 to 15 million dollars. An extra 20 million if you ask them to review the code and improve the comments.

chinathrow 1 day ago | |

I think for now it's better to convert tokens into code/library code and then work with that for deterministic results rather than relying on Claude being correct or not.

jascha_eng 1 day ago | |

yes I agree with this, more granular going back, letting me interrupt where it went off the rails, or even editing file reads myself etc would be lovely. Ingesting parts of other conversations would also be cool!

dude250711 1 day ago | |

I have heard of "token-maxxing" but I have not heard of "correctness-maxxing" or "quality-maxxing".

mirashii 1 day ago | | |

Not with those exact terms, but it is certainly being discussed. Wes McKinney said in a recent talk that with current coding agents there’s no longer an excuse for shipping suboptimal code that takes on tech debt. Writing tests has never been cheaper, writing custom fuzzers, linters, and other harnesses that serve as guardrails has never been cheaper. His take is that “we didn’t have enough engineering time to do it right” is no longer an excuse, and the only excuses left are that you don’t know any better or you have bad taste.

encoderer 1 day ago | |

The answer for me has been actually more tokens, and create even more layers of automated verification

Jarred 1 day ago | |

Dynamic workflows, in my experience, make Claude more effective at complex long-running tasks. They help precisely with getting Claude to do the task correctly.

It feels more like a bespoke build system for the specific task/project than prompting a freeform chat.

aloknnikhil 1 day ago | | |

As long as agents are fuzzy (which they will continue to be with the Transformers architecture), the need to validate will continue to exist. I cannot imagine merging code without at least 1 human review.

mil22 1 day ago |

Interesting to note, not sure if this was known publicly before today's blog post:

Rewriting Bun with dynamic workflows

An example of what dynamic workflows can unlock at scale is the recent rewrite of Bun. Jarred Sumner used dynamic workflows to port Bun from Zig to Rust with 99.8% of the existing test suite passing, roughly 750,000 lines of Rust, and eleven days from first commit to merge. One workflow mapped the right Rust lifetime for every struct field in the Zig codebase. The next wrote every .rs file as a behavior-identical port of its .zig counterpart, hundreds of agents working in parallel with two reviewers on each file. A fix loop then drove the build and test suite until both ran clean. After the port landed, an overnight workflow addressed unnecessary data copies and opened a PR for each for final review. While not yet in production, all of this was handled by dynamic workflows. Jarred will be writing about this more in the future.

SkyPuncher 1 day ago | |

I'm extremely skeptical that dynamic workflows had anything to do with this. I've been able to refactor one of the most complicated parts of our code base with similar results.

Mechanical refactors are relatively straight forward for agents.

jeswin 1 day ago | | |

> I've been able to refactor one of the most complicated parts of our code base with similar results. Mechanical refactors are relatively straight forward for agents.

A rewrite of bun in Rust is unlikely to be a trivial mechanical refactor. And if you are not sharing what the complicated parts were, or how big it is, how do we assess that the task was similar?

Unless you are intimately familiar with the bun codebase and you've already made that assessment.

dools 21 hours ago |

The #1 goal for Anthropic and others is to take the longest running process possible and make it entirely opaque to the developer. It's the only way they can build a moat for a commodity. I would highly recommend building your own multi-stage orchestration flows because then you'll get a much better idea of where you need to be in the loop, and where you can save money. Once entire organisations are functioning only as extensions of Anthropic, they'll put the prices up and squeeze the shit out of the market.

trjordan 1 day ago |

It feels like we're far past the point of where having AI do more faster is helpful.

It's telling that they used "rewrite Bun in Rust" as the proof point here. It's cool! But the vast majority of software engineering doesn't start with tens of thousands of tests, where making them pass is the whole job.

In my experience, AI still drifts from what I meant it to do on anything bigger than building a widget. My time is spent suspiciously reviewing output for changes the agent snuck in, or invariants it broke. I talked with a friend recently where the agent broke the test harness badly enough that none of the tests mattered for 3 weeks. They did pass, though, so CI never complained.

There's something at the intersection of context engineering, managing that sloppy pile of markdown plans, and good old fashioning system understanding that's the real bottleneck.

bcherny 1 day ago |

A few of us from the Claude Code team will be hanging around if anyone has questions! Very excited for this launch -- dynamic workflows have been a game changer for engineering here at Anthropic. Can't wait to hear what you think.

vld_chk 1 day ago |

Quite a thing to use Bun rewrite to Rust as example of dynamic workflows, while now it is considered as anti pattern which leads team to stop supporting the tool due to inability to properly understand and navigate 1m vibe coded Rust lines

Deukhoofd 1 day ago |

I'm going to be honest, this very much reads like an exciting new way to burn up as many tokens as possible. Large amounts of parallel agents that all have all their work double-checked by multiple other agents, and that keeps running for a longer period of time?

I feel like there are more efficient ways to tackle the issues given.

ithkuil 1 day ago | |

Possibly. But otoh one cannot complain that agents don't produce high quality code while at the same time not allowing them to thoroughly go through all the steps required to produce high quality code

afro88 1 day ago |

I tried this out yesterday - lucky enough to have access through EAP at work. The workflows that are generated are quite good - smart parallelisation and phasing. End results for larger chunks of work are also much better, which I attribute to more of the work having clean context windows (Opus 4.7 is unusable past 200k conversation length, and each subagent ends up using less than that IME). They also seem to have a validation phase hint in the workflow generator which also helps a lot. Speed is a bonus.

You can achieve a similar result manually prompting to use subagents, yes. But the TUI for in flight dynamic workflows is really nice - great visibility into exactly what's happening.

Honesty, for anything larger than a 1 shot PR, it's worth firing off a workflow for better automatic context management alone (more work done in the first 20% sweet spot)

ncphillips 1 day ago |

I just hit my Claude Max limit for the first time _ever_ thanks to workflows lol

Like 90 agents ran to do a code review of a fairly small package I have.

They're really looking for us to increase token usage aren't they?

tomjakubowski 1 day ago | |

This is a fundamental incentive issue with any company that does all of training models, building harnesses for them, and offering them as a service.

tra3 1 day ago |

I say this as someone who's found LLMs incredibly beneficial.

Is this a way to increase token burn?

I thought we covered this with Claude's C compiler. What changed?

mattas 1 day ago | |

My initial reaction was that this is tokenmaxxing disguised as a product.

sermakarevich 12 hours ago |

Not sure why Claude does not have AskUserQuestion implementation that works for spawned sessions: subagents, teams, workflows. Without it, spawning hundreds of subagents and wait for final result without single input feels a bit risky.

Here is the solution to it. Built on a SQLite DB and MCP, blocking until the question is answered, supporting all possible question types, with a CLI or web interface for answers, `ask_human_question` fills the gap in efficient subagent management.

https://news.ycombinator.com/item?id=48320233

xcskier56 1 day ago |

Are these “features” just hooks to get people to burn more tokens faster?

I’m at the point where deciding what we should and should not do takes a lot more time than actually doing it. More agents just means running faster in potentially the wrong direction

cush 21 hours ago | |

They’re pure enterprise features - needed for massive legacy codebases with tens or hundreds of similar enough coding tasks - where there is a lived “cost” of not doing this type of work paid by every engineer working around it

aabdi 1 day ago |

wrote something similar for my own use/work stuff; seems everything is converging towards similar ideas.

IMO, this style of workflow/agentics is how all SWE'll look like long term. Automate everything into a big pipe-y thing. How it's gonna be modelled is up in the air though. lots of different approaches:

mine: https://github.com/portpowered/you-agent-factory

https://github.com/ComposioHQ/agent-orchestrator

https://github.com/gastownhall/gastown

https://github.com/openai/symphony

vblanco 1 day ago |

I made my own knockoff of that for myself https://github.com/vblanco20-1/AgentLoom (not really usable, just a vibecoded prototype), based on the workflow files found in the Bun repo. Ive been using it but pointed at deepseek flash to do some really large scale stuff. Its a fun way of using agents, and highly useful for tasks like code review to apply some rules, or to find vulnerability candidates. Funny enough, i used it in the same way claude does, vibecoding the workflow scripts and prompts themselves.

I did find it uses tokens like crazy, i migrated Pixel Dungeon (java) to C# as a experiment, and it used almost 2 billion tokens. It was just 20 bucks due to deepseek flash, but i shudder thinking of how much money this uses when run on the real claude API pricing.

jorgeleo 1 day ago | |

curios minds... why to do that port?

vblanco 1 day ago | | |

just to test the tech. No real usage other than for the fun of it.

I did port stb_image from C to Jai which i was able to fully verify and harden and that one ill give more use. Im also using the same workflow system to perform agentic translation of a game i work with from english to various other languages, the results are far better than the commercial "human" translation services we tested. And i also use it to fix OCR issues on PDF books im ocr-ing for a data pipeline. This kind of workflow/wide agent swarm system is rather useful for many things where you want to "apply" the same prompts across a whole codebase or just in parallel.

buryat 1 day ago |

Not sure I understand how it's different from a team of sub-agents, what's the difference I'm curious?

bcherny 1 day ago | |

There's two main differences:

1. Support for 1-2 OOMs more agents, to do more work in parallel

2. A phased, semi-structured approach where work happens in steps

Robdel12 1 day ago |

> Rewriting Bun with dynamic workflows

There ya go, the rewrite was for marketing.

CuriouslyC 1 day ago |

Anthropic is going to price themselves out of code, but still find a nice market providing service to senior management. Their long term play is virtual employees rather than tools for humans.

chandureddyvari 1 day ago |

I’m currently cobbling sub agents with hooks, workflows looks very promising for doing things more predictably.

Is this equivalent of DAGs for sub agents inside claude code? Can i pause and resume/retry workflows? How stateful are they?

Really appreciate it someone claude code can throw more light on above. I’m trying to see if I can get langgraph equivalent DAGs here.

mohsen1 1 day ago |

I’m gonna try this one on tsz. So far Codex /goal has been great

https://tsz.dev

So far Codex /goal has been amazing but Claude Code /goal or even /loop does not work hard enough and gives up. I have observed it just claiming it’s “iterating” in a broken loop or simply giving up.

nebben64 21 hours ago |

How is this different, or how does it complement Agent Teams? When should I use which?

AndyNemmity 1 day ago |

I have had dynamic workflows in my agent for the past 9 months.

I am diffing Claude Code with them, I tend to agree with the analysis.

So far, versus my system, there are tradeoffs, but the dynamic workflows are over tuned to use way more agents that I have ever found add value.

It used 8 to diff our systems. I would have used 4, for example.

Zopieux 20 hours ago |

Who, beyond Anthropic themselves, can afford such purposefully wasteful uses of LLMs?

"the model sucks a bit so we just have best-of-4 & adversarial reviewing agents; surely one more agent will do the trick"

brap 1 day ago |

>Claude dynamically writes orchestration scripts

So, is this like a skill the LLM should follow, or an actual "workflow" in the deterministic sense?

If it's the former, is it even reliable for long running tasks? If it's the latter, can users interact with it?

afro88 1 day ago | |

It's the later. You can view it and see fine grained progress, but you can't interact with it. I hope that's coming next, because it would be useful to steer later phases or even agents

vb-8448 1 day ago |

> Rewriting Bun with dynamic workflows

Are we sure this is a good "success story" example?

facundo_olano 20 hours ago |

Absolutely annoying to have it assuming that I want to use this when I type workflow in the prompt. Like thats not already a thing in half of the software projects

ajma 1 day ago |

> It’s important to note that dynamic workflows consume meaningfully more usage than a typical Claude Code session

2001zhaozhao 1 day ago |

We really need a way to scope and implement these multi-agent orchestration features that isn't locked in to one provider.

piyuv 1 day ago |

“We realized the tech is not as addictive as we’ve hoped so we won’t be able to raise token prices enough to be profitable, so here’s a way to make you consume a lot more tokens without even realizing”

isoprophlex 1 day ago |

This seems like it's an openclaw, anthropic edition. Something like ClaudeClaw?

seabass 22 hours ago |

Have to love their demo use case: React -> Solid migration

zli0823 1 day ago |

found a new way to burn your money quicker.

mkw5053 1 day ago |

Wow, almost like the good old days of /ultrathink are back. Feels simultaneously like just yesterday and a lifetime ago.

zli0823 1 day ago |

a completely new way to burn your money.

SilverElfin 1 day ago |

Cloudflare just launched a feature with this same name, just this month. Why would Anthropic choose the same exact name?

https://blog.cloudflare.com/dynamic-workflows/

Also isn’t all of this already easy to do on any of the platforms (include Claude before this and OpenAI too).