@spec LINT_COMMAND.ORPHAN_VERIFIES
linter reports blocks that do not attach to a supported owned item.
Then #[test]
// @verifies SPECIAL.LINT_COMMAND.ORPHAN_VERIFIES
fn rejects_orphan_verifies_blocks() {
let block = block_with_path("src/example.rs", &["@verifies EXPORT.ORPHAN"]);
let parsed = parse_current(&block);
assert!(parsed.verifies.is_empty());
assert_eq!(parsed.diagnostics.len(), 1);
assert!(
parsed.diagnostics[0]
.message
.contains("@verifies must attach to the next supported item")
);
}And then the CLI command “special specs” pulls your specs and all attached verification + test code so you (or your LLM) to analyze whether the (hopefully passing!) test actually supports the product claim.
There’s also a bunch of other code quality commands and source annotations in there for architectural design & analysis, fuzzy-checking for DRY opportunities, and general codebase health. But on the overall principle, this article is dead-on: when developing with LLMs, your source of truth should be in your code, or at least co-located with it.
Always was.
> Social media is making money off him
Yes, and he's been making a TON of money as well.
> and we're all either laughing or astonished
So social media, got it. Social media LOVES controversial assholes, they just keep making money hand over fist (look at the Paul brothers for example).
We had to migrate to a new customer who didn't want the python middle layer, they wanted our software (Java) to read the output directly (don't argue on why this was necessary, just a requirement from a big customer). Since the python script grew up over time by many developers with small changes for new features, it was nearly impossible to understand exactly what it was doing.
I took only the parser code gave it to Opus 4.7 and said "This python code parses an undocumented file format. All of our tests pass and this code is the spec for the file format. Read the python code and create me a BNF document that describes what it parses." I took the BNF and started a new session and said "write me a parser in Java that parses the grammar in this BNF." It got it in one shot. Then all I needed to do is take what was parsed and pipe it into our existing software that took the structured format the python created.
Most of what I do is brownfield work and I really can't put our whole code base into an LLM because of IP issues, but for little things like this, is saves me weeks of work.
In other words: specs can be as detailed as it gets, and this is why developers have a hard time when they face as a senior an NDAed regulated environment. It ain’t software craftsmanship but data flow, hardware components, compliance on the lowest level including supply chains often times, information architecture - a simple app needs to comply to specs that amount to thousands of pages.
Context window: circular reference. A year ago? Specsmaxing by really weeding out any redundant words. Today? Yawn, like with 8mb RAM vs 512 Gigabytes.
AI wants to be easy on us so what is a spec anyway then?
To put it this way: the spec for the spec is constantly evolving.
Last year’s prompts lead to extremely different results today no matter how maxed out.
The author was on point with his introduction: AI is as junior in many ways when it comes to any sort of efficiency and optimization.
This is my revaluation after years of experimenting with AI. Beautiful code, sophisticated but performance wise and its architecture are laughable at best.
AI is not trained on optimization. Not the slightest and juniors have no clue about algorithms and Big O.
In fact Google used Big O as a basic entry level interview question for a very long time. They have to but the simple fact that in my experience 99% of devs never heard or consider it speaks volumes.
AI cannot compensate for that (yet).
I went the opposite and my specs focus heavily on architecture and the obvious dumb performance drains noobs do.
Google was mocked about Big O. And yes, failing to understand that Big O can be neglected thankfully in 99% of cases is part of its logic.
AI bloats your code. And a year long single dev project gets pumped out in hours. In short: a homerun for Big O because it looks on results that change depending on the variables. A function in mathematical terms.
So I think the author did a funny and great job of you focus on Big O if needed. Everything else is not that important because of being open to change and extension.
Big numbers need great architecture.
It screams loudly. And also think about leaks. Before AI I had virtually no memory leaks at all. Since AI NodeJS and React are worse leaking compared to IE 6 and 8. I mean it.
Big O reduces them significantly, so don’t work around the Elephant in the room.
Architecture and optimization is brutally hard. Google blew my mind in this regard but this is another story of squeezing out even milliseconds out of a build tool used by all. A single dev laughs at it but failed the calculation as well as abstraction.
There's a fair amount of talk right now about the value being in the verification layer -- once there's a hard verification loop, the agents can do amazing things without getting (permanently) sidetracked. I think what you're working on is half way there -- in essence, you're probably relying on the LLMs notion of what a spec is and should be to the codebase.
What's not currently solved, and what I think is very interesting is how much automation can be added to the creation of verification. We all would unlock a lot more speed and productivity for even moderate gains on that side.
fyi language alone can’t define/describe requirements which is why UML existed.
You could deterministically process any UML diagram into a prose equivalent.
And in fact you couldn't do the other way around (any prose -> UML) because UML is less powerful than natural language and actually can't express everything that natural language can.
Can it also fully describe a composition by Bach or a Rembrandt's painting? In some weird, overly complex way it probably 'could', but it would be very painful. That's why we pick other forms of expression. We use other forms of expression to compact and optimise information delivery. Another benefit is that we cut out the noise. So yes UML cannot describe everything natural language can, but then again why should it - it was designed as a specific framework for designing relations between objects. Not more and not less. Similar for sequence diagrams or other forms of communicating ideas efficiently.
Otherwise, I like the idea of machine-readable specs.
A full blown event model facilitates all communication, human (management, devs, ops) and agentic. But maybe I’m missing something, maybe the dashboard can have this function I didn’t dig into it too much.
> You’re absolutely right! Let me fix that.
Dear Claude,
I hope this email finds you well.\
I am writing to ask if you could please do another task for me.\
Start by running \`npx @acai.sh/cli skill\`.\
This will teach you everything you need to know about our process for spec-driven development. Then, proceed to plan and implement the features specified in our spec files.
Love,\
\[your-name]
Honestly, I can no longer tell parody from reality. Whether in politics or AI.1. Don’t write in yaml. It’s really hard for humans. Write in markdown and use a standard means to convert to lists / yaml.
2. Think beyond you writing your own specs - how does this expand into teams of tens or more. The ticketing system you have (jira? Bugzilla) is not designed for discussion of the acceptance criteria. I think we are heading into a world of waterfall again where we have discussions around the acceptance criteria. This is not a bad thing - is used to be called product management and they would write an upfront spec.
If this new world of a tech and a business user lead the writing of a new spec (like a PEP) and then then AI implements it and it’s put into a UAT harness for larger review and a daily cycle begins, we might have something.
Good luck
I want to try this features.yaml approach - there is something solid there, and the acai.sh tooling looks nifty.
I wanted to star the project to track the progress but it feels a bit weird.. Which repo shall I track? Server? Cli? Sounds like a misc repos.
Don't we just love the hard fact conclusions based on sample size N=1 and hand-waving arguments?
First it was choice of editor: people were micro optimizing every aspect of their typing experience, editor wars where people would literally slaughter over suggesting another camp.
Editor wars v2: IDEs arrived and second editor war began.
Revenge of the note taking apps: Obsidian/Roam/Joplin/Apple Notes/Logseq. Just one plugin, just one more knowledge graph, bro, and I’ll have peak productivity. 10x is almost here.
AI: you’re witnessing it now.
Do people NOT have anything else in life? How are y’all finding time to do all of this shit? Are you doing it on company time? Do you have hobbies, do you learn foreign languages, travel, have kids or spouses, drive a car, other thousand “normie” things outside of staring at the freaking monitor or thinking about this shit 24/7? Did I miss the invention of a Time Machine?
Also, a lot of folks don't write code anymore, and barely have the time to read the volume of code that AI produces. This may just be one of the most profound changes in an industry, and some folks are excited about it and want to get better at building with it.
I think the person who wrote this post made a good faith effort to share his learnings while promoting his tool.
How are any of those things even remotely as interesting as arguing with people about an Emacs config?
People are people.
This industry is just getting more and more bonkers.
also, i wonder if people who did MDD (model driven development) have embedded AI in their methodology
It's why famously, programmers always say, the code is the documentation, because writing detailed docs is very tedious and nobody wants to do it.
Behaviour Driven Development or Spec Driven Development are, loosely, forms of Test Driven Development where you encode the specification into the code base. No impedance, full insight, formality through code.
I think people get really dogmatic about “test” projects, but with a touch of effort a unit test harness can be split up into integration tests, acceptance tests, and specification compliance tests. Pull the data out as human readable reports and you have a living, verifiable, specification.
Particularly using something comparable to back-ticks in F#, which let test names be defined with spaces and punctuation (ie “fulfills requirement 5.A.1, timeouts fail gracefully on mobile”), you can create specific layers of compiled, versioned, and verifiable specification baked into the codebase and available for PMs and testers and clients and approval committees.
One thing though, I loved the "AUTH-1" numbering and the Yaml breaks that into an Auth section, with "1." subsection which I don't like nearly as much, the codification AUTH-1 is more referenceable/searchable.
The tooling expects that you refer to each requirement by full ID e.g.
# admin.AUTH.1Disagree on the bit about it "never going to work" though.
Failure-prone stochastic ML systems produce testable, auditable code... just like failure-prone human brains can produce testable, auditable code. And in fact, in both cases, changes to our process can reduce the amount of failures that slip past testing and audit. Or can reap other rewards. Finding the a better process is what I'm interested in right now.
You're missing the bigger picture here. Yeah, they produce code. But "producing" code was never the bottleneck. Yes you can pop out a webapp within a couple of hours, but now you have no clue how it works, even if its a language and framework you are competent it in, because you skipped the part where you understand how the parts fit in together architecturally. So you wrote an elaborate spec, but the LLM "decides" to do something else. Maybe they don't make that PK autoincrement or they throw you in those nice empty "catch" blocks they ingested from various beginner tutorials, which will be very "helpful" when you application silently deviates from the happy path execution that you spec'ed the hell out of in your virulent spec-driven-workflow.. So it "kinda" works, it generates the code. It works the way your kid's toy car works - it "drives" but it cannot be driven to work, can it? So it does not work in the big picture. It's not a reliable enterprise ready system. It's a toy, and should be treated like one.
I don't remember how much he made selling the "quickly-killed startup bought by now failed internet giant".
But I'm fairly sure how much money he made building something other people used is peanuts compared to what he made investing early in companies where others built things, instead of him.
> My point is, the spec must live somewhere, even if you don’t write it down. The spec is what you want the software to be. It often exists only in your head or in conversations. You and your team and your business will always care what the spec says, and that’s never going to change. So you’re better off writing it down now! And I think that a plain old list of acceptance criteria is a good place to start. (That’s really all that `feature.yaml` is.)
Checking the compiled artefact into the codebase without checking in its source code has always been a risky move!
The tradition of having a deck of punch cards evolved to having assembly, to Pascal, Fortran, C, basic. The important part is a human-auditable directive, not an opaque, generated artifact as the thing that matters.
have evolved and adapted. Photography, film cameras, polaroids, camcorders, digital cameras, smartphones, social media, Zoom/virtual attendees. Same with birthdays. Handwritten cards, to phone calls to e-cards, Facebook wall posts, video calls, shared photo albums and Sora (RIP) videos.
Specs are the end goal, not how the software look at a moment in time.
We iterate feature by feature through this process, and occasionally circle back on the original product manual to identify drift.
After the original documentation is drafted, I have the agent write up placeholder files and define all of the interfaces we expect to need (we will end up adding a lot later, but that’s ok) every file should reflect a clear separation of concerns, and can only be reached into through its defined interface, all else is private. I end up with more individual files than I would by hand, but by constraining scope at file granularity, and defining an inviolate interface per file, I avoid the LLM tendency to take shortcuts that create unmaintainable code.
I also open each new context with an onboarding process that briefly describes the logos and the ethos of the project, why the agent should be deeply invested in the success of the project, as well as learnings.md which the agent writes as it comes across notable gotchas or strong preferences of mine.
Needless to say, I use one million context , and it’s a token fire… but the results are solid and my productivity is 5-10x
“Specsmaxxing” is basically the right response to this. When you can't rely on authorial memory, you have to put the intent somewhere durable. Specs become the source of truth by default if we continue down the road of AI generated code.
1: https://ossature.dev/blog/ai-generated-code-has-no-author/
It allows Claude to look back into the session where a change was made and see the decisions made, tradeoffs discussed and other history not captured by code, tests.
Like, if waterfall of a project can be done in 2 weeks, is it agile now?
Yes, you don’t know the nuances of all specs upfront and revision will be necessary. Turning the ship with arbitrary degrees of freedom outside of bullet points on a roadmap is not an efficient way to resolve that for many projects.
The most common form of what you'd call a "spec" is the acceptance criteria on a work ticket, which is an accretive spec i.e. a description of desired change -- "given what already exists, change it as follows". I.e. if you somehow layered and summarized and condensed all tickets that have been made since product started, you'd have your "spec".
But it's the devs who were doing that condensing via understanding each desired spec addition vs reality of existing codebase.
So the gap between what people are currently calling "specs" what the code was already doing is not big and will not stay big, but for the fact you're effectively adding another (quasi) compile step underneath - and in this case its a non-deterministic one.
I've been doing "specmaxxing" for a few months now. Unlike the author I don't use Yaml, I use a mix of Markdown and Gherkin. If you haven't encountered Gherkin before, it's not new and you might know it under the name Cucumber or BDD.
Gherkin is basically a structured form of English that can be fed into a unit testing framework to match against methods.
The nice thing about writing acceptance criteria this way is that they become executable and analyzable. You write some Gherkin and then ask the model to make the tests execute and pass. Now in a good IDE (IntelliJ has good support) you can run the acceptance criteria to ensure they pass, navigate from any specific acceptance criteria to the code which tests it (and from there to the code that implements it), you can generate reports, integrate it into CI and so on.
And when writing out acceptance tests that are quite similar, the IDE will help you with features like auto-complete. But if you need something that isn't implemented in the test-side code yet, no big deal. Just write it anyway and the model will write the mapping code.
There's a variant of Gherkin specifically designed for writing UI tests for web apps that also looks quite interesting. And because it's an old ecosystem there's lots of tooling around it.
Another thing I've found works well is asking the models to review every spec simultaneously and find contradictions. I've built myself a tool that does this and highlights the problems as errors in IntelliJ, like compiler errors. So I can click a button in the toolbar and then navigate between paragraphs that contradict each other. It's like a word processor but for writing specs.
Once you're doing spec driven development, you don't need to write prompts anymore. Every prompt can just be "Update the code and tests to match the changes to the specs."
Did I miss something or is everyone back in 1970s, working in waterfall processes now?
You don't plan to follow the plan. You plan in order to understand the whole problem space. Obviously no plan survives contact with reality.
We do agile
Guess what? Every single one of them was doing waterfall.
Their agile included preplanning and pre-specifying the full spec and each task, before the project kicked off. We'd have meetings where we'd drill down into tasks, folks would write them down so detailed that there would be no other way than doing that. Agile would be claimed, but the start date, end date, end spec and number of developers was always concrete.
Sometimes, the end date was too late, so a panic would ensue. Most of the time, the date was too late because developers had "unknowns" which then had to be "drilled down and specced so they wouldnt be unknowns". Sometimes, nearly 50% of the workweek was spent on meetings.
A few times, a project was running late - so to make sure we are _really_ doing it agile, we'd have morning standups, evening standups, weekly plannings, retrospectives, and backlog refinement. It would waste the time, and the "unknowns" aka "tickets to refine" were again, as always, dependant upon the PM/PO/CEO's wishes, which wouldn't get crystallized until it was _really last minute_.
One customer wanted us to do a 2 year agile plan on building their product. We had gigantic calls with 20+ people in them, out of which at least half had some kind of "Agile SCRUM Level 3 Black belt Jirajitsu" certificates.
To them, Agile was just a thing you say before you plan things. Agile was just an excuse to deal with project being late by pinning it on Agile. Agile was just a cop out of "PM didn't know what to do here so he didnt write anything down". Agile was a "we are modern and cool" sticker for a company.
And unfortunately, to most of them, agile was just a thing you say for the job, as their minds worked in waterfall mode, their obligations worked in waterfall mode, companies worked in waterfall mode, and if they failed their obligation to the waterfall, their job would go down one.
So while we were doing the Agile ceremonies, prancing around with our Scrum master hats, using the right words to fit into the Agile™ worldview - we were doing waterfall all along.
And after 15 years, I'm not even sure - did agile really ever exist?
The waterfall process I experienced went like this:
- Product folks created requirements
- architects produced detailed specs
- project managers created tickets based on specs
- lengthy estimation ensued.
- Then finally developers proceeded with implementation.
- QA tested it.
Each step above involved lengthy review with like 5-10people. If the devs found an issue with the spec or god forbid the requirement it triggered a massive cascade of work for everyone above. Things needed to be reviewed again, customers may need to get contacted, …etc.
I think we can learn from that and optimize for change. Specs as living documents close to the code should be less cumbersome. But, just like anything else large corporations will probably fumble this like they did with “agile” (SAFe I am looking at you).
This is a long way to say specs aren’t bad. Specs that are difficult to change are though.
When rewriting the entire codebase is very quick and cheap, why bother iterating on small components?
distributed teams do well when proposals, decision, etc, are written down, and can be easily found and referenced
it doesn't mean docs are frozen in time and can't be patched like code
Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.
It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.
1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.
2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.
I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.
When it’s time to review, review both code and conversation. 200 “user written messages asking why and what”? Likely a good PR. 15 “yes, yeah, ok, whatever”? Well you might want to give that PR some love.
It feels to me that when we commit, we throw away half, if not most, of the work done by not recording it.
I am also stealing the idea of talking to LLMs as if it's an email. So funny, we need to be joymaxxing a bit more I think :)
You probably don't want people associating your work with abusing crystal meth and hitting yourself in the face with a hammer.
For anyone missing the reference, SNL has a pretty good explainer:
Second is that I'm doing a lot less "seat of my pants prompting" and doing more engineering and ideating, which was a big goal of mine. So I'm feeling less psychotic there too.
And sort of tangentially to that, I think a significant subset of devs actually are willing to just prompt their way to nirvana, day in and day out. I'm not. I think the spec will carry a lot of weight for a long time. Maybe they will get further than I give them credit for? Maybe the whole digital world becomes a single chat box?
Did cucumber go away? I never hear about it anymore.
> Nothing beats an organic, pasture-raised, hand-written spec.
Hah, I strongly empathize with the wording. I’ve been starting my design docs for fellow humans with “100% hand-written, organic content”, I might steal a part of yours.
Overall, cool idea. I don’t see myself using your SaaS, but the approach of tagging the requirements and constraints to make them easier to find sounds good.
One project you didn’t mention which I think is also, I think, a cool perspective on this is codespeak.dev , but I haven’t given it a go yet.
All in all, I feel like maintaining specs, and having agents translate spec diffs into code diffs is a promising area for the future. Good thing I enjoy writing!
If you are in this camp, consider educating yourself a bit on the V-model [1] and notice that this is not only used in the waterfall model, but that it is a way to decompose problems and verify that everything works properly.
This may not be required for a small hobby project, but if you start working at something with multiple companies in various technologies, it soon becomes extremely useful.
Also, I mainly pursue these tools so that I can have AI accelerate this process and broker an agreement after negotiating specs with the agent.
The one thing I like that OP brings is to tie specs and code together. The openspec flow does help a lot in keeping code synced with specs, but when a spec changes, AI needs to find the relevant code to change it. It's pretty easy to miss something in large codebase (especially when there is lots of legacy stuff).
Being able to search for numbered spec tags to find relevant bits of code makes it much more likely to find what needs to be changed (and probably with less token use too).
I've stopped doing it entirely and just archive directly after implementation.
When you do the sync process, it just keeps drifting and drifting until you have duplication and contradictions across specs.
I agree that tying the specs and code together helps for that but it still seems like extra overhead, even if the value is better justified here.
That's how you end up with those cooking recipes that only "take 5 minutes". Sure, if you don't count buying all the ingredients, cleaning and preparing them, cleaning up the pots and pans (and probably the worktop, stove, etc), a lot of things can take 5 minutes. Even trivial stuff like scrambled eggs don't actually take 5 minutes when you take everything into account.
Reminds me when I automated a manual service deployment that only "took 5 minutes". Sure, copying the binaries only took 5 minutes, but coordinating between various departments to deactivate the relevant monitoring bits, turning off the services, invalidate the caches, etc, etc actually took half a day with humans involved. Once automated and parallelized the thing took about 10 minutes for a data center.
https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...
This seems like the answer to that thought!
An executable spec like gherkin or hitchstory is config - it has no loops or conditionals. There are a number of rarely recognized benefits to this.
If you're genuinely confused, and haven't tried Opus for coding, then it's not surprising you're confused!
It is also okay for you to just not like the idea of LLMs for coding (but say that!).
I’ve used a couple of different skills libraries for this - most recently “super powers” which builds a detailed markdown plan and then uses TDD for most parts.
Not sure converting to YAML and running an app to track beats Linear tickets or a local cache or markdown if I’m honest but if it works for you and your process that’s great!
Mainly I don’t think everyone building their bespoke solution needs to try and create a product out of it. If it works for you, maybe good enough. Focus on your process before you worry about generalizing these days.
I have seen the same idea with processes, pipelines, lists, bullet points, jsons, yamls, trees, prioritization queues all for LLM context and instruction alignment. It's like the authors take the structure they are familiar with, and go 100% in on it until it provides value for them and then they think it's the best thing since sliced bread.
I would like, for once, to see some kind of exploration/abalation against other methods. Or even better, a tool that uses your data to figure out your personal bias and structure preference for writing specs, so that you can have a way of providing yourself value.
"Don't write prompts like that, do it like this! I swear it's better. Claude says so!"
Seriously, it's just not
Write your code like it's your spec and your software will be more stable, maintainable clearer to read.
Code is not transient, it is your friggin spec itself
And if your code isn't structured like it's a spec, then your code is garbage from the perspective of LLM driven development
If the specification is written in such a strict format as YAML, I would expect it to be executable, something like this https://blog.fooqux.com/blog/executable-specification/
But as far as I understood, for acai that is not the case.
An LLM/agent can take 100 tokens worth of spec and turn it into 1000 tokens worth of code. We all understand this well enough in the most basic case: you type a one line comment and the LLM will give you the correct 10 line function. But this scales - 10 line comment into 100 line function, 100 line spec into a 1000 line object etc. Depending on the work, you could get more or less than 10x compression. The disagreements are all largely "what what level of resolution should devs be operating", my opinion is that the current sweet spot it roughly 1000 tokens of spec (~750 word doc/3 pages) producing 10000 tokens (~1000loc)
This compression allows your spec to be much denser and more readable than raw code. It can be shared, understood and modified by non-dev partners more easily, gives you flexibility into how it's implemented etc.
This industry has become a parody of itself, and people are celebrating.
That would be easier to use than gazillons of .md files and skills.
Harder for brownfield projects of course (and it basically replaces Jira, which would cost me all my social capital if I try), so I won’t be adopting it now. But I will be leaning into long-lived organic, grass-fed assets like conceptual-model.md that‘s critical for evolving the product forward but doesn’t strictly map to the types in the code.
I really enjoyed engaging with the article. Loved the humor sprinkled in, and the humility demonstrated throughout. Some great first-principles clear thinking here.
For anything large, the spec becomes increasingly more complicated. Look at software schedules in the old waterfall days of the 80s/90s: the spec / planning period was maybe 30-70% of the project.
Unless you’re working on pretty routine stuff, the real problem is that the customer (which might be you) almost never knows what they want. The spec will change the minute a customer gets something to play with.
This was the real value of agile in my mind: letting a customer change their mind as early as possible.
Very few devs are actually reviewing any generated code.
> Why not fast forward to that point and save 80% of the time
If you are saving 80% of time, you aren't actually reviewing the code.
is it possible that you might be in a job that’s not right for you?
it sounds like you want to cut out 80% of your job. if you want to cut out 80% of your job, maybe you’re doing the wrong job? y’know?
like, i read this comment and my mind goes to project/product manager who has real experience of coding. going from a spec (tickets / design docs / customer feedback notes / epics / stories / whatever) to a working implementation (team of engineers build it and you don’t have to use brain power). it sounds like you’re describing turning your job into the job of a PM.
we need more PMs like that in the industry. good PMs are few and far between, good PMs who know what’s it’s like to code — even fewer. so maybe have a think about why you’re working this way? i don’t really care if you do or don’t, but future you might be glad if you took some to think about it.
Code lacks the surrounding theory that situates the code in the world [1]. My specs contain the theory that the code lacks, which makes specs more valuable in the future. Specs are proprietary data. Data holds value in a post-AGI world, not code.
I am defining specs to be more than just an architectural spec, to me it's more like I'm writing a booklet about a subject, and I'm using it to teach the LLM via in-context learning. It might need a different word than "specs".
So isn’t that gambling, not engineering?
There is a massive difference between a spec, which defines what the product should do, and code, which defines exactly how it should do it. Moving from the former to the latter is not "the easy part". Anyone who genuinely believes that either works on easy and straightforward problems, or is some sort of programming god. Because translating specs to code can still be difficult and exhausting.
> There is a massive difference between a spec, which defines what the product should do, and code, which defines exactly how it should do it.
He states: The difficult part is figuring out the details so LLM doesn't save much time. You state: If LLM is able to correctly assume the details that saves you a lot of time.
Case 1: Part of the spec describe some basic feature based on a popular framework and industry standards, everything is trivial. You are right, he is wrong.
Case 2: Part of the spec describe some niche feature and/or uses some not popular framework and/or require deviation from industry standards and/or cutting edge performance/latency requirements and/or uses a bunch of proprietary non-googlable data. You are wrong, he is right.
The more senior engineer are the less time they spend on case 1, those are easy, they don't spend much time on it, it is the 2nd which is much more time consuming.
To me saying "the code is the spec" is like saying "the business wants it this way because that's how the code is written". Which is obviously backwards.
Does the business mandate we use a cache for this hot path? No, but the business set performance targets, and the cache was a sensible way to satisfy them. See the difference?
I believe that the 'musts' and 'must nots' deserve special attention, and need to be recorded well before I decide on the 'how'. Every team does this differently. I find that writing itemized, functional acceptance criteria is practical way to marry the two domains. I also think the process matters a lot more now, because the temptation to let an agent ship it is increasing and the tedium of maintaining these specs is decreasing.
This seems confused. Specs are free to include as much or as little detail as they deem necessary. If a spec only wants to suggest vague performance goals and handwave the details, that's permitted. But if specs want to specify the exact means by which performance will be guaranteed, that's also permitted. And this isn't an anti-pattern, this is often very useful. For example, plenty of APIs in the real world specify algorithmic upper bounds for time and space consumption, which is useful in that they allow downstream consumers to have a greater understanding of what sort of performance their own systems will exhibit despite the API itself being a black box in other respects.
So the answer to the original question definitely isn't "no", it's "maybe, depending on the sort of guarantees we want to provide to our users".
You can't enforce a "do not do this" to an LLM. Just putting it in the context by saying "don't do this" makes it more likely that it will eventually do that.
You are confusing code with application code. The latter thing you describe is a test, which is expressible in code.
Not only is it backwards, it is a belief that is completely wrong and detached from reality. More often than not, implementations contrast with business requirements both in terms of bugs and gotchas.
Also, it's laughable how code is depicted as the realization of any spec when the whole software development sector is organized around processes that amount to improvising solutions in short iterations.
When you want to build a bridge you finalize all the blueprints and then someone goes and actually pours concrete, in software the blueprint is the code, and the code is also the bridge.
However there are different levels of abstraction for writing specs and code is just the most explicit form. With LLMs more of our time can be spent in those higher levels of abstraction and free us from work that is often repetitive and mundane.
I think the (distant) future of software engineering is not code writing but mostly requirements writing, and so it makes sense to build frameworks, “IDEs”, etc. around this new form of “programming”.
I don’t know if ACAI is the right one but the direction is interesting.
Construction has plans "as designed" and "as built".
What is different now is that maintainability and debugging design decisions were made w.r.t. human coders or teams in the past which is not necessarily the case anymore. Should we just specify the API and let agents figure the rest or do we still want to control the rest to ensure maintenance and security? A year ago I definitely thought so. Now it is more murky as the agents are faster browsers of codebases and can explore runtime effects faster than I can type and parse output. Strongest empirical observations depend on the runtime behavior so they have an edge there.
Having functional behavior and implementation details separate can be really useful, even though it is typically a pain to keep them in sync.
I'm not saying that you can just take any code out there and call it your spec.
I even pointed out in my last paragraph that code not written as a spec is garbage for LLM driven development - precisely because you end up with unintended implementation details becoming your spec.
There are a lot of ways to address this, but ultimately it is down to how you structure your code, where you place your comments, and what you write into your agents.md in the modules and what kind of QA agents you configure to go through your code before reviewing the code changes/spec adjustments yourself
A more heavy handed approach could also be domain driven design, where your domain/core package becomes the spec, and the unspecified parts get extracted to less specified modules with less explicit structuring.
The core of the misunderstanding is between new builds and making changes to existing builds (where most software dev work actually happens). Yes, you'll get a great headstart with a detailed spec for a new build. The issue is in the hundreds of changes that'll follow that.
Do people think that the desire to make shortcuts and do minimum effort changes is going to stop just because you've got a bit-more-natural-language-looking spec? And then with an AI underneath making probabilistic changes to code that's now basically a compile target - they really think the dev pace isn't going to collapse, but just faster and with a big ongoing inference bill?
The LLM's do not form mental models. You are not going to get a better results from an LLM vibe coding against spec diffs vs a dev prompting it from a position of understanding the codebase and the requested change.
From your comments it sounds like you are oblivious to the whole problem domain.
The whole point of these tools and frameworks is to provide a high level description of not only what features are already implemented in a project but also and more importantly what features you want to implement.
To put it in simpler terms, the point is to write down high level specs in a way that coding assistants can parse them and implement them.
I recommend you research spec-driven development, a whole set of frameworks designed to put together specs in a way coding agents can roll out plans to implement features in one-shot or few-shot prompts.
You know, what specs are used for.
I have never seen a non-trivial product that didn't have the spec modified during development due to finding rare business edge cases when actual implementation happened.
It's a fantasy to think that all you need to do is give an agent a spec; every team I have been on over the last 30 years have needed to refine the spec while developing.
That's the whole reason for Agile in the first place.
That approach just doesn't scale unless you don't care about your product whatsoever, hence my comment
I've successfully been working on various personal projects with highly stable functionality since I stopped that approach and instead focused more into leveraging my codebase to become the spec.
Very low maintenance and great to add features after the initial hurdle of structuring your codebase that it's less about implementation detail and more about spec. So a lot less "smart" code and less highly advanced abstraction, and a lot more declaratively structured - boring and potentially repetitive, but easier to place good AGENTS.md and doctexts on areas which have requirements which may be unclear just from looking at the code for example. Then running multiple "QA" agents over the diff before you look at it yourself for the final review
So yeah, sorry but your imagination is running wild
Compilable source files served as example implementations, complementary to the specs themselves. The idea that code is spec is just agile brainrot, like the idea that clean code means you never need to write documents or comments.
I use detailed specs to implement but I don't maintain those specs as the source of truth afterwards, the code is indeed the source of truth.
I've built a library (and products on top) that takes in requirements (programmatic or various spec formats) and forces an externally orchestrated implement -> review -> fix loop that doesn't stop until all requirements are met.
So I'll write a detailed spec then I'll have GPT 5.5 implementing and a mix of opus 4.7 / GPT 5.5 / DeepSeek v4 pro reviewing at every phase until it produces the quality I want.
I can let it run overnight or just during the day while I'm doing stuff that doesn't burn me out and that I actually enjoy.
So tldr spec first for me but not as the source of truth afterwards.
I'll be open sourcing and launching soon https://engine.build
> "we never said this would work for everyone, it depends on the effort you put into it!"
it's perfect for extending the lifespan of a scam on someone because yes, obviously it's true that in general you can improve results by working harder, being more disciplined, whatever...
same here, functionally. of course there are better and worse ways of writing a spec, a prompt, etc. but i honestly think a lot of the focus on this is a way to divert attention from the overall ceiling of these tools in general.
in other words yes, it's not all bullshit, but there's a huge aspect of this "prompt/spec engineering" that in my view is a way to unconsciously buttress the "LLMs are going to 10x our GDP" mindset -- if those high expectations aren't panning out, there's always:
> "we never said this would work for everyone, it depends on the effort you put into the spec!"
I very explicitly stated that if your codebase isn't written like a spec you cannot effectively use it for LLM driven development.
You can still edit the code via LLM, it's just always going to be a highly involved process with very careful reviews unless you're happy with regressions.
1. Start with unmaintainable prompt.
Add auth to the admin route. It must only be accessible to team members. It must not be accessible to team members with the `quarantine` role. Unauthorized users should be redirected to the /login page, with query param `?error=unauthorized`. Requires fresh session, no older than 10 minutes.
2. Put it in a structured .md spec of some kind # Feat: Admin Route
## Auth requirements
* [ ] Accessible to team members.
* [ ] Not be accessible to members with `quarantine` role.
* [ ] Redirects unauthorized users to /login
* [ ] Redirect path includes `?error=unauthorized`
* [ ] Requires fresh session, no older than 10 minutes.
But how do you cross reference the requirements? Give them stable IDs? # Feat: Admin Route
## Requirements
AUTH.1: Accessible to team members.
AUTH.2: Not be accessible to members with `quarantine` role.
AUTH.3: Redirects unauthorized users to /login.
AUTH.3.1: Redirect path includes `?error=unauthorized`.
AUTH.4: Requires fresh session, no older than 10 minutes.
Might as well clean it up and make it machine readable at that point? feature:
name: admin-route
requirements:
AUTH:
1: Accessible to team members.
2: Not be accessible to members with `quarantine` role.
3: Redirects unauthorized users to /login.
3-1: Redirect path includes `?error=unauthorized`.
4: Requires fresh session, no older than 10 minutes.
```For me this looks pretty machine readable.
## Requirements
AUTH.1: Accessible to team members.
AUTH.2: Not be accessible to members with `quarantine` role.
AUTH.3: Redirects unauthorized users to /login.
AUTH.3.1: Redirect path includes `?error=unauthorized`.
AUTH.4: Requires fresh session, no older than 10 minutes.
Yes, yaml is more "parseble". But I still need to use some format conventions inside yaml. So I could just use them inside markdown as well.Specs are traditionally more forward-looking only because, by removing a lot of the implementation details that are required to write code, the specification can be written to be much broader in scope than code in an equivalent time period. But periodically we invent software that lets us automatically fill in more details of the software that now don't need to be specified by humans, and a level of specification that was previously ‘spec’ turns into ‘code’.
Your arguments create a false dichotomy. You look at it from consumer perspective, while coding and it's artifacts are usually done by suppliers. If you change camcorder to tv advertisement, the requirements shift. The human auditable directive and the outcome matter. Coca Cola probably has very high standards for their IP (the directive) and doesn't care about the outcome (AI slop ads). The result is disgruntled consumers.
If you don't care about the "opaque" generated artifact, then you are Coca Cola.
We are nowhere near this scenario tbh. Token cost is very high and is currently heavily subsidized by VC money to gain market share. Also this realistically only applies to small projects, small codebases and mostly greenfield ones. No way you can rewrite the whole codebase quickly and cheaply in any mid-sized+ projects
But even assuming token cost plummets, any non-trivial piece of software that is valuable enough to generate income for the company is also big, complex, interconnected enough that cannot be rewritten quickly even by AI, also for business reasons too. If a piece of code works, is stable and is tested, then rewriting it will always bring a high degree of risk and uncertainty that in a lot of business critical applications is just not worth it. A stable system can stay untouched for years besides minor dependencies updates.
Easy to forget waterfall in 1970s / 80s really meant teams working on their own for months and then realizing there is no way to assemble the whole product from the parts. Or that the industry has moved on and the product is obsolete.
Agile as "devs can do what they want" never really existed ;-) Managers always have to plan / T-Shirt size resources (time, devs) to some degree. For stuff that's really hard to break into tasks, the magic word is "the plan is to do a POC first".
Coming from someone who also doesn't like teams being asked to break their unknowns into 30 known tasks. It's a compromise... I agree with all your points on how Agile is abused / misunderstood. Yet i believe in the progress from continuous integration and regular demos to stakeholders as a sign we did change something....
> Agile as "devs can do what they want" never really existed ;-)
No real agile ever really exists in the end :)
But it's not devs not doing "what they want" that bothers me - it's the absurdly over-planned project estimates and timelines, with every detail of the project being specced out, not a lot of margin room for errors, invoking the name of "agile principles" as a way to deal with exactly things the PM's don't want to deal with in that moment.
I'd be fine with some degree of planning ahead, or starting with prototypes/PoC's, but such a huge part of the industry just chunks it into "same boat but we'll put agile stickers on the holes", and there is a whole industry of ceremonies around it, that it breaks the "core principles" of agile.
What a beautiful irony have we built :)
So, I "doodle" some text / ideas / planning with a calligraphy pen, and type in some code, occasionally, both mainly for the fun aspects. There are side benefits to both, too. Writing some plans slowly and "beautifully" drags them out and I get to think longer on them, so the sporadic "nice looking plans" are often more well thought. And doing the coding all by myself stops my brain from losing the ability. I was initially in the 100% AI-writes-all-code camp for a while and noticed I am getting notably slow in some personal coding skills. It is too early to treat specs as the new code and old languages as assembly (but I admit we might get there some day).
In other words, I think AI doing 90-99% of the coding, depending on the language verbosity and AI accuracy for the code at hand, is quite reasonable.
Professionally, I'm employed writing safety-critical avionics software. Superfluous amounts of cogent tooling putting guardrails on agents has enabled me to spend heaps more time to think deeply about how the software should work at a systemic level. The code by definition must be heavily criticized and battle-tested before it can go out the door to begin with. Albeit a beautiful part of coding, those sporadic bursts of creativity drive the code leaving my desk less and less, and I feel strongly that has made its quality paradoxically better since I'd spent much more time on broader implications and interactions.
My aim was to voice disagreement with the "code is spec" crowd, whom I think are using a different (and in my opinion tautological / useless / counterproductive) definition of spec. Probably because they are mad that I use trigger words like Vibe and Maxxing, and they assume I can't even read the code I'm shipping. I digress.
In your "time complexity is a downstream requirement" example, which is a great one, I think you would prefer to have well-maintained written documentation of that criterion that lives outside of the procedural code itself, would you not? How much attention that doc gets is a matter of process and preference, but I'm advocating it should get more (spec-first).
Structuring your codebase like it's your spec doesn't mean you're omitting the NFRs - they need to become part of your codebase too where applicable.
eg with your first example of response times - in javaland you can use annotations on the REST resources and RPC endpoints
or your second example with specific data that may not be cached via a combination of AGENTS.md in the directory of the module providing access to it, potentially doctexts on the data constructs in question and QA agents which run pre review and raise (and resolve) the inevitable violations. Which I may add will _also happen, and so much more frequently_ when you're doing "spec driven development" via markdown files.
The difference is just down to code being testable, and markdown/yml being purely "trust me, it's like this fr fs man"
However you want to structure your spec, just structure your code like that - it works way better, because llms can handle source code so much better then markdown files.
It's not going to pass the "enterprise architect" review job application of the last decade, but your project will work, much much better and with less input required by the developer... And your development process is less let's roll the Dice and more this isn't the spec yet
That is a huge mixed bag. It helps to have an high level overview of an algorithm. If you want to be very accurate about an algorithm, then that documentation is potentially harder to digest then the actual code. I personally don't even know how to express an sophisticated algo in natural language without falling back to pseudo code. Then you also have to verify a purely natural language defined algo with a very slow feedback loop (agentic PR cycle) against a test. Which seems to be a guarantee for myself to get sloppy because it's annoying.
There are a lot of reasons to theorize about algos in spoken or written natural language. For me it's questionable why you'd prefer the natural language to be the reference implementation when all you want is the code. That only makes sense if you never want to look at code.
- Does Claude leverage the trailers automatically, or is usage initiated by you?
- How often are you using the trailer lookups?
- Any idea how this relates to token usage? If you're frequently busting cache on old sessions, it might be cheaper to read a local doc.
Trailers hint is in my global CLAUDE.md so it knows: when debugging, saying something like "didn't we already discuss this in a previous session?" it will know what to look for.
I also have a manually invoked `/search-session-transcripts` that I can use to natural-language inspect previous session by day, project, session id etc. Claude often uses this skill to narrow down on parts of the conversation that are relevant to the current query.
> How often are you using the trailer lookups?
Mondays are usually the day I need to refer to previous sessions from the week before. Trailer lookups are also good for continuing buildout of adjacent features. They've also been excellent in incident post-mortems where the PR text and commit message aren't enough to gauge the "how" of decisions that led to issues.
> Any idea how this relates to token usage?
I tested this. Session-transcripts are append-only so `/clear` and `/compact` don't clear out old messages, they stay stable and accessible. I also don't clean out my `~/.claude/sessions` ever so there's a lot in there, but the info is valuable and cheap.
We usually use UUIDs for this type of object but we have to send those objects to the legacy system XYZ, which only supports IDs with up to sixteen characters and is case insensitive, so we generate sixteen character random alphanumeric strings with uppercase letters which provides 82 bits of entropy.
Could you go deeper? Sure. Why do we have to send those objects to XYZ? Why does the legacy system still exist? Why does it not support UUIDs? Why is there no secondary key specifically for that system? Why are we using UUIDs?
But most likely you do not have to spell all those out. The point of a why is to explain why something is not what one would expect, you explain on top of some common knowledge. Everyone involved might know what XYZ does and why some objects have to get send there. If not, that is probably written down elsewhere. Why is the system using UUIDs? Maybe written down in the design for the persistence layer.
> I use a mix of Markdown and Gherkin
Gherkin also has a Markdown based syntax that is not well known:
https://github.com/cucumber/gherkin/blob/main/MARKDOWN_WITH_...
I prefer that to the 'verbose' original syntax. MDG also renders nicely in code forges.
The general idea of "readable specification language" was an inspired one but it failed on execution - it has gnarly syntax, no typing and bad abstractions.
This results in poor tests which are hard to maintain and diverge between being either too repetitive to be useful or too vague to be useful.
The ecosystem is big but it's built on crumbling foundations which is why when most people used it most of them got frustrated and gave up on it.
Annoyingly there's a certain amount of gaslighting around it too ("it didnt work for you coz you werent using it correctly") which is eleven different kinds of wrong.
My recursive-mode workflow handles all of that and more and gives you full traceability: https://recursive-mode.dev/introduction
People do that? Actual professionals?
Just because very few devs are qualified at doing their fucking job, it doesn't make someone trying to use AI properly wrong.
> If you are saving 80% of time, you aren't actually reviewing the code.
The idea is that if you spend time in specification ahead of time, reviewing and validating will be easier and less time consuming later.
I haven't tried it myself, but the idea rings true to me.
Hell, you probably couldn't even build a simple bike shed from plans without having to revise them while building, so I am skeptical that without writing you are going to pinpoint the problems in the spec.
Reading only gets you a short way towards learning.
Neither have I. This does not make the spec useless. I don't spec hoping that it will be the source of truth, I spec because planning more often than not allows me to spot inconsistencies and ambiguity ahead of time, not halfway through implementation.
> Hell, you probably couldn't even build a simple bike shed from plans without having to revise them while building, so I am skeptical that without writing you are going to pinpoint the problems in the spec.
I think you are using specification and design wrong.
It's not supposed to be a bible that implementation can't deviate from. It's a plan, not law. It's okay for the plan to be adjusted in contact with reality.
It's still useful to know ahead of time constraints, expected output, assumptions, premises, etc.
For the literal code:
• A healthy cocktail of /WX + /Wall, plus clang-tidy with very few suppressions
• An extremely opinionated mix of clang-format and LLM-generated bespoke formatting that AST-based tools cant express
• Hungarian notation; all stack locals pre-hoisted, declared in order of appearance, and separated from subsequent assignments
• Enforced dataflow: all memory accesses are bounded independent of branch resolution, with only data-oblivious indexing
• Functions have a single point of return
In a C89 workflow, this pushes agents to produce code where wrong business/domain decisions are unmistakably obvious, while eliminating the vast majority of bug classes before I ever read it.
So yeah, Ill reassert 80%, if not more.
I like to do spikes to understand problem spaces before planning. The planning is then usually effortless and just to get in sync with stakeholders.
But in that regard AI coding is really backwards. We don't necessarily need hard separation of planning and coding, but we need a deliberate separation of experimental/explorative coding and the code that is supposed to make it into prod. AI coding does all that in the same place, I don't even want to know how hard it is to "fix" AI code that started on behalf of a completely wrong premise. AIs certainly don't have a good measure when to refactor something completely messed up.
This is a very good point. The AI speedup some PMs fantasize about is skipping planning and instead generate code directly from end user discussions, POC-ing our way into shipping.
Another point of view is that LLM:s perform to an extent on the same level as outsourcing does. This interface requires a bit more contract mass than doing everything within single team.
Therefore, you must not use humans for any important work.
> Therefore, you must not use humans for any important work.
Counterpoint, all important work done in the history of humans were done by... humans, and yet I see no pink elephants created as a result of telling humans "do not think of pink elephants".
Sure. The thing is, the waterfall guys would tell you it's impossible to do it in 2 weeks because you need to have written down everything first. "Thousands of pages" was the terms they used.
Agile guys would point you to the Agile manifesto which would lead you to "working code over documentation" and "people over process".
A 2 week period to go from initial spec to product in a user's hands to capture feedback and make changes from there is much closer to agile than to waterfall. In fact it's more or less exactly some older versions of Scrum (which didn't permit deviating from the planned sprint user stories midway through the sprint, instead changes influenced the subsequent sprint).
- System/Segment Specification
- Software Development Plan
- Software Configuration Management Plan
- Software Quality Evaluation Plan
- Software Requirements Specification
- Interface Requirements Specification
- Software Standards and Procedures Manual
- Software Top Level Design Document
- Software Detailed Design Document
- Interface Design Document
- Data Base Design Document
- Software Product Specification
- Version Description Document
- Software Test Plan
- Software Test Description
- Software Test Procedure
- Software Test Report
- Computer Sytem Operator's Manual
- Software User's Manual
- Computer System Diagnostic Manual
- Software Programmer's Manual
- Firmware Support Manual
- Operational Concept Document
- Computer Resources Integrated Support Document
- Configuration Management Plan
- Engineering Change Proposal
- Specification Change Notice
That said, we still do ad-hoc versions of many of these. For example, a system/segment specification today is an OpenAPI document between microservices. Most larger SaaS companies have the equivalent of a Software Configuration Management plan - Who can change terraform or a GHA, what are the standards that they conform to (linter, peer review standards).
The point I was trying to make is we should be diving back into the older methodologies and accumulated wisdom and re-evaluate some of the older dead ends with new context.
Have you seen any pink elephants?
My point is that without writing, you can't surface the type of problems you usually surface. The AI isn't going to surface those problems for you.
It's rare when reviewing that you think "Oh shit, this approach is totally wrong, we need to throw it away", while it's common when writing code to have that reaction.
If you aren't writing, you aren't having that reaction, and you aren't going to get it from reviewing code that has thousands of green "passed" lines in the testsuite.
That's not my experience. It is actually very common for specifications and design to be reviewed and improved.
I think there may be crossed wires here - specs and designs are reviewed, but I've never seen a code review result in a spec+design review, while I always see spec+desiogn review happen during the "writing code" phase.
In short, reading code does not result in a spec+design review, writing code does. If you are not writing code and only reading it is unlikely you will trigger a spec+design review.
Yes, a government process meant to implement the waterfall approach.
If you look at Dr. Royce's paper which originated the concept, he was very explicit that it required upwards of thousands of pages of documentation to be written up front, if you were doing it "right".
By the time the required documentation had all been written, there should be essentially nothing left to do but to actually type out the punch cards as specified and turn then into a system of compiled programs.
Now, this appealed to government because it put documentation in place that was felt to be more viable for contracting processes, but ever since Dr. Brooks chaired a 1987 Defense Science Board study on the issues already facing the DoD trying to implement waterfall methods, they've been trying to restructure their software acquisition methods to pursue better outcomes rather than more concretely defined outputs.
Of course it's still a tremendous challenge for them even now, and it remains common to see defense acquisition projects that will say "Agile" to the right people even as they prescribe a full waterfall-style 'system engineering V' approach behind the scenes.
The ad-hoc responses that the commercial space often involves is usually more appropriate, believe it or not. They get process added when process is helpful, but not before it is helpful.
(and my link to the Royce paper isn't working anymore - I need to fix that!) - I am planning on a followup that takes the last 3 years of change in mind.
Yes, that's why his paper essentially said "you're going to have to build two." One to figure out the mistakes you can't predict ahead of time, and the second for the real deal. Do your best to get through the first one as fast as you can, but still deliberate enough that there won't be any bugs left behind for the second one.
But a third or subsequent iteration was definitely a failure in his mind, and even building two (or one-and-a-half, depending on your framing) was simply a concession to the reality that actual implementation would run into unpredictable issues, for much the same reason computer science had already learned the halting problem was undecidable.
I have a book with his paper and to the extent he speaks of iteration as desirable, it is only iteration between succeeding steps of the overall 'waterfall'. E.g. in an ideal world you iterate between system requirements and their decomposition into software requirements (updating the system reqs as necessary to ensure the software reqs you're writing are accounted for). Likewise for system requirements to software analysis, and so on.
As you point out, he mentions that this concept is “risky and invites failure”, and goes on to allow for re-refinement and re-implementation of the software requirements and program design phases based on experience from the testing phase. But he goes on to emphasize: “However, I believe the illustrated approach [waterfall with reimplementation post-test] to be fundamentally sound”.
The rest of his paper then goes into the detail of these phases, and he specifically notes early on that there is a natural question, of how much documentation is enough? And he gives a very clear answer: “My own view is ‘quite a lot’; certainly more than most programmers, analysts or program designers are willing to do if left to their own devices.”
It's not an accident that the DoD software acquisition requirements based on waterfall as mentioned by the other comments were so numerous or onerous. As Dr. Royce puts it:
- “The first rule of managing software development is ruthless enforcement of documentation requirements”
- When asked to review software projects the first thing he does is review the documentation. If the documentation is seriously lacking his recommendation is to replace the whole project management and shift 100% of work to fixing documentation.
- “Management of software is simply impossible without a very high degree of documentation”
- If procuring a $5M hardware device he'd expect a 30 page spec to suffice. If procuring a $5M software system, he'd “... estimate a 1,500 page specification is about right.”
I wasn't pulling "thousands of pages" from thin air. It's right in his paper and he's extremely clear about this. It's not an off-hand remark, he goes on to justify why he thinks that mass of documentation is required.
I want to emphasize that he's writing from the problems he was facing in his era. Computer systems necessarily were room-sized installations, interactive computer time was incredibly expensive, but paper was cheap. There was no Internet to speak of to share powerful and efficient open-source libraries. There was no "continuous deployment" or "continuous integration".
The system had to work well pretty quickly after the subsystems were built, installed, integrated and tested or this newfangled computer system that cost millions in 1960s dollars to run per month would be nothing more than a money sink while the nerds tried to troubleshoot.
Nowadays we don't develop under those kinds of strictures and we've put tremendous investments into allowing real useful systems to be developed using the simpler processes that even back then were much easier to develop around, when it could be used (Dr. Royce's paper even leads off by describing the 'nice' process as he explains why you can't use it as system size grows). The voluminous test documentation he's propose are things we pretty much do write today, but we call them test suites and we grow them along with the program, rather than write them all months before coding.
I think there's a lot to be said for what a modern-day waterfall process might look like with the technologies and iteration speeds available to us now, the only problem is that I think it will still resemble agile more than it would resemble the process Dr. Royce described.
If people truly understood what waterfall is and how it's supposed to be carried out, they'd be less apt to recommend it. Nothing prevents a team from employing planning in an agile effort, but doing this doesn't turn it into a waterfall project and you shouldn't describe it as such.
If anything, teams that refuse to use agile (thinking it inherently means meetings, story points and not looking beyond 14 days) often send up choosing something even simpler, like cooking up a simple design doc of 4-6 pages before implementing it.
But that's still not waterfall, it's just another of the infinite renditions of agile methods that are out there, just without the consultancies issuing formal training certs.