I created a video that talks about this in more detail:
Someone is an optimist! I'd estimate those significantly higher, and even worse if you are in a field that has to do any sort of SOC/HIPAA/GDPR audit
That's the theory anyway.
It also belittles the human resources. "I heard that 6 months AI will do everything, so why would I hire new engineers or promote the ones we have?"
Did I do that right? ;)
Anyway, AI maintenance can be a time saver if the maintenance is easy. Like upgrading a dependency and all you really need to do is fix imports on five hundred files and modify a method or two. That would have been time consuming, but it's not hard. I think the OP has hit a good point though. Writing code is the fun part, not the bottleneck. The pain is the maintenance, so let's apply the AI there and keep having the fun to ourselves.
When an LLM provides you with an overconfident piece of writing with no sources to back it up, what do you do?
You draw made up lines on made up plots and call it evidence, obviously.
When I use AI tools on existing codebases - understanding unfamiliar modules, generating targeted refactors, writing migration scripts - the maintenance burden genuinely drops. The AI is working on code I already understand architecturally, so I can evaluate its output quickly.
The problem shows up when AI generates greenfield code that nobody deeply understands. That code still has to be maintained by humans who did not write it AND did not design it. At least with code another human wrote, you can reason about their intent from naming, structure, and commit history. AI-generated code often lacks that legibility because the "author" had no persistent intent across files.
The article is right that we need to measure maintenance cost, not just velocity. In practice that means tracking time-to-understand and change-failure-rate on AI-assisted code vs. human-written code over months, not days.
An agent can reduce typing while increasing the number of things nobody really owns later: rationale, invariants, tradeoffs, half-meaningful tests, files that changed because they were nearby, etc. The PR can pass and still leave the team with more intent to rediscover.
The useful agent workflows I keep coming back to are less about "write more code" and more about making every change come with a maintenance handle: what invariant changed, what should fail if this is wrong, what files should not have changed, what rollback looks like. It feels slower in the moment, but it gives future-you something to grab onto.
I shipped a small Stripe storefront with Claude Code over the weekend. Three pages, four integrations, one database. At that size you can read every file before merging, and that's basically the lever. Shore's argument really bites when you can't.
The thing missing from his model, I think, is project shape. Bounded greenfield has a maintenance ceiling because the code itself does. A long-lived monolith with the agent extending it is where the math gets ugly.
We have had outages increase in tandem with lines of code shipped and outages are getting more and more severe. Yes we have improved much old code, deleted more old code, can automate code modernization, can better diagnose issues, have more options for mitigations, etc.
But all that has not offset the sheer magnitude of code being shipped which no one really understood.
Unfortunately, maintainability is simply bucketed as a "non-functional" requirement.
Maintainability (and similar NFRs) should actually be considered what preserves and enables the delivery of future functional requirements -- in contrast to framing non-functional requirements as simply "how" the software must do what it does vs. the "what"/functional requirements that "actually matter".
From that standpoint, if a steady flow of features/improvements is important for a project, maintainability isn't really a non-functional requirement at all, and amounts to being a functional requirement, in practice, over anything except the shortest of time horizons.
Seeing software as investment avoids speaking about "technical debt" by speaking about "value", a liability just being an asset with < 0 value. When software exits the high-margin world of yesterday it needs to develop a precise definition of what software deserves to exist, economically.
1. software doesn't only have tech maintenance - there is also user support and it increases as software grows.
2. I'm not convinced maintenance costs scale linearly. And even if it scales linearly, you will eventually get to a point where maintenance takes up all your time.
I wonder if AI could make code reviews more presentable.
for example, with human code reviews, developers learn quickly not to visually change code like reflowing code or comments, changing indent (where the tools can't suppress it), moving functions around or removing lines or other spurious changes.
And don't refactor code needlessly.
also, could break reviews up into two reviews - functional changes and cosmetic changes.
That makes reviews a lot easier. The review starts from "nothing should be changing" and then reviewers can pattern match on that.
Otherwise, the reviewer is re-evaluating every line of code to make sure nothing has changed. That's really hard to do properly.
The version control systems I've worked with have allowed queues of changes, each one reviewed independently. As I'm developing, if I need a refactor, I go up a commit, refactor, send out for review, rebase my in progress work and continue.
I send out a continual stream of "CLEANUP:" "REFACTOR_ONLY:", and similar changes with the final change being a lot smaller than a big monster of a change.
Your reviewers will appreciate the effort.
Plays the metric game (if you're working in that type of org) without being evil too.
https://github.com/nWave-ai/nWave
They have /nw-buddy to point you in the right direction
Very nifty
Also killed readability in general to the source files, apart from making sense of the review changees.
If you could wish for a codebase, which codebase would you wish for?
If you think a second on that question, you’ll realize you probably not wishing for a super feature-rich one, but an easy to understand one, quite close to what you have now. One that is easily to maintain and extend, depending on the upcoming business challenges.
Code bases that you "work in" (maintain, etc) solve real world problems, and solving those problems should trump cleanliness every time
Codebases that are clean are typically showcase examples that sit on a shelf to be admired and appreciated.
Only if you value your time more than the users’ time and your fellow developer time. Code are run and read more than they are written. You may need to do some hacky coding, but they should be small in scope, surrounded by warning, and have a ticket filed for properly resolving the issue. Otherwise, it’s not worth it.
oh, you sweet summer child... I wish I had one of those
Edit: I make it sound a bit simple maybe. I do more extensive redactors also, where I'm more involved and opinionated. But I don't feel the need to do that very often very deeply. But yeah sometimes it's definitely necessary to prevent the project from going off rails.
If you want to build well-architected, well-tested code or pay back debt, the LLMs make the world your oyster. And it's easier than ever since LLMs have no problem doing ridiculous cross-cutting refactoring that you'd never have done on your own.
That LLMs essentially lead to code that's harder to maintain, or that human-produced code is easier to maintain by default just aren't claims I'd sign off on, and TFA doesn't try to render the argument.
I'd argue the opposite since LLMs make it trivial to plan arch/tests for all the code changes you wouldn't have had the energy to do it for.
I have reduced our response time on our api to 30ms from 80ms and gotten a setup we can comfortably grow into.
I had not had time to track down these optimizations without Claude code.
AI tooling can also be a place where we start building our view of what maintainable software practices look like so we don't make decisions that have these same tail effort profiles. That can be things like building out tooling to handle maintenance updates
I think the real thing that comes out of AI tooling is probably that the tooling needs to be trained (or steered) towards activities that enhance human attention management.
But say you have that. Then you have great profiling. At that point you can measure correctness and performance. Then implementation becomes less of a focal point. And that makes it a lot easier to concede coding to ai
The AI will then be middle layer that will iterate until tests pass.
Layer 1: Specs (Humans)
Layer 2: Code (AI mostly)
Layer 3: Tests (AI + human checks).
Some of our developers are overly aggressive about using AI and I've started going down that path because I need to keep up and actually enjoy the flow of working with AI in my IDE.
I put a lot of work into keeping my area of the codebase understandable and coherent but I do not see that from the others on our team. I'm not perfect but I and extremely sensitive to incoherent, or un-grok-able at a glance.
Anyway, I like the novel (to me at least) framing of this article!
Instead, simply be a professional and fix what is needed, while working on non-developer visible tasks.
Don't allow people with zero clue to make those decisions. You are the expect. Make the decision.
I get that most of the cost is in training and not inference, but I don’t see how models stay useful once the worlds software updates in a few months post training since the models can’t learn without said training.
Are we just going to have shops do the equivalent of old COBOL shops where everything is built to one years standards and the main language/framework is mostly set in stone?
So:
* You get paid less. * The company might pay a similar amount due to LLM costs. Although, it could be more or less as well, depending on how it works out.
A couple of years ago, I saw a story of a guy writing two articles for a website a day. The boss asked him if he wanted to transition to AI-assisted writer for less pay. He said, "No." After a couple of weeks, he got canned. He checked the website out, and it had a bunch of AI writing on it.
LLMs are there to reduce your salaries and increase the businessowner's profits. Bigger inequality in wealth, it's only going to grow more and more. Also, a ton of people fired across many different fields.
That's a pretty old economic idea, and it will be interesting to see if it holds up in this instance. I have no idea how this all plays out. I do think it won't be one size fits all though.
The incitives for remote LLMs are off with providing defaults which optimize for maintenable sound architecture though. Same way Claude is going to produce overview of the indexes of the summaries of comprehensive reports, no one is going to read. No doubt this feels like excellent KPI on how much output was generated.
That being said, the problem the author talks about is more of a self imposed thing than everyone is going to suffer thing. The author correctly points out the startup scenario, where its just "get this damn thing to work somehow so I can see if there's market fit and nab some customers". That scenario has typically always come with higher maintenance costs down the road because quality is (rightfully) lowered in the name of speed to see if there's a business and if there is, get it going.
Also felt like the author was reluctant to talk about how AI can actually help with the maintenance part. AI can be great at fixing old dependencies and annoying bugs (with human guidance). Those tasks can feel like toil for software engineers and the kinds of things a software engineer will want AI to help with
But AI does indeed make it faster and easier. Why sift through the changed lines and hammer down the edge cases yourself if you can get an AI to do it?
It starts out that way. In the beginning, before they trust it, and before they've learned to prompt it to get the results they want, they use it to automate some tedious bits, but humans still create the initial implementation or pattern and then have AI fill in the gaps. More like turbocharged autocomplete than a sea change in how they write code.
The more people work with AI, the less they worry about the actual code it's producing. I'm not saying this is a good thing. It can introduce bugs, performance problems, security holes. It's reality, though. AI code produced a bug? Tell AI to fix it. AI code is bloated and hard to read? If you care, tell AI to fix it. A lot of people don't care.
When humans are removed completely from code maintenance, the need for maintainable code isn't there anymore.
We're not 100% there yet, but that's where we're headed. For a lot of companies, it's worth the risk to YOLO it because it's already good enough.
I don't personally trust it enough to stop reading the code it produces, but I don't read every single line. I pay more attention to the tests than the code under test. I pay more attention to parts of the code where performance matters. I guide the overall structure. But whenever any of that doesn't meet my standards, I'm not the one who's maintaining it. I just tell the AI to fix it.
Maintenance costs aren't on my radar when maintenance is this cheap.
- productivity isn't the be-all end-all, it's just one metric and a consultant's mantra. taking a productivity hit can be more than fine if it gives you a tactical/strategic advantage or opportunity.
- i'm not convinced at all that agents will become prohibitively expensive. that's indeed some companies' wet dream but a) good cheap competition is emerging and b) you don't really need the latest models or massive computing power to get shit done.
i do agree though with the emphasis on code quality and debt, and for not blindly going for the silver bullet fad and throwing money at it like there's no tomorrow in the hopes of some "productivity figures boost". then again i doubt that companies going for that would heed such advice, we've been there many times.
Some of the maintenance costs you mentioned are primarily read-only, slam dunk AI use cases. Input from AI to diagnose bugs, trace data flows, and help with reasoning. Tests are something of a gray area in the sense that they are not read-only but they don't affect the logic of the app itself.
The "write" use cases (you mention refactoring and the author seems to primarily focus on writing code) is where the author's point seems to be primarily aimed at.
Definitely agree on the read-only improvements to maintenance. Those are unquestionable slam dunk, high value improvements.
Which is why I think additional intent preserving abstraction is where software coding agents are likely heading.
I'm being completely serious. By giving it some kind of distinct name, you are giving license to it being ring-fenced and de-prioritised by someone who doesn't (but, arguably, probably should) know better.
Quality matters. It hits your P&L very quickly and very hard if you don't maintain it. So it is as important as any other factor.
If it does end up as a ticket that means you did something wrong in the implementation of a feature and that lack of quality got noticed. A business person is not qualified to weigh in on the importance of these things. All they know is "The page needs to load in $time", "The data should stay consistent", ...
You are the one who determines the how and quality maintenance is on you.
I treat it like housekeeping and treat features like hosting a party. Guests/stakeholders are people who want what you can make. The party is the feature they want.
They don't care whether it was difficult or easy for you to clean the house. They just assume keep your own house tidy ... and they know you don't when you only host once a quarter instead of once a month.
They assume you're a functional adult who manages his own space.
Tech debt is like that.
Thus - the business folk don't get a say in whether it's in the sprint - cuz it's not "the party". Instead it's your Scrum Master or whatever saying "hey kids - clean the mirrors and Jane this time you're sanitizing the toilet."
Right! The unfortunate thing is that many software companies don't seem to think much further than a quarter ahead, not really.
Sure they might have a product roadmap that extends for a year or two into the future, but let's be honest. Often that roadmap is mostly for sales purposes, not engineering planning purposes. Product and engineering will pivot if sales slump. The earlier in the company's lifespan, the more likely this will happen often
However if companies get out of this startup mode then they should start to stabilize... But many don't. They continue this pattern of short sighted short term planning, which means product stability remains a low priority effort.
Ultimately I guess many companies just either do not have the resources to build good software or do not actually care to
I would say there are two types of software debt. Dead code/useless features not providing value, which is in my experience easy to excise and rarely really causes issues, and the badly written code, which is providing value.
The term software debt is usually used in the second case because you have to pay it for delivering new value to the product. The debt was introduced in the past when another value was being added and it was added in a way that increased coupling, reduced cohesion, etc. Let's just say that it was implemented badly, for many interpretations of badly.
Now, if I want to add this new value then that previous bad implementation is causing issues, because of unsolicited side effects, of difficult to reason about logic, etc. So, now, I have to pay the debt, by refactoring, or by changing the logic in tens of places (which further increases the debt).
Tech debt was never really about functionality that does not need to be there and is not providing value, because you seldom pay it, or it's relatively cheap to pay it. The problem is when you have to rework the old feature, still providing value and should continue to do so in the same manner, just to deliver this new value. That's the expensive tech debt.
People already see their efforts as an investment, but that doesn't stop Debt from accruing over time. There will always be parts in your software that could have been written better, and that's Debt.
This has been possible already but from my vantage point, it doesn't look like anyone really did it? Sure, there already exists tons of OSS that is built for this case, even before AI, yet it seems to me to always come back to incentives. IMO, there is no incentive to write maintainable software (and I'm not sure there ever will be one at this pace). Businesses are only incentivized to write enough software to accomplish the task within their own defined SLAs and nothing further. But even that doesn't seem to be a blocker at this point if Github is used as an example.
Good software comes from people who care deeply about solving the problems in way that they are invested in. If your employees don't care about your product, you're already starting on the wrong foot. AI isn't going to incentivize bad-average developers to write better software or a good developer to push back harder against their clueless manager. When they make the decision, AI might help (assuming it doesn't make a bigger mess) but it's not going to reduce technical debt in any meaningful way without a sea change of perspective from product managers around the world.
So far, I just don't see it happening in theory or in practice. I hope I'm proven wrong!
Sales does.
If the developers say something like "well this will take 3 weeks longer to do correctly". They'll be replaced by the code monkey that can turn a POC over to the customer in 3 days.
Reality does not live up to our dreams.
You gotta use the language that folks speak in order to get your point across.
Well, I haven’t met any experienced developers that say this sentence. It’s career suicide in most cases. More often, an agreement about scope and delivery is reached and workload gets reshuffled. Customer battling with POCs to get their tasks done are soon to be ex-customers.
This pattern is only going to become more extreme year after year. I used to reject the idea that LLMs could produce useful code or debug things, but these days, we have Claude Opus and chatGPT Codex. And just around the corner, there's Claude Mythos. I believe it's ready to go out, but they are scanning OSS to give the code underneath it all a head start to fix the types of security issues Mythos can find before releasing the product. Otherwise, we could be talking an LLM jailbreak into a "scan this popular Java logging library" or "this popular OS operating system, Linux, for security flaws." If they didn't do it this way, there could have been a lot of damage to PCs, companies, government, bureaucracies, and institutions in general.
Right now, the name of the game is making sure your LLM has a good action plan before letting it attempt to fix a bug or refactor or add a feature. Devs with more experience know what to ask Claude to do whereas a greener dev doesn't know the questions to ask, leaving Claude to guess right sometimes and wrong sometimes. Simply put, if a dev doesn't know to ask for something, there's a bigger chance the LLM won't care to do it. And if there's some nuanced, tricky aspect to the code not described to the LLM, the LLM might burn a lot of tokens to reach a bad solution. A good dev might give more clues and hunches and more context to fine-tune the prompt so that it almost definitely succeeds whereas a greener dev doesn't have intimacy with the system yet, needing tips and descriptions of subsystems themselves before they could pass it along to Claude. By this stage, people also differ in their skills with the various tools in the ecosystem. Power tools do a lot more in the hands of a seasoned handyman than in the hands of an eight-year-old after all.
However, the better LLMs get, the less differences like this will exist, and ideally, every dev will approach a similar amount of productivity. Salaries aren't reflective of how much profit a worker produces in the company (unless you are the CEO or a little below them or maybe have some stock). Supply and demand drive salary. If something nearby AGI arrives tomorrow, by definition, almost any two devs will provide similar value at which point teams will downsize, yet productivity will hold steady or increase. They will downsize to save some money since we live in a brutal world where workers have no loyalty to a company, and a company has no loyalty to its workers. Pensions are a relic from the past. After all that will happen, a large group of qualified devs will be searching for jobs, so they can remain in a home with food in it. Companies will see tons of resumes flowing in all by AI-assisted devs that can do the job.
The companies will then do two things: Offer the hired devs a transfer to AI-assisted dev for less money or else while also interviewing all the ones that all the companies fired, giving them that same AI-assisted dev salary to everyone in the picture. And they will have calculated the proper discount off the old full salary using some kind of economic equations. Then the wildcard happens: some of the ones needing work urgently start to offer their services for even less than the company is. It's a spiral downward until the salary becomes so low to the point where a dev would rather be an ex-dev doing something else that is more relaxing and also still paying them enough money to survive. No reason to do tough coding work, it'll still be tedious with stronger LLMs. Comfort and relaxation will prevail for many as they no longer feel the salary justifies doing the work.
And at the top, assuming AI costs do not exponentiate, they will be making more money than ever before since they downsized teams, slashed salaries, and got hired at even lower salaries than the slashed salaries. (There will still be a premium for knowing the systems like the back of your hand without need to ramp up before adding value, so you'll get paid more than a new hire.)
Write tests. The most boring activity on the planet
There is problem solving in coding, but the bigger problems exist at a higher level and that’s still on you to solve.
Also I’ve been messing with “ai-only” files recently. You make a markdown file that basically tells it what the file does, how it’s used, and point to an API contract in some other file. Then you can run async ai that will try things and only submit a PR of all the tests pass and the perf improves. The files become almost unreadable to be, but I decided to embrace it because they were already unreadable. But so is the output of, say, the protobuf code generator and I never had a problem accepting that
AI does nothing for me there. Coming up with the hypothetical solution and having AI build it does nothing for me
I like to build
Translating those into executable tests is a different process entirely, and one that in the past has been done (often quite poorly) by humans, and it's actually pretty well suited for AI.
Probably more boring still, though.