For example, actually doing a walkthrough of how to set up these allegedly super powered workflows and concrete demonstrations.
I’m not an AI skeptic. Rather I’d don’t want to miss out on any actual super powers.
And if I am working on an existing codebase then isn't a good commit often a negative sum between added and removed lines? I don't want to bloat my codebase but make it more polished and elegant. After reading that I wonder if what they have done could have been accomplished for a far fewer LoC budget.
It's like the difference between doing stock price predictions with binary "up" or "down" histories and trying to figure out how to normalize actual price histories (basically impossible). The binary work gives a well-defined signal.
That's an insane level of throughput. What's a good baseline? Prior to agentic coding, whats the typical number of PRs engineers were expected to push? Maybe a 2-10?
Do people feel the software has gotten better in the last 6 months? The number of engs is prob the same so we should expect maybe 5x faster cycle in major software apps, but I don't see it. The AI apps do change very fast but given its a very new field, I'd expect as much. But outside of that, I don't see it.
It's interesting this was submitted to HN over 15 times since it was published in February: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
But this is the only submission that's had any traction. Since the content is nearly the same for all submissions, it highlights how getting to the front page can be a bit random. (Though this is the only one that capitalized 'Leveraged' so maybe that's the secret)
Another tip is to condense the doc files into the minimal required. Sometimes I’ll end up with 5 to 6 floating around in various states of staleness. Condensing to 2-3 and removing completed tasks seems to help a lot
- Give Claude/Codex a way to verify its own work (browser, smoke tests, e2e tests, high-fidelity local environment)
- Keep all context (issue tracking, docs, ideas, plans, worklogs) in-repo (https://github.com/shepherdjerred/monorepo/tree/main/package...)
- Give Claude/Codex access to observability (Grafana, Prometheus, Tempo, PagerDuty)
- Have Claude/Codex follow good engineering guidelines like fail-fast, type safety, parse at boundaries
I haven't yet been able to achieve full autonomy due to cost and CI load on my homelab.
I've found this to be really helpful, e.g. "you did this last week, and now some other thing is happening" or "you tried this approach before to solve alert X but it didn't work" -- except it can discover this itself.
https://github.com/shepherdjerred/monorepo/tree/main/package...
I've also used it to store TODOs and plans. For example I might want to explore some idea and defer it for later, or some weekend have it execute on some tech debt I've put off. One last use case is asking "what did I work on in the last 2-3 weeks, is it healthy, and what additional quality checks can/should I do; is there any follow-up work?"
Many times those updates are not properly tested, for example in one update the model selector got completely changed.
then next hotfix was pushed which restored original.
A. The code is absolute garbage and is speed for speed sake B. They’re using an internal model that is a generation beyond GPT 5.5
I say this because we’ve attempted to do something similar using the latest gen Claude models and a significantly larger team. The code is probably along the lines of millions LoC but is an absolute mess because of vibing. There’s a price you pay for speed
Additionally it’s an internal tool, which is likely much more amenable to slop.
Just like .vimrc and .zshrc, the harness "code" itself can be easy and personal. Provided that it's built on working and existing construct such as tmux.
Forcing readers to wade through an unceasing string of LLM clichés demonstrates the opposite of the point you’re trying to make—that the consumers of your work are worse off because you exercised no human judgment in creating it.
This is such a common thing among software engineers nowadays that I was very surprised that OpenAI would open with that line as if it were mind blowing.
But then I saw it was published in February and OP is just reposting it to farm karma.
If you're a more senior person in tech, this post is effectively saying that a large portion of your skillset is about to become completely worthless. This goes beyond the skills involved in writing the code. Everything that you've learned over years about how to determine whether code is good or bad, and what practices make an engineering team effective is not just obsolete, it's fundamentally counter-productive because it assumes a slow, human-centric process that requires you to actually review and understand the code. Even your ability to mentor junior engineers is now obsolete, because all that experience you've built up is now worthless to them.
If this is the approach the industry takes, particularly when combined with a lack of interest in quality from the business (and let's face it, consumers have shown us that they're happy to pay for cheap crap), it's hard to see much of a future for software engineers. You don't need thousands of people with deep technical expertise, you need a handful of manager-types, who will focus on defining product and business requirements and configuring how the AI gets enough context to implement the requirements.
Maybe, if we're extremely lucky, there's so much demand for software that total employment doesn't fall off a cliff, but the nature of the work will change so much that many older, more expensive engineers will become unemployable. Those who remain will have to accept that the skills they spend decades developing are now worthless, that younger engineers no longer respect or listen to them, that the business no longer sees them as experts worthy of respect, but old fogies who grew up in a different world.
Joe Biden liked to say that a job is more than just a paycheck, it's part of your identity and your sense of self-worth. We're all very used to a certain level of respect (and commensurate remuneration). If you don't think that's true, compare how a software engineer is treated to how a warehouse worker is treated. What happens when we lose that?
You end up with about 3 lines added per commit, which is not ridiculous when you consider that most would be editions rather than full additions.
Here, we have 1500 PRs and 1M LOC, which is about 650 added LOC per PR. Remember, not 650 lines total in the PR, but +650 balance after additions-removals.
Fun questions for attentive readers:
- What does a project growing at a rate of one full firefox-codebase worth of LOC per year look like, a decade down the line?
- What does the line count say about the verbosity of the tool, and what does it say about outcomes that the purpose of the project isn't clearly disclosed?
- Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?
- If it was confirmed that LLM usage blows up your line count, what's the implication for codebases that want to return to manual coding after months of usage? (Say, because the tool gets expensive).
For some reason most of the uses of "agents" are to build yet other AI products, it's turtles all the way down. Maybe that says more about the field of harnesses than it does about the power of "agents".
There is of course another sense in which the output quality is the only thing that matters. “Can I use agents to build a 1m line codebase that I want to maintain going forward.”
I take this as being exclusively a tech demo of the former. Quality (feature velocity, bugs, scalability) is not demonstrated.
Agents help a ton with the discovery, but the act of building a product needs a deeper level of thought and validation to make it actually better than what came before. So IMO what you see is people still learning what needs to be understood and crafted first hand to make a product better (including economics)
We’ll get there if more of us try
If you think about it, successful products rely on designing well-thought-out experiences, customer discovery (see all the Forward-Deployed Enginneer job listings at OpenAI) so the code velocity somewhat becomes irrelevant.
If you’re solving the right problem and you’ve got a good team then competitive advantage comes from somewhere OUTSIDE of code velocity.
The more important question I think is does faster code yield more value long-term? At the moment, it’s like yeah we do 3.5 pull requests per day.
I’m thinking, great, good for you. You could also combine three pull requests into one and then you’re doing 1 per day. This is quantitative data that doesn’t really mean anything tangible.
To what end and what would that even look like though? Enshittifying everything at maximum speed? The apps/platforms I use regularly - GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.
What if AI lets you create new versions of those tools, but without the enshitification?
I say that being in the "soaking" stage of using AI to rebuild a shitty software project in 70KLOC over about 2 weeks of spare time, so this may not be as theoretical as you might think.
This almost reeks of "I've never cleaned up our code base because there is too much code, and didn't even bother having agents/LLM cleaning them up".
You almost never need a million lines of code - this includes your software, infra, testing and operational tools. You didn't ship the linux kernel in 3 weeks and you know it. The code is already speghetti and it achieve the basic functions OK but it will harder and harder to simplify and untangle and maintain.
I'm not convinced of that.
I watched a video of an architect using AI to create architectural drawings. It became very clear to me that he has a lot of skills and terminology that helped him produce something very specific, in a few minutes. I've been working on some home improvement stuff including a studio/shed and I've struggled to produce even something simple (currently trying to get a conversation packet on the roof trusses to take the the permit department to get started). Even with my high school architecture class.
After watching that I wonder how much of what I'm doing with AI that looks easy is because I hae a deep technical knowledge, plus 3 years of heavy work with AI.
It's just that creating great software isn't really the SV/VC/big tech business model or main goal.
I'm not sure I fully understand what you're saying here. Isn't the value of these tools almost entirely independent of their actual software? That is, we have many good open source, self-hostable forges (Forgejo, sr.ht, etc.), lots of great music player software (Jellyfin, Symphonium, etc.), and decent maps software (OsmAnd and Organic Maps). People use GitHub, Spotify, and Google Maps -- perhaps even _put up_ with their often bad/glitchy software -- because of network effects (all three) and content/licensing partnerships (Spotify/GMaps). That proprietary data isn't something AI can help you with, right?