Vibe coding and agentic engineering are getting closer than I'd like

Vibe coding and agentic engineering are getting closer than I'd like(simonwillison.net)

787 points by e12e 11 days ago | 885 comments

u8 11 days ago |

The disconnect for AI is that it is a jagged frontier and it only really shines when one of its jagged frontiers extends counter to one of your valleys.

If you've been writing Perl for 30 years, you might not want to learn JavaScript just to make a little fun idea in your head to show your wife. Vibe code that shit man. Who cares? Your wife does not care about LOC or those internal design decisions you made.

If you're trying to learn something new like an algorithm, protocol, or API write that shit by hand. You learn by doing, and when you know how the thing works and have that mental context, you will always be faster than an AI. Also, when did we stop liking to learn? Why is it a bad thing to know all the ins and outs of a programming language? To write and make all the decisions yourself? That shit is fun. I don't care if you disagree.

If you're at work and they really care about getting something out of the door, do whatever you think is best. If you just wanna ship vibed code and review PRs all day, all the power to you. If you wanna write it by hand, and use AI like a scalpel to write up boiler plate, review code, do PR audits, etc... go for it!

A hammer is a really great tool that has thousands of purpose-designed uses. I still prefer my key to get into my car. It's all tools, you are a person.

A lot of this stuff if coming top-down from people who do not have the experience you do. Wouldn't a smart employee use their expertise to advise the organization? If you work at a company where that would not be okay, maybe it's time to start looking for another firm.

latexr 11 days ago | |

> Also, when did we stop liking to learn?

I suspect it happened when we achieved a level of such constant stimulation (there is a pocket computer always on us with infinite effortless distraction) that we’re never bored and never engage the default mode network.

https://en.wikipedia.org/wiki/Default_mode_network

https://www.youtube.com/watch?v=orQKfIXMiA8

When you’re bored, your mind goes to places it wouldn’t otherwise go. Curiosity kicks in. Curiosity is a precursor to learning. Learning engages the brain and is fun. But it’s not fun all the time, some of it is challenging and frustrating (which is good, that’s the process that teaches you).

When you have the digital equivalent to infinite candy and the brain equivalent to a sweet tooth, it’s hard to resist the siren’s call. The consequence is the brain equivalent to a stomachache—depression and loss of meaning—but unfortunately it doesn’t hit you the same way so you don’t make the immediate connection to make yourself stop. When you think about it, it’s ridiculous from several angles: the candy is infinite, it’s never going to run out, so you don’t need to gorge! But then we justify ourselves as only a true addict would, that while the candy is infinite, the flavours are limited editions and always rotating, and what if I miss that really good one everyone is on?! Then you miss it, is the answer. No one will be talking about it in fifteen minutes anyway.

gchamonlive 10 days ago | | |

> it happened when we achieved a level of such constant stimulation (...) that we’re never bored and never engage the default mode network.

I don't know... I don't disagree, but I think this has been repeated so much that I believe everyone, at least everyone that is actively participating in HN discussions is aware of this.

So if we are aware of this and we consciously choose to keep engaging in dopaminergic activities, without having some time to be bored, I think it starts to become a choice. We can blame tech for starting this trend of stealing our attention, but once we become aware of this, we can only blame ourselves for perpetuating it.

theshrike79 10 days ago | | |

> When you’re bored, your mind goes to places it wouldn’t otherwise go. Curiosity kicks in. Curiosity is a precursor to learning. Learning engages the brain and is fun. But it’s not fun all the time, some of it is challenging and frustrating (which is good, that’s the process that teaches you).

And I love how I can go from a curious brainfart "hmm, could I do a movie catalogue app that uses a web page + phone camera + OpenAI API to identify physical DVDs by front/back cover instead of trying to find a reliable barcode database" to it actually working in maybe two hours of real time. Just paused the movie I was watching, typed the idea to Claude Code on mobile and kept watching.

After the movie went back to my computer, merged the changes and tested whether it worked. It mostly did. The UI/UX was horrible etc, but the basic idea was functional. It even got some of the movie extras correctly.

I didn't try to turn it into a product, didn't buy a domain for it or advertise it on Reddit or Show HN. But now I know it CAN be done. Curiosity sated.

AStrangeMorrow 11 days ago | | |

I still love learning, especially outside of tech. Been working in the ML field for over 8 years, and while I went into it because I liked the field, I did lose some interest in learning things, but mostly because of the sheer volume of publication and the rate of change. Learning stopped being something I enjoyed doing and went to something I had to do to keep up. And it just stopped having the same flavor.

madduci 10 days ago | | |

We also stopped learning when someone had the idea to put unrealistic deadlines in projects and tackling tech debt has been denied and the most hated activity from management.

ditchfieldcaleb 11 days ago | |

I agree with you on everything you said here except:

> when you know how the thing works and have that mental context, you will always be faster than an AI

That's just plain false, honestly. No one can type at the speed AI can code, even factoring in the time you need to spend to properly write out the spec & design rules the AI needs to follow when implementing your app/feature/whatever. And that gap will only increase as LLMs get more intelligent.

notnullorvoid 11 days ago | | |

Some of us do actually have intimate knowledge in certain areas where guidance of an AI takes longer than doing it yourself. It's not about typing speed, it's that when you know something really really well the solution/code is already known to you or the very act of thinking about the problem makes the solution known to you in full. When that happens it's less text to write that solution than it is to write a sufficient description of the solution to AI (not even counting the back and forth required of reviewing the AI output and correcting it).

Turskarama 11 days ago | | |

In my experience AI can write _something_ from scratch, but often edge cases won't be handled until I go through and read the results or test it. Usually when I'm writing by hand I will naturally find the majority of edge cases as I go. By the time I've read through the results and fixed said edge cases, I usually would have been faster just doing it myself.

utopiah 11 days ago | | |

> No one can type at the speed AI can code

Don't we already have a weekly post nowadays explaining, again, that typing isn't the bottleneck?

JSR_FDED 11 days ago | | |

It should be “…you will always be faster than someone _without the knowledge_ using an AI”

charcircuit 11 days ago | | |

>No one can type at the speed AI can code

You can definitely be faster than frontier models. The number of tokens per second is not that high and they require a lot of tokens for thinking and navigating things.

leostarship 11 days ago | | |

as i understood it he's referring to the overall time it takes to build a complete finished piece of software, accounting for the refactoring and bug fixes and all that. cause handn't you understood the tools you're using you would be running into roadblocks and that adds up

gaanbal 11 days ago | | |

if you've never had the experience of handing something off to someone else being more laborious and slower than doing it yourself due to having to set constraints and define success, then you simply haven't held a senior enough position to comment on this with any authority

jmull 11 days ago | | |

They probably mean faster to a higher-level goal rather than SLOC. Typing speed and SLOC have never been that useful for measuring productivity.

TeriyakiBomb 11 days ago | | |

Plenty of cars can get off the line faster than an F1 car. But around a track, an F1 is by far the fastest in the world.

Going fast isn’t the difficult bit.

draxil 11 days ago | | |

Except it's often faster to make the change yourself than explain it to an AI.

erfgh 11 days ago | | |

Where does this certainty that LLMs will get more intelligent stem from?

stephenr 11 days ago | | |

> LLMs get more intelligent

The Spicy Autocomplete koolaid club is out in force today I see.

We clearly have different ideas of what the word "intelligent" means.

allthetime 11 days ago | |

AI is just revealing the two types of people in this line of work. Those who don’t actually like software and just do it because it’s lucrative, and the actual nerds who care.

smugglerFlynn 11 days ago | | |

You are probably talking about people who just crunch out some half baked solutions for the sake of getting somewhere.

But there are other nerds who care, just not about the code quality, but about conversion, testing out business ideas quickly, getting to know their customers better.

There are nerds who care about business strategy.

There are nerds who care about accounting principles and clean financial reporting.

There are nerds who care about sales targets and partnerships.

There are many types of nerds out there. Don’t limit nerds to engineers, because “tech” world is not just an engineering world anymore. All these nerds you can team up with to build meaningful things, because they do care.

eli 11 days ago | | |

A much more charitable framing: people who enjoy the process vs people who enjoy the result.

(Though, granted, the results are a lot better if you craft it by hand)

tyyyy3 11 days ago | | |

Can we build a list of the actual nerds who care? Need it for my future recruitment needs lol.

techpression 11 days ago | | |

It goes for all professions really, people who do it for work and people who care. Apply to any profession, plumbers, doctors, carpenters, cleaners, etc etc. Most of us have experienced both types and I haven’t heard of anyone preferring the ”do it for work” over the ones who care. And like those other professions, in software we accept the worse of the two because finding people who care is both time consuming and often much more expensive.

Daishiman 11 days ago | | |

I care a lot about software and I use LLMs extensively. There are some things I deeply understand yet I don't care for doing anymore because I've done them for years and there's nothing to be gained from doing them manually.

XenophileJKO 11 days ago | | |

This is such a naive take. Most of the nerdiest and most "quality" oriented engineers are hard leaning in to agentic coding. I feel like the most impressive engineers I know have always leaned in to learning how to "sharpen the axe" and AI is really the biggest axe we have seen.

okdood64 11 days ago | | |

I take software engineering and production reliability very seriously. But coding is just a small part of my job. It's not really the meat and potatoes. I'll vibe code (responsibility) where I can.

hgoel 10 days ago | | |

Your category of "nerds who care" is actually "nerds who only want to be coders" and not "nerds who care about solving problems".

enraged_camel 11 days ago | | |

I care about solving problems for and delivering value to my users. The software is simply a means to that end. It needs to work well, but that does not mean every line of code requires an artisanal touch and high attention to detail.

michaelcampbell 10 days ago | | |

I think there's a continuum here, too. I've heard it said, in jest, mind, that LLM's square the dev. It turns a 1.5x dev into a 2.25x dev, but it also turns a 0.75x dev into a ~0.56x dev.

I think the exponent of 2 is probably too high, but it's not a bad approximation of a very messy reality.

There is also the division of people who value the thing being produced vs. valuing the actual production of that thing, whether or not its used. I don't see one side here being "right", necessarily, but when a company is behind it one is certainly more valued, and I think not incorrectly.

munksbeer 10 days ago | | |

There are more types of people. I do it because it is lucrative, because it turns out I'm good at being a professional software engineer, but I also enjoy it more than other things I could be doing.

However, ultimately, I got into software because I was intellectually curious and programming was a tool I could use to explore that curiosity. When I stop working professionally, I will stop caring about the sorts of stuff I care about today and go back to using programming for what I love. A tool to explore.

avgDev 10 days ago | | |

I am a nerd who cared. Caring is not putting my food on the table though, delivering stuff is.

I still enjoy diving into documentation but AI has transformed how I work. I can quickly get code examples I can debug. I learn new things as sometimes AI generates approaches I haven't used before.

stephenr 11 days ago | | |

I've posited for a while now that the people who find spicy autocomplete to be exciting are the people who can't really do what it does.

I played with Image Playground last year some time. It was really fun. You know why? I can't draw, and I can't paint, to save my life. It's letting me do something I can't do well/at all on my own.

Using an LLM to do something I can do, with the caveat that it's pretty mediocre at the task, and needs to be constantly monitored to check it isn't doing stupid things? If I wanted that I'd just get an intern and watch them copy crappy examples from StackOverflow all day.

The same logic explains the use of LLM's to write emails/other long form text.

It makes accessible something that people otherwise cannot do well. Go look at submissions on community writing sites. The people who write because they're good at it, are adamant they don't use an LLM.

People use LLM's to do things they're otherwise not able to do. I will die on this hill.

cableshaft 10 days ago | | |

I've been a software nerd all my life (and there was a time where I worked 60 hours a week at a startup working hard to make mobile games), but there's just been so much extra crap associated with it (especially web development, and especially corporate web development, what currently pays my bills) over the years that it's worn me down and I'm happy to let A.I. churn through the hard or frustrating or endless amounts of boilerplate bits, and let me focus on other things.

Part of me still wishes we were making websites with just HTML, CSS, PHP, and a little Javascript here and there (before AJAX). I'm still not convinced all this extra SPA functionality is really needed for most corporate website needs (something like Google maps or real-time chatting, sure, other things not so much), but I do it because they insist.

I also really like game design, and I had a fairly simple game idea that I prototyped a physical version of and playtested a few times and thought, 'yeah, this is pretty fun'.

But I don't have the energy to code it in my spare time anymore. Was curious how close to a working MVP it could get with me writing up a specification yesterday with the help of ChatGPT (after I brainstormed a few aspects of the design), and dumped that spec into a new repo on GitHub, and about 20 minutes later, it had a fully functional game that worked exactly like my physical prototype.

It was still missing other features, like tutorials and stats and sharing abilities and the like, and I'd like to adjust the presentation some, and the computer opponent A.I. was a bit weak and could have been stronger, but it was fully functional and even looked pretty good, kind of like a Wordle presentation, which was what I was going for anyway.

Something that would have taken me probably 40 hours of dedicated work at least to get everything working and looking as nice as it did.

So yeah, it's kind of like 'well what's the point of me manually coding this anymore'.

What I really like about software was solving puzzles, but now I can focus on the more interesting puzzle of what makes a good game design and 'how best to present this to players' instead of how to get five different libraries and/or APIs to play nice together and learn how it all works.

If coding hadn't become some labyrinthian monstrosity and got out of your way when coding, I probably would want to keep coding more.

Some languages/frameworks get close to that, Lua/Love2D is pretty smooth except when it gets to you wanting to distribute it on platforms other than PC/Mac/Linux, or integrate with external libraries, or for me work with shaders since I'm still pretty weak with shaders.

But even then, it was hard to deny how much faster A.I. could code a feature and I've started getting more hands-off there as well.

That being said, work has gotten less fulfilling, since I'm not doing any actual design work really, just implementing features and making them look according to Figma specifications or fixing bugs, so that's gotten less fulfilling without the busywork of solving coding puzzles (now it's 'how to say this to the A.I. to get it to fix this right, which is still a puzzle but a much weaker one). I'm starting to get tempted to make a go of starting my own business so I can have more autonomy again.

pdntspa 11 days ago | | |

Why exactly does "actual nerds who care" stipulate writing code?

jesterson 11 days ago | |

> Why is it a bad thing to know all the ins and outs of a programming language? To write and make all the decisions yourself? That shit is fun.

It's not just fun (i agree it is), but it is also essential for creation.

What we have done with the 'AI' is to create a lot of ignorant morons who think they can create a lot of things without knowledge. This is not gonna end well.

zx8080 11 days ago | | |

> they can create a lot of things without knowledge. This is not gonna end well.

Who said "managers"

Finbel 11 days ago | |

>Also, when did we stop liking to learn? Why is it a bad thing to know all the ins and outs of a programming language?

I do not know the inns and out of the assembly layer my high level code end up as. It's not because I don't like to learn, it's because I genuinely don't need to. At a certain level of AI performance, how will this be any different?

0xpgm 11 days ago | | |

However, curious programmers who develop in high level languages will dabble with assembly maybe for fun, and will be much better off for it than those who treat parts of the stack like a black box never to be opened.

californical 11 days ago | | |

Because you may not know the specifics of the assembly being generated, but you’ve likely learned a language built on top of assembly. And the compilers do some great tricks behind the scenes to generate efficient assembly, but those tricks are specifically coupled to semantics of the source language.

An LLM is not coupled to anything and can generate output that simply does not relate to the input. This doesn’t happen with compilers, and if it does, then it’s a specific bug to be addressed. An LLM can never guarantee certain output based on the input.

If I write x < 100, I know exactly how the compiler will treat that code every single time, and I know what < means and how it differs from <=

If I tell an LLM that “I want numbers up to 100.” Will that give me < or <= and will it be consistent every single time, even the ten thousandth program that I write?

The language is ambiguous where the code is specific

sdevonoes 11 days ago | | |

One difference is: to use a top notch compiler/assembler you don’t need to pay. They are open source and have a lot of support. To use the latest and greatest models (bc no one around likes to use non sota ones) you need to pay a premium price.

Multibillion dollars companies are now the gateway for every line of code you need to write. That’s dystopian. It sucks

jtr1 11 days ago | |

I have been building an iOS app that I had kicking around in my head for years but never had time to build. I have been a frontend UX engineer for the better part of a decade and went through a handful of tutorials on Swift. The project definitely sits in this uncanny valley for me. I have test suites for every aspect of the app and have the agent using TDD to avoid cheating - this has gotten me pretty far without having to look too close at the output other than general structure. As I'm reaching a more mature stage of the project though, I'm finding that I want to tweak a lot by hand in the code to get the details right without burning tokens.

throwaway219450 11 days ago | | |

The agents always do the best work IMO if you already know exactly what you want, but are too lazy to implement it. I like having the agent mock up a working solution before reimplementing it.

To split the difference, I now try to hand code as much as I can from the beginning, leave TODO comments for the agent to mop up and I'll ask it to complete the issue with reference to the current diff. It reduces the surface for agents to make stupid assumptions. If I can get it done fast on my own, win for me, if the agent finds issues or there's logic that needs checking, also a win. This way you stay sharp, but you have access to an oracle if you get stuck and it costs you fewer tokens.

miki123211 10 days ago | |

In my view, AI is worst at crossing the rubicon from a 200-line script to a maintainable architecture of ~10kloc.

If you already have a decent architecture, adding a new feature is usually fine. If you have nothing and need it to write a 200-line script, that's usually fine. If you need it to figure out a maintainable architecture that will be easy to extend in the future... that;'s where the problems start.

esafak 10 days ago | | |

You need to be involved in the architecture.

kreneskyp 10 days ago | |

> If you're trying to learn something new like an algorithm, protocol, or API write that shit by hand. You learn by doing, and when you know how the thing works and have that mental context, you will always be faster than an AI. Also, when did we stop liking to learn?

I vibe engineer to learn. I am currently doing this with a project to build a Vector DB extension in postgres. Several aspects of this project are very new to me. I don't write any of the code. I have never written a single line of Rust. I do, however, spend a significant amount of time discussing architecture and design with the agents.

I started with well known algorithms (HNSW, IVF, DiskANN, TurboQuant, RabitQ, PQFastScan) and have since moved on to a novel implementation based on fairly recent research papers.

My primary goal is to learn. That is a success and ongoing. A stretch goal is to contribute novel ideas back to the community, which may be useful even if what I build isn't ever production ready.

IanCal 11 days ago | |

Fundamentally you need to start with "what am I trying to do?" and "given that goal, where is my time best spent?".

I made a checklist for my kids to stamp off items after they get back from school (sort bag, get changed, etc). I had two goals, 1) I was trying to solve a problem at home and would have pip installed a library that just straight up did this already and 2) I wanted to check out what the claude website outputs was like at the time. My time was best spent poking at claude a bit but mostly playing with my kids - so vibe coding it was.

Client test speedup issues, I'm trying to speed up tests for them and spend as little time as possible doing so. Vibe coded some analysis and visualisation tools, mostly AI but with some review guided multiple prototypes for timing and let it just fix whatever. More dedicated review for the actual solutions.

Learning a new thing - goal is to learn that thing. AI there is good for doing a lot of the work around that. Maybe I'm focussing on, say, Z3. AI there can help with debugging, finding docs, setting up an environment and leave me to do the central part.

rufasterisco 11 days ago | |

Let’s see if someone can point me towards some resources over the following.

The problem is mixing vibe-coding and agentic-eng, and switching the brain in 2 different modes (fast-feedback gratification vs deep-focus gratification).

There’s no clear cut rule on what works. Different people, different brains, and especially amongst devs some optimized low-key neurodivergence.

And then there’s waiting mode, those N seconds/minutes that agents take to think and write.

What’s the right mix? Keep a main focused project and … what do you do in the meantime? Vibe code something else? Hn? Social media? Draw lines on a paper sheet? Wood carving? Exercise? Rewatch some old tv series?

I have experimented….

There are side activities that help you go back to the task at hand in the correct mental framework for it. Not just for productivity, but for efficiency and enhancing critical thinking on the main task. Or whatever you choose to optimize for. Can anyone point me towards some people talking about this?

dirtbag__dad 10 days ago | |

> Also, when did we stop liking to learn?

Says who? One of the most enriching things about coding with agents is I have them provide new information, tools, patterns, whatever as a follow up to every feature I work on. I’m learning a ton and it’s helping me build better with agents, too.

Xeronate 11 days ago | |

When I started spending 40-60 hours a week programming and wanted to spend my remaining time doing other things.

hirvi74 11 days ago | | |

I imagine my future will involve spending 40–60 hours a week using LLMs to do the work of multiple roles instead of just one, while wishing I could spend my remaining time doing other things.

maxsilver 10 days ago | |

> Also, when did we stop liking to learn?

When the economy got so bad for so many people, that every waking moment has to be either chasing fresh cash (or spent in recovery from cash-chasing, worrying about new cash), to the point they have to largely ignore their own long term goals or basic morals or principles.

You can blame all the new gadgets (phones/social media/tiktok/‘dopamine-things’) — but it’s a very much blaming the symptom, not the problem.

(It’s the meme. “Guys, this isn’t funny. Humans only do this when they’re very distressed”)

imrozim 11 days ago | |

100% aggred, i learn coding by building stuff and breaking it when you let ai do everything you skip that pain and also skip the understanding.

dpoloncsak 10 days ago | |

Just here to say I love the line 'A hammer is a really great tool that has thousands of purpose-designed uses. I still prefer my key to get into my car.'

Been saying the 'Hammer is a great tool but you need to know when to use it, just like AI.' to coworkers, and i'm ̶s̶t̶e̶a̶l̶i̶n̶g̶ borrowing your quote instead, now

etothet 11 days ago |

Vibe Coding (and LLMs) did not create undisciplined engineering organizations or engineers. They exposed and accelerated them.

Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.

zarzavat 11 days ago |

Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.

If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.

If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.

If anything, "truthy" code is more mentally taxing to review than just obviously bad code.

jwpapi 11 days ago |

> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good.

I feel like this is just not true. An JSON API endpoint also needs several decisions made.

- How should the endpoint be named

- What options do I offer

- How are the properties named

- How do I verify the response

- How do I handle errors

- What parts are common in the codebase and should be re-used.

- How will it potentially be changed in the future.

- How is the query running, is the query optimized.

…

If I know the answer to all these questions, wiring it together takes me LESS time than passing it to Claude Code.

If I don’t know the answer the fastest way to find the answer is to start writing the code.

Additionally, whilst writing it I usually realize additional edge cases, optimizations, better logging, observability and what else.

The author clearly stated the context for this quote is production code.

I don’t see any benefits in passing it to Claude Code. It’s not that I need 1000s of JSON API endpoints.

devin 11 days ago |

> If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

It is so embarrassing that LOC is being used as a metric for engineering output.

dataviz1000 11 days ago |

Have you noticed that the coding agents get really close to the solution on the first one shot and then require tons of work to get that last 10% or 5%?

If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.

The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.

Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.

peterbell_nyc 11 days ago |

For me the distinction is the quality and rigor of your pipeline.

Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.

Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)

And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.

wek 11 days ago |

What an excellent article by a smart, humble, still-learning person!

Favorite quote:" There are a whole bunch of reasons I’m not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you’re doing, you can run so much faster with them. [...]

I’m constantly reminded as I work with these tools how hard the thing that we do is. Producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and what we’re trying to achieve here is still really difficult. [...]"

alishayk 11 days ago | |

What do you do if you don't have that existing experience? How do you build it up?

sotix 11 days ago | | |

Build it up in your free time. It's extraordinarily valuable to build up those skills, and I'm not convinced that companies will allow time to slow down and build them.

mikestaas 11 days ago | | |

Break things, and then fix them. Repeat many times.

keeganpoppen 11 days ago | |

it’s sad that i had to triple-read this to determine you weren’t being sarcastic. sad for whom? i don’t know. but the amplifier take is exactly the right one.

wek 11 days ago | | |

I kind of felt the same way reading the article! It felt so unusual to encounter someone who is both smart and humble and willing to admit they were learning. And I was happy to encounter it and sad that I was so surprised by it.

rirze 10 days ago | | |

I didn't think it was sarcastic till I read your comment, upon which point, I got confused and read it twice to make sure it wasn't sarcastic.

Nevertheless, it is refreshing to see nuanced positive energy. I agree that AI is going to be the great multiplier.

underdeserver 11 days ago |

When I was in grad school I graded homework for first year math classes, and the thing about math homework is that the perfect homework takes almost no time to grade.

It's the bad, semi-coherent submissions that eat up your time, because you do want to award some points and tell students where they went wrong. It's the Anna Karenina principle applied to math.

Code review is the same thing. If you're sure Claude wrote your endpoint right, why not review it anyway? It's going to take you two minutes, and you're not going to wonder whether this time it missed a nuance.

scottyah 11 days ago | |

Typically in engineering you don't know what you're doing. If you're sure of what it should look like going in, you're more of a technician. I think most people coding have no idea what they're doing to a large extent- not many people can do the same rote work for years straight.

wg0 11 days ago |

Here's for the AI supremacists:

Let's assume AI is 10x perfect than humnas in accuracy and produces 10x less bugs and increases the speed by 1000x compared to a very capable software engineer.

Now imagine this: A car travels at a road that has 10x more bumps but it is traveling 1000x slower pace so even though there are 10x bumps, your ride will feel less bumpy because you're encountering them at far lower pace.

Now imagine a road that has 10x less bumps on the road but you're traveling at 1000x the speed. Your ride would be lot more bumpy.

That's the agentic coding for you. Your ride would be a lot more painful. There's lots of denial around that but as time progresses it'll be very hard to deny.

Lastly - vibe coding is honest but agentic coding is snake oil [0] and these arguments about having harnesses that have dozens of memory, agent and skill files with rules sprinkled in them pages and pages of them is absolutely wrong as well. Such paradigm assumes that LLMs are perfect reliable super accurate rule followers and only problem as industry that we have is not being able to specify enough rules clearly enough.

Such a belief could only be held by someone who hasn't worked with LLMs long enough or is a totally non technical person not knowledgeable enough to know how LLMs work but holding on to such wrong belief system by highly technical community is highly regrettable.

[0]. https://news.ycombinator.com/item?id=48018018

kelnos 11 days ago |

Yup, the normalization of deviance here is a real thing. I still review all the code the LLM generates (well, really, I have it generate very little code: I use it more for planning, design, rubber-ducking, and helping track down the causes of bugs), but as time goes on without obvious errors, it gets more and more tempting to assume the code is going to be fine, and not look at it too closely.

But resisting that impulse is just another part of being a professional. If your standards involve a certain level of test coverage, but your tests haven't flagged any issues in a long time, you might be tempted to write fewer tests as you continue to write more code. Being a professional means not giving in to that temptation. Keep to your quality standards.

Sure, standards are ultimately somewhat arbitrary, and experience can and should cause you to re-evaluate your standards sometimes to see if they need tweaking. But that should be done dispassionately, not in the middle of rushing to complete a task.

And hell, maybe someday the agents will get so good that our standards suggest that vibe coding is ok, and should be the norm. But you're still the one who's going to be responsible when something breaks.

keeda 11 days ago |

I think all coding will become vibe coding, but it will be no less an engineering discipline.

Note: I still review pretty much every line of code that I own, regardless of who generates it, and I see the problems with agents very clearly... but I can also see the trends.

My take: Instead of crafting code, engineering will shift to crafting bespoke, comprehensive validation mechanisms for the results of the agents' work such that it is technically (maybe even mathematically) provable as far as possible, and any non-provable validations can be reviewed quickly by a human. I would also bet the review mechanisms would be primarily visually, because that is the highest bandwidth input available to us.

By comprehensive validations I don't mean just tests, but multiple overlapping, interlocking levels of tests and metrics. Like, I don't just have an E2E test for the UI, I have an overlapping test for expected changes in the backend DB. And in some cases I generate so many test cases that I don't check for individual rows, I look at the distribution of data before and after the test. I have very few unit tests, but I do have performance tests! I color-code some validation results so that if something breaks I instantly know what it may be.

All of this is overkill to do manually but is a breeze with agents, and over time really enables moving fast without breaking things. I also notice I have to add very few new validations for new code changes these days, so once the upfront cost is paid, the dividends roll in for a long time.

Now, I had to think deeply about the most effective set of technical constraints that give me the most confidence while accounting for the foibles of the LLMs. And all of this is specific to my projects, not much can be generalized other than high-level principles like "multiple interlocking tests." Each project will need its own custom validation (note: not just "test") suites which are very specific to its architecture and technical details.

So this is still engineering, but it will be vibe coding in the sense that we almost never look at the code, we just look at the results.

turtlebits 11 days ago |

The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher and make changes as no human can understand the code anymore.

Pretty soon there is no code reuse and we're burning money reinventing the wheel over and over.

somewhereoutth 11 days ago | |

Prior to the advent of LLMs, I had this concept of the 'complexity horizon' - essentially a [hand built] software system will naturally tend to get more and more complex until no-one can understand it - until it meets the complexity horizon. And there it stays, being essentially unmaintainable.

With LLMs, you can race right for that horizon, go right through, and continue far beyond! But then of course you find yourself in a place without reason (the real hell), with all the horror and madness that that entails.

bossyTeacher 11 days ago | |

> The scary part is that codebases are getting layers of AI complexity, that it's going to cost $$$ to have the latest model decipher

Isn't this a bit like old Java or IDE-heavy languages like old Java/C#? If you tried to make Android apps back in the early days, you HAD to use an IDE, writing the ridicolous amount of boilerplate you had to write to display a "Hello Word" alert after clicking a button was soul destroying.

turtlebits 11 days ago | | |

At least a human can get involved. Complex codebases written by humans can be understood.

If the barrier is too high, code is refactored.

layer8 11 days ago | | |

The difference is that the complexity to achieve “Hello World” was the same for everyone, and more or less well-understood and documented. With AI, you get some different random spaghetti slop each time.

ewild 11 days ago | |

I genuinely think it's part of a psyop. If we bloat all codebases and eventually start printing the models on chips to reduce inference costs by 50-100x they'll take in massive profits from 5M line codebases instead of 350k

eddiewithzato 11 days ago | |

The models today will happily slop over a single 1k loc react index component on a brand new project.

They really are bad for creating a healthy codebase

ofrzeta 11 days ago |

"I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money.” (Simon Willison herein quotes Matthew Yglesias) - this is such a naive and sloppy take. What do you want? "better software"? not going to happen. "cheaper software"? not going to happen either. "more software"? for sure, but is it really what you want?

If I hire a plumber it's certainly not cheaper than doing it myself but when I am paying money I want to make sure it is better quality than what I am vibe plumbing myself.

ianm218 10 days ago | |

I definitely want more, higher quality software, maybe even 10X more. Even simple things like a personal assistant that can help manage my social life better don't really exist yet, nevermind that I want a medical team doing research on my behalf/ optimizing my insurance. Or a software team in the background building bespoke software for all my hobbies etc.

senordevnyc 10 days ago | |

I'm already getting and creating better software for cheaper. I have lots of software products that I use that are better now than a few years ago because of AI. And much of the software I use is free. What are you talking about exactly?

And on the creation side, I run a SaaS that's taking over a niche market because it replaces a human-powered process with an AI-powered one. Customers switch to me because they get better results more consistently, much faster, and much cheaper.

dev360 11 days ago |

> It’s not just the downstream stuff, it’s the upstream stuff as well. I saw a great talk by Jenny Wen, who’s the design leader at Anthropic, where she said we have all of these design processes that are based around the idea that you need to get the design right—because if you hand it off to the engineers and they spend three months building the wrong thing, that’s catastrophic.

This is spot on. I think the tooling is evolving so much particularly on the design side that its not worth the "translation cost" to stay (or even be) on the Figma side anymore.

christophilus 11 days ago | |

If you hand something off to engineering and they spend three months building the wrong thing, you’ve got a dysfunctional organization.

gabriela_c 11 days ago |

Claude often does things in more detail, and even better, than I would, in the first pass. But I don't understand how anybody stands comments generated by an LLM?

It's seriously the thing that worries (and bothers) me the most. I almost never let unedited LLM comments pass. At a minimum.

Most of the time, I use my own vibe-coded tool to run multiple GitHub-PR-review-style reviews, and send them off to the agent to make the code look and work fine.

It also struggles with doing things the idiomatic way for huge codebases, or sometimes it's just plain wrong about why something works, even if it gets it right.

And I say this despite the fact that I don't really write much code by hand anymore, only the important ones (if even!) or the interesting ones.

Also, don't even get me started on AI-generated READMEs... I use Claude to refine my Markdown or automatically handle dark/light-mode, but I try to write everything myself, because I can't stand what it generates.

jazzypants 11 days ago | |

I find that the best thing about generating documentation with LLM's is that it gets me angry enough to rewrite it correctly.

"Ugh, no! Why would you say it like that? That's not even how it works! Now, I need to write a full paragraph instead of a short snippet to make sure that no future agents get confused in the same way."

mkozlows 11 days ago | |

The comments aren't an LLM thing, they're a Claude thing. Codex doesn't write those gross hyper-verbose comments.

user34283 11 days ago | | |

In my experience Codex barely writes any comments, despite my attempts to encourage it in the AGENTS.md.

GistNoesis 11 days ago |

The real paradigm shift is not here yet, but not very far away. I'm talking about the single unified codebase. Agents building a unique codebase for all your software needs.

Because most of the complexity in software comes from interfacing with external components, when you don't need to adapt to this you can write simpler and better code.

Rather than relying on an external library, you just write your own and have full control and can do quality control.

Linux kernel is 30 000 000 LOC. At 100 tokens /s, let's say 1 LOC per second produced for a single 4090 GPU, in one year of continuous running 3600 * 24 * 365 = 31 536 000 everyone can have its own OS.

It's the "Apps" story all over again : there are millions of apps, but the average user only have 100 max and use 10 daily at most.

Standardize data and services and you don't need that much software.

What will most likely happen is one company with a few millions GPUs will rewrite a complete software ecosystem, and people will just use this and stop doing any software because anything can be produced on the fly. Then all compute can be spent on consistent quality.

ytoawwhra92 11 days ago | |

> Standardize data and services and you don't need that much software.

We've known this since close to the advent of computing and yet every generation of has taken us further away from this goal. Largely driven by jealous resource-guarding, particularly when it comes to data. Why don't I have a generic media player app that can stream Netflix, Disney, Hulu, etc? Those brands want control over my experience. They will continue to want that control indefinitely. That basic human desire for control won't evaporate with a "single unified codebase".

deadbabe 11 days ago | |

Every happy OS will be the same. Every broken OS will be broken in its own way. What a nightmare.

jcgrillo 11 days ago |

> It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.

I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.

As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?

I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.

vmaurin 11 days ago |

> The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

No, it was never designed around that. All methodologies of software dev don't focus too much on writing the code, but on everything else: requirement definition, quality, maintenance, speed of integrating feature, scaling the work, ...

Personally with 20 years of experience, I never seen a single company were writing the code was a bottleneck

senordevnyc 10 days ago | |

requirement definition, quality, maintenance, speed of integrating feature, scaling the work

Literally every single one of these is much, much faster with AI than without. It's not even close.

bhagyeshsp 11 days ago |

> The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.

> If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.

The distance of accountability of the output from its producer is an important metric. Who will be held accountable for which output: that's important to maintain and not feel the "guilt".

So, organizations would need to focus on better and more granular building incentives and punishment mechanisms for large-scale software projects.

drmajormccheese 11 days ago |

There are techniques for improving our confidence in our software: unit testing, integration testing, fuzz testing, property-based testing, static analysis, model checking, theorem proving, formal methods, etc. The LLM is not only a tool for generating lines of code. It can also generate lines of testing. The goal is that the tests are easier to audit by the humans than the code.

exographicskip 11 days ago | |

I've found that one of the areas I enjoyed least is now what I spend a lot of time on now: testing!

Property-based testing in particular has uncovered a number of invariants in every code base I've introduced it to.

tbf depending on the agent/model a lot of the tests end up being thrown out so it's possible I _should_ handwrite more tests, but having better prompts and detailed plans seems to mitigate that somewhat

sfjailbird 11 days ago | |

How do we make sure the LLM generated code works? We'll have LLM generated tests! Wait a minute...

coldtea 11 days ago | |

>There are techniques for improving our confidence in our software: unit testing, integration testing, fuzz testing, property-based testing, static analysis, model checking, theorem proving, formal methods, etc. The LLM is not only a tool for generating lines of code. It can also generate lines of testing.

Which is the same issue of lack of understanding and care and accountability from the human operator, with extra steps and a false sense of security.

noduerme 11 days ago |

>> The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code.

Yeah. I'm not sure how other people work, but I almost never need to write formal tests because I essentially test locally as I write, one method at a time, and at that moment I have a complete mental map of everything that can potentially go wrong with a piece of code. I write and test constantly in tandem. I can write a test afterwards to prove what I already know, but I already know it. This is time consuming, anal, and obsessive-compulsive, and luckily that kind of work perfectly suits my personality. The end result is perfect before I commit it.

It is a lot of fun asking LLMs to write code around my code. Make 10 charts with chartjs in an html page that show something and put it behind a reverse proxy so the client can see it. Wow. Spot on, would've taken me an hour. I can even rely on Claude to somewhat honestly reason about things in personal projects.

But knowing every implementation decision makes a huge difference when anything real is at stake. "Guilt" wouldn't begin to describe the sense I'd have id my software did something because of a piece of code I hadn't personally reviewed and fully understood, at which point I probably should have just written it myself.

_doctor_love 11 days ago |

Repeat after me: most software spends the majority of its lifetime in the maintenance phase.

Repeat after me: it follows that most of the money the software makes occurs during the maintenance phase.

Repeat after me: our industry still does not understand this after almost 100 years of being in existence.

Alan Kay was 100% right when he said that the computer revolution hasn't occurred yet. For all of our current advancements all tools are more or less in the Stone Age.

My great hope is that AI will actually accelerate us to a point where the existing paradigm fully breaks beyond healing and we can finally do something new, different, and better.

So for now - squeee! - put a jetpack on your SDLC with AI and go to town!!! Move fast and break things (like, for real).

jFriedensreich 11 days ago | |

Most software has a few years lifetime and nearly no users. What you say is only true after reaching a certain milestone like product market fit. I think the idea is to reach that turning point as fast as possible and then rebuild the system from ground up with maintainability and quality focus.

_doctor_love 10 days ago | | |

doch

jwpapi 11 days ago | |

I hate code and I want as little of it as possible in my codebase.

_doctor_love 11 days ago | | |

The best code is no code. The second-best code is the code I delete.

My favorite JIRAs are the ones I prevent from being worked on in the first place because they were unnecessary.

The ideal prompt is the one I don't fire because it would be a waste.

In an application with an LLM component, the ideal amount of inference is zero.

Ultimately this seems to lead to "the ideal amount of computers in the world is none" but for the sake of my continued employment let's let that one go by. :)

inventor7777 11 days ago |

I agree somewhat, but I do still think there is a decently sized separation between true vibe coding (the typical "make me an app...fix this bug") and actual AI assisted development. I personally think that if you are a dev and you simply trust the AI's output, that is still vibe coding.

I am not a developer and have very basic code knowledge. I recently built a small and lightweight Docker container using Codex 5.5/5.4 that ingests logs with rsyslog and has a nice web UI and an organized log storage structure. I did not write any code manually.

Even without writing code, I still had to use common sense in order to get it in a place I was happy with. If i truly knew nothing, the AI would have made some very poor decisions. Examples: it would have kept everything in main.go, it would have hardcoded the timezone, the settings were all hardcoded in the Go code, the crash handling was non existent, and a missing config would have prevented start. And that is on a ~3000 line app. I cannot imagine unleashing an AI on a large, complex. codebase without some decent knowledge and reviewing.

_jss 11 days ago |

This is a timely observation and feels right to me. I needed to get a relatively simple batch download -> transform -> api endpoint stood up. I wrote a fairly detailed prompt but left a lot of implementation details out, including data sources.

Opus 4.7 built it about 90% the same way I would, but had way more convenience methods and step-validations included.

It's great, and really frees me up to think about harder problems.

exographicskip 11 days ago | |

This is my experience too. I'm primarily a python dev, but have been routinely using other backend languages (rust, go, etc) that I'm familiar with but not at the same level.

Just having ~13yrs experience heavily weighted in one language with some formal studying of others makes directing llms a lot simpler.

Learning syntax, primitives, package managers, testing, etc isn't that much of a lift compared to how I used to program.

Was helping a non-dev colleague who's using claude cowork/code to automate reporting the other day. They understand the business intelligence side well, but were struggling with basic diction to vibe code a pyautogui wrapper to pull up RDP and fill out a MS Access abstraction on a vendor DB.

Think we'll be fine for another 5-10 years as a profession

ok123456 11 days ago |

One-shot "vibe coding" is generally a mistake.

But using an agentic LLM to complete boilerplate is attractive simply because we've created a mountain of accidental and intentional complexity in building software. It's more of a regression to the mean of going back to the cognitive load we had when we simply built desktop applications.

dyauspitr 11 days ago | |

Tell it to make a plan. Ask it to do 3-5 steps at a time. “One shotting” works very well.

addedGone 11 days ago | | |

why in May 2026, it seems that people haven't discovered loops? people are ignorant, run 20 times the same task in a loop to verify and it's pristine.

lenerdenator 11 days ago |

> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?

Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.

There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:

1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people

2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.

Either way, they have a route to the destination of not paying engineers, and that's the end goal.

If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.

linuxhansl 11 days ago |

I guess it all depends on what you use it for.

I work on database optimizers and other database related stuff, and I can assure Claude Code - with all the highest settings - does make mistakes. It will generate a test that does not actually test what it "thinks" it tests. It will confidently break stuff.

Do not get me wrong. It is still awesome! It takes much of grunt work off me. It can game out designs decisions even when that needs to refactor a lot of code. If you point out a mistake more often than not it can fix it itself.

It's just for a critical project I would never ship it without understanding every line of code - with the exception perhaps of some of the test code. Maybe in a year or two that will be different.

aenis 11 days ago |

Its just economy 101.

People have been running crappy code commercially for over half a century now. Not many companies successfully differentiate by running good code - it usually does not matter to the end consumer, other things are much more important. So now companies will pay less for code, and maybe it is a bit worse (though I personally can't believe AI can do worse than corporate software developers on average). Hobbyists will remain hobbyists, and precious few will be lucky enough to have someone pay them to handcraft stuff. Exactly what happened to woodworkers and other craftsmen.

kommunicate 11 days ago |

It's already the case that you get much better results out of LLMs by forcing agents using them to go through additional layers of planning, design & review.

The future is going to dynamically budget and route different parts of the SLDC through different models and subagents running on the cloud. Over time, more and more of that process will be owned by robots and a level of economic thinking will be incorporated into what is thought of today as "software engineering." At some point vibe coding _is_ coding and we're maybe closer to that point than popularly believed.

mrothroc 10 days ago |

The "blurring" framing makes Simon's tension sound intrinsic when it is actually structural. Vibe coding and agentic engineering aren't on a continuum. They're distinguished by the process.

Engineering is always about a defined process. We follow it to produce predictable artifacts that meet the specifications. Even though code is somewhat "squishy" in that it is an art just as much as a science, it still has to meet the spec.

This has always been true, even before agents started writing code for us. We've all dealt with spaghetti code because of undisciplined practices. That's exactly why we came up with the standard SDLC process: plan, design, code, test, deploy. Repeat.

The part people seem to forget about when looking at this is the space between the steps: the gates. We review the artifacts produced at each stage. If the reviewer does not approve, the engineer has to fix it until it passes. True for human coders, doubly true for agentic coders.

Agentic engineering still follows the process. Artifacts are now cheap to produce, which means we have to adjust it so we don't overwhelm the humans in the loop. For me, this means augmenting my review step with agentic reviewers to catch the dumb stuff. It only escalates to me when either a) it passes clean or b) there is something that genuinely needs my experience.

This is agentic engineering, not vibe coding.

redhale 11 days ago |

I want to agree, I do. But this point is plainly wrong in my observations:

> The enterprise version of that is I don’t want a CRM unless at least two other giant enterprises have successfully used that CRM for six months. [...] You want solutions that are proven to work before you take a risk on them.

Perhaps not for every category of software and every company. But in practice, any SaaS app that is just CRUD with some business logic + workflows is, imo, absolutely vulnerable to losing customers because people within their customers' orgs vibe coded a replacement.

They are perhaps even more at risk because would-be new customers don't ever even bother searching to find them as an option because they just vibe code a competitor in-house.

The vulnerability lies primarily in the fact that most of these SaaS apps were talking about are _wrong_ to some meaningful degree. They don't fully fit how your company works, and they never did. There is something about them that you are forced to work around in some way. This is true because it is impossible to build a universally perfect product, to perfectly fit it to every business requirement of every user in every company.

But now it is relatively cheap to build the perfect version for your company in-house. Or maybe even just for YOU.

I think medium/long-term this will mean a redistribution of technical talent from SaaS companies to industry companies. Instead of paying millions for SaaS subscriptions, industry companies will spend fewer millions building precisely what they need in-house with the help of AI. Not every SaaS and not every company, but I already see this happening at my company right now.

slopinthebag 11 days ago |

I agree, I'm actually generating just over of 20,000 lines of code each day at my company. Part of that was the mandate and leaderboards around token usage, but also they started using pull requests as an explicit metric. What I do is usually pull around 5 or so tickets at once, spin up 5 different agents on their own branch, have them work until completion, and then spin up two more agents to handle the merge request.

I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.

Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.

Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.

jFriedensreich 11 days ago | |

I nearly fell for it until the tiktok part, thanks for amusing shitpost

nsoonhui 11 days ago |

This is my workflow which I find very productive with Agentic AI.

Disclaimer: I'm doing a CAD-like engineering desktop app, and I'm using VS 2026 Copilot, so YMMV.

When I get a Jira ticket, I will first diagnose the problem, and then ask AI to write a test case for it that will reproduce the problem, with guidance on what/how to do the test case (you will be surprised to know how many geometry, seemingly visual problems can be unit tested), and if necessary I provide clues (like which files to read, etc.) for AI to look at, and ask AI to just go and fix the test.

Often AI can do that; AI can make the test pass and make sure that adjacent tests also pass. If in doubt, I will check the output reasoning. I then verify that the fix is done properly via visual inspection (remember, this is a desktop app), and I ask for clarification if needed.

Then at night I'll let my automated test suites run... and oops! Regression found! Who broke it? AI or human? Who cares. I just tell AI that between these times one of the commits must have broken the code — can you please fix it for me? And AI can do that.

This works for small or medium feature implementation, trival bugfixes, or even annoying geometrical problems that require me to dig out the needle in the haystack. So the productivity gain is very real. But I haven't tried it on feature that requires weeks or months for implementation, maybe I should try it next time.

It's hard to describe the feeling. It's just that the AI is working like a very capable (junior?) programmer; both might not have full domain knowledge, but with strong test suites and senior guidance, both can go very far. And of course AI is cheaper and a lot more effective.

DonHopkins 11 days ago |

Instead of "vibe coding" by asking the AI to design and write code, I'm having it refine my own designs, and write code under strict supervision and guidance, that I carefully review and iterate on.

I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.

"I saw the angel in the marble and carved until I set him free." -Michelangelo

It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.

Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!

But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.

ttariq 10 days ago |

I am not sure about agentic engineering getting close to vibe coding, but I certainly buy into building trust in your agents, similar to how you would trust another team / colelague within your organization (the image resizing example), and the best way to make sure that a team is working well is to make sure the right context i available to them at the right time and whenever they change the code base, they update that "context." In the case of human programming, this context is in the form of architecture docs, tickets, product spec, ADRs, messages, code review comments etc and lives in a host of different places. It is also difficult to get humans to fetch and update the context with discipline. However, with agents, it is much easier to get them to consume the right context and keep it updated as they make changes to the code base. I think that is the key to making agents more reliable and being able to have the trust in their decision making and output. All of this, is of course, on top of standard unit testing etc.

solomonb 11 days ago |

From the podcast episode they talk about the idea of using an LLM for training by disallowing the model to write code. I've been experimenting with exactly that in conjunction with a proof checker (Agda) to help me learn some cubical type theory and category theory.

I find the LLM as interactive tutor reviewing my work in a proof checker to be a really killer combo.

cess11 11 days ago |

"But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?"

"I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good."

This really is Wordpress and early PHP all over again, but it's the seasoned folks rather than the amateurs that buy into it.

I believe these tools will be refined and locked down and eventually turn into RAD stuff used by certified enterprise consultants, much like SAP and Salesforce and IBM solutions and so on. From this I come to the conclusion that it is not a good idea to become dependent on them at this stage, which is corroborated by the pecuniary expense as well as excruciatingly fast change in available products.

saltyoldman 11 days ago |

For work I do agentic engineering. As the code that I submit for a code review is hand reviewed by me. I know every line and file that I submit.

My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.

singpolyma3 11 days ago |

I think I'm just too opinionated to go there. If I see something that works fine, but isn't the way I'd do it, it doesn't matter if a human or an LLM wrote it I'm still in there making it match my vision.

rglover 11 days ago | |

This is the way. If you're a prick about quality and outcomes, whether you typed it with your digits or the robot spit it out is irrelevant.

What standard of result are you pursuing and are you willing to discipline yourself enough to achieve it?

AI can't make you un-lazy, no matter how many tokens you pay for.

suzzer99 11 days ago | |

100%. I don't think any senior programmer ever looks at another developer's code and says, "Oh yeah, that's just the way I'd do it."

cortesoft 11 days ago | | |

But I assume you don't go and change all your co-workers code just because they didn't do it how you would have done it?

hirvi74 11 days ago | | |

I concur, and I think that is one of the most difficult aspects of reviewing another's code. It's difficult for me to sometimes differentiate between what is acceptable vs. what I would have done. I have to be very conscious to not impose my ideals.

ai_slop_hater 11 days ago | | |

So you are going to waste everyone's time getting another developer to write code the way you want? This resonates with me because at my company I get this all the time. At that point, you might as well close my PR and do it yourself, whatever way you want. I really like the advice from the book 0 2 1, to assign different areas of responsibility to people, so that there is no conflict.

jstummbillig 11 days ago | |

That's not how most organizations work, AI or not.

jf22 11 days ago | | |

What do you mean?

sevenzero 11 days ago |

>If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

How is producing more lines of code any good? How does quality assurance work with immeasurable code bloat? I want good software not slopware with 2000 different features. A good product does few things, but does these really well. There is no need to constantly add lines of code to a working product.

galkk 11 days ago |

Given rapidly decelerating quality of, at least, claude code output, the agentic coding use may decrease. It is insane how bad the results of background agents are now: constant hallucinations, nonsensical outputs.

BowBun 11 days ago | |

The heavy users of Claude at my job disagree (me included), our work gets shipped and the quality has increased by all metrics. Are you talking about enterprise or consumer Claude subscriptions? I think they're serving drastic different quality depending on how much $ you fork up.

galkk 11 days ago | | |

I don't see much sense to have hn as support thread, but here are quotes from my single claude investigation session, and that happens in every claude code session that I have, especially with 4.7

* The first agent's claim that was 3.x-only was wrong * is nice-to-have but doesn't target our exact case as cleanly as the agent claimed. * The agent's "direct fix for yyy" is overstated. * not 57% as the earlier agent claimed

etc etc etc

And I forgot how many times my session with claude starts: did you read my personal CLAUDE.md and use background agents for long running operations?

I use enterprise subscription, max effort, was with both 4.6 and 4.7.

And please refrain from comments like "you're using it wrong", as the drop in output quality is very clear and noticeable.

parasti 11 days ago |

As a web developer, I feel like this take is wildly optimistic. My remaining qualifications that still provide some sort value are providing historical/business/architectural context to the agent and testing the agent's output. And that's only because 1) it's not all written down in Markdown and 2) the agent is massively nerfed by costs and Anthropic. The thing in the middle where I get a coffee and write code in a variety of languages, then pop open a debugger has been fully obsoleted.

kw3b 11 days ago |

Strong agree. Most orgs will stay tangled in the mess they hand-coded over the years, a few greenfield teams will pull ahead, but until some LLM-fuelled startup displaces a strong incumbent I'm skeptical that we're on the cusp of anything other than a K-shaped transition. I see already low quality software and orgs getting flushed to make room for some new ideas now that the barrier to entry is slightly lower (but far from free). I just wish the transition was done with more humanity.

jFriedensreich 11 days ago |

We still have not the right sandbox and PR abstractions to make the merge of the two complete. Imagine merging a PR and knowing exactly this code cannot ever possibly reach the internet and it can only receive and send specific shapes of api requests from these specific services, it has well defined resource limits and you have specific optimal UI to review these constraints. I can imagine to not review a bigger number of PRs in that reality.

cultofmetatron 11 days ago |

2 days ago, we updated a stripe library which broke everything. With AI, I was able to one shot wrapping all of the calls into a shared service, patched the broken api contract across the entire app and got our signup and payment flows working again. solid day and a half of work. this would have taken a days of back and forth debugging previously. AI is not a panacea for everything but its doign valuable work right now.

bamboozled 11 days ago | |

What does this have to do with the article?

I'd say if you're a semi-competent developer, as probably many people reading the article and commenting already are, this comment adds nothing new to the discussion and would already be a very vanilla usage example of "AI".

I think the point is that while you can "do things" like extracting the stripe integrations out into their own service in ten minutes, you're not stepping into other problems, such as how do you handle failures, how do you scale the stripe service, how do you structure all your other micro services so they can communicate in a coherent way, basically you're speed running yourself into harder decisions when using AI.

cultofmetatron 11 days ago | | |

> basically you're speed running yourself into harder decisions when using AI.

on the contrary, I freed myself from the burden of having to find all the places in the code base where we used stripe and patched them in one go along with the tests to prevent regressions. That represents DAYS of work that I condensed into a few hours.

who cares if it can't know good structure and how to handle failures? I know how to do that. I have a skills file I created that tells stripe our policy for handling error failures, defaults for structures as well as guidelines for how we should deal with communications between different systems. Before i spent hours building this stuff out. now I just spend 20-30 min reviewing a pr to make sure it follows my directives and move onto other problems.

Thats said, i agree with you on principle. I hand coded an app from a solo dev to now managing a team and gettin ready for an imminent series A. AI doesn't save you from scaling issues, you still need to have a clear idea of what you want from the ai and build processes that give it the context to do its job.

I call that job security :)

themafia 10 days ago |

> responsible use of AI to write code

You have no clue what went into the training data or how much of the output is covered by someone else's copyright. To pretend this is "responsible" is ridiculous.

Then you go on to use lines of code per day as a meaningful metric without any evidence that it has any consequence whatsoever.

Finally you don't mention profitability once.

What are we even doing here? Pretending? Why?

Amber-chen 11 days ago |

The distinction between 'vibe coding' and 'agentic engineering' is important. In my experience, the key difference is whether you're reviewing and understanding the code the agent produces. When I use coding agents for non-trivial tasks, I always review the diff before committing — that's the engineering part. The danger is when people skip that step and just trust the output.

sodapopcan 11 days ago | |

That's exactly what TFA is about.

MikeNotThePope 11 days ago |

The more I use AI, the more I find it’s great for anything trivial and uninspired. Need help with some predictable glue code? AI. Need help with something insightful and new to the world? Not AI. Need help with an important task that’s been done a 1000 times? AI with scrutiny. Need to invent something new to the world and core to your business? Probably not AI.

bluefirebrand 11 days ago | |

I'm struggling to imagine the sort of person who struggles with predictable glue code that I would trust with anything more important than that, with or without AI...

tempaccount5050 11 days ago | | |

It's not a struggle for me to walk 15 miles to work every day, I could easily do it. It's just makes no sense when I have a car.

smallnix 10 days ago |

It doesn't matter if you specify system behavior in code, as a LLM conversation, agent instructions, or UML. In all cases you need to be able to translate business needs into very specific computer behavior. This isn't something a layperson can do. But it democratized software development to all who can, but can't write code.

imrozim 11 days ago |

Used to check every line for my project. Now i just check the tricky parts still don't know if that's ok or just lazy?

__alexs 11 days ago |

The current state of the technology is that you must read at least some of the code, but everyone keeps shipping tools that are focussed on churning out more and more stuff without giving you any affordances to really understand the output.

Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.

arian_ 11 days ago |

The gap between "vibe coding" and "agentic engineering" is the same gap between asking someone to do a task and being able to prove they did it correctly. One is vibes. The other is accountability. We keep building more powerful agents without building the audit infrastructure to verify what they actually did.

jatora 11 days ago | |

I think this sounds much more poignant than it is. Its actuallt pretty shallow. The same agents can audit the infrastructure lol

bobkb 10 days ago |

While those who are hands on is realising the limits and issues with vibe/context engineering/agentic engeering/buzz-word-of-the-week the businesses and pushing hard on the buzz words. It’s high time we start looking at ways to live with the new reality and figure out ways to ensure software reliability.

tyyyy3 11 days ago |

Correct me if I’m wrong Simon, but weren’t you highly optimistic about llm’s and agentic-use of them?

I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.

simonw 11 days ago | |

I still am. I think setting up LLMs to call tools in a loop is a fascinating way to build interesting software that could not have existed before.

Coding agents are also upending how software development works, in a way that we are still very much figuring out.

I don't think anyone has a confident answer for how best to apply them yet, especially on larger production-ready projects.

p_stuart82 11 days ago | | |

I think you kind of answered this in the post though. "I want somebody to have used the thing" is dogfooding. and it's probably the only quality signal left that can't be generated in 30 minutes.

WhereIsTheTruth 10 days ago |

Keyboards and mouse have always been a bottleneck, the average person only types around 50 words per minute

If you want to build a project, you can never shorten the actual time it takes to write it out, you are stuck at that 50 words per minute limit

LLMs, agents, call it how you want, they allow us to remove that bottleneck

kdnxownxkwkd 11 days ago |

An AI cannot be held accountable to mistakes, so an AI should not be doing your job for you. End of discussion.

skeledrew 11 days ago |

It makes sense that they merge over time; it's a mark of the progress being made. The ultimate end is to make them indistinguishable, where the purely vibe coded app will have the quality of the app that has been well engineered over significant time thanks to good user feedback.

_pdp_ 11 days ago |

About two years ago I was using the term "agentic engineer" to describe someone who builds AI agents - not a vibe coder.

Agentic Engineer does not make much sense to be applied to a developer.

It is weird and confusing to call a web designer that uses AI assisted coding tools "agentic engineer".

aryehof 11 days ago | |

Vanity titles never make much sense, and now even more people can call themselves “engineers”. I was always at a loss why many weren’t calling themselves “web engineers”. Hey Mom, I used Claude Code today at work so I’m an Agentic Engineer!

ianhxu 11 days ago |

In my own experience, good engineering practices are still not easy to achieve. As a software engineer with three years of experience, I've been doing solo dev for the past few months. Currently, there is still a lot of the harness to set up manually.

readgrounded 11 days ago |

I agree to some extent. I think that small aps, dashboards, service wrappers etc. you can vibe code.

But building software still requires domain knowledge, understanding data structures, architecture, which services to use. We probably have 2-5 years before thats fully automated.

rolymath 11 days ago |

Simon,

Just piggy backing on this post since I'm early:

Would love to see your take on how the AI and Django worlds will collide.

NikolaosC 11 days ago |

The "has someone actually used it" signal is the new code review. Tests, docs, commit count all reproducibl in 30 minutes. Daily usage for 2 weeks isn't. That's the only proof of work that survived the agent era.

hiroakiaizawa 11 days ago |

One thing I've started appreciating with LLM-assisted workflows is how important fixed evaluation protocols are.

Without pre-defined definitions and locked procedures, it's extremely easy to mistake iterative adaptation for genuine signal.

ppqqrr 11 days ago |

the discourse around "code quality" has always attracted the least nuanced minds, ones who see the world and the phenomenon of life as nothing but territory to be divided up by the latest buzzwords. the worst ones insist that we narrow the discussion even further, to focus on the conflicts between these buzzwords. whenever i have to sit through such discussions, i try to meditate on the irony of mother nature weaving the most functionally brutal, ruthlessly redundant poetry that is the genetic code, only for the resulting creatures to deny themselves the power of the principles inherent in their own construction.

marnett 11 days ago | |

Say more!

gverrilla 11 days ago |

> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up.

> Claude Code does not have a professional reputation!

how come?

giulianob 11 days ago | |

That's a wild statement to me. Even with spending significant time making plans with Opus 4.7 and GPT 5.5 on xhigh, I still find lots of poor decisions made when it actually goes to implement it. I find the quality of PRs hasn't dramatically changed either way because the better engineers will spot the issues whereas others will find what the AI is doing acceptable.

causal 11 days ago |

As agents get better at code we trust them to produce more of it. There are still bugs to find, but the haystack gets bigger.

So the number of bugs to find remains constant but the amount of code to review scales with the capability of the agent.

lubujackson 11 days ago |

I think this is what people mean when they say LLMs are a higher level abstraction. We still need to consider edge cases and have tests. We still to sweat the architecture and understand how the pieces fit together and have a mental map of the codebase. But within each bottom node of that architecture we don't sweat the details. Anything obvious gets caught right away. Most subtle/interaction-based issues occur at the architecture level. Anything that bypasses those filters is a weird bug that is no worse or different from a normal bug fixes - an edge case that was hit in a real world scenario that gets flagged by a user or a logged as an error.

There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.

This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.

exographicskip 11 days ago | |

Good description of my thoughts on vibe coding / agentic engineering.

Spend a lot more time on architecting and testing than hand rolling most repos now.

Hats off to people who enjoy the minutia of programming everything by hand, but turns out I enjoy the other aspects of software development more.

mohsen1 11 days ago |

I am experimenting with writing en entire TypeScript compiler[1] with AI assistant. I've spent 4 months on it already. It might not be successful at the end of the day but my thinking is that if LLMs are going to write a lot of the code I better learn how this can and can not work. I've learned a lot from this project already. I think we're still in charge of design and big ideas even if all of the code is written by AI

[1] https://github.com/mohsen1/tsz

Insanity 11 days ago | |

I'm also experimenting with it more and more. Now I'm trying to create a 2D side-scrolling shooter with it, running in the browser. When it was relatively small, it did a good job. As the codebase and docs/ files that I'm using get larger it starts hallucinating, especially when the context gets at about 50% usage (Codex w/ gpt5.5). As in, it'll literally forget to update parts of the code.

e.g, I change velocity of player to '200' and of bullets to '300', and it only updated the bullet velocity. Then told me the player was already 'at the correct value' even though it was set to 150. Things like that.. :)

mohsen1 11 days ago | | |

For me, unless there is a concrete way of proving work is correct you can't rely on AI coding. tsz has super strict tests around correctness, performance and architectural boundaries

copypaper 11 days ago | |

>25k commits in 4 months or about 1 commit every 7 minutes

How do you manage/orchestrate this? I'm genuinely curious.

mohsen1 11 days ago | | |

Multiple computers and each multiple Claude Code or Codex sessions. It had lots of ups and downs. Now I have a good enough test harness that makes it easier to iterate faster

gxs 11 days ago |

I grew up on construction sites with my dad. If i've done well in my career, it was from watching him operate - managing huge construction crews, how he figured out who to put on what tasks, handling suprises, setbacks, all that stuff

My dad (now retired) was always super practical about stuff. He'd tell me pretty nonchalantly things like "yeah we're dealing with xyz constraint, we may have to cut a corner over here, but that's ok", when I asked him about it he gave me a little spiel that you can be thoughtful about how you do things, including when you can cut a corner and more importantly, what corners are ok to cut.

I really took that to heart - especially the "be thoughtful about the corners you cut"

If an LLM has consistently one shotted certain tasks and they are rote/mechanical - not reviewing that code is probably ok.

Are you getting lazy and not reviewing stuff that should be reviewed even if a human wrote it? That's probably not ok

I can live with some basic code that broke because it used outdated syntax somewhere (provided the code isn't part of a mission critical application), but I can't live with it fucking JWT signing etc

gverrilla 11 days ago |

"Code quality" was always a mirage imo. Logic is what matters. I've used the internet from the early days, and probably 99% of software I used always had serious bugs. Ultima online was mentioned in HN recently: it was a real bug-and-exploit-fest. Banks, AAA games, companies like Uber with 1000's of engineers - they all had serious problems (and that's still true). It would be worst if some engineers didn't have that drive to code in high quality, but we gotta admit that was not ever enough. Even now with Claude Code, I see a lot of "specifications" that are far from specified enough - and people blame the LLM.

Sparkyte 11 days ago |

The problem with vibe coding closer is that the agentic makes a very plasticy samey feel unless you work with something that makes it unique or can pass a template through it.

scuff3d 10 days ago |

That's because "agentic engineering" by and large is a term made up to make people feel better about the fact that their just vibe coding.

Havoc 11 days ago |

Never really bought that there was a clean distinction.

To me it’s a spectrum with varying levels of structure provided, review etc.

Basically oneshot vibes on one side, fully hand coded on other.

tim-projects 10 days ago |

> Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work.

Nothing about this should be disturbing unless you want to dig your heels in, cross your arms and refuse to adapt.

AI is a massive opportunity. But if people focus on the issue of the 'change' they simply waste time they could (and should) be spending on integrating it correctly.

I believe that this form of resistance is far more stagnating and dangerous than any of the issues that come with the general onslaught of ai integration.

hirvi74 11 days ago |

I'd be lying if I said I was not worried about the future. I am not necessarily worried in the sense that there will be some grave, impeding doom that awaits the future of humanity.

Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.

Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.

Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.

cortesoft 11 days ago | |

If you work in any new technology field, the chances that your job will exist in the same way 50 years from now is very small.

The job, as you have done it at least, was also not here 50 years before you started doing it.

Did you have any of the same feelings knowing that you were doing a job that has not existed in the world very long? That seems like a strange requirement for a meaningful job, that it should remain the same for 50+ years.

In truth, our world and what we do for our careers is entirely shaped by the time that we live in. Even people that ostensibly do the same thing people have done for centuries (farmer, teacher, etc) are very different today than 100 years ago.

mattlangston 11 days ago |

Software engineering is software engineering.

An ace software engineer is not an ace because of tooling.

It's not the plane, it's the pilot, or something like that.

bigger_fish 11 days ago |

Totally agree. The sales pitch is that anyone can use this stuff, but good output is only obtained via thorough understanding.

dyauspitr 11 days ago |

I still don’t get what agentic engineering is. Isn’t it all just asking the same LLM what you want it to do?

devoria 10 days ago |

The thing I've been thinking about: agentic engineering still gives you per-step verification.

criddell 11 days ago |

Agentic engineering? That reads to me a little like amateur oncologist. How are you defining engineering?

Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?

https://www.nspe.org/career-growth/nspe-code-ethics-engineer...

vehemenz 11 days ago | |

The problem of calling what most of us do "engineering" predates LLMs by a good 15-20 years.

senko 11 days ago | |

> Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?

Can software engineers?

rglover 11 days ago | |

Yes. I do "agentic engineering," primarily using Cline as it allows me to gas-and-brake the AI and review what it's doing on a granular level. So, think pair programming but my #2 is an LLM. I routinely reject turns when a given model goes off into space. I also routinely make hot edits to its changes before advancing, several times per day.

You can use these tools wisely without letting it run unverified carelessly.

resters 10 days ago |

agentic engineering is when you go from vibes to trust. It's much like how one feels about a brand new unproven, newly hired human team member vs a trusted team member one has worked with for years.

overgard 11 days ago |

I can't really say I agree with this, although I also hate the phrase "agentic engineering".

I'm working on a licensing system for a product I'm building. I've used Claude a little bit to help out with it, but it's also made a lot of very dumb decisions that would have large (security!) consequences if I didn't catch them. And a lot of them are braindead things, like I asked it to create a configurable limit on a certain resource for the trial version of the application. When I said configurable, I mostly meant: put the number in a constant so I can update it later. What Claude thought I asked was "make it so the user can modify the limits of the trial version in the settings panel" (which defeats the entire purpose of a free trial!). Another thing it messed up recently is I was setting up email-magic-link authentication. It defaulted to creating an account for anyone that typed in an email, which could allow a bad actor to both spam people with login requests (probably getting me kicked off Resend) or creating a lot of bogus accounts.

These things do not think. You cannnot outsource your thinking to them.

tannerr_dev 10 days ago |

Was unaware they were seperate or different in the first place

mentos 11 days ago |

Why is it one or the other and not one THEN the other?

a456463 10 days ago |

Hot take: most people are shit at writing code or logic. We are just going to see more of this vibe coding. This is exposing the bad coders more than anything else. Everything to do with preventing and stabilizing vibe code is what we had to do on a longer scale, now we have to do it a lot more and faster

rotis 11 days ago |

> my disturbing realization that vibe coding and agentic engineering have started to converge in my own work.

>I firmly staked out my belief that “vibe coding” is a very different beast from responsible use of AI to write code, which I’ve since started to call agentic engineering

Disturbing? Really? I admit I don't do agentic and am going only by vibes, but for me agentic engineering is basically vibe coding in a automated loop with some ornamentals. They both stem from the same LLM root and positioning them as significantly different is weird and unconvincing to me. There may be a merit to this article (I gave up after few sentences), but I reject this specific premise.

coldtea 11 days ago | |

>They both stem from the same LLM root and positioning them as significantly different is weird and unconvincing to me.

It's the difference between caring and not caring.

rotis 11 days ago | | |

Caring about what? I could slap an application and say I vibe coded it or I could equally claim I agentically engineered it. No one could tell the difference(if there is any) without seeing the code. The only thing you could say I used an LLM. And that is what is happening. Most of the code that is "engineered" we don't get to see. So who know what is really going on there and what is the actual result?

wiseowise 11 days ago |

> I’m starting to treat the agents in the same way. And it still feels uncomfortable, because human beings are accountable for what they do. A team can build a reputation. I can say “I trust that team over there. They built good software in the past. They’re not going to build something rubbish because that affects their professional reputations.”

The most important part and why slop isn't the same as a code written by someone else. The model doesn't care, it just produces whatever it is asked to produce. It doesn't have pride, it doesn't have ego, it doesn't artisanal qualities, it doesn't have ownership.

kensai 11 days ago |

No offense, but if feels to me the author writes this piece to convince himself. I am afraid he is right. But the bottom line is the same: vibe coding, agenting engineering, everything AI-related comes for our jobs.

Slash32 11 days ago |

Still thinking about LLM's

kushalpatil07 11 days ago |

Every time I do deep work, and think of solutions to a complex problem. I always have the opportunity to ask claude to implement a sub-par AI slop solution.

Do this enough times, and I will have forgotten how to think.

dev360 11 days ago | |

Or, you just explain the solution and save some typing and get the same thing. I find it refreshing to be able to just talk to Claude and have it generate the same thing I would have built.. It gives me more time to articulate and solve complex problems, and less time with the mundane writing, test loops etc.

eddiewithzato 11 days ago | |

That’s why I like the term “mind virus” for AI. Humans always go for shortest path

Groxx 11 days ago |

I mean... yeah? Isn't it obvious that they're essentially the same thing, but one thinks they're in a higher class than the other?

Fast feedback loops and delegating tasks to sub-agents have been pretty common for vibers since well before they were canonicalized by agenteers. Same thing, different day, hardly even any difference in quality: they evolve together, though vibe tends to lead and agents follow and refine... which vibers then use too.

If you think of vibe coders as agentic alpha testers it makes a lot more sense.

QuantumNomad_ 11 days ago |

People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read almost any of it and is no longer possible for anyone to maintain neither with nor without LLMs. And the LLM generated garbage will have drowned out all of the good quality code that ever existed and no one will be able to find even human generated code anymore on the internet.

Makes me want to just give up programming forever and never use a computer again.

treespace8 11 days ago |

I feel like an outlier in all of this. But isn't this just more AI slop? How is this different from text generation or image generation?

Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.

But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.

This isn't good.

simonw 11 days ago | |

The difference is that coding agents can run the code that they produce, fix any bugs, build tests and generally demonstrate that it works.

You can't do that for images and text.

_rwo 11 days ago |

> But I’m not reviewing that code (...)

That's the spirit, I always say - _others_ will deal with AI slop during code review. Eventually they will get tired and start 'reviewing' this AI stuff with AI - so it's a win win. Right?

Fokamul 11 days ago |

Reminder, cybersecurity will be huge in following years.

Companies are shipping things and nobody understands what they're shipping.

andy_ppp 11 days ago |

Honestly, I think the need for devs is total copium, the progress made in two years is astounding and in two years time they will be better at programming than 99% of programmers. It’s incredible what they can do now. No it’s not perfect but imagine where we’ll be in 5 or 10 years.

bamboozled 11 days ago | |

All of those out of work radiologists would agree \s

fzzzy 11 days ago |

man i love this post

0gs 11 days ago |

huh. i honestly never thought they were all that different. didn't the same guy coin them both to refer to the same thing?

simonw 11 days ago | |

Not at all. Andrej Karpathy coined vibe coding as: https://twitter.com/karpathy/status/1886192184808149383

> where you fully give in to the vibes, embrace exponentials, and forget that the code even exists [...] It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

So clearly we need a term for what happens when experienced, professional software engineers use LLM tooling as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software.

"Agentic engineering" is a good candidate for that.

dev360 11 days ago | | |

> as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software

Its shifted so much for me. I used to think that I had a solemn duty to read every line and understand it, or to write all the test cases. Then I started noticing that tools like CodeRabbit, or Cursor would find things in my code that I would rarely find myself.

I think right now, its shifted my perception of my role to one where I am responsible for "tilting" the agentic coding loop; ultimately the goal is a matter of ensuring the agent learns from its mistakes, self-organize and embrace a spirit of Kaizen.

Btw thank you for your work on Django, last 20 years with it were life changing (I did .NET before).

0gs 10 days ago | | |

https://x.com/karpathy/status/2019137879310836075

jonahs197 11 days ago |

What the F is "agentic" really?

zuzululu 11 days ago |

Vibe coding is just coding now. Writing assembly used to be a thing too until higher and higher languages were created. LLM is like that except it compiles English to code. This scares lot of professionals understandably.

drfloyd51 11 days ago |

It is pure arrogance to expect that machines will never be able to code as good as a skilled human.

And AI generated code should be different than human code. AI has infinite memory for details. AI doesn’t need organizational patterns like classes. Potentially AI can write code that is more performant than any human.

Will it look like garbage? Sure. Will the code be more suited to the task? Yes.

tuom1s 11 days ago | |

What will happen when AI companies increase the price of tokens?

The code produced will only be understandable by AI. You could use locally hosted LLMs, but it won't be as performant as AI run by big guys. And there is nothing stopping greedy companies implementing some ridiculous pattern that only their model can reasonably work with.

So what you'll do in situation when you can't understand "your" codebase and you have to make changes or fix a bug?

pylua 11 days ago | | |

Eventually I would bet on ai using its own non human readable languages (brains?) to program in to reduce overhead.

It will be a black box, and the code will be generated just in time by ai for each api request

jnwatson 11 days ago | | |

What happens when the price of tokens goes to 0?

The open weight models are nipping on the heels of frontier models. The frontier labs have to make forward progress and keep tokens cheap in order to maintain marketshare.

Eventually, we'll have a Mythos-level model running on integrated hardware on every PC.

drfloyd51 10 days ago | | |

That is a pricing problem. And it is an absolute risk. That doesn’t change AI’s potential to be a better coder than 98-100% of everyone.

platevoltage 11 days ago | | |

I think this is going to happen sooner than most people think.

jazzypants 11 days ago | |

I find it hard to believe that code with unnecessary cruft and repetition is "more suited to the task". I've literally deleted hundreds of unnecessary or unused functions at this point. The only way I can agree is if "more suited" means, "it's wearing multiple suits for no reason".

vehemenz 11 days ago | |

I would only add one caveat to this:

Code that is organized well and operates coherently in the first place, by an LLM or not, will be easier to iterate on, by an LLM or not.

tyyyy3 11 days ago | |

Your post weeks of pure arrogance. You sound like the bozo’s at Anthropic who made an AI agent for finance and think this is somehow going to provide a huge productivity boost because all they do is a bunch of tick boxing and spreadsheet work.

No, just no.

xienze 11 days ago |

> And that feels about right to me. I can plumb my house if I watch enough YouTube videos on plumbing. I would rather hire a plumber.

I don't buy this argument at all. I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat. Worst case, you have to hire an experienced, expensive person to fix the mess. Yes, I can hear everyone now, "worst case is they burn your house down." Sure, but as we're reminded _constantly_ when we read stories about AI agent catastrophes -- a human could wipe your prod database too. wHy ArE yOu HoLdInG iT tO a DiFfErEnT sTaNdArD???

The business side of the house is getting to live that scenario out right now as far as software goes. Sure you've got years of expertise that an LLM doesn't have _yet_. What makes you think it can't replace that part of your job as well?