Why are executives enamored with AI, but ICs aren't?(johnjwang.com) |
Why are executives enamored with AI, but ICs aren't?(johnjwang.com) |
Narrator: there is not
But because time is money, I think all the benefits go to the dev. The exec still needs the dev regardless
It accomplished this not simply by eliminating my overpaid bullshit job as parasite attractor; but by putting an end to its pathetic semblance of a premise: building software to be used by, uh, someone? for, uh, something?
The various entities requesting the work (or, in later years, the layers of barely-sentient intermediaries between me and said entities) were hardly if ever clear on how exactly this was supposed to produce value; but now they're free, too! Free from having to even try to understand how answering that question is relevant, emdash - so in the end it worked out for them as well!
I am finally at liberty to do something worthwhile with my life, and while at this point I realize it'll take me some time to even remember what "worthwhile" even was (or whether such a thing still exists in your imaginary world of personalized sensory bubbles), I do sleep a rich REM sleep knowing society is now capable of digging its own grave without my assistance. Seriously, I was looking at my bank account and getting a little worried.
I am told that mine is a minority position: if you happen to be the kind of person who believes that more is better, no matter more of what, rest assured you and your eventual progeny will be quite safe - for a while, anyway - in your new role as AI trainer (or is it AI fodder, let's let the market decide!)
Well, turns out when we are all busy looking the part, it becomes impossible for anyone to actually play the part; but also nobody notices, so this is fine too!
Just one request on my part: if possible, do shut up while figuring out how to better turn yourself and our world into paperclips, alright? Besides the ones that you recognize as people, a whole bunch of other people do live on this here planetation - and I hear they find all the AI blather to be mighty annoying.
Ha! Apparently the author hasn't been asked "how long will it take to code this?" yet... And isn't a common developer complaint that management does not know how to evaluate them, and substitutes things like how quickly a task gets completed, with the result that some guy looks amazing while his coworkers get stuck with all his technical debt?
- You ask someone to do it
- You check their work and they made some mistakes, but it's good enough to use
- You ultimately don't know if they're doing the best at their job but you have regular performance check-ins to be safe
As ICs we can complain all we want about the quality of AI, but as far as your manager goes - you using AI is not that much different to them having an employee.
e: typo
It makes me think of an executive I once reported to who “increased velocity” by changing the utilization rate on a spreadsheet from 75% to 80%.
embedded/cloud/IoT --> AI --> quantum…
When the company originally known as C3 Energy changes their name to C3.quantum, you'll know where on to the next buzzword.
Curious how you verify this behavior would be unique to the West?
I seriously doubt Satya Nadella is sitting down for hours a day to use Copilot to draft detailed documents. He's being fed fantastical stories by his lackeys telling him what he wants to hear.
I’m neither a developer nor an executive, but from my vantage point the software crisis has to do with the fact that software development presents an existential risk to any organization that engages in it. It seems to be utterly resilient to estimation, and projects can run late by months or even years with no good explanation except “it’s management’s fault.” This has been discussed at length. If I had a good answer, “I wouldn’t still be working here” as the saying goes. But half a century after The Mythical Man Month, it still reads like it was written yesterday, and “no silver bullets” seems to ring true.
In my view, the software crisis will be resilient. Throwing more code, or more code per day, at a late project will make it later. There will be a grace period while the pace of coding seems exciting, but then the reality will set in: “We haven’t shipped a product.” And it will be management’s fault.
Not even sure if determinism is a good axis to analyze this problem. Also smells extremely like concept creep - do you mean "moving up the abstraction stack" as "non determinism" too?
When you analyze this as "Management loves AI" and "workers hate it" goes completely back to 'who owns the means of production?', and can be clearly seen within Marx's critique.
Ic can refer to people leading, without direct reports, making 500k+ in comp.
How? Marx's critique doesn't land here at all.
Executives see this as way to replace labor.
The labor sees themselves being replaced.
This is a story as old as the hills.
It doesn't help that the west has a clear bias wherein moving "up" is moving away from the work. Many executives often don't know what good looks like at the detail level, so they can't evaluate AI output quality.
I think another part of it is that AI tools demo really well, easily hiding how imperfect and limited they are when people see a contrived or cherry-picked example. Not a lot of people have a good intuition for this yet. Many people understand "a functional prototype is not a production app" but far fewer people understand "an AI that can be demonstrated to write functional code is not a software engineer" because this reality is rapidly evolving. In that rapidly evolving reality, people are seeing a lot of conflicting information, especially if you consider that a lot of that information is motivated (eg, "ai is bad because it's bad to fire engineers" which, frankly, will not be compelling to some executives out there). Whatever the new reality is going to be, we're not going to find out one step at a time. A lot of lessons are going to be learned the hard way.
Yes, and they work really well for small side projects that an exec probably used to try out the LLM.
But writing code in one clean discrete repo is (esp. at a large org) only a part of shipping something.
Over time, I think tooling will get better at the pieces surrounding writing the code though. But the human coordination / dependency pieces are still tricky to automate.
I'm (mildly) excited by LLMs because I love a new shiny tool that does appear to have quite some utility.
My analogy these days is a screwdriver. Let's ignore screw development for now.
The first screwdrivers, which we still use, are slotted and have a habit of slipping sideways and jumping (camming out). That's err before LLMs ... something ... something.
Fast forward and we have Philips and Pozi and electric drivers. Yes there were ratchet jobs, and I still have one but the cordless electric drilldriver is nearly as magical as the Dr Who sonic effort! That's your modern LLM that is.
Now a modern drilldriver can wrench your wrist if you are not careful and brace properly. A modern LLM will hallucinate like a nineties raver on ecstasy but if you listen carefully and phrase your prompts carefully and ignore the chomping teeth and keep them hydrated, you may get something remarkable out of the creature 8)
Now I only use Chat at the totally free level but I do run several on-prem models using ollama and llama.cpp (all compiled from source ... obviously).
I love a chat with the snappily named "Qwen3.5-35B-A3B-UD-Q4_K_XL" but I'm well aware that it is like an old school Black and Decker off of the noughties and not like my modern De Walt wrist knackerers. I've still managed to get it to assist me to getting PowerDNS running with DNSSEC and LUA and configuring LACP and port channel/trunking and that on several switch brands.
You?
I really think a lot of folks were conned by a smooth operator and a polished demo, so now everyone has to suffer though having this nebulous thing rammed down our throats regardless of its real utility because people with higher pay grades believe it has utility.
It feels like a lot of “AI is inevitable; you are failing to make this abundant future inevitable by your skepticism.”
Like what - the world's most advanced blowjob?
For the record all your prompts are tracked and easily viewable by whoever oversees it at your company. Don't prompt more than you have to and certainly don't give it your best ids. This is value at scale.
Yes, we have craftsmanship, but at the end of the day everything is ephemeral and impermanent and the world continues on without remembering us.
I think both the IC and executive are correct in superposition.
I think that the simple explanation for why executives are so hyped about AI is simply that they're not familiar with its severe current limitations. For example, Garry Tan seems to really believe he's generating 10KLOC of working code per day; if he'd been a working developer he would have known he isn't.
ICs dislike this because it raises expectations and puts the spotlight on delivery velocity. In a manufacturing analogy, it’s the same as adding robots that enables workers to pack twice as many pallets per day. You work the same hours, but you’re more tired, and the company pockets the profits.
Software Engineers are experiencing, many for the first time in their careers, what happens when they lose individual bargaining power. Their jobs are being redefined, and they have no say in the matter - especially in the US where “Union” is a forbidden word.
The more appropriate tools for ICs are torches and pitchforks.
Why is this supposed to be a good thing?
I _could_ do my job without AI, but it would take twice the time and I would feel miserable having to type out every single character like a caveman.
Just AI auto complete alone is a massive life changer. It reduces my typing at least in half, and is highly accurate to what I want to write.
Me (and my friends similarly) inspect code indirectly now - telling agents to write reports about certain aspects of the code and architecture etc.
ICs worry about doing their job (either doing it well because they care about their craft, or doing it good enough because they need to pay bills). AI doesn't really promise them anything. Maybe they automate some of their tasks away, but that just means they will take on more tasks. For practically any IC, there is no increase in wealth nor reduction in labor time. There is only a new quiet lingering threat that they might be laid off if an executive determines they're not needed anymore.
That's the difference in enthusiasm about AI.
Thoughts and idea as in "I will implement this in this structure, with these tradeoff, and it will work with these 4 APIs and have no extra features and here's how I'm (or LLM with tools is) going to run it and test it".
Thoughts and idea not as in "build facebook" - a lot of people think AI can do that, it won't (but might pretend to) and it will just lead to failure.
My competitive edge did not diminish, it expanded.
Reality check: LLMs are available to everyone, dev or otherwise, so your 'competitive edge' is indeed diminished if you believe LLMs are all that.
I believe that it's pretty close to the article thesis, just more prosaic.
And yes, the AI works great for some programming tasks, just not for everything or completely unsupervised.
(Yeah, I know, there's lots of instances of execs who got paid huge amounts of money and delivered abysmal results...)
That said, the central point of the TFA is spot-on, though it could be made more generally, as it applies to engineering as well as management: uncertainty rises sharply the higher you climb the corporate and/or seniority ladder. In fact, the most important responsibility at higher levels is to take increasing ambiguity and transform it into much more deterministic roles and tasks that can be farmed out to many more people lower on the ladder.
The biggest impact of AI is that most deterministic tasks (and even some suprisingly ambiguous ones) are now spoken for. This happens to be at the bread and butter of the junior levels, and is where most of the job displacement will happen.
I would say the most essential skill now is critical thinking, and the most essential personality trait is being comfortable with uncertainty (or as the LinkedInfluencers call it, "having a growth mindset.") Unfortunately, most of our current educational and training processes fail to adequately prepare us for this (see: "grade inflation") so at a minimum the fix needs to start there.
Developers use it, for groking a codebase, for implementing boilerplate, for debugging. They don't need juniors to do the grunt work anymore, they can build and throw away, the language and technology moats get smaller.
The value of low level managers, whose power came from having warm bodies to do the grunt work, diminishes.
The bean counters will be like when does it pay for itself. Will it? IDK, IDC.
I know there's an attempt to shift the development part from developers to other laypeople, but I think that's just going to frustrate everyone involved and probably settle back down into technical roles again. Well paid? Unclear.
Look, I know that we like poking fun at some people but generally I haven't seen execs saying this.
Executives do not need actively functional systems from AI to help with their own daily work. Nothing falls over if their report is not quite right. So they are seeing AI output that is more complete for their own purposes.
But also, AI is good enough to accelerate software engineering. To the degree that there are problems with the output, well, that's why they haven't fired all the the engineers yet. And executives never really cared about code quality -- that is the engineers' problem.
What I'm trying to build for my small business client right now is not engineering but still requires some remaining employees. He's already automated a lot of it. But I'm trying to make a full version of his call little center that can run on one box like an H200. Which we can rent for like $3.59/hr. Which if I remember correctly is approximately the cost of one of his Filipino employees.
Where we are headed is that the executives are themselves pretty quickly going to be targeted for replacement. Especially those that do not have firm upper class social status that puts them in the same social group as ownership.
But I will insist that executives are more driven by FOMO than a teenager.
If you are not, you either have a boring job or do not have any ideas that are worth prototyping asynchronously. Or haven't tried AI in the last ~3 months.
But I suppose it depends on what you consider fun. I genuinely know people who love to meticulously write many many unit tests. I think that's great as a craft, but you probably can not expect to get paid for it, similar to how you likely can not be profitable by selling handmade shirts now unless you are already independently wealthy and well known.
For non-technical, the current meteoric rise of AI is due to the fact that AI is generally synonymous to "it can talk". It has never _really_ spoken to the wider audience that the image recognition, or various filters, or whatever classifiers they could have stumbled upon are AI as well. What we have, now, is AI in the truest sense. And executives are primarily non-technical.
As for the technical people, we know how it works, we know how it doesn't work, and we're not particularly amused.
For executives, that's writing code. For ICs, it's other stuff.
It’s like Marc Andressen bloviating about how AI will replace everyone except him.
To be fair, some of this is understandable. At some level, you’re just going to see some things as a bullet point in a daily/monthly/quarterly report and possibly a 10 minute presentation. You’re implicitly assuming that the folks under you have condensed this information into something meaningful.
It's honestly insane that they think this.
They really don't understand that they're building something they cannot possibly control, if it turns out to be what they're envisioning.
On top of that, places like Amazon extol the virtues of only working on projects that can be completed with entirely fungible staffing and Google tries ever so hard to electroplate this steaming turd of an ideology with iron pyrite calling fungibles "generalists."
So along comes AI coding agents, which I love as an IC because it excels at tedious work I'd rather not have to do in the first place, yet I get why others see it as a threat. But I really think it's no more of a threat than any other empty promise to cut costs with the silver bullet of the month and we just have to let the loudmouths insist otherwise until the industry figures out this isn't a magic black box. They never learn, do they? Maybe their jobs depend on never learning.
Meanwhile executives see the money related numbers go up.
In my systems programming job ICs have mostly avoided it because we don't have time to learn a new thing with questionable benefits. A lot of my team are really, really good programmers and like that aspect of the job. They don't want to turn any part of it over to a machine. Now if a machine could save us from ever dealing with Jira...
That said, I have begun using AI for some things and it is starting to be useful. It's still 50/50 though, with many hallucinations that waste time but some cases where it caught very simple bugs(syntax or copy/paste errors). I think the experience of, say, systems programmers is very different vs python/web folks though. AI does a great job for my helper scripts in Python.
Management needs to take their own medicine though. They continue to refuse to leverage AI to do things it could actually be good at. I give a duplicate status to management 3x/week now. Why? AI could handle tracking and summarizing it just fine. It could also produce my monthly status for me.
I was explaining this to my wife, who asked, why doesn't the CEO understand the limitations and the drawbacks the programmers are experiencing. And I said—he doesn't care, because he's looking at what other businesses are doing, what they're writing about in Bloomberg and WSJ, what "industry best practice is", and where the money is going. Trillions of dollars are going in to revolutionizing every industry with AI. If you're a CEO and you're not angling to capture a piece of that, then the board is going to have some serious questions about your capability to lead the company. Executives are often ignorant of the problems faced by line workers in a way perhaps best explained by a particular scene from Swordfish (2001): "He lives in a world beyond your world..." https://www.youtube.com/watch?v=jOV6YelKJ-A The complaints of a few programmers just don't matter when you have millions or billions of capital at your command, and business experts are saying you can tenfold your output with half the engineering workforce.
Right now there are only two choices for programmers: embrace generative AI fully and become proficient at it. Instead of surfacing problems with it, offer solutions: how can we use AI to make this better? Or have a very, very hard time working in the field.
I understand that developers feel their code is an art form and are pissed off that their life’s work is now a commodity; but, it’s time to either accept it and move on with what has happened, specialize as an actual artist, or potentially find yourself in a very rough spot.
E.g. when Jensen Huang said that you need to pair your $250k engineer with $250k of tokens.
It massively boosts my efficiency as just reading the code myself would take days.
And with LLMs also more context and token usage and cost.
The manager class sees worker units as fungible.
No, they are captured disproportionately by the haut bourgeois capitalists. The two groups overlap to an extent (when major capitalist are nominally employed by a firm they invest in, it is usually as an executive), but executives qua executives (that is, in their role as top level managerial employees) are not the main beneficiaries of increased productivity.
After that programmers fell into the situation you are describing - relatively high bargaining power and salaries. Hopefully now with the push for AI we will finally see another pro labor organisation effort !
It isn't this. This is the executive's misinterpretation.
They just create even more slop currently, which will be the case until someone realizes they aren't needed to produce slop at all.
And plenty of prolific programmers are writing publicly about their Ai use.
I find people tend to omit that on HN and folks dealing with different roles end up yelling at each other because those details are missing. Being an embedded sw engineer writing straight C/ASM is, for instance, quite different from being a frontend engineer. AI will perform quite differently in each case.
It's too easy to add bloat and complexity that can never go away, and with the tooling we now have a significant portion of engineers are now active risk to the projects they are working on.
I'm talking about adding code to a code base. After a certain size, it's mostly garbage. I subscribe to the old unix principle and I have yet to seen a reason how LLM has changed it, it in fact became much more important now as a coding agent would happily add any crap to any repository.
> But we're the accounting team?
> Doesn't matter. This is a SD-Native company now. We believe everyone can be more productive with an SD-based workflow.
Because that also sounds real bad!
The other 20% is writing: policies, SOPs, audits, grants, performance reviews, etc.
I could probably automate over half my job in n8n in a weekend… hmm… actually might try that.
- needs a digest of where chat activity is hottest. Maybe he lurks, occasionally he gets into conversations about what’s going on in another department.
- needs some warning if the Microsoft systems are under attack or strain. The Linux systems have not needed attention; the jargon is unfamiliar.
- occasionally brings up hypothetical radical changes in strategy. I think of these as multivariate tests. Maybe I reply, “Plenty of Kubenetes developers available right now” might communicate that some small team would be ahead of us on some solved problem.
I’m surprised that:
- he has no concern that competition even exists. No awareness that our competition demos at conferences; why they’d choose to spend time that way.
- no interest in the big accounts we don’t have. If it would take a big lift, what would engineering need? If it would take a small lift, what non-engineering is blocking? No interest.
- person-to-person networking is effective at all. I just can’t imagine any value in two execs meeting without hours of preparation.
I’ve seen BI tooling around each of these. I wonder if a daily “facts of our department” slide to begin each meeting, if that would replace/augment 51% of visible exec.
Another 25% of the time the tests are wrong in some other way. Usually mocking something in a way that doesn't match reality.
And maybe 5% of the time Claude does some testing that requires a database, it will find some other database lying around and try to use that instead of what it's supposed to be doing.
And even if Claude writes a correct test, it will general have it skip the test if a dependency isn't there--no matter how fervently I tell it not to.
If you're not looking the code at all, you're building a house of cards. If you not reading the tests you're not even building you're just covering the floor in a big sloppy pile of runny shit.
You probably could have s/Claude/Human/ in your rant and been just as accurate. I don't know how many times I've flagged these issues in code reviews. And that's only assuming the human even bothered to write tests...
What I find is that when I ask AI to write tests it writes too many, and I agree with you that a lot of them are useless. But then I just tell it that, and it agrees with me and cleans it up. Much faster feedback loop and much better final result.
I feel like people that look at a poor result and stop there and conclude it's useless have made up their mind and don't want to see the better results that are right in front of them if they just spend an extra 5 seconds trying.
We’re not dealing AGI here. Tests aren’t strictly necessary for humans. They are for AI. AI requires guardrails to keep from spinning out. That’s essentially the entire premise of the agentic workflow.
> Me (and my friends similarly) inspect code indirectly now - telling agents to write reports about certain aspects of the code and architecture etc.
Doesn't this take longer than reading the code?
I can see how some of this is part of the future (I remember this article talking about python modules having a big docstring at the top fully describing the public functions, and the author describing how they just update this doc, then regenerate the code fully, never reading it, and I find this quite convincing), but in the end I just want the most concise language for what I'm trying to express. If I need an edge case covered, I'd rather have a very simple test making that explicit than more verbose forms. Until we have formal specifications everywhere I guess.
But maybe I'm just not picturing what you mean exactly by "reports".
I do read the tests (quickly, I admit) and so does OP:
Architecture overview sure, and testing yes, but not reading the code directly any more.
Reading that again I may have misunderstood what they meant by "testing yes", though.
You can get Claude to write good tests but based on what I’m seeing at work that’s not what’s happening. They always look plausible even when they’re wrong, so people either don’t read them, skim them very quickly, or read the first few assume the rest work and commit.
I think Claude is great for testing because setting test data and infrastructure is such a boring slog. But it almost always takes a lot of back and forth and careful handholding to get it right.
I'm not being precious here or protective of my "art" or whatever. But I do find it sort of hilarious and obvious that someone on a data science team might not understand the aesthetic value of code, and I suspect anyone else who has worked on such a team/ with such a team can probably laugh about the same thing - we've uh... we've seen your code. We know you don't value aesthetic code lol. Single variable names, `df1`, `df2`, `df3`.
I'm not particularly uncomfortable at the moment because understanding computers, understanding how to solve problems, understanding how to map between problems and solutions, what will or won't meet a customer's expectations, etc, is still core to the job as it always has been. Code quality is still critical as well - anyone who's vibe-coded >15KLOC projects will know that models simply can not handle that scale unless you're diligent about how it shoul dbe structured.
My job has barely changed semantically, despite rapid adoption of AI.
Most people consider aesthetic values to align with these things.
https://degoes.net/articles/insufficiently-polymorphic
> My job has barely changed semantically, despite rapid adoption of AI.
it's coming... some places move slower than other but it's coming
lol this is not why people do "df1", "df2", etc, nor are those polymorphic names but okay.
> it's coming... some places move slower than other but it's coming
What is coming, exactly? Again, as said, I work at a company that has rapidly adopted AI, and I have been a long time user. My job was never about rapidly producing code so the ability to rapidly produce code is strictly just a boon.
Which parts of it exactly? I've considered for loops and if branches "commodities" for a while. The way you organize code, the design, is still pretty much open and not a solved problem, including by AI-based tools. Yes we can now deal with it at a higher level (e.g. in prompts, in English), but it's not something I can fully delegate to an agent and expect good results (although I keep trying, as tools improve).
LLM-based codegen in the hands of good engineers is a multiplier, but you still need a good engineer to begin with.
Like what about performance optimization or security analysis? Shouldn't AI be the CAD of coding tools? Idk.
Spec driven dev is good software engineering practice. It’s been cast aside in the name of “agile” (which has nothing to do with not doing docs - but that’s another discussion).
My problem is writing good specs takes time. Reviewing code and coaxing the codegen to use specific methods (async, critical sections, rwlocks, etc) is based on previous dev experience. The general perception with c suite is that neither is important now since “vibing” is what’s in.
If you look at the evolution of agent-written code you see that it may start out fine, but as you add more and more features, things go horribly wrong. Let's say the model runs into a wall. Sometimes the right thing to do is go back into the architecture and put a door in that spot; other times the right thing to do is ask why you hit that wall in the first place, maybe you've taken a wrong turn. The models seem to pick one or the other almost at random, and sometimes they just blast a hole through the wall. After enough features, it's clear there's no convergence, just like what happened in Anthropic's experiment. The agents ultimately can't fix one problem without breaking something else.
You can also see how they shoot themselves in the foot by adding layers upon layers of defensive coding that get so think they themselves can't think through them. I once asked an agent to write a data structure that maintains an invariant in subroutine A and uses it in subroutine B. It wrote A fine, but B ignored the invariant and did a brute-force search over the data, the very thing the data structure was meant to avoid. As it was writing it the agent explained that it doesn't want to trust the invariant established in A because it might be buggy... Another thing you frequently see is that the code they write is so intent on success that it has a plan A, plan B, and plan C for everything. It tries to do something one way and adds contingencies for failure.
And so the code and the complexity compound until nothing and no one can save you. If you're lucky, your program is "finished" before that happens. My experience is mostly with gpt5.4 and 5.3-codex, although Anthropic's failed experiment shows that the Claude models suffer from similar problems. What does it say when a compiler expert that knows multiple compilers pretty much by heart, with access to thousands of tests, can't even write a C compiler? Most important software is more complex than a C compiler, isn't as well specified, and the models haven't trained on it.
I wish they could write working code; they just don't.[1] But man, can they debug (mostly because they're tenacious and tireless).
[1]: By which I don't mean they never do, but you really can't trust them to do it as you can a programmer. Knowing to code, like knowing to fly a plane, doesn't mean sometimes getting the right result. It means always getting the right result (within your capabilities that are usually known in advance in the case of humans).
> What does it say when a compiler expert that knows multiple compilers pretty much by heart, with access to thousands of tests, can't even write a C compiler?
It does not know compilers by heart. That's just not true. The point of the experiment was to see how big of a codebase it can handle without human intervention and now we know the limits. The limitation has always been context size.
>By which I don't mean they never do, but you really can't trust them to do it as you can a programmer. Knowing to code, like knowing to fly a plane, doesn't mean sometimes getting the right result. It means always getting the right result (within your capabilities that are usually known in advance in the case of humans).
Getting things right ~90% of the time still saves me a lot of time. In fact I would assume this is how autopilot also works in that it does 90% of a job and the pilot is required to supervise it.
The kind of code they write is the kind of code that will be unsalvageable after 10-50 changes. That's throwaway code, although it looks good. I don't think that's good enough for most places.
Of course, if you really take the time to slowly and carefully review what they write (that many people say they do, but the results don't look like it) you can keep the agents on course with a lot of babysitting and a lot of "revert everything you did in this last iteration".
> You have painted an awfully pessimistic picture that frankly does not mirror reality of many enterprises.
Why pessimistic? The agents are truly remarkable at debugging, and they're very good at reviews. They just can't really code. Interestingly, if you ask codex to review other codex-written code it will often show you just how bad it is, it's just that if you loop coding and review, the agents don't converge.
> It does not know compilers by heart. That's just not true.
It is true. The models can reproduce large swathes of their training material with pretty good accuracy.
> The point of the experiment was to see how big of a codebase it can handle without human intervention and now we know the limits.
What they produced was 100KLOC, which is 5-10x larger than some production C compilers, but even 100KLOC isn't a big codebase. And the amount of human intervention in that experiment was huge: humans wrote specs, thousands of tests, a reference implementation and trained the model on all of those. In most software, at least two or three of these four efforts are not realistic.
What they didn't have is close and careful supervision of every coding iteration. If you really do that - i.e. carefully read every line of plausible-looking code and you think about it - fine; if not, you're in for a nasty surprise when it's too late.
> The limitation has always been context size.
I don't buy it because human context size - especially in this case, where the model has been trained on everything - is smaller, and yet writing a C compiler isn't hard for a person to do.
> Getting things right ~90% of the time still saves me a lot of time.
They might get things right ~75% of the time when they write no more than a few hundred lines of code (unless we're talking a mechanical transformation). Anything beyond that is right closer to 10% of the time. The problem is that it works, at first, close to 90% of the time, but not in a way that will survive evolution for long. So if you're okay with code that works today but won't work a year from today, you might get away with it. I think some people are betting that the models a year from now will be able to fix the code written by today's models. Maybe they're right.
But the agents certainly save a lot of time on debugging and review. Coding - not so much, except in refactorings etc..
So, yes; I understand where you’re coming from. But; that’s not what we do.