So we tried eliminating human code review by default with a new spec-test based process that has one critical rule: if code passes the tests, it ships.
Yes reliability is only as good as the specs, so that's where the human expertise is now: writing specs and overseeing the agent instead of typing code.
We developed a couple new open source tools Hot Sheet¹ (new IDE - agent command center + harness) and Glassbox² (agent feedback focused code review UX) to help with this.
Honestly, writing specs well for complicated things takes about as long as writing code. You still have to cover the edge cases, now in English instead of code. And it might even be more error-prone, since English is more ambiguous than code.
I'm curious if anyone else has tried this and found it doesn't save time. My bet is human review won't be a gate forever but human eng time shifts from code writing, to spec writing, and ultimately allows for faster development. And the AI will be better at finding bugs to prevent that eg Mythos.
[1] https://www.npmjs.com/package/hotsheet [2] https://www.npmjs.com/package/glassbox