I've been implementing the WebNN Web Standard on a personal Servo branch with AI coding assistance (main bans AI), and switched to GPT 5.4 this week. It introduced a race condition that compiled cleanly, passed tests, and would have been hard to catch if calling patterns had ever changed — and separately produced over-engineered architecture that took several iterations to unwind. Both episodes say something interesting about where agentic coding actually breaks down. Article includes the conversation transcripts.