Ask HN: Does anybody still FEEL improvements between latest LLMs for coding?

4 points by sspehr 1 day ago | 7 comments

Title basically, for me it feels like latest generations of LLMs are quite equal in usefulness for coding, does anybody have anecdotes of the opposite case?

muzani 16 hours ago |

In the last 12 months?

Antigravity started the workflow where you give a list of things to do and it goes off and does those things without supervision, including drafting and testing edge cases. It can even spin up images. Fable is the latest form of this workflow.

Gemini 3 Pro is actually at a mid designer level. None of the others are even at a junior human design level.

Sonnet 4.5 was, for a brief moment, creative brilliance. But we're talking coding, not writing right? They ditched it all for 4.6.

Opus 4.6 and 4.8 are extremely good for coding. I use them to reliably go through logs that are like 15k lines long. It can read my code, plot out the logs that should happen, check the logs for what actually happens, and from there, form hypotheses, and set up the logs needed to validate these.

Codex/ChatGPT is probably second best in all of the above.

kapperchino 1 day ago |

Ngl after opus 4.5 I haven’t noticed too many improvements

HN-user2345 7 hours ago |

for those who are advanced in the fields who know what they are doing can tell differences but for those of us who arent that good at programming , its basically the same ngl

purple-leafy 21 hours ago |

Yes. Fable is insanely capable. Probably comparable to the jump from ChatGPT 3 to ChatGPT 4 for me, maybe double?

mzubairtahir 19 hours ago |

imporvements are happening in other things like less token usage, speed, cost etc

softwaredoug 1 day ago |

I still oscillate between "I'm totally cooked, I have no role here, the AI does everything" to "WTF why is this LLM so stupid today, WTF is it doing? This is garbage?"

A lot of that is because in the former case (AI does everything) I wasn't paying enough attention.

sergiotapia 1 day ago |

Not intelligence improvement, but the improvements to speed are tangible. In my case I've gotten so used to the speed of Composer 2.5 that using the latest Anthropic models frustrates me. They are so slow, and not really worth the wait times since Composer gets me what I need precisely, much faster. I think you'll see labs care a lot more about latency and tokens per second moving forward.