Ask HN: How are you getting reliable code-gen performance out of LLMs? I'm particularly interested in people using LLM APIs, where code is consumed programmatically. I've been using LLMs a lot lately to generate code, and code quality is a mixed bag. Sometimes it will run straight out of the box or with a few manual tweaks, and others it just straight up won't compile. Keen to hear what workarounds others have used to solve this (e.g. re-prompting, constraining generations, etc). |