AutoMegaKernel: Compiling a LLM into a single CUDA kernel | Dark Hacker News