LLM inference engine from scratch in C++ – why output tokens cost 5x | Dark Hacker News