LLM from scratch, part 32k – Interventions: gradient accumulation | Dark Hacker News