Writing an LLM from scratch, part 16 – layer normalisation | Dark Hacker News