Language Modeling, Part 2: Training Dynamics | Dark Hacker News