From multi-head to latent attention: The evolution of attention mechanisms | Dark Hacker News