Understanding Multi-Head Latent Attention (From DeepSeek) | Dark Hacker News