Understanding Multi-Head Latent Attention (From DeepSeek)

Understanding Multi-Head Latent Attention (From DeepSeek)(shreyansh26.github.io)

2 points by shreyansh26 158 days ago | 1 comment

shreyansh26 158 days ago |

A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.

shreyansh26 158 days ago |

A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.