KV Sharing, MHC, and Compressed Attention | Dark Hacker News