Kimi introduces Attention Residuals: 1.25x compute performance at <2% overhead(arxiv.org)9 points by nekofneko 87 days ago | 0 commentsNo comments yet