thw20 | Dark Hacker News

1.	Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition(jeffreywong20.github.io) 1 points by thw20 51 days ago \| 0 comments
2.	Towards understanding multiple attention sinks in LLMs(github.com) 1 points by thw20 111 days ago \| 2 comments
3.	The Existence and Behavior of Secondary Attention Sinks(arxiv.org) 1 points by thw20 133 days ago \| 0 comments