Towards understanding multiple attention sinks in LLMs(github.com)1 points by thw20 65 days ago | 2 comments