New deepseek paper: Natively Trainable Sparse Attention mechanism(twitter.com)5 points by redlock 1 year ago | 1 comment