Native Sparse Attention: Hardware-Aligned and Natively Trainable(arxiv.org)2 points by teepo 1 year ago | 0 commentsNo comments yet