NSA: Hardware-Aligned and Natively Trainable Sparse Attention | Dark Hacker News