Scalable-Softmax Is Superior for Attention | Dark Hacker News