Usual implementation of attention transformers (SDPA) is kind of bad, actually | Dark Hacker News