FlashAttention-4: Algorithm and Kernel Pipelining Co-Design | Dark Hacker News