FP8 GEMM Optimization on AMD CDNA4 Architecture | Dark Hacker News