CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL | Dark Hacker News