I made a kernel 2.2x faster. It made my training loop 3x slower(kyrieblunders.bearblog.dev)4 points by vishal-padia 14 hours ago | 0 comments