Usual implementation of attention transformers (SDPA) is kind of bad, actually(gist.github.com)1 points by teleforce 2 days ago | 0 commentsNo comments yet