shreyansh26 | Dark Hacker News

1.	1 points by shreyansh26 67 days ago \| discuss
2.	73 days ago \| discuss
3.	1 points by shreyansh26 73 days ago \| discuss
4.	1 points by shreyansh26 73 days ago \| discuss
5.	1 points by shreyansh26 81 days ago \| discuss
6.	1 points by shreyansh26 82 days ago \| discuss
7.	1 points by shreyansh26 133 days ago \| discuss
8.	Understanding Multi-Head Latent Attention (From DeepSeek)(shreyansh26.github.io) 2 points by shreyansh26 157 days ago \| 1 comment
9.	1 points by shreyansh26 1 year ago \| discuss
10.	Deriving the gradient for the backward pass of Layer Normalization(shreyansh26.github.io) 3 points by shreyansh26 1 year ago \| 0 comments
11.	1 points by shreyansh26 1 year ago \| discuss
12.	GTC'25 Notes: CUDA Techniques to Maximize Memory Bandwidth – Part 1(shreyansh26.github.io) 1 points by shreyansh26 1 year ago \| 0 comments
13.	FlashAttention in PyTorch(github.com) 2 points by shreyansh26 3 years ago \| 1 comment
14.	Understanding FlashAttention(shreyansh26.github.io) 2 points by shreyansh26 3 years ago \| 0 comments
15.	Ask HN: What are some good resources on Recommender Systems? 14 points by shreyansh26 3 years ago \| 3 comments