| user: | mezark |
| created: | February 27, 2023 |
| karma: | 125 |
| 1. | 2 days ago | discuss |
| 2. | 2 days ago | discuss |
| 3. | |
| 4. | What happens when you run a CUDA kernel?(fergusfinn.com) |
| 5. | A running list of reasons to move to open source(whyopensource.ai) |
| 6. | Moe inference optimizations: 15% lower expert load by request reordering(blog.doubleword.ai) |
| 7. | |
| 8. | Tensor Network Attention(mainlymatmul.com) |
| 9. | Redundant Information in LLM Weights(fergusfinn.com) |
| 10. | Tans: Precomputing RANS(fergusfinn.com) |
| 11. | Also-RANS: Asymmetric Numeral Systems for Entropy Coding(fergusfinn.com) |
| 12. | 70x faster cold(ish) starts for SGLang(fergusfinn.com) |
| 13. | QueueSpec – drafting speculation tokens while a request queues(blog.doubleword.ai) |
| 14. | ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism(mainlymatmul.com) |
| 15. | Parallel Primitives for Multi-Agent Workflows(fergusfinn.com) |
| 16. | |
| 17. | Should GPUs Make Free Trade Agreements?(doubleword.ai) |
| 18. | |
| 19. |