mezark | Dark Hacker News

user:	mezark
created:	February 27, 2023
karma:	125

1.	2 days ago \| discuss
2.	2 days ago \| discuss
3.	1 points by mezark 4 days ago \| discuss
4.	What happens when you run a CUDA kernel?(fergusfinn.com) 293 points by mezark 5 days ago \| 32 comments
5.	A running list of reasons to move to open source(whyopensource.ai) 6 points by mezark 12 days ago \| 0 comments
6.	Moe inference optimizations: 15% lower expert load by request reordering(blog.doubleword.ai) 3 points by mezark 45 days ago \| 0 comments
7.	1 points by mezark 53 days ago \| discuss
8.	Tensor Network Attention(mainlymatmul.com) 2 points by mezark 58 days ago \| 0 comments
9.	Redundant Information in LLM Weights(fergusfinn.com) 5 points by mezark 60 days ago \| 0 comments
10.	Tans: Precomputing RANS(fergusfinn.com) 3 points by mezark 65 days ago \| 0 comments
11.	Also-RANS: Asymmetric Numeral Systems for Entropy Coding(fergusfinn.com) 25 points by mezark 65 days ago \| 0 comments
12.	70x faster cold(ish) starts for SGLang(fergusfinn.com) 4 points by mezark 71 days ago \| 0 comments
13.	QueueSpec – drafting speculation tokens while a request queues(blog.doubleword.ai) 1 points by mezark 159 days ago \| 0 comments
14.	ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism(mainlymatmul.com) 1 points by mezark 166 days ago \| 0 comments
15.	Parallel Primitives for Multi-Agent Workflows(fergusfinn.com) 1 points by mezark 171 days ago \| 0 comments
16.	New fastest AI Model Gateway – 450x less overhead than LiteLLM(github.com) 2 points by mezark 256 days ago \| 0 comments
17.	Should GPUs Make Free Trade Agreements?(doubleword.ai) 3 points by mezark 288 days ago \| 1 comment
18.	Controlled generation of OS LLMs – without impacting latency(youtube.com) 7 points by mezark 2 years ago \| 1 comment
19.	1 points by mezark 2 years ago \| discuss