Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090(buraak.com)3 points by bozdemir 14 days ago | 0 commentsNo comments yet