GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU(theahmadosman.substack.com) |
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU(theahmadosman.substack.com) |
V of context is not equal across models.
Also, huggingface tells you how big the model is for the exact one you have in your hand, why the weird guesswork? Dynamic quants are not going to magically fit some formula.