GPU Memory for LLM Inference (Part 1) | Dark Hacker News