Real-time LLM Inference on Standard GPUs (3k tokens/s per request)(blog.kog.ai)7 points by morgangiraud 35 days ago | 0 commentsNo comments yet