LiteLLM Migrates to Rust(docs.litellm.ai) |
LiteLLM Migrates to Rust(docs.litellm.ai) |
Over the past year we've heard the same thing from our users and community, they want the fastest and litest AI gateway.
This change allows us to address two of the most common problems we hear from users latency spikes under load and memory leaks/OOM kills that take pods down
We believe a Rust hot path is faster and bounded in memory, so those whole classes of issues go away.
It will be a gradual, non-breaking change. The Python SDK and proxy stay exactly the same, under the hood they start calling the Rust binary through PyO3, one component at a time, each proven in production before the next. The sub-1ms figure is gateway overhead (what we add on top of the upstream call), and we're aiming for a sub-100MB binary. Happy to share benchmark methodology if folks want to poke at it.
The whole gateway will be running on Rust by December 1, 2026.
Full announcement: https://docs.litellm.ai/blog/litellm-rust-launch
How do you handle the competition like https://github.com/ENTERPILOT/GOModel and Bifrost? They already moved to more performant languages like Go.
What is the moat of litellm currently and why such a radical move?
Rust/Python hybrids are also quite well established, so it allows us to work off of a stable base - while ensuring we can deliver - low memory utilization - low request latency overhead
which is the primary goal here (be fast, lightweight and cheap to deploy).
2. LiteLLM has the most mature, and broadest range of unified api's x providers. This means you do not need to give developers raw LLM API keys, ever.
We see devs using/building agents that consume a lot of different API's - responses api, realtime, chat completions, messages - and no matter what they use we want them to be able to switch across providers without if/else statements in their code.
I can't comment on others, but that's our goal and what we work on doing everyday. So I would trust that we do it well.
Beyond that, we're also growing to become the single point of access for all AI resources. This makes it a lot easier when building agents, because you can give an agent 1 key, and it will have access to LLM's + MCP's (and in the future other resources like skills, api credentials, sandbox api's, etc.).