Show HN: LLM-Use – An LLM router that chooses the right model for each prompt

Show HN: LLM-Use – An LLM router that chooses the right model for each prompt(github.com)

3 points by justvugg 100 days ago | 2 comments

Hi HN,

I built *LLM-Use*, an open-source intelligent router that helps reduce LLM API costs by automatically selecting the most appropriate model for each prompt.

I created it after realizing I was using GPT-4 for everything — including simple prompts like “translate hello to Spanish” — which cost $0.03 per call. Models like Mixtral can do the same for $0.0003.

### How it works: - Uses NLP (spaCy + transformers) to analyze prompt complexity - Routes to the optimal model (GPT-4, Claude, LLaMA, Mixtral, etc.) - Uses semantic similarity scoring to preserve output quality - Falls back gracefully if a model fails or gives poor results

### Key features: - Real-time streaming support for all providers - A/B testing with statistical significance - Response caching (LRU + TTL) - Circuit breakers for production stability - FastAPI backend with Prometheus metrics

### Early results: - Personal tests show up to 80% cost reduction - Output quality preserved (verified via internal A/B testing)

### Technical notes: - 2000+ lines of Python - Supports OpenAI, Anthropic, Google, Groq, Ollama - Complexity scoring: lexical diversity, prompt length, semantic analysis - Quality checks: relevance, coherence, grammar

Repo: [https://github.com/JustVugg/llm-use](https://github.com/JustVugg/llm-use)

Thanks! Happy to answer questions.