Hi HN, I built *LLM-Use*, an open-source intelligent router that helps reduce LLM API costs by automatically selecting the most appropriate model for each prompt. I created it after realizing I was using GPT-4 for everything — including simple prompts like “translate hello to Spanish” — which cost $0.03 per call. Models like Mixtral can do the same for $0.0003. ### How it works: - Uses NLP (spaCy + transformers) to analyze prompt complexity - Routes to the optimal model (GPT-4, Claude, LLaMA, Mixtral, etc.) - Uses semantic similarity scoring to preserve output quality - Falls back gracefully if a model fails or gives poor results ### Key features: - Real-time streaming support for all providers - A/B testing with statistical significance - Response caching (LRU + TTL) - Circuit breakers for production stability - FastAPI backend with Prometheus metrics ### Early results: - Personal tests show up to 80% cost reduction - Output quality preserved (verified via internal A/B testing) ### Technical notes: - 2000+ lines of Python - Supports OpenAI, Anthropic, Google, Groq, Ollama - Complexity scoring: lexical diversity, prompt length, semantic analysis - Quality checks: relevance, coherence, grammar Repo: [https://github.com/JustVugg/llm-use](https://github.com/JustVugg/llm-use) Thanks! Happy to answer questions. |