Evaluate Your Own RAG, Why Best Practices Failed Us(huggingface.co) |
Evaluate Your Own RAG, Why Best Practices Failed Us(huggingface.co) |
Manual search wasn't cutting it. So we built a RAG system to give our team instant access to critical technical knowledge.
What worked: - AWS Titan V2 crushed it (69.2% hit rate vs. 57.7% for Qwen, 39.1% for Mistral) - Chunk size? Barely mattered (2K to 40K—no significant difference) - Qdrant: Easy to use, solid performance, great for self-hosting - Mistral OCR: Unmatched, the only tool that parsed our equations correctly - Naive chunking beat context-aware (70.5% vs 63.8%) - Dense-only search outperformed hybrid search (69.2% vs 63.5%)
Hard lessons: - OpenSearch from AWS is ridiculously expensive for no reason and presented as the default option by AWS - Mistral Embed works well in English but not in French