Evaluate Your Own RAG, Why Best Practices Failed Us

Evaluate Your Own RAG, Why Best Practices Failed Us(huggingface.co)

2 points by couAUIA 255 days ago | 1 comment

couAUIA 255 days ago |

At Jimmy, we're developing France's first Small Modular Reactor. Our engineers needed to quickly search and extract insights from thousands of complex scientific PDFs—nuclear research papers, regulatory documents, multilingual content filled with equations and diagrams.

Manual search wasn't cutting it. So we built a RAG system to give our team instant access to critical technical knowledge.

What worked: - AWS Titan V2 crushed it (69.2% hit rate vs. 57.7% for Qwen, 39.1% for Mistral) - Chunk size? Barely mattered (2K to 40K—no significant difference) - Qdrant: Easy to use, solid performance, great for self-hosting - Mistral OCR: Unmatched, the only tool that parsed our equations correctly - Naive chunking beat context-aware (70.5% vs 63.8%) - Dense-only search outperformed hybrid search (69.2% vs 63.5%)

Hard lessons: - OpenSearch from AWS is ridiculously expensive for no reason and presented as the default option by AWS - Mistral Embed works well in English but not in French