What is SOTA for retrieval in RAG systems now? Have there been significant improvements this year? The simple flow we landed on in 2024 was: 1. Chunk and embed docs with embedding model 2. Embed query (maybe using an LLM to reformulate first) 3. Retrieve N1 docs using cosine similarity 4. Narrow to N2 using a reranking model 5. Inject these docs into context to generate answer Have there been significant advancements? Has anyone had seen improvements using graph structures like Neo4j for more sophisticated retrieval? |
No comments yet