Even though the costs are higher, we see that the RAG accuracy gains (in question-answering tasks) are worth it. Including LLM chunk re-ranking and contextual summaries in your RAG flow also makes the system robust to changes in chunk sizes, parsing oddities and embedding model shortcomings. It's one of the largest drivers of performance we could find.