I’ve written a post exploring the fundamentals of Information Retrieval and how they relate to modern RAG (Retrieval-Augmented Generation) systems.
It walks through:
• The CISI dataset used for experiments
• Sparse retrieval methods — TF-IDF and BM25, with their underlying mechanics
• Evaluation metrics — MRR, Precision@k, Recall@k, and NDCG
• Vector-based retrieval with embedding models
• ColBERT and late-interaction (MaxSim) methods