I recently gave a talk with the Milvus Community showing a demo of how to transform PDFs with Feast using Docling for RAG.
The tutorial is available here: https://github.com/feast-dev/feast/tree/master/examples/rag-...
And the video is available here: https://www.youtube.com/watch?v=DPPtr9Q6_qE
The goal with having a feature store transform and retrieve your data for RAG is that (1) we make it easy to configure vector retrieval with just a boolean in the code declaration and (2) you can use existing tooling that data scientists / ml engineers are already familiar with.
I'd love any feedback or ideas on how we could make things better or easier. The Feast maintainers have quite a lot in the pipeline (batch transformations, support for Ray, computer vision and more).
Thanks a ton!
1: https://github.com/minio/minio/blob/RELEASE.2023-06-19T19-52...
Closer to the bits, your docling-demo.ipynb has some alarming times in it:
> INFO:docling.document_converter:Finished converting document 2203.01017v2.pdf in 101.48 sec.
What is it doing to that PDF for almost 2 minutes?
I have this idea of taking my whole Zotero library full of PDFs/websites, use my own notes and tags and prepare it for RAG applications, so I can query my documents, make connections between documents, classify them by topic, etc and have a way to add my new notes/tags back to Zotero (this is all possible with their API). Do you think Feast is a good framework for that ?