undefined | Dark Hacker News

13 points by franciscojarceo 1 year ago

Hey folks!

I recently gave a talk with the Milvus Community showing a demo of how to transform PDFs with Feast using Docling for RAG.

The tutorial is available here: https://github.com/feast-dev/feast/tree/master/examples/rag-...

And the video is available here: https://www.youtube.com/watch?v=DPPtr9Q6_qE

The goal with having a feature store transform and retrieve your data for RAG is that (1) we make it easy to configure vector retrieval with just a boolean in the code declaration and (2) you can use existing tooling that data scientists / ml engineers are already familiar with.

I'd love any feedback or ideas on how we could make things better or easier. The Feast maintainers have quite a lot in the pipeline (batch transformations, support for Ray, computer vision and more).

Thanks a ton!

mdaniel 1 year ago | |

Separately, because it's a pet peeve of mine, your project is advocating for running an AGPLv3 binary[1] seemingly needlessly; perhaps you meant https://github.com/minio/minio/blob/RELEASE.2021-04-22T15-44...

1: https://github.com/minio/minio/blob/RELEASE.2023-06-19T19-52...

mdaniel 1 year ago | |

Your video is an hour long and spends the first half(?) explaining what vectorization and RAG is. If you're trying to introduce folks to feast (et al?) then you'd benefit from a much, much less chatty one

Closer to the bits, your docling-demo.ipynb has some alarming times in it:

> INFO:docling.document_converter:Finished converting document 2203.01017v2.pdf in 101.48 sec.

What is it doing to that PDF for almost 2 minutes?

jdesfossez 1 year ago | |

That looks really interesting, thank you for sharing ! I have to admit, I am not exactly sure what a feature store does yet, but I will read more and watch your presentation.

I have this idea of taking my whole Zotero library full of PDFs/websites, use my own notes and tags and prepare it for RAG applications, so I can query my documents, make connections between documents, classify them by topic, etc and have a way to add my new notes/tags back to Zotero (this is all possible with their API). Do you think Feast is a good framework for that ?