Why AutoGPT engineers ditched vector databases(dariuszsemba.com) |
Why AutoGPT engineers ditched vector databases(dariuszsemba.com) |
The doc is littered of those paragraphs. Remove the fat! Go to the point! YOLO, that’s a freaking waste of life cycles!
But that never got anyone promoted.
Maybe they found search by plain old dot product faster
My opinion on this: eh, who cares? AutoGPT and similar are non-standard use cases for Vector DBs right now, and Vector DBs are useful for RAG.
It's a good idea to find a solution that enables starting simple and scaling up as needed without having to fully rewrite the code.
Surely though we’re going to see a fairly exponential increase in these requirements though?
The cheaper the compute gets/scales the more sense it makes to hammer problems with more agents/tries so scale of needed agent memory also goes up.
I’d have gone with “it’s already implemented so just leave it be”
If a model on one computer could report memories to a centralized service that could be searched by new instances so work didn't need to be replicated, I'd fully expect that 3rd party service to be running a vector DB.
But in reality, the issues of trust and poisoning the well are too pertinent to see enough centralized consolidation of memory to justify it for a project like this.
I've seen some discussions around E2E encrypted LLM chains, and I could definitely see a 3rd party memory layer as part of that, though I'd suppose it would need to be a plug-in at the model provider and not at the client anyways.
I understand the argumentation of the article. But I can imagine that waiting so long for a LLM to react that I would actually prefer to do a search instead on a vector database on my "additional information layer" and find relevant information myself. In that case, having a vector DB would then serve two purposes and that could change the considerations whether it is worth the added complexity.
Not an expert here, just a question that came to mind - it might be based on wrong assumptions.
But a multi-modal database that also supports embeddings in hybrid mode or _in addition_ to standard retrieval techniques is both still very useful, and probably sufficient.
What that means to me is that it is yet another vote in favor of less optimized but far more versatile and robust solutions like: OpenSearch, Elastic, and PostgreSQL. [when I say 'less optimized' I'm only referring to their current vectordb plugins, not the rest of the machinery]
OpenSearch and Postgre are phenonemal, robust, OSS tools and the only lingering downside seems to be that their vectordb implementations are still a bit less optimized for large collections - but that probably doesn't matter in practice.
This can cause RAG systems to fail to retrieve, fail to connect fragmented information or to conclude the result of a process. My theory is that information needs to be digested by a LLM and augmented before indexing in a RAG system. Embeddings are just searching at surface level. That is why I thought AutoGPT had difficulties, one of the reasons at least.
Maybe we can have LLMs preprocess the material to expose the deeper layers of meaning. And we need to reindex everything when we discover we are interested in an aspect we didn't explicitly expose for retrieval. Study, then index, and sometimes study again.
When I worked for Dubai airport, I was tasked with building a vector similarity search that was highly optimised for query speed, in the end I ended up holding the vectors in memory (in a numpy array) and using scipy to do cosine similarity, I could get about 1.2 million vectors per second per core after tweaking and optimising, again this is embarrassingly parallel so if you have more vectors, chunk it to fit your hardware and you should get more or less linear scale with that per core.
If you want a hand writing this let me know.
(Also there’s a lot of caveats here, for example they did not need to update the vectors, it was an extremely read-heavy usage pattern)
After a lot of trial and error, I managed to keep it somewhat on track by using a CSV file, something like;
"An expert manipulate .csv files, read the first URL from the first line, 2nd column of 'raw.csv', pass the URL to browse_website the questions 'summarize the activities, highlight any investment, funding or patents mentioned', regardless of the results or failure, write the data quote delimited and in columns where appropriate to a new line of 'complete.csv', pass the URL to google and summarize the answer to the question 'does this organisation have a good reputation, from reliable sources', regardless of the results or failure, write the data quote delimited and in columns where appropriate to a new line of 'complete.csv', remove the 1 line from 'raw.csv' you have processed, repeat the process until 'raw.csv' has no more URL in it"
On the plus side, it was very quick to iterate, 'programming' in words is an exercise in linguistics, it's ability to scrape from any site was impressive, on the downside, it really struggled to stay on task, and even when things seems to be working well, random behaviour was normal, so it might just decide to delete the csv as a short cut...
On Windows it's ability to engage PowerShell was equally enlightening and terrifying... As an exercise in instructing an AI it was interesting, I'd certainly try again if the requirements fitted.
I think it's a credit to the team that they explored options for vector storage then retreated in the name of complexity, it's a good reason.
You only want the approximate nearest neighbour method for millions of vectors and above. Even that is easy to do with off the shelve libraries for local index. It only gets complicated when you want fast insertion and distributed access.
* number of languages, no?
--Khrushchev, satirically inquiring regarding the implausible number of labor-saving devices exhibited on the American side of the Kitchen Debate[1].