Qdrant, the Vector Search Database, raised $28M in a Series A round

Qdrant, the Vector Search Database, raised $28M in a Series A round(qdrant.tech)

131 points by francoismassot 2 years ago | 167 comments

francoismassot 2 years ago |

Congrats to Qdrant's team, $28M for a Series is really nice.

There are a lot of OSS vector search databases out there, we could probably list the main ones:

- Qdrant: https://github.com/qdrant/qdrant

- Weaviate: https://github.com/weaviate/weaviate

- Milvus: https://github.com/milvus-io/milvus

What else?

lmeyerov 2 years ago | |

It's funny taking a numbers view. The most popularly used might not even be these, but vector indexes in existing popular OSS DBs and storage systems people are already using. Afaict earliest would be faiss on disk and vectors in opensearch & elasticsearch, and I'd be curious how say databricks, pgvector and other big ones are getting picked up now that they are out. Most of these supported fast & large-scale indexes even early on (ivfpq, ...) by wrapping faiss and friends. ~All OSS DBs we use now, esp managed, have or are getting vector indexes.

Another one most similar to qdrant we track internally is lancedb. They are clever by supporting an embedded architecture, so an architectural reason to prefer over most existing OSS DBs. In our survey 2 years ago, we predicted specialized vector DBs having regular OSS DBs be the elephant in the room, and missed embedded as a fundamentally different category: https://gradientflow.com/the-vector-database-index/ .

(Good luck to qdrant! I'm happy they waited before raising, hopefully this means they can operate more healthily than otherwise and easier to maintain the discipline to do that!)

jillesvangurp 2 years ago | |

There are a few more. Pinecone comes to mind.

And then there are traditional databases and search products that are integrating vector search capabilities as well: Postgres, Elasticsearch, Opensearch, Solr.

They each have their limitations of course but the 28M round suggests a moat that I'm not seeing that clearly in terms of tech. What's so special about qdrant relative to their competition?

At least they are Apache licensed for now. So, that's nice. But that also means e.g. Apache Lucene could borrow some code from them to beef up their vector search capabilities. Which would benefit Elasticsearch, Opensearch, and Solr which all depend on Lucene.

Which raises the question what the point is of QDrant long term and why investors are betting on this as opposed to other things.

It seems to me that the main challenge with vector search is inference cost (at index and query time), not storing the vectors. A secondary concern is the vector comparisons at query time. A good way to cut down on that is to reduce the overall result set using traditional search or query mechanisms. In other words, you need

manishsharan 2 years ago | | |

I think there will be enough of market to justify a few more dedicated VectorDB vendors.

From the enterprise perspective, which of these vendors proved the best combination of security, availability, performance and pricing will matter. when we run benchmarks on our (self hosted) LLMs, we do not a clear idea of where we have bottlenecks and we end up assuming its the GPU/memory. And our pilot implementation will never go into production as the security model is nearly non existent in our implementations; the execs AND qa are getting the same RAG outputs. It is all very new to us and our teams. If a vendor can outperform its competition in our tests and show credible security model with segmentation of knowledge, that would be the choice.

lsaferite 2 years ago | | |

Is Pinecone OSS?

I ask because this was the statement from the PP

   There are a lot of OSS vector search databases out there, we could probably list the main ones
   ...
   What else?

alfalfasprout 2 years ago | |

Sure, but frankly it's historically very hard to build a business around a specialized database, especially if you have competitors that are even 80% as good but free.

The cases where I've seen this work are when the DB offers something way ahead of what their competitors offer. For example, KDB+ was historically unrivaled when it came to ultra high performance time series storage and Aerospike is very hard to beat for extremely high performance multi-node K/V.

Otherwise there's little to stop a larger company from offering the OSS competitor to your DB as a service for a lower cost and invest eng resources to close the gap.

mritchie712 2 years ago | |

We use pgvector which if you're already using postgres should be in the running for your use case.

I also like https://github.com/lancedb/lancedb

epistasis 2 years ago | |

It's fascinating to see the diversity of vector databases! I've chosen to prototype with two, ChromaDB, and LanceDB, based on the ease of using embeddings with them, and had not even heard of these others here. I'm also very excited to go through VectorHub's table of databases:

https://vdbs.superlinked.com

(discovered from sibling comment here: https://news.ycombinator.com/item?id=39103322)

rgbrgb 2 years ago | |

I've been looking at this one to embed in desktop apps https://github.com/unum-cloud/usearch

bumberg 2 years ago | |

Marqo.ai (https://github.com/marqo-ai/marqo) is doing some interesting stuff and is oss. We handle embedding generation as well as retrieval (full disclosure, I work for Marqo.ai)

shenli3514 2 years ago | |

Chroma looks good. https://github.com/chroma-core/chroma 10k+ stars, very easy to use, and can be used as an embedding database

tajd 2 years ago | |

This website for comparing vector database solutions might be handy https://vdbs.superlinked.com/

daveed 2 years ago | |

I think Activeloop(YC) is too: https://github.com/activeloopai/deeplake/

morgango 2 years ago | |

Elasticsearch: https://www.elastic.co/platform

andre-z 2 years ago | |

These are the major ones, correct.

lettergram 2 years ago |

Not to knock Qdrsnt, but generally the whole “vector search database” rush is insane.

I’ve been working with vectors for over a decade; particularly with embeddings used in AI. We’re talking projects from 100k to 100B+ records, used for AI applications

Postgres, particularly with pgvector and derivatives, can handle to millions of records very rapidly no problem. It’s very cheap, scales great, and is accurate.

I’m sure some of these open source solutions are improvements. That said, weigh vendor lock in, cost, risk and in the end it usually makes very little sense.

therealdrag0 2 years ago | |

What’s you use for 100B records?

lettergram 2 years ago | | |

Idk if I can talk about my exact project. But I worked alongside these folks:

https://www.capitalone.com/tech/machine-learning/learning-em...

Running an R&D group in the department.

tw1984 2 years ago |

I don't think such business model is going to last. There is no reason for AI giants like OpenAI to stick with such external "vector databases". There is not much technical stuff there. Unless you want to argue that "vector searching" is just some labor work when compared to AI, in that case, sure.

manishsharan 2 years ago | |

There are huge segments e.g. banking, insurance,legal, which are wary of using OpenAI and they would much rather host their own LLMs. I think these vector databases will find a ready market in this segment

tw1984 2 years ago | | |

Tell me what makes you believe that those big techs are not going suit those "banking, insurance,legal" orgs by providing them their own LLMs? For example, ever heard about github enterprise? you pay a stupid amount, github setup everything almost identical to the public github, just on your servers for your employees. Why big techs won't do the same here?

Those high profit margin part of the LLM business is for big players only, they don't burn hundreds of billions to offer you opportunties to cut their profit by capitalizing on their core business.

Communism doesn't exist in high tech. People don't work their xxx off to pave ways for your free lunch for life.

nemothekid 2 years ago | |

Is the idea that big players will get tired of paying a license to a company like Qdrant and write their own database? I just don't really see why they would do that - if Qdrant is similar in complexity to any standard DB, it's like asking why doesn't Apple just rip out MySQL and write their own Apple DB.

I can see them replacing it if Qdrant isn't able to scale to their needs - thats why we ended up with Dynamo, Spanner, MyRocks. However its likely that its probably easier to just acquire the team - like Apple did with Foundation - and become project stewards than trying to invent a new datastore to save pennies.

spullara 2 years ago | | |

qdrant is trivial compared to a normal database.

mritchie712 2 years ago | |

OpenAI uses Qdrant for chatgpt and a few other products.

https://news.ycombinator.com/context?id=38611608

guappa 2 years ago | | |

And has enough money to fork it at any time.

CamperBob2 2 years ago | |

There is not much technical stuff there.

I know that, and you know that, but tell it to a jury of East Texas hayseeds.

clbrmbr 2 years ago |

What’s the mote here? Seems to be a risky investment when it’s such a crowded space and likely to be decent open source alternatives for those with small budgets and homegrown solutions for companies with bigger budgets and requirements.

jjackson5324 2 years ago | |

It's a series A. Thinking about the moat in a hot market like vector DBs is a great way to miss out on unicorns.

tw1984 2 years ago | | |

the OP argued that it is not a hot market - companies like openai is going to eventually use its own while small players are going to just use openai's assistant APIs, they don't have to operate their own "vector database".

it is also worth to mention that even if there is going to be a market called "vector databases", which is highly unlikely, you can't just written off all existing regular databases and pretend that they are not going to just walk in and take over.

all in all, there is no reason to believe it is a hot market. it is much better to ask is there going to be a market at all.

lyjackal 2 years ago | |

Qdrant is open source (with hosted offerings).

weinzierl 2 years ago |

A while ago I read in a thread here that they are used in OpenAI's products and at another popular company. I am not sure but vaguely remember X/Grok.

They are also a Rust shop.

Who says Germany has no cool startups.

EDIT: Yes, it was Grok.

sroecker 2 years ago | |

Yes, used by Grok: https://twitter.com/qdrant_engine/status/1721097971830260030 Oh wow, completely missed that they're German. Should have noticed their "Impressum"..

treprinum 2 years ago | |

Is Open AI using it in their assistants API for retrieval? Answer performance of those is really bad and retrieval is slow compared to Pinecone.

simonw 2 years ago | | |

Yes, it's used for the RAG implementation - though we only know that due to information leaked in an error message I believe: https://twitter.com/altryne/status/1721989500291989585

mindvirus 2 years ago |

Congrats to them!

What have your experiences with vector databases been? I've been using https://weaviate.io/ which works great, but just for little tech demos, so I'm not really sure how to compare one versus another or even what to look for really.

danielbln 2 years ago | |

We're using Postgres with the pg_vector extension for basically all of our projects. We know and love Postgres, it has a big track record, the extension is supported on all major managed cloud offerings, no new tooling needed, pg_vector supports HNSW indices for performance as well.

Once in a whole supabase slips into a project, but that's basically just Postgres with some bells and whistles on top.

I got nothing bad to say about Pinecone, We aviate, Chroma etc. but when it comes to dbs, I like to go with the devil I know.

andre-z 2 years ago | | |

You should use whatever works best for you unless you face some limitations. The issue is that vector databases are not databases but search engines. It is ACID vs BASE. A few thoughts on this https://qdrant.tech/articles/dedicated-service/

Fendyfd 2 years ago | |

There are multiple vectordb available in the market, open source ones include Milvus, Qdrant, Weaviate etc. Cloud services include Zilliz Cloud (managed milvus), Qdrant Cloud, Weaviate Cloud etc. Try using a benchmark tool to evaluate them. Here is an open-source option for your reference: VectorDBBench (https://github.com/zilliztech/VectorDBBench)

tw1984 2 years ago | | |

why would anyone use a "benchmark" tool from a vendor (zilliz here) to test the performance of its competitors?

tw1984 2 years ago |

We have to be honest - "vector" database is a low tech stuff when compared to today's AI. You shouldn't be expecting to walk into the battle of AI, which is arguable the most important one in our life time, to dig a chunk of significant profit from major AI players' pocket by just having some low tech stuff. They use external "vector databases" for now because they don't want to invest R&D resources on such non-key issues for now.

for now is the keyword here.

When the company grow to 10k or 30k people, there will be teams competing for visibility, someone is going to build their inhouse "vector database" to get his/her slice of the pie. Do you still believe that any AI major player is going to reply on some external vector databases?

hartator 2 years ago |

> For example, it can automatically map ‘frontend engineer’ to ‘web developer’

Small revolution indeed.

Ref: https://qdrant.tech/use-cases/

avereveard 2 years ago | |

I really don't understand that sample the similarity capability is provided by the external embeddings model not by quadrant per se, unless they have some proprietary embeddings.

minimaxir 2 years ago | | |

Correct, but that one-pager is more aimed toward project managers than engineers. Marketing copy is weird like that.

anonzzzies 2 years ago |

The sourcecode is very readable of this product. And good license, no agpl or worse stuff.

treprinum 2 years ago | |

Why is AGPL bad? It would prevent Amazon from taking it from the founders and making money off it without giving anything back, like they did to dozens other products.

JoachimS 2 years ago | | |

It also makes it less interesting for other potential customers to use it. Reducing the potential market is probably not what a VC funding a startup wants.

menaerus 2 years ago | |

IMHO it looks arcane to understand or to debug, and with most likely a lot of negative performance implications, due to its shared-ptr-in-disguise all-over-the-place design.

    $ git grep "Arc<" | wc -l
    451

It could be probably related to the fact that the main author of the codebase is coming from the Java/Scala world. Or perhaps it's the Rust safety guarantees.

nemothekid 2 years ago | | |

Qdrant is a async Rust project, so there will be lots of Arc. Rust safety guarantees doesn't really let share references across threads haphazardly.

anonzzzies 2 years ago |

Offtopic: Is there a good OSS mixed (vectors + traditional) that can be embedded in our own solution and allows storing indexes in a pluggable kv storage? Besides rolling one, I cannot really find anything. Rust or Go would be best.

softwaredoug 2 years ago |

Someone has to ask the question: How many vector DBs do we really need? How do the vector DB companies differentiate themselves? And why do we need a company at all when there are increasingly awesome open source options?

I genuinely ask - there are a lot of other problems in the RAG, fine tuning, AI/LLM, retireval space, to solve. And more and more vector retrieval is, while not 100% solved, at least is something the community has a grasp on the tradeoffs. Solved to the point that squeezing a bit more recall out of vector retrieval isn't the problem anymore.

sanp 2 years ago | |

Agree but then the same argument applies to RDBMSs and multiple vendors seem to be doing OK in that space. I think it ultimately comes down to "stuff" (sales journey, price, support etc.) other than the technology itself. I am sure any RDBMS can meet most of the requirements of any given customers (in most cases) but we still see customers buying across vendors.

inertiatic 2 years ago | |

>Solved to the point that squeezing a bit more recall out of vector retrieval isn't the problem anymore.

I think this is a bit of a strawman. I don't think recall is the main point these systems are trying to sell us on, it's more about robustness and ease of use compared to building something inhouse or using a lower level library to build a system on top of it just for this small part of your overall project/product (be it RAG, search, whatever).

I guess Lucene-based solutions, while very mature overall in terms of engineering, lagged behind this functionality (out of caution, trying to build what's going to be long term useful) and are also perceived a bit too cumbersome. So these stores do make sense, I think. The core functionality is nothing too complex (at least HNSW), but hiding it behind a stable black box with just a few inputs and levers, has value for people that are likely to use these stores.

esafak 2 years ago | |

qdrant is open source. Being open source is not in opposition to running a company; it is part of their strategy.

There is still work to be done in vector databases. None of the products have perfected hybrid search yet, for example, and performance varies a lot between products; they are not fungible.

shanghaikid 2 years ago |

Congratulations.

What do you think you milvus? https://milvus.io/. The difference seems significant from the architecture perspective.

infecto 2 years ago |

I am excited to see how the vector search space plays out. Most of my work is not constrained by a low latency chat type user experience and I have not touched most of the vector search apis. I wonder what the difference is between competitors. The way I picture it is everyone is starting up their own Elasticsearch hosted solution and while there are some differences in functionality, the real bet is cost and scale.

ankit219 2 years ago | |

I think alpha lies in how good the embedding space is rather than which db you use to store and retrieve. A typical tradeoff between accuracy and performance, and here accuracy will be more important in many cases esp for businesses and enterprises. With that, and existing database providers introducing their own support for vectors, this space might be commoditized in near term.

Re embeddings, you would likely get better results if you train your own embeddings model. A popular approach is ColBERT, which anecdotally outperforms vector search in border cases[1]. Second is training an embedding model using initial layers of an LLM. [2]. In Colbert's case once it's trained, you dont need a db to store the vectors.

[1]: https://twitter.com/arjunkmrm/status/1744741903646773674 [2]: https://huggingface.co/intfloat/e5-mistral-7b-instruct

infecto 2 years ago | | |

I agree with you. I was ignoring the accuracy/performance tradeoff. Even in that space while there is certainly a lot of innovation left, there is already so much that is available commercially open source. If that holds true, you are really left with competing on price and scale in the long run.

redwood 2 years ago |

Anyone using Qdrant in prod?

inertiatic 2 years ago | |

I do, and it's very very rough around the edges to be honest. Lots of things broken, things are even breaking between releases suddenly in unexpected places. Or at least, I'm used to working with more robust data stores. If my work was more high stakes, I'd have already advocated for moving our vector search to something more robust. Thankfully it's not and I can just maintain what we're making with not too much stress, and enjoy seeing this OS project grow from a user perspective (haven't seen a data store go through this very initial phase in my career yet).

Support from the team is great however, and congrats to them for this round!

esafak 2 years ago | | |

Please elaborate. What would you have moved to, for example? This is valuable information.

crucio 2 years ago | |

We are for a few projects. We've been using them for over a year and have been impressed. We have 10s millions of items in there with lots of daily inserts/deletions etc. There's been a couple of gotchas but generally it is quite predictable and scalable.

We use 768 dimensional vectors for our items with several other payload filters (e.g. language). Performance has been good and I think the qdrant team focus on the right features without creeping into other areas.

swalsh 2 years ago | |

I built a little proof of concept that uses it in the RAG pipeline, it's been proving quite useful, so we're just starting the move to production.

It's probably going to stay, but I'm also evaluating databricks new vector store as we're using databricks for all the analytics parts of the app already, and having them all on the same infrastructureis appealing.

andre-z 2 years ago | |

Many: https://testimonial.to/qdrant/all

https://techcrunch.com/2024/01/23/qdrant-open-source-vector-...

redwood 2 years ago | | |

I should have been clearer in my question: It would be great to hear directly from people who are using them about their successes and what their experience has been like

yujian 2 years ago |

Good on them, I know the crustaceans are out here happy about this raise for a Rust based Vector DB!

(now I'm gonna plug what I work on)

If you're interested in a more scalable vector database written in Go, check out Milvus (https://github.com/milvus-io/milvus)

andre-z 2 years ago | |

The open-source benchmarks show different results. Feel free to make a PR to improve. ;) https://qdrant.tech/benchmarks/

rvz 2 years ago |

Well deserved funding round for a company that underpins most of the AI hype happening all over the place and probably always overlooked by many analysts.

Let’s see what they can do in a year or more with that new capital.

braza 2 years ago |

Outside AI and LLMs, there are some solid use cases for those Vector Search Databases? Maybe I am not seeing something, but it’s hard to see it gaining traction outside tech companies.

esafak 2 years ago | |

Vector databases enable semantic and similarity search. What company does not need that?

beernet 2 years ago | | |

Companies that don't want/can build it by themselves, so the majority of enterprises. It's a nice series A by the numbers, at the same time, generating relevant revenue will very likely not happen (given the valuation at this round was probably around 200M€). It's hype all over but can't blame them, would do the same I guess.

ancorevard 2 years ago |

Honest question, how long before EU makes it unattainable for Qdrant to remain in Germany/EU?

wahnfrieden 2 years ago |

What’s the best vector db for text similarity that can run in browser front ends too?

yding 2 years ago |

Congrats! Amazing milestone.

spullara 2 years ago |

Honestly there is no reason, except huge scale, to have a separate vector db. Every normal database and search engine now support vector search.

_mh56 2 years ago |

I applied to Qdrant a while back and got this response:

"We are getting many applications for this position. Usually, a test task would help preselect suitable candidates. However, since we develop open-source software, we rely on contribution.

You can build an open-source Qdrant connector to another framework or library. The simplest one would be, for example, a Streamlit data connector. But other ideas are more than welcome!

No limitations and no deadline. As long as this job position is online, we accept submissions. After you are done, send us an email to career@qdrant.com with the link to the repo. We will review it and get back to you asap."

No interviews, conversation before this email. Hope they see and fix this.

Edit : No Pay.