Algolia Acquires Search.io(algolia.com) |
Algolia Acquires Search.io(algolia.com) |
This is an interesting acquisition from my perspective because we also just started working on adding vector search to Typesense about a month ago.
So you can now do nearest neighbor searches by bringing your vectors into Typesense. This lets you do things like similarity searches, recommendations, etc.
I’d love to have more beta testers use the feature and give us feedback. If you’d like to try it out, please send me an email: jasonb at typesense dOt org
In any case, congratulations Search.io / Sajari team!
More generally, I think it is great to see development in this area from Algolia and Search.io to Typesense and others. Being able to have a customisable search which is really fast, can make a bit difference on a web-site.
I've heard about DataTables in various contexts over the years, so it's really cool to hear that you've integrated it with Typesense.
It feels similar to Render vs Vercel where clearly players like Vercel & Algolia make the big bucks from enterprise clients and thus make their services less accessible to small companies like mine.
Quick question with regards to vector search, do y'all intend on exposing some basic embedding service to your platform? I think it'd be pretty powerful to add a basic word2vec embedding model so that users who want to play around with vector search can simply just send some text and typesense would do the rest (convert text to embedding, index embedding, etc).
good luck!
"While the acquisition price was undisclosed, media reports suggest Algolia paid more than $100 million for Search.io"
https://www.businessnewsaustralia.com/articles/french-unicor...
"Search.io’s mission is to “enable every organization to build smart search and discovery solutions.” The company was founded in 2014 by Hamish Ogilvy and David Howden (originally named Sajari, and recently rebranded to Search.io). "
Contra the business news article: "Search.io was founded in 2020 by Hamish Ogilvy, who will remain with the merged company in the new role of vice president of artificial intelligence."
---
alright can some non marketing person explain in practical usecases why this "hybrid search" is so disruptive? i feel like the article is trying really hard to communicate how big a deal it is, but it falls flat on me because i simply only have pedestrian search knowledge
Anyone able to speculate how they were able to achieve this? Or for that matter beyond good sales & marketing - what technically gave them an edge that market actually needed?
Real-time upserts on hybrid and vector indexes is very unusual, please link to how you do this.
vespa.ai does it and it's open source
PR like this just feels like it was written by a college kid or makes me feel like they are compensating for technical inadequacy.
No thanks.
Where's the proof that "no other vendor offers this today"?
https://github.com/esteininger/vector-search
feel free to watch for updates :)
What is missing is Licence's implemention, which helps power Solr/OpenSearch/Elasticsearch
Vector search, though, isn't as good on handling typos and not good at all when it comes to as you type searching. Vehic won't match on auto, for example.
We believe that there is use for each of these approaches and a use in a single search, rather than choosing ahead of time or through heuristics after the fact which to choose.
(I'm a Principal PM for Semantic Search and Search Ranking at Algolia.)
This is incorrect in general case and it entirely depends on the model that is used to produce word vectors and the text corpus the model is trained with.
For instance, fastText model is trained on words, but also their parts (n-grams), so it should produce word vectors that would be close (in cosine-distance) to vectors of their corresponding typos and partials, even if the text corpus that was used to train the model doesn't contain same typos and partially typed words verbatim.
A good search partner is hard to find. PageRank is fun and all, but I believe better methods exist these days.
(disclaimer: I work on Semantic Search at Lucidworks)
However, I'm sure there are still applications where you don't have access to a robust embedding for your domain but can apply other techniques to deal with that domain's noise.
For the first part you can look into "embeddings" and "approximate nearest neighbor lookup" for the modern approaches. That said inverted indexes are still very popular.
The second one is generally called "learning to rank" so you can find a lot of things written on that topic. The biggest issue here imho is what training data you use which gives you examples of good rankings. The best algorithm trained on garbage will give you garbage.
And our CTO, Julien, wrote an "Inside the Engine" series on how our search engine works. It doesn't have the new "hybrid search" but it shows you the base of how we do search: https://www.algolia.com/blog/engineering/inside-the-algolia-...