Meilisearch 1.0 – Open-source search engine built in Rust

Meilisearch 1.0 – Open-source search engine built in Rust(blog.meilisearch.com)

448 points by tpayet 3 years ago | 180 comments

schappim 3 years ago |

We’ve used Meilisearch in production and it is the closest thing to self hosted Algolia you can get, which in itself is pretty amazing.

Unfortunately the performance of indexing (constantly changing records) wasn’t great and Meilisearch would fall behind on indexing records for hours.

Meilisearch has been amazingly great for projects where records don’t change all that much (eg docs, or even a customer database), but if you have for example a fast paced ecommerce system with 50k records constantly changing (eg product inventory), it falls over pretty quick. We had to transition over to Elastic for this aspect of our app.

The other issue we faced is their Rails gems falling out of step with the server, and when fixes came out, the Rails gem was incompatible for a while.

I really really hope 1.0 increases performance to the point where it becomes production ready, because the initial out of the box performance (before getting bogged down with indexing) was pretty amazing. Better than Elastic and on par with Algolia.

I recommend keeping Meilisearch on your radar. It is going to be great.

I wish the best for the Meili team and hope they succeed!

Kerollmops 3 years ago | |

Thank you very much for this amazing feedback, really appreciated.

We did a lot of improvement to the indexing part of the engine and now can auto-batch updates which gaves incredible improvements. We will continue to work on this in 2023. Can I know the version you were using?

nezirus 3 years ago | | |

My experience with indexing is similar. Up to lets say 1M docs it works fine, but after that it goes south. Even with auto-batch I had to manually prepare large bulk updates and wait for completion during inserts to not overload MS. (I am using Rust client).

Other than that, it is simply great. Ranking stuff is great, simple, I only need custom weights there, some additional functions (not just asc/desc) and it would be perfect.

message 3 years ago | |

Same ran into same version inconsistencies with Java library.

manigandham 3 years ago |

Congrats to the team, it's been interesting to watch the development of Meilisearch (and it's close competitor Typesense). Algolia has really paved the way here but it's nice to see the open-source options with more configurations and better default UX.

There's also many search libraries if you want to embed search more deeply into your app. I have a list of modern search systems and libraries here: https://manigandham.com/post/search-systems-libraries

naiv 3 years ago | |

I will never understand who the target group of Algolia is besides a website where the number of records coincidentally is in the range of the number of queries. At least they got rid of the pricing per indexing transaction which made it even more absurd.

If Algolia would offer an instance based pricing on cpu, ram and storage they would be the clear winner imho.

manigandham 3 years ago | | |

Why do the number of records and searches have to be similar? The current pricing is simple - you pay per "search unit" which scales in both dimensions.

The vast majority of small/medium customers would rather pay-as-you-go than maintain a fixed cost instance, and it allows Algolia to efficiently pack them into a multitenant architecture instead of wasting resource overhead.

luhn 3 years ago | | |

A while back I priced out what Algolia would cost us and it ended up being thousands of dollars per month for something that was currently running a t3.micro Elasticsearch instance. Our usage was just worst-case when it came to their pricing dimensions: A large number of very small documents with a low search volume.

kacy 3 years ago |

We’ve been using a Meilisearch for the last six months or so and have been delighted with its performance and usability. It uses a fraction of the resources as Elasticsearch, and the language support is extensive and very active.

That being said, our cluster is much smaller than other ones I’ve worked with in the past, so I can’t comment on its reliability at massive scale. I’ve also been very impressed with how active contributors are on GitHub and in their Discord. Everyone seems like good people, and it’s a project I’m excited to keep using.

tempest_ 3 years ago | |

This is the thing I find when people post "ElasticSearch Alternative".

80% of ElasticSearch's value add (wrt search anyway) is all the clustering and frame work that allows you to span the search over tens or hundreds of machines "easily".

I think the same is true here. Probably the comparison should be with the underlying search libraries that ES sits on.

I suppose this comparison makes sense in a world where most people don't run their own servers much any more since the clustering etc would be a problem for the cloud offering and not the consumer.

Semaphor 3 years ago | | |

> 80% of ElasticSearch's value add (wrt search anyway) is all the clustering

Or configurability. I looked at this again now that 1.0 is out, but besides the .NET client still being in an alpha state, it’s also very zero-configuration. There seems to be no configurability regarding tokenization strategies, for example.

Now, I certainly see the appeal, I barely understand my own ES code and meilisearch replicates probably 70% of it with no configuration at all, that’s impressive, but it also means that switching would mean giving up on those 30%.

kmac_ 3 years ago | | |

Yeah, Elastic also brings advanced aggregates and filters, Kibana, nice UI where you can explore data and create dashboards easily and tons of bigger and smaller features. But in some areas both products are comparable.

trilobyte 3 years ago | | |

This was the sense I got as well, though I have only started playing w/ Meilisearch. Clustering was one of the top 3 features that let Elasticsearch take over the market so quickly. In the playing around w/ Meilisearch I've done, it seems more like a replacement for something like Sphinx so far.

tpayet 3 years ago | |

Thank you very much! I'll share your comment with the team <3

sandstrom 3 years ago |

Great news!

Been following along for a while and it's a great project. ElasticSearch needs some competition.

For us, there are two things missing for us before we could make the switch:

1. Multi-index search; Standard use-case is searching across e.g. users and companies. Common in many SaaS-applications, where you want a single search field with type-ahead for e.g. contacts/organisations/tasks/events.

2. Decay functions; Basically to gradually phase out results for things based on age, distance or something similar. ElasticSearch has pretty good support for these. https://www.elastic.co/guide/en/elasticsearch/reference/curr...

ferdi05 3 years ago | |

Thanks for your feedback! The Multi-index search is planned, coded, and will be integrated on v1.1 (scheduled for April). The decay function is really interesting, the team will reach you back to know more about this need :)

joking 3 years ago | |

Elastic already has a great competitor called solr, which I prefer on multiple aspects over elastic by the way.

chimen 3 years ago |

Is Rust that important that you have to place "built in Rust" in the title? Is this like a cult following that we only bet on traffic and interest coming from other evangelists where Rust is the only feature that matter?

4 months ago: " Meilisearch, open-source alternative to Algolia in Rust lands a $15M Series A"

It's not the first time I see, there are at least 2-3 daily submissions reaching the FP in this manner so I'm curious: "built in Rust" = marketing these days?

tmikaeld 3 years ago |

My team tried to use Meilisearch for large datasets, unfortunately, it's impossible to plan the RAM usage. If you have very little searches, it consumed very little, but if you have a lot of search traffic, it may consume more than we could provision beforehand. This made it too unpredictable and too expensive, so we went with Manticore instead. I don't know if this has been addressed in 1.0, hopefully it has.

amateurdev0_07 3 years ago |

Thank you for making MeiliSearch. I use it for a personal project that gets a few hits a day, mostly from me and my friends.

https://pulpflakes.com/fmisearch/

It's a search over an index of fiction in the English language, first published in periodicals. Searchable by author, artist, magazine name and specific issue. Biggest index has about 200K documents, doc sizes are tiny.

Integrated with my WordPress site by handwritten PHP. Which was fun.

Performance is great. I didn't run into too many issues, and those I did i could resolve. What i remember:

1. The rules for text searches are too strict by default and if the order of words is different, will result in no matches. A, B will not return a result if B A is in the database.

2. Creating an index, uploading documents and changing settings required quite a bit of work. A week's worth of coding, almost. Would have loved to have a reasonably robust shell script that could take a JSON file with metadata on index and do the grunt work.

3. I have multiple types of documents, would have liked search to cover all of them so I don't have to change search type manually each time.

4. The default number of documents and max uploaded file size is too low. 200K and 200 MB or something. But it fails even on smaller file size.

The above sound like complaints. They're problems I ran into and others might. I love how productive Meilisearch made me. Thank you.

networked 3 years ago |

The most specific criticism I have read of Meilisearch is https://news.ycombinator.com/item?id=32940683. It has four points: (1) words beyond 65535 are silently ignored (this is documented in https://docs.meilisearch.com/learn/advanced/known_limitation... ); (2) the position of a matching word in a document non-optionally affects ranking; (3) to get the match information you must retrieve the entire attribute; (4) the meaning of PUT and POST is switched relative to RFC 7231.

Are points (2) through (4) true? Has any of the points been an issue for you in practice?

Kerollmops 3 years ago | |

What’s funny is that (1) doesn’t look like a real limit when you know that the first Harry Potter book is nearly 77000 words. The recommended way is to split your documents by paragraph to increase relevancy, this way you can see the exact part that match.

About (2) we will work on exposing two new ranking rules to be able to control that.

For (3) I thought it was fixed.

We decided to implement (4) the PUT and POST this way after looking how others were doing that.

networked 3 years ago | | |

Thanks for your reply. I agree about (1). I have checked the datasets I have set up search for, and they either have no or under 1% of documents with more than 65535 words. (This is without any processing to break up the documents into sections.)

mmachatschek 3 years ago |

This is awesome news! We've been using meilisearch in production for a few months now and we're more than happy with its reliability. Their work of the last few months really paid off, as the search speed and especially the indexing speed has increased a lot thanks to their efforts.

I'm excited to see all the things they'll build in the future.

tpayet 3 years ago | |

Thank you Markus <3

nop_slide 3 years ago |

This looks really cool and I might try the self hosted option out on my small website as an upgrade from Postgres’ full text search.

I was hoping the cloud version would be more appealing, granted there seems to be a generous free tier but the next option is $1200 a month?!

ren_engineer 3 years ago | |

their free tier looks like it has a "pay as you go" option once you exceed it that's identical to the paid option per 1K searches and 1K documents. You are basically paying for priority support, pretty common strategy and seems fair to me.

just noticed you don't get high availability on free tier which sucks, but I guess if search is mission critical to the point you need it, you would be willing to pay. Most of these database type companies start off targeting enterprise and then roll out self-serve solutions as they scale.

tpayet 3 years ago | |

Sorry, it might not be obvious, but you can go over the free tier and pay for the usage at 0.25$ for each 1000 searches/documents :)

nop_slide 3 years ago | | |

Ah yep sorry I missed that! Good to know. I just saw the next option was $1200 and my eyes became fixated on the number.

Maybe I will try out the cloud version then even though I expect my site would probably be well in the free tier limit, like I said it seems like a very generous tier.

koblas 3 years ago | |

Was excited to see a non-GC'ed search engine that looked solid. But, without having the replicated - distributed version of it in the "free" tier makes it hard to really evaluate.

tpayet 3 years ago | | |

Feel free to reach out to quentin@meilisearch.com, we'll find a way for you to evaluate the pro plan!

leeoniya 3 years ago |

compared to https://typesense.org/ ?

curquiza 3 years ago | |

Meilisearch made a comparison

https://docs.meilisearch.com/learn/what_is_meilisearch/compa...

bduffany 3 years ago | |

typesense did their own comparison here:

https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...

curquiza 3 years ago | | |

Unfortunately, the comparison with Meilisearch is not up to date in this link.

Also, we have to keep in mind that every comparison written by a company is always oriented.

MrBuddyCasino 3 years ago | |

They do have a pretty good comparison table: https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...

freewizard 3 years ago | |

they are very similar. I've tested both intensively a few months ago, ended up w Typesense for performance reason.

My test data set is 1.5M doc * 3-10 fields * 10-50 characters. Meilisearch has slightly better multi-language support, but typesense is much better on batch reindex speed and ram usage while a bit shy on supporting asian languages. The query speed is similar in light to medium load, I didn't stress test on query.

qdequelen 3 years ago | | |

You should try it again since we intensively improved the indexation performances. Most of our actual users no longer have performance problems, even on hundreds of millions of documents.

rsstack 3 years ago |

Is there a way to run it in WASM, to get something like Lunr[1]? We prefer to do our (small-index, <2MB) search client-side for a bunch of reasons, currently using Lunr.js, but it's a bit annoying and the typeahead search is something I improvised and not really official.

[1] https://lunrjs.com/

sandstrom 3 years ago | |

You could have a look at https://github.com/lucaong/minisearch/

rsstack 3 years ago | | |

Wow, this might fit our needs much better! Thanks!

kristiandupont 3 years ago |

I am using the core (called "Milli") in a local indexer that I run on my repositories and Obsidian files. It works like a charm and I am very happy with it. Obviously that's a use case with very little traffic but just indexing my repositories folder is quite a bit of work and it does it surprisingly fast.

The only real thing I am missing is a typeahead feature.

dureuill 3 years ago | |

Hello from a Meilisearch team member,

wow your project looks very interesting. How do you handle things like the filesystem changing while your indexer is offline? Do you reindex from scratch at startup?

Regarding typeahead, is this what we call "query suggestions"[1]? At the moment, we think that this is something that frontends and SDK can provide rather than the engine, so that means you wouldn't find it at the Milli level. We think you could maybe build an ancillary suggestion index and make two queries instead of one when typing, so as to get both results and suggestions at once.

Here's a chat link[2] to our latest discussions on the topic; feel free to come and weigh in if you're interested!

[1]: https://roadmap.meilisearch.com/c/31-query-suggestions

[2]: https://discord.com/channels/1006923006964154428/10685073658...

kristiandupont 3 years ago | | |

Thank you! Yes, I reindex. I store the file timestamp along with the contents, so it's not quite as involved as it could seem but startup does take a bit. And, I don't have a good way of discovering deleted files at the moment. Not a big deal as it is, but something I will look into.

And yes, query suggestions are exactly what I mean. Thank you for informing me, I guess I will have to look into how I can make it myself :-)

scop 3 years ago |

Congrats! Question for the team as I see a possible discrepancy on the website.

The "Comparisons" page says there is no limit for number of indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)

However, the "Limitations" page says there is a limit of ~180 indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)

Can you clarify what, if any, are the limitations of # indices?

ferdi05 3 years ago | |

Thanks! Indeed we now have a limit, but this limit depends on the OS you use. The limit is 200 on Linux. We found a way to remove this limit in the next version of Meilisearch (v1.1), which will be released in approximately two months.

I would like to know the use case for needing more than 200 indexes. We have handled multi-tenant with a single index and multi-tenant tokens. https://docs.meilisearch.com/learn/security/tenant_tokens.ht...

scop 3 years ago | | |

Multi-tenancy is indeed the use case. Our current solution involves keeping each customer's data in a separate index. I'll review the link. Thanks!

zX41ZdbW 3 years ago |

You can query Meilisearch directly from ClickHouse with the integrated table function: https://github.com/ClickHouse/ClickHouse/pull/33332

This feature was a student project, and I'm not sure if it will find its usage. If you are using Meilisearch with ClickHouse, or if you think this feature is worth something, please let me know.

survirtual 3 years ago |

This looks like an effective piece for a project I have. It would be significantly more effective if it was published on crates.io and could be instantiated within Rust, and was able to operate in memory (or have a filesystem passed to it, so that can be simulated)

I found this issue which tracks crates.io publication: https://github.com/meilisearch/meilisearch/issues/3367

Would be nice to see that made a priority. Having a powerful search engine that can be embedded in a larger application and made portable (like being able to deploy to WASM) would be extremely novel and valuable. Given Rust is already in use, I think it may not necessarily demand too much effort. When search becomes a focus for what I’m working on, perhaps I will make that happen if not already done yet.

Thanks for making this available to people.

jvans 3 years ago |

This looks very cool, nice work. Any plans to support ANN vector searches in the near future?

qdequelen 3 years ago | |

Yes, it's planned!

heybrendan 3 years ago |

How would one begin to use this when data is stored in MySQL, MariaDB, or PostgreSQL?

xarope 3 years ago |

I tried RTFM'ing, but can Meilisearch handle restricted documents (P&C) and integrated with LDAP/AD to pull security groups?

P.S. great to see your documentation search is powered by your own product (!)

wiradikusuma 3 years ago |

I see comparison against other search engines, but how does it compare to RDBMS full text search e.g. Postgre's? I know it's not apple-to-apple, but most people start with RDMBS.

bayesian_horse 3 years ago | |

As far as I understand it a search engine like this is meant to perform well on "Human" queries that are hard to formalize.

SQL queries, asking for records based on something like field a has to contain b or something like that are easy to formalize and fulfill by an RDBMS. But the SQL queries get hairier and hairier when the query involves multiple fields or even multiple unrelated tables. Or free form text. And those queries are harder to index.

On top of all of that, Humans often want things sorted in an order that isn't straight-forward to express in SQL. What is "relevancy"? All of that can be done in SQL, but it's not what RDBMS engines shine at.

drcongo 3 years ago |

Congrats team. Meilisearch is an absolute joy to work with.

ferdi05 3 years ago | |

Thanks!

nynapalm 3 years ago | |

Thanks! :)

tpayet 3 years ago | |

Thanks :D

dawnerd 3 years ago |

Very early adopter of meilisearch and it’s pretty great. But bumpy as the team found their footing but overall very impressed with it.

qdequelen 3 years ago | |

Thanks for your feedback!

hnaccountme 3 years ago |

Anyone else having deja vu of when Java did this sort of 'X' build with Java?

fyzix 3 years ago |

They need to structure their pricing page better. A quick glance had me thinking that $1200 was the minimum for production use. But the free tier is actually pay as you go.

msvan 3 years ago |

How does Meilisearch compare to ElasticSearch from an operational point of view? I've experienced ElasticSearch to be quite painful to maintain, requiring lots of manual tweaking to balance shards and careful design of indices.

paraboul 3 years ago | |

I've been using Meilisearch in production for quite some time now, and TBH it has been one of the easiest service to maintain (I mean, it's just a single statically linked binary) and with close to zero configuration.

AlexAltea 3 years ago | | |

Is it really "just a single statically linked binary"?

I'd love to use Meilisearch as you describe, but their so-called SDKs are just for clients, so you still need the Meilisearch server listening on localhost.

I would love to see something like SQLite based on Meilisearch (i.e. a fully self-contained search library like https://github.com/mchaput/whoosh). Do you know if such a thing exists?

tpayet 3 years ago | |

That's the point! We don't ambition to compete with Elastic on everything (logs, analytics, etc). We are doing search for front-end users with a strong focus on relevancy, speed & developer experience. You can read a bit more on our documentation https://docs.meilisearch.com/learn/what_is_meilisearch/compa...

sidmitra 3 years ago | | |

A quick question, are there any limits around number of separate indexes we can have with meilisearch? I'm thinking atleast say 20-30K separate indexes to start with.

My use case is that i want to start creating some indexes that are "per-user" and some "per-company" where a company(customer) might have many users. This is to do some sort of double tenant isolation. I will create different keys that have permission to specific indexes and deliver those to the user somehow. My current solution does hacky things with Elasticsearch like adding query filters by user/company-id attributes in the background automatically. But since meilisearch would be customer facing, i need stronger guarantees around permissions per index.

I tried this out a year ago on Meilsearch locally, but haven't stress tested it by creating thousands of them like production.

Or is there a better way to do this. This is also a reason where memory-only systems like Typesense didn't make sense to me. I'm fine with taking a performance hit by going to disk to pull the right index. Not every index will be used all the time. I might also look at sharding/partitioning features if present.

cies 3 years ago |

I think multi-lingual stemming is the point where I see this as a real ES competitor. Still they've come a long way, and burning too much RAM on ES is not the way fwd either.

drifteaur 3 years ago | |

I've had a great experience with Meilisearch, it was very easy to set up.

But I'm not sure what's behind the claim that "it supports all languages", aside from handling unicode? Does it support stemming at all? Does it have customized stop words per language?

qdequelen 3 years ago | | |

To answer your question precisely, we handle all the space-separated languages and have specific tokenizers for Chinese, Japanese, Korean, Thai, and Hebrew. We plan to add more languages in the future.

ckok 3 years ago |

Does it have any kind of master/slave or replication abilities? Couldn't find anything in the docs.

tpayet 3 years ago | |

Hello! No we don't yet, we are considering it though.

slig 3 years ago |

Is there a way to somehow find related documents to a specific document?

snowpid 3 years ago |

Can you rewrite it in Rust?

curquiza 3 years ago | |

We will think about it :D

muhammadusman 3 years ago |

how does this compare to Typesense? I'd like to see which one uses fewer resources for similar performance

qdequelen 3 years ago | |

Hey muhammadusman, I'm the Meilisearch's CEO. We have a complete comparison table. Note that it represents our point of view. https://docs.meilisearch.com/learn/what_is_meilisearch/compa...

Both Meilisearch and Typesense are really different regarding resource consumption and performance. I would say that where Typesense would have a better indexing performance (Meilisearch has recently improved indexation speed), Meilisearch will guarantee a much faster search performance while keeping impressive relevancy. Regarding the consumption, as Typesense is entirely on RAM and Meilisearch is using memory mapping, Meilisearch would take more disk space but less RAM.

garbagecoder 3 years ago |

What language you write a program in is not a feature, definitely not a headline one.

pkolaczk 3 years ago | |

Languages are not tools. Languages are materials. If you buy a house you are quite likely interested in what materials were used to build it. There are different features you'd expect from a wooden house vs concrete house.

timeon 3 years ago | |

Maybe not for you. But for my RSS filter it definitely is.

groestl 3 years ago | |

Well, choice of language carries _a lot_ of implicit information for which you'd need many more words.