Sonic: Fast, lightweight and schemaless search back end in Rust

Sonic: Fast, lightweight and schemaless search back end in Rust(github.com)

238 points by louis-paul 7 years ago | 39 comments

Always nice to see more alternatives to Elasticsearch. That project could be so much better with some proper planning and focus.

There's also Toshi: https://github.com/toshi-search/Toshi which is built on top of Tantivy: https://github.com/tantivy-search/tantivy

And for C++, there's Xapiland: https://github.com/Kronuz/Xapiand

And for Go, there's Blast: https://github.com/mosuka/blast built on Bleve: https://github.com/blevesearch/bleve

mrec 7 years ago |

Huh. When this was initially posted it had a weird and commercially restrictive license, but it looks like that's been reverted, possibly after (polite) discussion on /r/rust. It's MPL 2 now.

https://github.com/valeriansaliou/sonic/issues/52#issuecomme...

javitury 7 years ago |

Performance figures are awesome. Also language support is great.

> Sonic only keeps the N most recently pushed results for a given word

This index discards old entries. This is fine for messages, in which aging items lose relevance. Yet the developer uses it for a help desk, which I think should give equal importance to all items.

In this area I would say the main comperitor is Groonga. It can be integrated into PostgreSQL with the PGroonga extension, and it indexes all of the data. However it consumes way more ram.

mellab 7 years ago |

Wow. I’ve spent the last three weeks building a custom search solution in Kotlin - in my case I’m using tokenizers from Lucene and using a radix trie as an index. I actually looked at using Bleve (another rust search lib) initially but it didn’t have the right language support

Glancing over this it looks like a nearly perfect fit for my use case I just wish I had seen this a couple of weeks earlier!

networkimprov 7 years ago | |

Bleve, from Couchbase, is in Go :-)

https://github.com/blevesearch/bleve

O_H_E 7 years ago | |

And that's why I believe that the search/discovery problem is not solved yet by google

StavrosK 7 years ago | | |

Far from it, search sucks nowadays. Just think about how hard it is to find stuff from Twitter, reddit etc, even though that's where most of the content is.

And if you don't know what you want exactly, you can't find anything.

manigandham 7 years ago | | |

Google is a public service to search the internet. This product and thread is more about adding search functionality to other applications and private data.

FridgeSeal 7 years ago |

Talk about perfect timing!

I was looking for something just like this for a project in my team. We had been using this setup where a huge chunk of the data was being stored in triplicate: some of it in ES, some more of it in another database and finally the whole dataset in our data warehouse.

Hopefully I can use this to only provide the index + full text capability and just use the warehouse itself as the main db because the query performance is similar enough and the warehouse is criminally underused for what we pay for it.

winrid 7 years ago | |

What's preventing you from using ES for everything? Slow writes?

FridgeSeal 7 years ago | | |

Write speed is fine, it’s more the fact that the dataset is reasonably large, and to run an instance with enough capacity and nodes (even with spill to disk), is silly expensive.

dannycastonguay 7 years ago | |

Curious to know if you considered https://www.algolia.com/ as well?

mellab 7 years ago | | |

Algolia is a beautiful product but it’s expensive. At just 1 mil items you’re already paying 500 a month

marmaduke 7 years ago |

This looks like a breath of fresh air. Elasticsearch won’t even start if it can’t preallocate 2 GB

Xylakant 7 years ago | |

That’s just false. The default config might set the JVM HEAP to 2GB (though I’m fairly certain it’s 1GB) but ES will start up with half a GB of heap with no issue.

marmaduke 7 years ago | | |

It was the default behavior on my install, documented recommendation is half the system memory and the logs don’t provide useful info when the heap allocation fails.

Perhaps the upstream default is 1 GB but this was not the default, and there is much confusion on setting these correctly

https://stackoverflow.com/a/40333263

Not everyone has time to dig into it; of course if you did that’s good for you

sidcool 7 years ago |

I would like to understand the code for the project. What approach would help?

d33 7 years ago | |

Personally, I would advise finding an itch to scratch - something you'd like to see improved. Then try to understand the code from this perspective - where would you put the functionality, which pieces would it be connected to? Try to follow the lead, make notes as you read the code. You'll eventually get a feel of the infrastructure, going from the deal to the big picture - this is my default way of navigating projects.

If at any moment you feel lost, look through related issues and merge requests, as well as Git history. Perhaps you'll see how things get changed in the project, patterns intrinsic to it.

Also, keep in mind that you can always try to contact the community/author or invite someone to try figuring out your goal with you - once you collaborate, you'll get a solution tailored to the way you think. It does engage other people, but it also makes coding social and (at least to me) more satisfying.

Let me know what you think of this approach!

StavrosK 7 years ago |

This looks pretty cool! I'd like to try it, but it looks like I'll have to wait for the Python client library first.

MosheZada 7 years ago | |

I just wrote a basic one https://github.com/moshe/asonic

StavrosK 7 years ago | | |

That looks great, thank you! I'll try it out and send you any feedback I have. Also, you might want to link to Sonic in the README, for people unfamiliar with it.

marmaduke 7 years ago | |

The protocol description makes it look fairly trivial to script,

https://github.com/valeriansaliou/sonic/blob/master/PROTOCOL...

deadwisdom 7 years ago | |

Wanna build it with me?

arcticbull 7 years ago | | |

The Rust Python bindings are actually pretty good, I hacked together a project to let you deploy Rust micro services in Lambda via Python module bindings a while back.

StavrosK 7 years ago | | |

Normally I would, but I don't have an immediate use for this, which kills my motivation

21stio 7 years ago | |

a grpc endpoint would be nice