Elasticsearch 1.0.0 released

Elasticsearch 1.0.0 released(elasticsearch.org)

471 points by dakrone 12 years ago | 136 comments

RyanZAG 12 years ago |

Elasticsearch is really awesome for searching, but what most people don't realize is that it makes a better MongoDB than MongoDB while giving you that searching too.

bilbo0s 12 years ago | |

This. A THOUSAND TIMES "This".

The one drawback ES had in the bad old days was that backup and restore was a nightmare... ESPECIALLY on AWS. The new system they introduced was so simple I was concerned about updating to it because I was SURE something would go south.

But it all just worked.

I still have the Couch to ES replication running because I'm anal like that... but really... yeah... you can do without Couchbase, Mongo et al... ES will probably do everything you need PLUS everything you can't do in the others.

diminish 12 years ago | | |

As a proud user of Elastic search since the early days I'm happy to see so much progress. Never mind about the *search part of their naming it's really a database for all practical purposes, especially for web data.

rjzzleep 12 years ago | | |

to be fair, the main selling point of mongodb is that developers can access it more easily. i haven't really touched mongodb in over a year and then only for playing, but have you tried the elasticsearch filter query syntax? have you compared mongodbs syntax?

also, i have the exact opposite nitpick. people want to use it to do everything, mail indexers, file system indexers. what's the matter with web developer folks? why is it that when the next database comes around they want to use it for everything?

AznHisoka 12 years ago | | |

Just curious, if I'm using say version 0.92, how would I go about backing up my ElasticSearch instance. Besides creating a replica in a server, then "freezing" it by disconnecting the server?

kainosnoema 12 years ago | |

I'm surprised so many people miss this. Out of the box, Elasticsearch is a distributed NoSQL store with better write consistency (and arguably performance) than MongoDB offers in its default configuration. The major missing feature was backup snapshots and restores, which 1.0 delivers—along with aggregations that more than rival MongoDBs. The team has intentionally avoided marketing themselves as a NoSQL store (was told this directly by an employee), but they're aware of the potential and have customers using it as such.

nkoren 12 years ago | | |

It's easy to miss. On the front page, the word "store" only occurs once, buried three page-scrolls down in the body text. Otherwise it very much gives the impression of being some kind of analytics dashboard for third-party datastores. And I didn't notice that until after I've visited the website, clicked through a few links trying to figure out what the fuss was about, then gave up and decided to read the comments here.

gibrown 12 years ago | | |

While I agree with the sentiment, I think Shay (lead ES developer) has explicitly said that he does not consider ES to be a data store... yet. I think this is mostly due to maturity.

I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.

camus2 12 years ago | | |

did not know all that stuff, could Elasticsearch be the holy grail of document stores ?

sandGorgon 12 years ago | |

I had a live production logistics system running on top of Elasticsearch 0.6 (as a NoSQL database ) back in 2012. This powered one of India's largest ecommerce systems (at that time).

Elasticsearch is brilliant as a NoSQL - and if you were already using elasticsearch as a search system, you dont need to introduce yet another component into your stack.

axefrog 12 years ago | |

What limitations should one be aware of that would make ElasticSearch not a viable candidate where something like MongoDB would be a better fit?

RyanZAG 12 years ago | | |

When running a search, ES by default will not show items that have been indexed in the last 1 second. Directly getting an item by its ID doesn't have that limit though, and you can optionally set a search to force a re-index and show all items.

Other than that (which is just performance tuning, really), ES matches mongodb feature for feature, and obviously has a lot of extra power from its search heritage such as facets and percolate.

So I can't actually think of any limitations, and it's why I said ES makes a better MongoDB than MongoDB.

alisson 12 years ago | | |

On ElasticSearch you have to update the whole document, no commands to manipulate them. You don't have commands like: $set, $addToSet, $pop, etc..

You need to have a good understanding of how tokenizers and analyzers work to be able to create good results for your data. I have difficulties matching documents with the exact title being searched for. On MongoDB that just works, on ElasticSearch you need to configure it.

ElasticSearch has some advantages and MongoDB others. I think they are great together. One for storage and the other for searching.

brasetvik 12 years ago | | |

I can't comment much on MongoDB, but I've written a bit things to keep in mind when considering Elasticsearch as a NoSQL store here: https://www.found.no/foundation/elasticsearch-as-nosql/

sjs382 12 years ago | | |

I'm not sure if ElasticSearch does anything like this, but I make use of MongoDB's GeoJSON queries, namely the $geoIntersects operator.

http://docs.mongodb.org/manual/applications/geospatial-index...

abhirama 12 years ago | | |

When I played around it, could not figure out a way to get the exact count of events in the datastore when the data was distributed in replicas. In fact, there was ticket open for this, not able to fish it out now.

ddorian43 12 years ago | | |

presharding

You create a number of shards for each index(database) that you can't later expand.

ddorian43 12 years ago | | |

also changing indexed-fields on the go

mtrn 12 years ago | |

True. I evaluated Mongo, Couch and a couple of similar solutions, but ES being a search engine from the start really convinced me, that it can be a viable database for loosely structured data.

g9yuayon 12 years ago | |

I don't know much about MongoDB, but it's true that Elasticsearch is a great NoSQL db with support of boolean search. Netflix has a number of use cases that use Elasticsearch as such NoSQL db: http://www.slideshare.net/g9yuayon/elasticsearch-in-netflix

ErrantX 12 years ago | |

Definitely! We are using it in production for storing monitoring data (via sensu, if anyone is interested). It's fantastic because you can shove data into the index with a ttl of 1 year. And have a x month archival strategy for cold storage.

It's search capabilities and scalability and fantastic - were throwing GB of data into it weekly and it just soaks it up.

tracker1 12 years ago | |

I would suggest that everyone who is considering one, look at both... When I looked into both, about a year and a half ago, I found that geospacial searches worked better in MongoDB at the time, and shaping my data to fit was more awkward with ElasticSearch.

That said, it's definitely worth looking into both, depending on what your needs are.

obastemur 12 years ago | |

"most people don't realize is that it makes a better MongoDB than MongoDB "

(IMHO) Unfortunately for most of the people, old habits to be made. Indeed a nice project and great release.

m0th87 12 years ago |

It was two weeks ago, and our startup was on the precipice of a major launch. We had completely rewritten our online publication site, which drives the bulk of our traffic. The product had to be shipped on-time - we had press releases, eager investors and a launch party dependent on it.

A few days before launch, things were not looking good. As admins manipulated articles in preparation for the launch, the servers kept crashing.

In a time-constrained major launch like this, a lot of nasty little hacks build up in the codebase. Our search system for admins was a complete mess. It was a custom solution that worked fine when admins managed a handful of database records, but now that they were managing thousands of articles, it was not scaling at all.

At the 11th hour, we dropped elasticsearch into our infrastructure. It worked like a charm. The servers stopped crapping out, and we launched on time.

Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr), or build anything on top to interface between the index and the queries themselves (Lucene). Thanks elasticsearch, you saved us!

mavelikara 12 years ago |

ES seems to have ability to run analytic queries. I have read about people using it as an OLAP solution [1], although I have not yet read anyone describe their experience. In that respect how does ES analytics capabilities compare against:

1) Dremel clones [2] like Impala & Presto (for near real-time, ad hoc analytic queries over large datasets)

2) Lambda Architecture [3] systems (where queries are known up- front, but need to run against a large dataset)

Does anyone here have experience ES in such usecases, beyond the free text searching one ES is well-known for?

[1]: https://groups.google.com/forum/#!topic/elasticsearch/iTy9IY...

[2]: http://static.googleusercontent.com/media/research.google.co...

[3]: http://jameskinley.tumblr.com/post/37398560534/the-lambda-ar...

zcrar70 12 years ago | |

I would also be interested in this.

Argorak 12 years ago |

Beyond the technology, Elasticsearch has a very mature, active and helpful community with users groups all over the world. We're well connected.

Pick your favourite users group here: http://elasticsearch.meetup.com/

Full disclosure: I started and run the Berlin UG. We set ourselves apart by always providing a small introduction into ES for those that are completely new and would have a hard time following the main talk.

shurane 12 years ago | |

Intros to ES and other technologies are useful.

I don't see many tutorials covering usage of ES here: http://www.elasticsearch.org/tutorials/

Could you maybe provide a link to yours?

Argorak 12 years ago | | |

The introduction is in person, at the users group.

Yep, tutorials is a huge problem, but there are people working on that.

bryanh 12 years ago |

The thing that worried me the most about Elasticsearch was how fragile it got around the limits of its performance. Run out of memory because of a nasty query? Boom, data corrupted. I hope you weren't using it as your primary persistence layer...

Otherwise, we love ES. The other comment about it being a better Mongo than Mongo rings true. With the backup/restore API and the some of the circuit breakers, I'm hopeful that my fears will be abated.

polyfractal 12 years ago | |

FWIW, this is a place ES devs are spending a lot of time thinking about. For example, 1.0 introduces a new "Circuit Breaker" [1] feature which will help prevent over-eager facets from blowing out the heap. It's just one part of a very large effort to make ES handle exceptional events more gracefully (in particular, memory related).

Another example are disk-based doc values [2], which are essentially pre-computed field data structures that are stored on disk. This moves Field Data off heap and allows the OS to manage memory evictions, to help minimize GCs and OOM blowouts.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

[2] http://www.elasticsearch.org/blog/disk-based-field-data-a-k-...

nzadrozny 12 years ago | |

Ditto open file handles, which is easy to push when aggressively over-sharding. Not an uncommon mistake for the enthusiastic newbie.

Having supported Solr/ES/Lucene in production for 4+ years now (websolr.com / bonsai.io) I would be pretty hesitant to trust Lucene in general as a primary data store. Beautiful for secondary indexing, but otherwise, Why Not Postgres?™ ;)

RyanZAG 12 years ago | | |

Complexity. Having two copies of the data means more dev time, more resources required to shift the data around, etc. Having just 1 data store that can also handle all your searching is like the holy grail. As you say, not sure if Solr/ES/Lucene are there yet - but they're definitely very very close. There is no theoretical barrier either - it just comes down to closing bugs, and the ES/Lucene team are very good at closing bugs.

EDIT: I don't think MongoDB is there yet either. There are definite benefits and drawbacks between Postgres and ES, tipping heavily towards Postgres for structured heavy write data. But for ES and MongoDB? I think MongoDB falls a bit short there.

wikyd 12 years ago | | |

I think the CSS for bonsai.io is not loading.

sandstrom 12 years ago |

This gem is from the 'breaking changes' list:

  “Geo queries used to use miles as the default unit. And we 
  all know what happened at NASA because of that decision. The
  new default unit is meters.”

I like this release already.

roryokane 12 years ago | |

Link to that page: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

lflux 12 years ago |

> Easy to read, console-based insight into what is happening in your cluster. Particularly useful to the sysadmin when the alarm goes off at 3am and JSON is too difficult to read.

It's these little details I love, when a project actually cares about operations and not just "well here's the API"

I've been using ElasticSearch only for Logstash, but i've been blown away so far as how easy it is to deal with.

axionike 12 years ago |

ES has performed very well for us as the backbone for the solution we deployed for a large government-sector customer. Had some GC issues initially, and were worried about user concurrency, especially since we were not restricting queries (i.e. users can do full-scale wildcard searches against the entire data set of 1BN+ records). But ES continues to shine.

Congrats to the ElasticSearch team, and all the supporters around it. Once I get back into more of a coding role, I'll definitely be contributing back to the ES project.

room271 12 years ago | |

This may require a bit more lengthy answer than makes sense here, but I'm curious about what was causing your GC issues and how you fixed them (we have GC issues at the moment).

polyfractal 12 years ago | | |

Not the OP, but GC issues in Elasticsearch basically boil down to memory pressure (obviously), which is usually caused by facets. Facets eat a lot of memory, especially if you are faceting high-cardinality fields - think fields like "tags" or any analyzed field. High cardinality, analyzed strings is the easiest way to blow out the heap.

There are other reasons, but that is like 90% of GC issues. To solve it, you need to make sure your faceted fields are configured well (usually not_analyzed) and assess how much memory is available. You may be able to index and even full-text search ten billion docs on a single machine, but faceting it may just be too much to ask for a single node.

Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.

Other GC culprits can be: too large bulk requests, unbounded threadpool queues, or something like parent/child/scripts/filter cache keys eating all your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)

NDizzle 12 years ago |

I also took a few days a few weeks ago to setup elastic search after my mysql full text search fell apart.

What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM table, the entire document in a text field.

What I'm afraid I'm not doing right is creating the web interface to search elasticsearch. What I'm using filters with the query string syntax[1] in the search box, pointing directly at that fulltext column. I'm also using the highlight functionality so that I can specify how many highlight blurbs to return with the result. The query string syntax works great with the OCR'd text, because most of it is near-garbage (as most ocr is) so you can search for something like "net sales"~50 to find those two terms within 50 words of each other. I think the results were something like: net sales 15,000 results "net sales" 120 results "net sales"~50 550 results

Can anyone point me at a good web based search implementation using elasticsearch that explains how they're doing it?

What I have works pretty good, I just want to... check my work, I guess.

[1]: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

nzadrozny 12 years ago | |

I host and support websolr.com and bonsai.io and have seen a lot of search implementations.

The main thing for good stability and performance is to be very good at batching your updates. You don't want to sling a ton of highly-parallel single-document updates at Lucene, lest you thrash the JVM and start garbage collecting like crazy.

From there, on the query side, you'll want to get a good working knowledge of the different tokenization and analysis options. There are a lot of subtle and interesting combinations to be had in there that influence performance and relevance of your search results.

xutopia 12 years ago |

I love when something I've been using in production for what seems like years just announces now that they've reached 1.0.

brickcap 12 years ago | |

Well does it not make you feel glad that you took the risk? After all version is just a number :)

dabeeeenster 12 years ago |

ES is a fantastic project. Thank you thank you thank you for your contribution; truly standing on the shoulders...

jonhmchan 12 years ago |

Congrats to the team - absolutely love elasticsearch. Having a lot of fun with it here at Stack Overflow.

pron 12 years ago |

What does Elasticsearch add on top of Lucene?

lobster_johnson 12 years ago | |

A lot. Lucene is basically the inverted indexes, providing on-disk structures and a mechanism to query, as well as assorted bits like tokenization.

ES adds distribution (multimaster-replicated cluster of nodes connected via a gossip protocol), sharding, defines a document model and schema (the mapping of arbitrary JSON documents to index structures), faceting, aggregation (ie., roll-up-type calculations), various types of scoring (eg., geographic distance), ETL ("rivers"), backup/restore, performance metrics, a plugin system (eg., for indexing different file formats) and a bunch of other things -- and of course a REST-based API on top of the whole thing.

buckbova 12 years ago |

I didn't know what this was and looking at this link it was tough to tell.

The github lays it out well.

https://github.com/elasticsearch/elasticsearch

alecco 12 years ago |

Why is it awesome? Why "it just works"? Is it just a mongodb-kind document store over Hadoop+Lucene?

What makes it so special to have hundreds of votes and tweets all around within 2 hours?

I don't understand. A DB engine engineer.

gibrown 12 years ago | |

There are a lot of features thoughtfully combined that make ES great. Top of my list would be:

1. It handles human written language. Any language. The same technology that let's it handle strings written in human language provides a lot of flexibility in handling string in other applications. Particular when handling logs.

2. Non-string data it also handles very fast and cleanly (numbers, dates, geo).

3. Lucene has an inverted index that has been optimized over many years. ES scales that pretty seamlessly across many servers. All decisions in the project seem to be made around whether a feature can scale to 100s of nodes.

The devs have also been really smart to focus on the "out of box experience". Very well thought out defaults.

More on our experience with ES at scale: http://gibrown.wordpress.com/2014/01/09/scaling-elasticsearc...

buckbova 12 years ago | | |

Is this accurate to elastic search since it is build on Lucene?

https://lucene.apache.org/core/

"index size roughly 20-30% the size of text indexed"

That seems excessive for an index.

ddorian43 12 years ago | |

distributed/full-text-search(many-many-options)/highlighter/compressed/geo-queries/searching on multiple indexes(databases)|types(tables)/distributed-aggregation/distributed faceting/very-fast-in-memory-suggester/inverse-query(percolator)where you register queries(like rows), and then test documents if they match queries

and many other stuff

philfreo 12 years ago |

We wrote a tutorial about how we wrote our search for Close.io using elasticsearch and pyparsing:

"Sales data search: Writing a query parser / AST using pyparsing + elasticsearch"

Part 1: http://blog.close.io/sales-data-search-writing-a-query-parse...

Part 2: http://blog.close.io/sales-data-search-writing-a-query-parse...

karterk 12 years ago |

Elasticsearch mostly "just works". The latest version of Solr has made clustering easier (requires managing Zookeeper), but before that, it was either ES or nightmare.

Lucene is one of those projects which hardly has any real competition. That's surprising given how many real world software projects have a search requirement. While Lucene is excellent, it's not without flaws and competition is always great.

m0th87 12 years ago | |

FWIW, Elasticsearch builds on Lucene. It's just working at a much higher level of abstraction.

dclara 12 years ago | |

I agree with you, almost every website needs a search server on the backend for people to search their document base, especially for enterprise intranet. Maybe enterprises are using commercial products, such as SharePoint. How about the rest of the small businesses and websites? Maybe the learning curve is steep for every website to adopt so far.

swah 12 years ago | |

Hmm, could that be because they have to compete with free?

malaporte 12 years ago | | |

Lucene does have competition, mostly in the commercial world. I know, since I work for one of those companies :p

Solr, ElasticSearch, etc. are mostly concerned about the index/search features, and they do quite a good job there. But this still leaves a huge amount of space for commercial offerings, as core search is only a part of the problem. I'm thinking about connectivity with complex enterprise systems, support for the specific security models of those systems, integration in other systems, etc. Believe me, those problems are not easy to solve.

So, even if we have an index that can most probably match Lucene's feature for feature and quite a lot of things beside, we typically won't go after deals where simple search is the only requirement. Instead we focus on larger deals with more complex requirements. And we're doing quite well, thank you :)

Zilog 12 years ago |

Too bad they have yet to address the split brain issue.

chriscareycode 12 years ago | |

I haven't had a split brain on my 15 node cluster in over 6 months even though the cluster is split among multiple data centers which do drop connectivity from time to time. When the setting was wrong, it happened constantly. Tune it properly and it won't happen. n/2+1

r00fus 12 years ago | |

Link for the curious: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-br...

AznHisoka 12 years ago | |

True, that's a valid issue. For me, it's not as I end up indexing the same document multiple times over the course of 2-3 days.

hungryblank 12 years ago |

At Contentful in Berlin (Germany) we're looking for an elasticsearch/lucene expert, if you're excited by this tool and want to work full time with it get in touch.

https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7I...

capkutay 12 years ago |

I was vetting ES for a business critical search platform, had some concerns about write/read performance and how the lucene indexes are handled on disk. I read that it doesn't really perform as well a splunk...Instead of ES, I'm considering a solution using HBase to shard lucene indexes on HDFS.

gane5h 12 years ago |

Really impressed with the pace of innovation in the last few months: cat api, aggregations, snapshots. The unfortunate side effect is that books and stack overflow posts written before 1.0 are outdated.

Disclaimer: I’m the founder of a hosted Search As A Service and we use ES in a few critical parts of our infrastructure.

mtrn 12 years ago |

Elasticsearch is a really great piece of software because it makes the simple easy and the complicated possible.

vhost- 12 years ago |

I'd be curious to see how well Elastic Search holds up to Endeca. I'm currently stuck maintaining some Endeca instances and it's a nightmare. I wish I could go back to ES.

At my last place of work, ES was beautiful and required little work to get a very fast, workable search in place.

quicksilver03 12 years ago | |

FYI, at my shop we use Oracle Commerce (ATG) and we've seen Oracle's salespeople pushing Endeca to all current and new customers.

For our current project we went with ElasticSearch and we're quite happy. One of the contributing factors was that one of our most experienced guys was unable to get the damn thing installed, even with the help of one Endeca consultant.

pyotrgalois 12 years ago |

Great news. In every new project that we create (in general REST JSON APIs made with nodejs, erlang or rails that are consumed by iOS and android clients) we always finish using postgresql, redis and elasticsearch. Great tools.

kailuowang 12 years ago |

Congratulations to the team. This is a great library that we really appreciate.

willcodeforfoo 12 years ago |

Congrats! Elasticsearch is one of my favorite recent pieces of technology.

rartichoke 12 years ago |

ES is one of the few techs that I seriously love.

The rails support for it is amazing too. The guy creating the rails integration lib is really talented and active.

elchief 12 years ago |

Anybody know if elasticsearch does multiword synonyms properly? (Solr doesn't). Thx

skarnik 12 years ago |

congrats to the team!

dreamdu5t 12 years ago |

We recently switched from using MixPanel + Crittercism + Sphinx to using qbox.io (hosted elasticsearch) and Kibana to do all our analytics, crash reporting, and search.

I can't recommend qbox.io enough! Point-and-click scaling of managed elasticsearch clusters + Kibana == bliss.