Full text search in your Data: How you can do better than Elasticsearch(blog.algolia.com) |
Full text search in your Data: How you can do better than Elasticsearch(blog.algolia.com) |
This is a false statement,
http://www.elasticsearch.org/guide/reference/query-dsl/custo...
This combined with scripts give you unlimited possibilities to alter you score based on whatever you please. The syntax is a bit wonky in the current version perhaps but awesomeness is on the way:
https://github.com/elasticsearch/elasticsearch/issues/3423
"Unfortunately Elasticsearch fuzzy matching does not work out of the box, is complex to customize, and does not provide the ability to highlight prefixes."
There are more ways to catch typo's then fuzzy and levensteins, ngrams for instance. Elasticsearch allows you to do both but yes its true you have to know your way around analyzers/tokenizers and mapping a little bit to get the best results in elasticsearch. If you use the ngrams approach highlighting also works alot better.
"This sorting configuration might seem pretty explicit, but it is in fact quite dangerous as it conflicts with the boost on fields. To better understand the problem, let’s look at the query ‘the rains’:"
Its true sorting trumps boosting, but given the assumption you cannot alter _score this whole section seems contrived.
In the instant search section they use elasticsearch's querstring query to search for `world w*` this is indeed a very slow way since it will generate a wildcard query in the background they probably should have written the query using a phrase prefix query.
1) You are right that it is possible to mix both popularity and relevance, but you need to use boost and store everything in the float _score. This is dangerous and has side effects (for example you have a big risk of obtaining at some point a hit with typos before an exact one). It is really difficult to control ranking with boosts.
2) The ngrams approach is indeed an alternative. But it also has major drawbacks in term of relevance, mainly for the proximity between terms.
3) Phrase query is a good way to improve performance but it breaks user experience if the terms are not close together (these hits are not in the search results). It's better to let the proximity do its job and impact the ranking.