Hello, as I was looking at Lucene core, I was amazed that the ASCII Folding filter was implemented as a huge switch/case statement which is then compiled as a big lookup table and a lot of branches. Since this single filter is critical for many companies using Solr, Elasticsearch or Tantivy, I wanted to explore other ways to implement it. I have not yet benchmark the branchless implementation, I expect it to be slower when dealing with english or latin inputs and to be faster when dealing with easterns languages. Next time, I might try to implement it using SIMD instructions. Also note that this is an experiment and that is was not yet evaluated against the unit tests provided by Lucene. |
No comments yet