ANN Vector Search with SQL-Powered LSH and Random Projections(clickhouse.com) |
ANN Vector Search with SQL-Powered LSH and Random Projections(clickhouse.com) |
Pinecone https://www.pinecone.io/learn/series/faiss/locality-sensitiv... seem to promote as a viable approach.
We defn don't consider this to be the final solution for ANN and hence are investing in other graph based techniques - https://clickhouse.com/docs/en/engines/table-engines/mergetr...
Typically, LSH is for low dimensions - you split the space to cells by the hash value, use the hash value as an index, and look up one to a few values for every key (so you have some overlap).
In contrast, in the article, we use the value not as a hash but to construct a metric, approximating the original metric space. Then, we do a full scan with filtering by the approximate distance. It speeds up the search because a low amount of data is scanned, but it is not like using the hash for direct lookups.
You can name it differently, e.g., "quantization into bit-vectors".