Rendezvous Hashing Explained(randorithms.com) |
Rendezvous Hashing Explained(randorithms.com) |
[1]: https://dgryski.medium.com/consistent-hashing-algorithmic-tr...
Btw, you might want to add weights, they are very satisfying, once you understand them.
The way weighted rendezvous hashing works is really beautiful, and easy enough to derive semi-formally to explain in a short blog post.
Based on my experience implementing the skeleton-based variant, it does add a fair bit of additional complexity compared to plain rendez-vous hashing but IMO, it's still much simpler to implement than ring-based consistent hashing (with optimizations).
The main downside of skeleton-based rendez-vous hashing compared to ring-based consistent hashing is that when a server (aka 'site') leaves the cluster, you need to iterate over all the keys in order to do the re-mapping. Skeleton-based rendez-vous hashing minimizes the number of keys which will need to be moved when a server leaves the cluster (at least as well as consistent hashing does), but you still need to go through every key in order to check whether or not it has moved; there are no shortcuts to check only a subset of keys. With consistent hashing, if a server leaves the cluster, you only need to check keys which fall within the affected 'sector' of the ring so you can completely ignore all the other keys.
That said, I find that skeleton-based rendez-vous hashing is simple to implement and maintain, it's fast and it spreads keys pretty evenly (especially in the long run). Consistent hashing can become uneven over time if multiple nearby severs/sites leave the ring; that's why some solutions make use of 'virtual sites/servers' which never leave the ring (but this adds complexity and overhead).
Did the author mean to say "use the key as a hash seed?" Otherwise, I don't see how using server ID as a seed makes the hash dependent on the key.
“Use the server id as the hash seed” -> apart from the encryption, seed the result by first hashing the unique id then updating with the actual content before finalizing the hash, depending on the hash algorithm in question akin to keyed_hash(unique_id, actual_payload)
It doesn't make sense: how do you move the keys from an offline server? The pictures in the post suggest that keys are not redundantly stored on other servers.
The default for Tahoe-LAFS (a distributed filesystem that uses a form of Rendezvous Hashing) uses erasure coding to split each file into (by default) 10 segments of which any 3 are necessary to reconstruct the file. Those 10 segments are then stored across the first 10 servers in the list. [https://tahoe-lafs.readthedocs.io/en/latest/architecture.htm...]
That way, even when servers go away with your data (whether due to crash or even network partitions(!)), you still have a decent chance to locate your data.