Age-Partitioned Bloom Filters(arxiv.org) |
Age-Partitioned Bloom Filters(arxiv.org) |
For those that don't know, bloom filters are really good at determining when something is not in a set. So if says "not in set" that's 100% accurate and super fast.
We used this to our advantage at reddit. When you load a comments page, we had to determine if you've voted on anything. So with the bloom filter in front, the query for "have they voted on anything here" was very quick if the answer was no.
But we still had to make the query. So another hack we did was to put a value on your user object to track the last time you interacted with the website. If the comments page was made after your last interaction, then we didn't even have to do the query.
With this, we wouldn't have to do the two step process -- it sounds like this would be super fast when it was time boxed. Maybe...
We used it in front of a Radix Tree that at places had too much depth. The filter took something like 2MiB at 7% false-positive rate vs 100+ MiB for the Radix Tree. We later bitmap-compressed the Radix Tree [0] to under 5MiB at the cost of making the lookups very very slow (P99 at 4ms, P50 at 1ms compared to P99 at 400ns before, iirc) but added a TinyLFU-based [1] lean-HAMT [2] cache of 512KiB in the front to speed lookups of popular items and a Robinhood cache [3] of 256KiB to take care of the long-tail KVs.
[0] https://news.ycombinator.com/item?id=2348619
[1] http://highscalability.com/blog/2016/1/25/design-of-a-modern...
[2] https://blog.acolyer.org/2015/11/27/hamt
[3] https://blog.acolyer.org/2018/10/26/robinhood-tail-latency-a...
Agreed. It allows for false positives, but disallows false negatives.
Here’s the Go implementation: https://github.com/nehbit/aether/blob/master/aether-core/aet...
I used mine for authentication rejection.
Not sure if your app is big enough to care, but it might be you want to normalize the response times, so server load is reduced but auth time is fixed.
I don't work on that anymore though. I might do something similar in the future, but being able to detect when admins are snoozing is low on my list of concerns; if it's designed properly, it'll be cheaper to hire PIs to just look and see when they're snoozing.
As long as the "count" doesn't overflow to the max size (in this case: 255), you can always "remove" items from a Counting Bloom Filter. In effect, a traditional 1-bit bloom filter is simply a counting-bloom filter that "overflows" on the first hit.
If people really need deletion on their bloom filter, do realize it is possible! (as long as you select a proper bit-size).