The opposite of a bloom filter(somethingsimilar.com) |
The opposite of a bloom filter(somethingsimilar.com) |
It is pretty easy (an exercise) to implement the "opposite of a Bloom filter" if you start from a summary of the complete set of events and support deletion, rather than starting from the empty set and supporting addition.
What makes everything seem hard is the (often unstated) requirement that you start from an empty set and support addition, which is roughly as hard as implementing a Bloom filter that starts from the complete set and supports deletion. Neither of the links make this requirement explicit (though, it is implicit in their "motivation" sections).
The article doesn't mention the false negative rate. Unlike a Bloom filter, it'll depend on order when the input includes repeated elements. But in general, required memory will increase quadratically in the number of items stored at a constant false negative rate (because of the "birthday paradox").
So it isn't the opposite of a Bloom filter. But what is?
I'm afraid that is not how it works. A Bloom filter can tell whether an item may be in the set (false positive) but can definitely tell an item is NOT in the set (no false negative).
These are logically equivalent.
At first glance I do disagree that his/her solution will be as efficient in terms of size of the set and as performant as bloom filters or cuckoo filters. But I would have to benchmark it to be sure.
Item is not a reference to the physical data structure in memory. If you treat it as a black box where you put items in and then ask it if it has seen the 'item' then the wording is more clear.
No one is claiming that "if the hash matches the object must be there". They're saying "if the object is there, the hash will match". This is logically equivalent to saying "if the hash does not match, the object is not there".
“contains implies report” is different than “report implies contains”. You are (correctly) arguing the against latter statement, which GP does not make.
I think what's going on here is that you're reading "if" as meaning "whether". In fairness, this is common English usage: "I'll tell you tomorrow if I'm going" usually means "I'll tell you tomorrow whether I'm going".
That's not what the OP means. The correct reading is perhaps clarified with a comma:
> A Bloom filter is guaranteed to report correctly, if it contains the item.
or perhaps better
> If it contains the item, a Bloom filter is guaranteed to report correctly.
IMO it is better to present the Bloom filter by asking/stating what one can be certain of given each return value of the Bloom filter.
If it contains the item -> reports that it contains the item (correctly)
If it doesn't contain the item -> may report that it contains the item (incorrectly)