Cuckoo Filter: Approximate Membership That Supports Deletion

A Cuckoo Filter is a probabilistic data structure for approximate set membership. Like a Bloom filter it answers “is this element possibly in the set?” with a tunable false-positive rate and zero false negatives — but unlike a standard Bloom filter it also supports deleting elements, is more space-efficient at low false-positive rates, and is more cache-friendly on lookups.

Cuckoo Filter vs. Bloom Filter Link to heading

DimensionBloom FilterCuckoo Filter
False positivesYesYes
False negativesNoNo
DeletionNot supported (standard)Supported
Lookup accessk hashes, k random probesAt most 2 buckets, cache-friendly
Space (at low fpr)LargerUsually smaller
Behavior when fullNever “full”, fpr just risesInsert can fail (needs load-factor headroom)

Rule of thumb: when the target false-positive rate is below ~3%, a Cuckoo Filter is usually more space-efficient than a Bloom filter — and native deletion support is its biggest selling point.

Core Idea Link to heading

A Cuckoo Filter does not store the elements themselves. It stores a small fingerprint of each element — a short hash (e.g. 8–16 bits). The structure is an array of buckets, each holding a few slots (typically 4 slots per bucket).

The key trick is partial-key cuckoo hashing. Every element has two candidate buckets:

i1 = hash(x)
i2 = i1 XOR hash(fingerprint(x))

Because i2 = i1 XOR hash(fp), you can compute “the other candidate bucket” from either bucket index and the fingerprint alone. This is what lets the filter relocate or delete entries without needing the original element x — the fingerprint stored in the bucket is enough. It is also why deletion and fingerprint-only storage are possible at the same time.

The clearest way to build intuition is to watch it happen. Type a few keys below and add them: each key is hashed to a fingerprint, flies into one of its two candidate buckets, and once both are full you’ll see the signature cuckoo behavior — an existing fingerprint gets kicked out and relocates to its own alternate bucket, sometimes cascading several hops before everything settles.

Interactive Cuckoo Filter
m = 8 buckets · b = 4 slots · 8-bit fingerprint
Type a key and hit Add. Watch it hash to a fingerprint, fly into one of its two candidate buckets, and trigger a cascade of evictions when both are full.
stored fingerprint candidate bucket (i1 / i2) being kicked out

Operations Link to heading

  • Lookup: compute i1 and i2, then check whether either bucket holds a matching fingerprint. A match means “possibly present”; no match means “definitely absent”. At most two memory accesses.
  • Delete: find the matching fingerprint in i1/i2 and remove it. This is only safe for elements that were actually inserted — deleting an element that was never inserted but happens to share a fingerprint will corrupt a real element’s membership.
  • Insert:
    1. Compute the fingerprint and the two candidate buckets; if either has a free slot, place it there.
    2. If both are full, pick one candidate bucket at random and kick out an existing fingerprint, taking its slot.
    3. The evicted fingerprint is relocated to its other candidate bucket (computed via the XOR formula above).
    4. If that bucket is also full, the eviction cascades — like a cuckoo taking over a nest — until something fits or a maximum kick count (MaxKicks, e.g. 500) is reached.
    5. If MaxKicks is exceeded and the element still does not fit, the insert is declared a failure / table full, and the filter must be resized or rebuilt.

False Positives and Parameters Link to heading

  • The false-positive rate is approximately 2b / 2^f, where b is the number of slots per bucket and f is the fingerprint length in bits. Longer fingerprints and fewer slots per bucket lower the fpr.
  • A typical configuration is 4 slots per bucket with an 8-bit fingerprint, reaching a load factor of about 95% at roughly a 3% false-positive rate. Lengthen the fingerprint for a lower fpr.
  • Leave load-factor headroom: a near-full table causes frequent evictions and eventually insert failures. In practice, keep occupancy below ~95%.

Properties and Pitfalls Link to heading

  • Limited duplicate inserts: a given (bucket, fingerprint) pair holds at most 2b copies; inserting the same element more than 2b times triggers an insert failure. Counting scenarios should use a Counting variant.
  • Delete with care: only delete elements that were actually inserted. Deleting a never-inserted element whose fingerprint collides with a real one breaks the real element’s membership (introduces a false negative).
  • Not easily resizable: like a Bloom filter, growing the structure generally means rebuilding it.
  • Variants: Counting Cuckoo Filter (supports counts), Vacuum Filter (better space/scaling characteristics).

In Practice (RedisBloom) Link to heading

  • CF.RESERVE key capacity to create, CF.ADD key item to add, CF.EXISTS key item to query, CF.DEL key item to delete.
  • CF.ADDNX adds only if absent; CF.COUNT estimates how many times an element appears.
  • A good fit for dedup / cache-penetration-prevention scenarios that need deletion. For append-only use cases, a Bloom filter is often simpler.

When to Choose It Link to heading

  • You need to delete elements and can tolerate a very low false-positive rate → prefer a Cuckoo Filter.
  • You want to be more memory-efficient at a low false-positive rate (<3%) → a Cuckoo Filter usually wins.
  • Append-only, and you want the simplest possible implementation → a Bloom filter is enough.

Further Reading Link to heading