Explained: Bloom Filters

Codelooru Bloom Filters
Codelooru Bloom Filters

Bloom Filters

Modern systems often need to answer one simple question extremely fast:

“Have we seen this item before?”

But storing millions (or billions) of items in memory is expensive. Bloom filters solve this problem using very little memory.

They trade perfect accuracy for efficiency:

  • ✅ No false negatives
  • ⚠️ Possible false positives

What Is a Bloom Filter?

A Bloom filter is a probabilistic data structure used to test whether an element is in a set.

It can answer:

  • Definitely not present
  • Maybe present

How It Works

A Bloom filter consists of:

  • A bit array
  • Multiple hash functions
Initial bit array:

[0 0 0 0 0 0 0 0 0 0]
---

Step 1: Adding an Element

Insert "apple":

hash1("apple") → 2
hash2("apple") → 5
hash3("apple") → 7

Set those positions to 1:

[0 0 1 0 0 1 0 1 0 0]

Step 2: Adding Another Element

Insert "banana":

hash1("banana") → 3
hash2("banana") → 5
hash3("banana") → 8

Updated bit array:

[0 0 1 1 0 1 0 1 1 0]

Notice how some bits overlap (like position 5). This is expected.

Codelooru: Bloom Filter Adding Element Diagram

Step 3: Checking Membership

Now let’s check elements.

Case 1: Maybe Present

Check "apple"

hash → 2, 5, 7
bit[2] = 1
bit[5] = 1
bit[7] = 1

All bits are 1 → Maybe present

It might be present, or those bits were set by other elements.

---

Case 2: Definitely NOT Present (Important)

Check "orange"

hash → 1, 4, 9
bit[1] = 0
bit[4] = 0
bit[9] = 0

At least one bit is 0 → Definitely NOT present

This is the most valuable property of Bloom filters.

Codelooru: Bloom Filter Checking Element Present Diagram

Example Implementation (Python)

Here is a simple implementation of a Bloom filter in Python:


import mmh3
from bitarray import bitarray

class BloomFilter:
    def __init__(self, size, hash_count):
        self.size = size
        self.hash_count = hash_count
        self.bits = bitarray(size)
        self.bits.setall(0)

    def add(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            self.bits[index] = 1

    def contains(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            if self.bits[index] == 0:
                return False
        return True

Usage


bf = BloomFilter(1000, 3)

bf.add("apple")
bf.add("banana")

print(bf.contains("apple"))   # True (maybe present)
print(bf.contains("grape"))   # False OR True (possible false positive)

Explanation:

  • add() → sets bits using hash functions
  • contains() → checks if all bits are set
  • If any bit is 0 → definitely not present
---

Why This Matters

Bloom filters are powerful because they can eliminate unnecessary work.

Example: Database

Query: user123
Bloom filter → bit = 0

Result: Skip disk read completely 🚀

---

Simple Rule

Any bit = 0  → Definitely NOT present
All bits = 1 → Maybe present
---

Where Bloom Filters Are Used

  • Databases (Cassandra, HBase)
  • Caching systems (Redis)
  • Web crawlers
  • Distributed systems
---

Summary

Bloom filters are a simple yet powerful idea:

  • Very fast
  • Memory efficient
  • Perfect for large-scale systems

If you're building systems at scale, Bloom filters are a must-know tool.



×