Bloom Filters
Modern systems often need to answer one simple question extremely fast:
“Have we seen this item before?”
But storing millions (or billions) of items in memory is expensive. Bloom filters solve this problem using very little memory.
They trade perfect accuracy for efficiency:
- ✅ No false negatives
- ⚠️ Possible false positives
What Is a Bloom Filter?
A Bloom filter is a probabilistic data structure used to test whether an element is in a set.
It can answer:
- Definitely not present
- Maybe present
How It Works
A Bloom filter consists of:
- A bit array
- Multiple hash functions
Initial bit array: [0 0 0 0 0 0 0 0 0 0]---
Step 1: Adding an Element
Insert "apple":
hash1("apple") → 2
hash2("apple") → 5
hash3("apple") → 7
Set those positions to 1:
[0 0 1 0 0 1 0 1 0 0]
Step 2: Adding Another Element
Insert "banana":
hash1("banana") → 3
hash2("banana") → 5
hash3("banana") → 8
Updated bit array:
[0 0 1 1 0 1 0 1 1 0]
Notice how some bits overlap (like position 5). This is expected.
Step 3: Checking Membership
Now let’s check elements.
Case 1: Maybe Present
Check "apple" hash → 2, 5, 7
bit[2] = 1 bit[5] = 1 bit[7] = 1
All bits are 1 → Maybe present
It might be present, or those bits were set by other elements.
---Case 2: Definitely NOT Present (Important)
Check "orange" hash → 1, 4, 9
bit[1] = 0 bit[4] = 0 bit[9] = 0
At least one bit is 0 → Definitely NOT present
This is the most valuable property of Bloom filters.
Example Implementation (Python)
Here is a simple implementation of a Bloom filter in Python:
import mmh3
from bitarray import bitarray
class BloomFilter:
def __init__(self, size, hash_count):
self.size = size
self.hash_count = hash_count
self.bits = bitarray(size)
self.bits.setall(0)
def add(self, item):
for i in range(self.hash_count):
index = mmh3.hash(item, i) % self.size
self.bits[index] = 1
def contains(self, item):
for i in range(self.hash_count):
index = mmh3.hash(item, i) % self.size
if self.bits[index] == 0:
return False
return True
Usage
bf = BloomFilter(1000, 3)
bf.add("apple")
bf.add("banana")
print(bf.contains("apple")) # True (maybe present)
print(bf.contains("grape")) # False OR True (possible false positive)
Explanation:
- add() → sets bits using hash functions
- contains() → checks if all bits are set
- If any bit is 0 → definitely not present
Why This Matters
Bloom filters are powerful because they can eliminate unnecessary work.
Example: Database
Query: user123 Bloom filter → bit = 0
Result: Skip disk read completely 🚀
---Simple Rule
Any bit = 0 → Definitely NOT present All bits = 1 → Maybe present---
Where Bloom Filters Are Used
- Databases (Cassandra, HBase)
- Caching systems (Redis)
- Web crawlers
- Distributed systems
Summary
Bloom filters are a simple yet powerful idea:
- Very fast
- Memory efficient
- Perfect for large-scale systems
If you're building systems at scale, Bloom filters are a must-know tool.