If you’ve ever implemented a cache with TTLs (time-to-live), you’ve likely faced a tricky balance: how do you efficiently expire entries without slowing down writes or wasting memory?
Redis - the in-memory data store famous for its sub-millisecond latency, has an elegant solution. It doesn’t rely on complex data structures like heaps or sorted lists to manage expirations. Instead, it uses a little randomness and a lot of clever engineering.
Let’s unpack how it works.
The Naïve Approach
Imagine you’re designing your own cache. You want every entry to expire after some duration. The straightforward solution is:
- Keep a sorted list (or min-heap) of keys by expiration time.
- Periodically check the earliest ones and delete any that are past due.
This works… but it’s not cheap. Each insertion into a heap costs O(log n), and with millions of keys, that adds up.
Redis was designed for millions of writes per second , an O(log n) write path simply wouldn’t fly. So how does Redis avoid this?
Redis’ Two-Pronged Approach
Redis uses two complementary mechanisms to handle expirations efficiently:
1. Lazy Expiration
Every time a key is accessed (say, via GET or HGET), Redis checks whether it’s expired.
If yes, it deletes it right away and pretends the key never existed. That’s free cleanup , if the key is ever accessed again.
But what about keys that expire and are never touched? That’s where the second mechanism kicks in.
2. Active Expiration (the “Random Cleanup Loop”)
Internally, Redis keeps a separate dictionary called expires, which maps keys to their expiration times:
expires["foo"] = 1730665000000 // Unix time in ms
Only keys with a TTL are in this dictionary.
About 10 times per second, Redis runs a background process (activeExpireCycle) that does this:
- Sample 20 random keys from the
expiresdictionary. - If a sampled key’s TTL has passed, delete it.
- If more than 25% of the sampled keys were expired, run another round, up to a CPU time limit.
Here’s a simplified version of the logic:
for each db in redis:
repeat:
expired_count = 0
for i in 1..20:
key = random_key(db.expires)
if key.is_expired():
delete(key)
expired_count++
if expired_count < 5: // <25%
break
That’s it. No global heap. No O(log n) operations. Just random sampling and adaptive cleanup.
How Does Redis Pick “Random” Keys?
This is where the design really shines. Redis’ hash tables support O(1) random sampling. Internally, random_key picks a random bucket in the hash table and returns the first key it finds there. That means Redis can fetch random keys without scanning the entire dictionary or maintaining a separate index. Do this 20 times, and you get 20 random TTL’d keys, fast !!!
Why Randomness Works
At first glance, this sounds too hand-wavy. What if Redis just keeps picking unexpired keys?
Here’s the probabilistic magic:
If a large portion of keys are expired (say 30–40%), the chance of picking an expired key in 20 samples is very high. When that happens, Redis immediately runs another round. It keeps looping until fewer than 25% of sampled keys are expired.
Statistically, the system self-regulates:
Redis automatically cleans more aggressively when there’s a backlog of expired keys, and relaxes when things are tidy. It’s a feedback loop powered by probability.
The Payoff
This design yields three powerful benefits:
| Aspect | Traditional Approach | Redis’ Approach |
|---|---|---|
| Write complexity | O(log n) | O(1) |
| Expiration precision | Exact | Approximate |
| Memory cleanup | Deterministic | Probabilistic but effective |
| Throughput | Moderate | Extremely high |
Tuning the Behavior
A few internal settings control this behavior:
| Config | Description | Default |
|---|---|---|
hz | Frequency of background tasks | 10 (10Hz = 10 times/sec) |
ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP | Keys sampled per round | 20 |
lazyfree-lazy-expire | Whether deletion happens asynchronously | no |
These can be tuned for large deployments where millions of keys expire frequently, but defaults work beautifully for most workloads.
A Lesson in Systems Design
Redis’ key expiration strategy is a case study in probabilistic system design using randomness to balance performance and correctness.
Rather than maintaining a perfectly accurate expiration queue, Redis accepts approximate cleanup in exchange for blazing speed.
And because of statistical feedback, it still converges to the right state.
It’s a reminder that sometimes, the smartest engineering move isn’t to be perfect , it’s to be good enough, fast.
Final Thought
Redis shows that randomness is not the enemy of determinism —when wielded carefully, it’s a tool for building fast, self-correcting systems.
So the next time you’re designing a cache or scheduler, ask yourself:
Do I really need a perfect global order… or can a bit of randomness do the job?
