CachingIntermediateInteractive10 min exploration

Thundering Herd Simulator

Understanding how cache expiration can overwhelm databases and how to prevent cascade failures

thundering-herdcache-stampederequest-coalescingperformancereliability

How this simulation works

Use the interactive controls below to adjust system parameters and observe how they affect performance metrics in real-time. The charts update instantly to show the impact of your changes, helping you understand system trade-offs and optimal configurations.

Simulation Controls

Concurrent Users200 users

Number of simultaneous users making requests

Cache TTL

Time-to-live for cached data

Request Rate per User10 req/min

Requests per minute from each user

Database Capacity50 req/s

Maximum concurrent requests database can handle

Normal Cache Hit Rate95 %

Percentage of requests served from cache when healthy

Mitigation Strategy

How to prevent thundering herd problems

Database Query Latency

Time for database to respond to each query

Cache Response Latency5 ms

Time for cache to respond to requests

Current Metrics

Database Load

Percentage of database capacity currently used

Average Response Latency

Average time to serve user requests

Error Rate

Percentage of requests failing due to overload

System Throughput

Successfully processed requests per second

req/s

Mitigation Effectiveness

How much the strategy reduces database load

User Experience Score

Combined metric of latency and reliability (1-100)

score

Performance Metrics

Real-time performance metrics based on your configuration

Database Load

Percentage of database capacity currently used

Average Response Latency

Average time to serve user requests

Error Rate

Percentage of requests failing due to overload

System Throughput

Successfully processed requests per second

Mitigation Effectiveness

How much the strategy reduces database load

User Experience Score

Combined metric of latency and reliability (1-100)

Configuration Summary

Current Settings

Concurrent Users:200 users

Cache TTL:300

Request Rate per User:10 req/min

Database Capacity:50 req/s

Key Insights

Database Load:

Average Response Latency:

Error Rate:

System Throughput:

Optimization Tips

Experiment with different parameter combinations to understand the trade-offs. Notice how changing one parameter affects multiple metrics simultaneously.

Thundering Herd Simulator

This simulation demonstrates one of the most dangerous cache failure modes: the thundering herd. When a popular cache entry expires, hundreds of requests can simultaneously hit your database, causing cascade failures. Learn how to prevent this disaster from our Caches Lie: Consistency Isn't Free post.

The Anatomy of a Thundering Herd

What Triggers the Stampede?

T=0:     Cache entry for popular data expires
T=1:     Request #1 detects cache miss → hits database
T=2:     Request #2 detects cache miss → hits database
T=3:     Request #3 detects cache miss → hits database
...
T=50:    Request #200 detects cache miss → hits database
T=100:   Database becomes overwhelmed, starts timing out
T=150:   Timeouts cause more retries → even more load
T=200:   Cascade failure: database goes down completely

The Perfect Storm Conditions

Popular data (high request rate)
Synchronized expiration (TTL-based caching)
Slow database queries (complex computation)
No coordination between requests

Mitigation Strategies Deep Dive

🤝 Request Coalescing

Principle: Only allow one request per cache key to hit the database

import asyncio
from typing import Dict, Any

class CoalescingCache:
    def __init__(self):
        self._cache: Dict[str, Any] = {}
        self._pending: Dict[str, asyncio.Future] = {}

    async def get(self, key: str) -> Any:
        # Check cache first
        if key in self._cache:
            return self._cache[key]

        # Check if already fetching this key
        if key in self._pending:
            # Wait for the ongoing request instead of creating new one
            return await self._pending[key]

        # Start new fetch and let others wait
        future = asyncio.create_task(self._fetch_from_database(key))
        self._pending[key] = future

        try:
            value = await future
            self._cache[key] = value
            return value
        finally:
            # Clean up pending request
            del self._pending[key]

✅ Pros: Eliminates duplicate database calls ❌ Cons: Requests wait for single slow request

🎲 Probabilistic Early Expiration

Principle: Randomly refresh cache before TTL expires

import random
import time

class ProbabilisticCache:
    def get(self, key: str) -> Any:
        entry = self._cache.get(key)
        if not entry:
            return self._fetch_and_cache(key)

        # Check if we should refresh early
        age = time.time() - entry.created_at
        ttl = entry.ttl

        # Beta distribution: higher probability as expiry approaches
        refresh_probability = (age / ttl) ** 2

        if random.random() < refresh_probability:
            # Refresh in background, return current value
            asyncio.create_task(self._refresh_key(key))

        return entry.value

✅ Pros: Spreads refresh load over time ❌ Cons: Some unnecessary database calls

🔄 Background Refresh

Principle: Proactively refresh popular cache entries

class BackgroundRefreshCache:
    def __init__(self):
        self._scheduler = BackgroundScheduler()
        self._scheduler.start()

    def set(self, key: str, value: Any, ttl: int):
        self._cache[key] = CacheEntry(value, ttl)

        # Schedule refresh at 80% of TTL
        refresh_time = ttl * 0.8
        self._scheduler.add_job(
            self._refresh_key,
            'date',
            run_date=datetime.now() + timedelta(seconds=refresh_time),
            args=[key]
        )

    def _refresh_key(self, key: str):
        if key in self._cache:
            new_value = self._fetch_from_database(key)
            self.set(key, new_value, self._default_ttl)

✅ Pros: Zero user-facing impact, prevents all stampedes ❌ Cons: Continuous background load, complexity

⚡ Circuit Breaker Pattern

Principle: Fail fast when database is overloaded

class CircuitBreakerCache:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN

    def get(self, key: str) -> Any:
        if self.state == 'OPEN':
            if time.time() - self.last_failure_time < self.timeout:
                # Serve stale data or raise error
                return self._get_stale_or_error(key)
            else:
                self.state = 'HALF_OPEN'

        try:
            if key not in self._cache:
                value = self._fetch_from_database(key)
                self._cache[key] = value

            # Reset failure count on success
            if self.state == 'HALF_OPEN':
                self.state = 'CLOSED'
                self.failure_count = 0

            return self._cache[key]

        except DatabaseTimeoutException:
            self.failure_count += 1
            self.last_failure_time = time.time()

            if self.failure_count >= self.failure_threshold:
                self.state = 'OPEN'

            return self._get_stale_or_error(key)

✅ Pros: Protects database during outages ❌ Cons: May serve stale/error responses

Performance Impact Analysis

Database Load Patterns

No Mitigation:

Normal Load: ████ 20%
Herd Event:  ████████████████████ 400% (OVERLOADED!)

Request Coalescing:

Normal Load: ████ 20%
Herd Event:  ████ 20% (same as normal)

Background Refresh:

Normal Load: █████ 25% (background jobs)
Herd Event:  █████ 25% (no spikes!)

Latency Characteristics

| Strategy | Normal | During Herd | Recovery Time | |----------|---------|-------------|---------------| | None | 5ms | 2000ms+ | 5+ minutes | | Coalescing | 5ms | 150ms | 30 seconds | | Early Expiry | 6ms | 50ms | Immediate | | Background | 5ms | 5ms | N/A | | Circuit Breaker | 5ms | 10ms* | 60 seconds |

*May serve stale data

Real-World Scenarios

Social Media Hot Posts

Problem: Viral post cache expires, millions hit database
TTL: 5 minutes
Users: 10,000 concurrent
Strategy: Request coalescing + probabilistic early expiry
Result: 99.9% database load reduction

E-commerce Flash Sales

Problem: Product details cache expires during sale
TTL: 1 minute (fast-changing inventory)
Users: 5,000 concurrent
Strategy: Background refresh + circuit breaker
Result: Zero customer-facing errors

Financial Market Data

Problem: Stock price cache expires during market open
TTL: 10 seconds (real-time requirements)
Users: 1,000 concurrent
Strategy: Probabilistic early expiry only
Result: 50% load reduction, acceptable staleness

Interactive Experiments

Experiment 1: Witness the Stampede

Set 200 concurrent users with no mitigation
Use 5-minute TTL and 100ms database latency
Watch database load spike to 200%+ and error rates climb
Observe how user experience score plummets

Experiment 2: Request Coalescing Magic

Keep the same settings as Experiment 1
Switch to request coalescing
Watch database load drop to normal levels
Notice slightly higher latency (requests wait for each other)

Experiment 3: Background Refresh Perfection

Use background refresh strategy
Increase concurrent users to 500
Observe how database load stays steady
See perfect user experience scores

Experiment 4: Circuit Breaker Protection

Set 1000 concurrent users (extreme load)
Use circuit breaker pattern
Watch how it limits database load even when overwhelmed
Note the trade-off: lower error rates but potential stale data

Experiment 5: TTL Impact

Choose your favorite mitigation strategy
Compare 1-minute vs 1-hour TTL
Observe how longer TTLs reduce herd frequency
Consider the staleness trade-off

Production Implementation Guide

Monitoring Dashboards

# Key metrics to track
class ThunderingHerdMetrics:
    def __init__(self):
        self.cache_miss_rate = Histogram('cache_miss_rate')
        self.database_connection_pool = Gauge('db_pool_usage')
        self.request_coalescing_hits = Counter('coalescing_hits')
        self.early_refresh_rate = Histogram('early_refresh_rate')

    def track_cache_miss_burst(self, key: str, miss_count: int):
        if miss_count > 10:  # Potential thundering herd
            self.alert('thundering_herd_detected', {
                'key': key,
                'miss_count': miss_count,
                'timestamp': time.time()
            })

Alerting Strategy

Database connection pool > 80% → Warning
Cache miss rate > 20% → Investigation needed
Average response time > 500ms → Alert
Error rate > 1% → Critical alert

Configuration Guidelines

# Production cache configuration
cache:
  default_ttl: 300  # 5 minutes
  max_ttl: 3600     # 1 hour

  # Thundering herd protection
  coalescing:
    enabled: true
    timeout: 5000   # 5 seconds max wait

  probabilistic_refresh:
    enabled: true
    beta: 1.0       # Refresh probability curve

  background_refresh:
    enabled: true
    refresh_ahead_ratio: 0.8  # Refresh at 80% TTL

  circuit_breaker:
    failure_threshold: 5
    timeout: 60
    half_open_max_calls: 3

The Bottom Line

Thundering herds are preventable disasters. The simulation shows that:

Without mitigation: System becomes unusable during peak loads
With basic coalescing: 90%+ improvement in stability
With background refresh: Near-perfect performance
Circuit breakers: Essential for graceful degradation

Key insight: The best strategy often combines multiple techniques. For example:

Request coalescing for immediate protection
Probabilistic early expiry for load spreading
Circuit breakers for worst-case scenarios
Background refresh for critical, high-traffic data

Choose your strategy based on your consistency requirements, load patterns, and failure tolerance.

Published on October 5, 2025 by Anirudh Sharma

← Back to Simulations

Comments

Explore More Interactive Content

Ready to dive deeper? Check out our system blueprints for implementation guides or explore more simulations.

View System Blueprints More Simulations