CachingIntermediateInteractive10 min exploration

Thundering Herd Simulator

Understanding how cache expiration can overwhelm databases and how to prevent cascade failures

thundering-herdcache-stampederequest-coalescingperformancereliability

How this simulation works

Use the interactive controls below to adjust system parameters and observe how they affect performance metrics in real-time. The charts update instantly to show the impact of your changes, helping you understand system trade-offs and optimal configurations.

Simulation Controls

200 users

Number of simultaneous users making requests

Time-to-live for cached data

10 req/min

Requests per minute from each user

50 req/s

Maximum concurrent requests database can handle

95 %

Percentage of requests served from cache when healthy

How to prevent thundering herd problems

Time for database to respond to each query

5 ms

Time for cache to respond to requests

Current Metrics

Database Load
Percentage of database capacity currently used
%
Average Response Latency
Average time to serve user requests
ms
Error Rate
Percentage of requests failing due to overload
%
System Throughput
Successfully processed requests per second
req/s
Mitigation Effectiveness
How much the strategy reduces database load
%
User Experience Score
Combined metric of latency and reliability (1-100)
score

Performance Metrics

Real-time performance metrics based on your configuration

Database Load

Percentage of database capacity currently used

Average Response Latency

Average time to serve user requests

Error Rate

Percentage of requests failing due to overload

System Throughput

Successfully processed requests per second

Mitigation Effectiveness

How much the strategy reduces database load

User Experience Score

Combined metric of latency and reliability (1-100)

Configuration Summary

Current Settings

Concurrent Users:200 users
Cache TTL:300
Request Rate per User:10 req/min
Database Capacity:50 req/s

Key Insights

Database Load:
Average Response Latency:
Error Rate:
System Throughput:

Optimization Tips

Experiment with different parameter combinations to understand the trade-offs. Notice how changing one parameter affects multiple metrics simultaneously.

Thundering Herd Simulator

This simulation demonstrates one of the most dangerous cache failure modes: the thundering herd. When a popular cache entry expires, hundreds of requests can simultaneously hit your database, causing cascade failures. Learn how to prevent this disaster from our Caches Lie: Consistency Isn't Free post.

The Anatomy of a Thundering Herd

What Triggers the Stampede?

T=0: Cache entry for popular data expires T=1: Request #1 detects cache miss → hits database T=2: Request #2 detects cache miss → hits database T=3: Request #3 detects cache miss → hits database ... T=50: Request #200 detects cache miss → hits database T=100: Database becomes overwhelmed, starts timing out T=150: Timeouts cause more retries → even more load T=200: Cascade failure: database goes down completely

The Perfect Storm Conditions

  1. Popular data (high request rate)
  2. Synchronized expiration (TTL-based caching)
  3. Slow database queries (complex computation)
  4. No coordination between requests

Mitigation Strategies Deep Dive

🤝 Request Coalescing

Principle: Only allow one request per cache key to hit the database

import asyncio from typing import Dict, Any class CoalescingCache: def __init__(self): self._cache: Dict[str, Any] = {} self._pending: Dict[str, asyncio.Future] = {} async def get(self, key: str) -> Any: # Check cache first if key in self._cache: return self._cache[key] # Check if already fetching this key if key in self._pending: # Wait for the ongoing request instead of creating new one return await self._pending[key] # Start new fetch and let others wait future = asyncio.create_task(self._fetch_from_database(key)) self._pending[key] = future try: value = await future self._cache[key] = value return value finally: # Clean up pending request del self._pending[key]

✅ Pros: Eliminates duplicate database calls ❌ Cons: Requests wait for single slow request

🎲 Probabilistic Early Expiration

Principle: Randomly refresh cache before TTL expires

import random import time class ProbabilisticCache: def get(self, key: str) -> Any: entry = self._cache.get(key) if not entry: return self._fetch_and_cache(key) # Check if we should refresh early age = time.time() - entry.created_at ttl = entry.ttl # Beta distribution: higher probability as expiry approaches refresh_probability = (age / ttl) ** 2 if random.random() < refresh_probability: # Refresh in background, return current value asyncio.create_task(self._refresh_key(key)) return entry.value

✅ Pros: Spreads refresh load over time ❌ Cons: Some unnecessary database calls

🔄 Background Refresh

Principle: Proactively refresh popular cache entries

class BackgroundRefreshCache: def __init__(self): self._scheduler = BackgroundScheduler() self._scheduler.start() def set(self, key: str, value: Any, ttl: int): self._cache[key] = CacheEntry(value, ttl) # Schedule refresh at 80% of TTL refresh_time = ttl * 0.8 self._scheduler.add_job( self._refresh_key, 'date', run_date=datetime.now() + timedelta(seconds=refresh_time), args=[key] ) def _refresh_key(self, key: str): if key in self._cache: new_value = self._fetch_from_database(key) self.set(key, new_value, self._default_ttl)

✅ Pros: Zero user-facing impact, prevents all stampedes ❌ Cons: Continuous background load, complexity

⚡ Circuit Breaker Pattern

Principle: Fail fast when database is overloaded

class CircuitBreakerCache: def __init__(self, failure_threshold=5, timeout=60): self.failure_threshold = failure_threshold self.timeout = timeout self.failure_count = 0 self.last_failure_time = None self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN def get(self, key: str) -> Any: if self.state == 'OPEN': if time.time() - self.last_failure_time < self.timeout: # Serve stale data or raise error return self._get_stale_or_error(key) else: self.state = 'HALF_OPEN' try: if key not in self._cache: value = self._fetch_from_database(key) self._cache[key] = value # Reset failure count on success if self.state == 'HALF_OPEN': self.state = 'CLOSED' self.failure_count = 0 return self._cache[key] except DatabaseTimeoutException: self.failure_count += 1 self.last_failure_time = time.time() if self.failure_count >= self.failure_threshold: self.state = 'OPEN' return self._get_stale_or_error(key)

✅ Pros: Protects database during outages ❌ Cons: May serve stale/error responses

Performance Impact Analysis

Database Load Patterns

No Mitigation:

Normal Load: ████ 20%
Herd Event:  ████████████████████ 400% (OVERLOADED!)

Request Coalescing:

Normal Load: ████ 20%
Herd Event:  ████ 20% (same as normal)

Background Refresh:

Normal Load: █████ 25% (background jobs)
Herd Event:  █████ 25% (no spikes!)

Latency Characteristics

| Strategy | Normal | During Herd | Recovery Time | |----------|---------|-------------|---------------| | None | 5ms | 2000ms+ | 5+ minutes | | Coalescing | 5ms | 150ms | 30 seconds | | Early Expiry | 6ms | 50ms | Immediate | | Background | 5ms | 5ms | N/A | | Circuit Breaker | 5ms | 10ms* | 60 seconds |

*May serve stale data

Real-World Scenarios

Social Media Hot Posts

Problem: Viral post cache expires, millions hit database TTL: 5 minutes Users: 10,000 concurrent Strategy: Request coalescing + probabilistic early expiry Result: 99.9% database load reduction

E-commerce Flash Sales

Problem: Product details cache expires during sale TTL: 1 minute (fast-changing inventory) Users: 5,000 concurrent Strategy: Background refresh + circuit breaker Result: Zero customer-facing errors

Financial Market Data

Problem: Stock price cache expires during market open TTL: 10 seconds (real-time requirements) Users: 1,000 concurrent Strategy: Probabilistic early expiry only Result: 50% load reduction, acceptable staleness

Interactive Experiments

Experiment 1: Witness the Stampede

  1. Set 200 concurrent users with no mitigation
  2. Use 5-minute TTL and 100ms database latency
  3. Watch database load spike to 200%+ and error rates climb
  4. Observe how user experience score plummets

Experiment 2: Request Coalescing Magic

  1. Keep the same settings as Experiment 1
  2. Switch to request coalescing
  3. Watch database load drop to normal levels
  4. Notice slightly higher latency (requests wait for each other)

Experiment 3: Background Refresh Perfection

  1. Use background refresh strategy
  2. Increase concurrent users to 500
  3. Observe how database load stays steady
  4. See perfect user experience scores

Experiment 4: Circuit Breaker Protection

  1. Set 1000 concurrent users (extreme load)
  2. Use circuit breaker pattern
  3. Watch how it limits database load even when overwhelmed
  4. Note the trade-off: lower error rates but potential stale data

Experiment 5: TTL Impact

  1. Choose your favorite mitigation strategy
  2. Compare 1-minute vs 1-hour TTL
  3. Observe how longer TTLs reduce herd frequency
  4. Consider the staleness trade-off

Production Implementation Guide

Monitoring Dashboards

# Key metrics to track class ThunderingHerdMetrics: def __init__(self): self.cache_miss_rate = Histogram('cache_miss_rate') self.database_connection_pool = Gauge('db_pool_usage') self.request_coalescing_hits = Counter('coalescing_hits') self.early_refresh_rate = Histogram('early_refresh_rate') def track_cache_miss_burst(self, key: str, miss_count: int): if miss_count > 10: # Potential thundering herd self.alert('thundering_herd_detected', { 'key': key, 'miss_count': miss_count, 'timestamp': time.time() })

Alerting Strategy

  • Database connection pool > 80% → Warning
  • Cache miss rate > 20% → Investigation needed
  • Average response time > 500ms → Alert
  • Error rate > 1% → Critical alert

Configuration Guidelines

# Production cache configuration cache: default_ttl: 300 # 5 minutes max_ttl: 3600 # 1 hour # Thundering herd protection coalescing: enabled: true timeout: 5000 # 5 seconds max wait probabilistic_refresh: enabled: true beta: 1.0 # Refresh probability curve background_refresh: enabled: true refresh_ahead_ratio: 0.8 # Refresh at 80% TTL circuit_breaker: failure_threshold: 5 timeout: 60 half_open_max_calls: 3

The Bottom Line

Thundering herds are preventable disasters. The simulation shows that:

  1. Without mitigation: System becomes unusable during peak loads
  2. With basic coalescing: 90%+ improvement in stability
  3. With background refresh: Near-perfect performance
  4. Circuit breakers: Essential for graceful degradation

Key insight: The best strategy often combines multiple techniques. For example:

  • Request coalescing for immediate protection
  • Probabilistic early expiry for load spreading
  • Circuit breakers for worst-case scenarios
  • Background refresh for critical, high-traffic data

Choose your strategy based on your consistency requirements, load patterns, and failure tolerance.

Published on by Anirudh Sharma

Comments

Explore More Interactive Content

Ready to dive deeper? Check out our system blueprints for implementation guides or explore more simulations.