Multi-tier Cache Hierarchy
Understanding how cache hierarchies balance speed, capacity, and cost in modern distributed systems
How this simulation works
Use the interactive controls below to adjust system parameters and observe how they affect performance metrics in real-time. The charts update instantly to show the impact of your changes, helping you understand system trade-offs and optimal configurations.
Simulation Controls
Application-level cache (fastest, smallest)
Local Redis cache (fast, medium size)
Distributed cache cluster (slower, largest)
How requests are distributed across data
Total requests per second across all tiers
Total unique items that could be requested
How data moves up the cache hierarchy
How to remove items when cache is full
Current Metrics
Performance Metrics
Real-time performance metrics based on your configuration
Overall Hit Rate
Percentage of requests served from any cache tier
Average Latency
Weighted average response time across all tiers
L1 Hit Rate
Percentage of requests served from L1 cache
L2 Hit Rate
Percentage of L1 misses served from L2 cache
L3 Hit Rate
Percentage of L2 misses served from L3 cache
Memory Efficiency
How well cache space is utilized across tiers
Network Traffic
Data transferred between cache tiers (MB/s)
Configuration Summary
Current Settings
Key Insights
Optimization Tips
Experiment with different parameter combinations to understand the trade-offs. Notice how changing one parameter affects multiple metrics simultaneously.
Multi-tier Cache Hierarchy
This simulation models the layered caching approach used by major tech companies to achieve both ultra-low latency and massive scale. Explore how L1/L2/L3 cache tiers work together, based on concepts from our Caches Lie: Consistency Isn't Free post.
The Memory Hierarchy Pyramid
🚀 L1 Cache (Application Level)
Fastest • Smallest • Most Expensive
# In-process cache using Python dict or LRU class L1Cache: def __init__(self, size=1000): self._cache = OrderedDict() self._max_size = size def get(self, key): if key in self._cache: # Move to end (LRU behavior) self._cache.move_to_end(key) return self._cache[key] return None def set(self, key, value): if key in self._cache: self._cache.move_to_end(key) elif len(self._cache) >= self._max_size: self._cache.popitem(last=False) # Remove oldest self._cache[key] = value
Characteristics:
- Latency: ~1ms (memory access)
- Capacity: 100-100K items
- Location: Same process as application
- Volatility: Lost on restart
⚡ L2 Cache (Local Network)
Fast • Medium • Shared
# Redis or Memcached on same server/rack class L2Cache: def __init__(self, redis_client): self.redis = redis_client async def get(self, key): value = await self.redis.get(key) if value: # Promote to L1 cache l1_cache.set(key, value) return value async def set(self, key, value, ttl=300): await self.redis.setex(key, ttl, value)
Characteristics:
- Latency: ~5ms (local network)
- Capacity: 10K-1M items
- Location: Same rack/data center
- Durability: Configurable persistence
🌐 L3 Cache (Distributed)
Slower • Largest • Global
# Distributed cache cluster (Redis Cluster, Hazelcast) class L3Cache: def __init__(self, cluster_client): self.cluster = cluster_client async def get(self, key): value = await self.cluster.get(key) if value: # Promote to higher tiers await l2_cache.set(key, value) l1_cache.set(key, value) return value async def set(self, key, value, ttl=3600): await self.cluster.setex(key, ttl, value)
Characteristics:
- Latency: ~20ms (cross-region possible)
- Capacity: 1M-100M items
- Location: Distributed across regions
- Availability: High availability via replication
Data Movement Strategies
🦥 Lazy Promotion (Demand-Driven)
Most common approach - move data up on access
async def get_with_lazy_promotion(key): # Try L1 first value = l1_cache.get(key) if value: return value # Try L2 value = await l2_cache.get(key) if value: l1_cache.set(key, value) # Promote to L1 return value # Try L3 value = await l3_cache.get(key) if value: l1_cache.set(key, value) # Promote to L1 await l2_cache.set(key, value) # Promote to L2 return value # Database fallback value = await database.get(key) if value: # Populate all tiers l1_cache.set(key, value) await l2_cache.set(key, value) await l3_cache.set(key, value) return value
✅ Pros: Simple, no wasted promotions ❌ Cons: First access always pays full latency
🚀 Eager Promotion (Proactive)
Push popular data up the hierarchy
class EagerPromotion: def __init__(self): self.access_counter = defaultdict(int) self.promotion_threshold = 5 async def track_access(self, key, tier): self.access_counter[key] += 1 if self.access_counter[key] >= self.promotion_threshold: await self.promote_to_higher_tiers(key, tier) async def promote_to_higher_tiers(self, key, current_tier): value = await self.get_from_tier(key, current_tier) if current_tier == 'L3': await l2_cache.set(key, value) l1_cache.set(key, value) elif current_tier == 'L2': l1_cache.set(key, value)
✅ Pros: Future accesses faster, good for hot data ❌ Cons: Network overhead, may promote cold data
🧠 Adaptive Promotion (ML-Based)
Machine learning predicts what to promote
class AdaptivePromotion: def __init__(self): self.ml_model = AccessPatternPredictor() self.features = FeatureExtractor() async def should_promote(self, key, access_pattern): features = self.features.extract(key, access_pattern) promotion_probability = self.ml_model.predict(features) return promotion_probability > 0.7 # 70% threshold async def handle_access(self, key, tier): pattern = self.get_access_pattern(key) if await self.should_promote(key, pattern): await self.promote_eagerly(key, tier) else: await self.promote_lazily(key, tier)
✅ Pros: Optimal promotion decisions, adapts to workload ❌ Cons: Complex implementation, ML infrastructure needed
Cache Sizing Strategy
The 10x Rule of Thumb
Each tier should be roughly 10x larger than the one above:
Typical Sizing: L1: 1,000 items (1KB each = 1MB RAM) L2: 10,000 items (10KB each = 100MB RAM) L3: 100,000 items (100KB each = 10GB RAM) Large Scale: L1: 100,000 items (10MB RAM) L2: 1,000,000 items (1GB RAM) L3: 10,000,000 items (100GB RAM cluster)
Working Set Analysis
class WorkingSetAnalyzer: def analyze_access_pattern(self, access_log): # Unique items accessed in different time windows windows = [60, 300, 900, 3600] # 1min, 5min, 15min, 1hour working_sets = {} for window in windows: unique_keys = self.get_unique_keys_in_window(access_log, window) working_sets[window] = len(unique_keys) return { 'l1_size_recommendation': working_sets[60], # 1-minute working set 'l2_size_recommendation': working_sets[900], # 15-minute working set 'l3_size_recommendation': working_sets[3600] # 1-hour working set }
Eviction Policies Compared
LRU vs LFU vs ARC vs W-TinyLFU
| Policy | Best For | Memory Overhead | Performance | Complexity | |--------|----------|-----------------|-------------|------------| | LRU | Temporal locality | Low | Good | Simple | | LFU | Frequency-based access | Medium | Good for Zipfian | Medium | | ARC | Mixed workloads | High | Excellent | Complex | | W-TinyLFU | Large caches | Low | Excellent | Medium |
W-TinyLFU Implementation
class WindowTinyLFU: def __init__(self, window_size, main_size): self.window_cache = LRUCache(window_size) self.main_cache = SLRUCache(main_size) # Segmented LRU self.frequency_sketch = CountMinSketch() self.doorkeeper = BloomFilter() def get(self, key): # Try window cache first if key in self.window_cache: return self.window_cache.get(key) # Try main cache value = self.main_cache.get(key) if value: self.frequency_sketch.increment(key) return value def put(self, key, value): # New items go to window cache if key not in self.window_cache and key not in self.main_cache: evicted = self.window_cache.put(key, value) if evicted: # Window cache is full, consider promotion to main cache self.consider_promotion(evicted, key, value)
Real-World Architecture Examples
Netflix Cache Architecture
Content Delivery: L1: Application JVM heap (popular titles) L2: Local SSD cache (regional content) L3: Regional cache clusters (full catalog) Fallback: Origin servers + CDN Hit Rates: L1: 40% (1ms latency) L2: 35% (5ms latency) L3: 20% (50ms latency) Origin: 5% (200ms latency)
Facebook Social Graph
Friend Connections: L1: TAO leaf cache (active users) L2: TAO aggregator cache (regional) L3: MySQL read replicas Fallback: Master database Data Flow: - Writes go to master, invalidate all tiers - Reads try each tier in sequence - Popular connections cached at all levels
Amazon Product Catalog
Product Information: L1: Application-level (best sellers) L2: ElastiCache clusters (category data) L3: DynamoDB DAX (full catalog) Fallback: DynamoDB tables Optimization: - ML predicts which products to pre-cache - Seasonal adjustments for cache sizes - Geographic distribution of L3 caches
Interactive Experiments
Experiment 1: Cache Size Impact
- Start with small L1 cache (100 items)
- Use Zipfian access pattern (80/20 rule)
- Gradually increase L1 to 10K items
- Watch L1 hit rate improvements vs. memory cost
Experiment 2: Access Pattern Sensitivity
- Set moderate cache sizes (1K/50K/1M)
- Switch between access patterns:
- Uniform → Random access, low hit rates
- Zipfian → High hit rates even with small caches
- Temporal → Excellent L1 performance
- Working Set → Predictable, stable performance
Experiment 3: Promotion Strategy Comparison
- Use high request rate (5000 req/s)
- Compare promotion strategies:
- None → Static tiers, poor performance
- Lazy → Good baseline
- Eager → Better hit rates, more network traffic
- Adaptive → Best overall performance
Experiment 4: Eviction Policy Impact
- Set medium cache sizes to see eviction effects
- Use mixed workload pattern
- Compare eviction policies:
- LRU → Simple, works well for temporal locality
- LFU → Better for frequency-based access
- W-TinyLFU → Best overall performance
Experiment 5: Scale Testing
- Use massive dataset (1B items)
- Set high request rate (10K req/s)
- Find optimal cache sizes for your budget
- Observe network traffic implications
Production Deployment Guide
Monitoring Multi-tier Caches
class CacheHierarchyMonitor: def track_tier_performance(self): metrics = { 'l1_hit_rate': self.calculate_hit_rate('L1'), 'l2_hit_rate': self.calculate_hit_rate('L2'), 'l3_hit_rate': self.calculate_hit_rate('L3'), 'overall_hit_rate': self.calculate_overall_hit_rate(), 'average_latency': self.calculate_weighted_latency(), 'memory_efficiency': self.calculate_memory_efficiency(), 'promotion_rate': self.track_data_promotions(), 'network_overhead': self.measure_inter_tier_traffic() } return metrics def detect_anomalies(self, metrics): alerts = [] if metrics['l1_hit_rate'] < 30: alerts.append('L1_HIT_RATE_LOW') if metrics['overall_hit_rate'] < 90: alerts.append('OVERALL_PERFORMANCE_DEGRADED') if metrics['network_overhead'] > 100: # MB/s alerts.append('EXCESSIVE_CACHE_TRAFFIC') return alerts
Configuration Best Practices
# Production cache hierarchy configuration cache_hierarchy: l1: type: "in-memory" size: "100MB" eviction: "lru" ttl: "5m" l2: type: "redis" size: "10GB" eviction: "allkeys-lru" ttl: "1h" cluster: false l3: type: "redis-cluster" size: "1TB" eviction: "allkeys-lfu" ttl: "24h" nodes: 6 promotion: strategy: "adaptive" threshold: 3 ml_model: "access_pattern_predictor_v2" monitoring: hit_rate_alert_threshold: 85 latency_alert_threshold: "50ms" memory_usage_alert_threshold: 90
The Hierarchy Sweet Spot
The simulation reveals key insights about multi-tier caching:
- Small L1 caches can achieve high hit rates with the right access pattern
- Promotion strategies significantly impact performance vs. network cost
- Eviction policies matter more as cache pressure increases
- Memory efficiency often trumps raw hit rates in production
Golden Rule: Design your hierarchy based on your specific access patterns, not generic recommendations. The best configuration varies dramatically based on workload characteristics.
This layered approach allows systems to serve millions of requests per second while maintaining both cost efficiency and excellent user experience.
Published on by Anirudh Sharma
Comments
Explore More Interactive Content
Ready to dive deeper? Check out our system blueprints for implementation guides or explore more simulations.