CachingAdvancedInteractive12 min exploration

Multi-tier Cache Hierarchy

Understanding how cache hierarchies balance speed, capacity, and cost in modern distributed systems

cache-hierarchymulti-tierperformancearchitecturememory-hierarchy

How this simulation works

Use the interactive controls below to adjust system parameters and observe how they affect performance metrics in real-time. The charts update instantly to show the impact of your changes, helping you understand system trade-offs and optimal configurations.

Simulation Controls

Application-level cache (fastest, smallest)

Local Redis cache (fast, medium size)

Distributed cache cluster (slower, largest)

How requests are distributed across data

1000 req/s

Total requests per second across all tiers

Total unique items that could be requested

How data moves up the cache hierarchy

How to remove items when cache is full

Current Metrics

Overall Hit Rate
Percentage of requests served from any cache tier
94.9%
%
Average Latency
Weighted average response time across all tiers
17.3ms
ms
L1 Hit Rate
Percentage of requests served from L1 cache
6.0%
%
L2 Hit Rate
Percentage of L1 misses served from L2 cache
40.0%
%
L3 Hit Rate
Percentage of L2 misses served from L3 cache
91.0%
%
Memory Efficiency
How well cache space is utilized across tiers
100.0%
%
Network Traffic
Data transferred between cache tiers (MB/s)
1.62 MB/s
MB/s

Performance Metrics

Real-time performance metrics based on your configuration

Overall Hit Rate

94.9%

Percentage of requests served from any cache tier

Average Latency

17.3ms

Weighted average response time across all tiers

L1 Hit Rate

6.0%

Percentage of requests served from L1 cache

L2 Hit Rate

40.0%

Percentage of L1 misses served from L2 cache

L3 Hit Rate

91.0%

Percentage of L2 misses served from L3 cache

Memory Efficiency

100.0%

How well cache space is utilized across tiers

Network Traffic

1.62 MB/s

Data transferred between cache tiers (MB/s)

Configuration Summary

Current Settings

L1 Cache Size:1000
L2 Cache Size:50000
L3 Cache Size:1000000
Access Pattern:zipfian

Key Insights

Overall Hit Rate:94.9%
Average Latency:17.3ms
L1 Hit Rate:6.0%
L2 Hit Rate:40.0%

Optimization Tips

Experiment with different parameter combinations to understand the trade-offs. Notice how changing one parameter affects multiple metrics simultaneously.

Multi-tier Cache Hierarchy

This simulation models the layered caching approach used by major tech companies to achieve both ultra-low latency and massive scale. Explore how L1/L2/L3 cache tiers work together, based on concepts from our Caches Lie: Consistency Isn't Free post.

The Memory Hierarchy Pyramid

🚀 L1 Cache (Application Level)

Fastest • Smallest • Most Expensive

# In-process cache using Python dict or LRU class L1Cache: def __init__(self, size=1000): self._cache = OrderedDict() self._max_size = size def get(self, key): if key in self._cache: # Move to end (LRU behavior) self._cache.move_to_end(key) return self._cache[key] return None def set(self, key, value): if key in self._cache: self._cache.move_to_end(key) elif len(self._cache) >= self._max_size: self._cache.popitem(last=False) # Remove oldest self._cache[key] = value

Characteristics:

  • Latency: ~1ms (memory access)
  • Capacity: 100-100K items
  • Location: Same process as application
  • Volatility: Lost on restart

⚡ L2 Cache (Local Network)

Fast • Medium • Shared

# Redis or Memcached on same server/rack class L2Cache: def __init__(self, redis_client): self.redis = redis_client async def get(self, key): value = await self.redis.get(key) if value: # Promote to L1 cache l1_cache.set(key, value) return value async def set(self, key, value, ttl=300): await self.redis.setex(key, ttl, value)

Characteristics:

  • Latency: ~5ms (local network)
  • Capacity: 10K-1M items
  • Location: Same rack/data center
  • Durability: Configurable persistence

🌐 L3 Cache (Distributed)

Slower • Largest • Global

# Distributed cache cluster (Redis Cluster, Hazelcast) class L3Cache: def __init__(self, cluster_client): self.cluster = cluster_client async def get(self, key): value = await self.cluster.get(key) if value: # Promote to higher tiers await l2_cache.set(key, value) l1_cache.set(key, value) return value async def set(self, key, value, ttl=3600): await self.cluster.setex(key, ttl, value)

Characteristics:

  • Latency: ~20ms (cross-region possible)
  • Capacity: 1M-100M items
  • Location: Distributed across regions
  • Availability: High availability via replication

Data Movement Strategies

🦥 Lazy Promotion (Demand-Driven)

Most common approach - move data up on access

async def get_with_lazy_promotion(key): # Try L1 first value = l1_cache.get(key) if value: return value # Try L2 value = await l2_cache.get(key) if value: l1_cache.set(key, value) # Promote to L1 return value # Try L3 value = await l3_cache.get(key) if value: l1_cache.set(key, value) # Promote to L1 await l2_cache.set(key, value) # Promote to L2 return value # Database fallback value = await database.get(key) if value: # Populate all tiers l1_cache.set(key, value) await l2_cache.set(key, value) await l3_cache.set(key, value) return value

✅ Pros: Simple, no wasted promotions ❌ Cons: First access always pays full latency

🚀 Eager Promotion (Proactive)

Push popular data up the hierarchy

class EagerPromotion: def __init__(self): self.access_counter = defaultdict(int) self.promotion_threshold = 5 async def track_access(self, key, tier): self.access_counter[key] += 1 if self.access_counter[key] >= self.promotion_threshold: await self.promote_to_higher_tiers(key, tier) async def promote_to_higher_tiers(self, key, current_tier): value = await self.get_from_tier(key, current_tier) if current_tier == 'L3': await l2_cache.set(key, value) l1_cache.set(key, value) elif current_tier == 'L2': l1_cache.set(key, value)

✅ Pros: Future accesses faster, good for hot data ❌ Cons: Network overhead, may promote cold data

🧠 Adaptive Promotion (ML-Based)

Machine learning predicts what to promote

class AdaptivePromotion: def __init__(self): self.ml_model = AccessPatternPredictor() self.features = FeatureExtractor() async def should_promote(self, key, access_pattern): features = self.features.extract(key, access_pattern) promotion_probability = self.ml_model.predict(features) return promotion_probability > 0.7 # 70% threshold async def handle_access(self, key, tier): pattern = self.get_access_pattern(key) if await self.should_promote(key, pattern): await self.promote_eagerly(key, tier) else: await self.promote_lazily(key, tier)

✅ Pros: Optimal promotion decisions, adapts to workload ❌ Cons: Complex implementation, ML infrastructure needed

Cache Sizing Strategy

The 10x Rule of Thumb

Each tier should be roughly 10x larger than the one above:

Typical Sizing: L1: 1,000 items (1KB each = 1MB RAM) L2: 10,000 items (10KB each = 100MB RAM) L3: 100,000 items (100KB each = 10GB RAM) Large Scale: L1: 100,000 items (10MB RAM) L2: 1,000,000 items (1GB RAM) L3: 10,000,000 items (100GB RAM cluster)

Working Set Analysis

class WorkingSetAnalyzer: def analyze_access_pattern(self, access_log): # Unique items accessed in different time windows windows = [60, 300, 900, 3600] # 1min, 5min, 15min, 1hour working_sets = {} for window in windows: unique_keys = self.get_unique_keys_in_window(access_log, window) working_sets[window] = len(unique_keys) return { 'l1_size_recommendation': working_sets[60], # 1-minute working set 'l2_size_recommendation': working_sets[900], # 15-minute working set 'l3_size_recommendation': working_sets[3600] # 1-hour working set }

Eviction Policies Compared

LRU vs LFU vs ARC vs W-TinyLFU

| Policy | Best For | Memory Overhead | Performance | Complexity | |--------|----------|-----------------|-------------|------------| | LRU | Temporal locality | Low | Good | Simple | | LFU | Frequency-based access | Medium | Good for Zipfian | Medium | | ARC | Mixed workloads | High | Excellent | Complex | | W-TinyLFU | Large caches | Low | Excellent | Medium |

W-TinyLFU Implementation

class WindowTinyLFU: def __init__(self, window_size, main_size): self.window_cache = LRUCache(window_size) self.main_cache = SLRUCache(main_size) # Segmented LRU self.frequency_sketch = CountMinSketch() self.doorkeeper = BloomFilter() def get(self, key): # Try window cache first if key in self.window_cache: return self.window_cache.get(key) # Try main cache value = self.main_cache.get(key) if value: self.frequency_sketch.increment(key) return value def put(self, key, value): # New items go to window cache if key not in self.window_cache and key not in self.main_cache: evicted = self.window_cache.put(key, value) if evicted: # Window cache is full, consider promotion to main cache self.consider_promotion(evicted, key, value)

Real-World Architecture Examples

Netflix Cache Architecture

Content Delivery: L1: Application JVM heap (popular titles) L2: Local SSD cache (regional content) L3: Regional cache clusters (full catalog) Fallback: Origin servers + CDN Hit Rates: L1: 40% (1ms latency) L2: 35% (5ms latency) L3: 20% (50ms latency) Origin: 5% (200ms latency)

Facebook Social Graph

Friend Connections: L1: TAO leaf cache (active users) L2: TAO aggregator cache (regional) L3: MySQL read replicas Fallback: Master database Data Flow: - Writes go to master, invalidate all tiers - Reads try each tier in sequence - Popular connections cached at all levels

Amazon Product Catalog

Product Information: L1: Application-level (best sellers) L2: ElastiCache clusters (category data) L3: DynamoDB DAX (full catalog) Fallback: DynamoDB tables Optimization: - ML predicts which products to pre-cache - Seasonal adjustments for cache sizes - Geographic distribution of L3 caches

Interactive Experiments

Experiment 1: Cache Size Impact

  1. Start with small L1 cache (100 items)
  2. Use Zipfian access pattern (80/20 rule)
  3. Gradually increase L1 to 10K items
  4. Watch L1 hit rate improvements vs. memory cost

Experiment 2: Access Pattern Sensitivity

  1. Set moderate cache sizes (1K/50K/1M)
  2. Switch between access patterns:
    • Uniform → Random access, low hit rates
    • Zipfian → High hit rates even with small caches
    • Temporal → Excellent L1 performance
    • Working Set → Predictable, stable performance

Experiment 3: Promotion Strategy Comparison

  1. Use high request rate (5000 req/s)
  2. Compare promotion strategies:
    • None → Static tiers, poor performance
    • Lazy → Good baseline
    • Eager → Better hit rates, more network traffic
    • Adaptive → Best overall performance

Experiment 4: Eviction Policy Impact

  1. Set medium cache sizes to see eviction effects
  2. Use mixed workload pattern
  3. Compare eviction policies:
    • LRU → Simple, works well for temporal locality
    • LFU → Better for frequency-based access
    • W-TinyLFU → Best overall performance

Experiment 5: Scale Testing

  1. Use massive dataset (1B items)
  2. Set high request rate (10K req/s)
  3. Find optimal cache sizes for your budget
  4. Observe network traffic implications

Production Deployment Guide

Monitoring Multi-tier Caches

class CacheHierarchyMonitor: def track_tier_performance(self): metrics = { 'l1_hit_rate': self.calculate_hit_rate('L1'), 'l2_hit_rate': self.calculate_hit_rate('L2'), 'l3_hit_rate': self.calculate_hit_rate('L3'), 'overall_hit_rate': self.calculate_overall_hit_rate(), 'average_latency': self.calculate_weighted_latency(), 'memory_efficiency': self.calculate_memory_efficiency(), 'promotion_rate': self.track_data_promotions(), 'network_overhead': self.measure_inter_tier_traffic() } return metrics def detect_anomalies(self, metrics): alerts = [] if metrics['l1_hit_rate'] < 30: alerts.append('L1_HIT_RATE_LOW') if metrics['overall_hit_rate'] < 90: alerts.append('OVERALL_PERFORMANCE_DEGRADED') if metrics['network_overhead'] > 100: # MB/s alerts.append('EXCESSIVE_CACHE_TRAFFIC') return alerts

Configuration Best Practices

# Production cache hierarchy configuration cache_hierarchy: l1: type: "in-memory" size: "100MB" eviction: "lru" ttl: "5m" l2: type: "redis" size: "10GB" eviction: "allkeys-lru" ttl: "1h" cluster: false l3: type: "redis-cluster" size: "1TB" eviction: "allkeys-lfu" ttl: "24h" nodes: 6 promotion: strategy: "adaptive" threshold: 3 ml_model: "access_pattern_predictor_v2" monitoring: hit_rate_alert_threshold: 85 latency_alert_threshold: "50ms" memory_usage_alert_threshold: 90

The Hierarchy Sweet Spot

The simulation reveals key insights about multi-tier caching:

  1. Small L1 caches can achieve high hit rates with the right access pattern
  2. Promotion strategies significantly impact performance vs. network cost
  3. Eviction policies matter more as cache pressure increases
  4. Memory efficiency often trumps raw hit rates in production

Golden Rule: Design your hierarchy based on your specific access patterns, not generic recommendations. The best configuration varies dramatically based on workload characteristics.

This layered approach allows systems to serve millions of requests per second while maintaining both cost efficiency and excellent user experience.

Published on by Anirudh Sharma

Comments

Explore More Interactive Content

Ready to dive deeper? Check out our system blueprints for implementation guides or explore more simulations.