CachingAdvancedInteractive12 min exploration

Multi-tier Cache Hierarchy

Understanding how cache hierarchies balance speed, capacity, and cost in modern distributed systems

cache-hierarchymulti-tierperformancearchitecturememory-hierarchy

How this simulation works

Use the interactive controls below to adjust system parameters and observe how they affect performance metrics in real-time. The charts update instantly to show the impact of your changes, helping you understand system trade-offs and optimal configurations.

Simulation Controls

L1 Cache Size

Application-level cache (fastest, smallest)

L2 Cache Size

Local Redis cache (fast, medium size)

L3 Cache Size

Distributed cache cluster (slower, largest)

Access Pattern

How requests are distributed across data

Request Rate1000 req/s

Total requests per second across all tiers

Total Dataset Size

Total unique items that could be requested

Promotion Strategy

How data moves up the cache hierarchy

Eviction Policy

How to remove items when cache is full

Current Metrics

Overall Hit Rate

Percentage of requests served from any cache tier

94.9%

Average Latency

Weighted average response time across all tiers

17.3ms

L1 Hit Rate

Percentage of requests served from L1 cache

6.0%

L2 Hit Rate

Percentage of L1 misses served from L2 cache

40.0%

L3 Hit Rate

Percentage of L2 misses served from L3 cache

91.0%

Memory Efficiency

How well cache space is utilized across tiers

100.0%

Network Traffic

Data transferred between cache tiers (MB/s)

1.62 MB/s

MB/s

Performance Metrics

Real-time performance metrics based on your configuration

Overall Hit Rate

94.9%

Percentage of requests served from any cache tier

Average Latency

17.3ms

Weighted average response time across all tiers

L1 Hit Rate

6.0%

Percentage of requests served from L1 cache

L2 Hit Rate

40.0%

Percentage of L1 misses served from L2 cache

L3 Hit Rate

91.0%

Percentage of L2 misses served from L3 cache

Memory Efficiency

100.0%

How well cache space is utilized across tiers

Network Traffic

1.62 MB/s

Data transferred between cache tiers (MB/s)

Configuration Summary

Current Settings

L1 Cache Size:1000

L2 Cache Size:50000

L3 Cache Size:1000000

Access Pattern:zipfian

Key Insights

Overall Hit Rate:94.9%

Average Latency:17.3ms

L1 Hit Rate:6.0%

L2 Hit Rate:40.0%

Optimization Tips

Experiment with different parameter combinations to understand the trade-offs. Notice how changing one parameter affects multiple metrics simultaneously.

Multi-tier Cache Hierarchy

This simulation models the layered caching approach used by major tech companies to achieve both ultra-low latency and massive scale. Explore how L1/L2/L3 cache tiers work together, based on concepts from our Caches Lie: Consistency Isn't Free post.

The Memory Hierarchy Pyramid

🚀 L1 Cache (Application Level)

Fastest • Smallest • Most Expensive

# In-process cache using Python dict or LRU
class L1Cache:
    def __init__(self, size=1000):
        self._cache = OrderedDict()
        self._max_size = size

    def get(self, key):
        if key in self._cache:
            # Move to end (LRU behavior)
            self._cache.move_to_end(key)
            return self._cache[key]
        return None

    def set(self, key, value):
        if key in self._cache:
            self._cache.move_to_end(key)
        elif len(self._cache) >= self._max_size:
            self._cache.popitem(last=False)  # Remove oldest
        self._cache[key] = value

Characteristics:

Latency: ~1ms (memory access)
Capacity: 100-100K items
Location: Same process as application
Volatility: Lost on restart

⚡ L2 Cache (Local Network)

Fast • Medium • Shared

# Redis or Memcached on same server/rack
class L2Cache:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def get(self, key):
        value = await self.redis.get(key)
        if value:
            # Promote to L1 cache
            l1_cache.set(key, value)
        return value

    async def set(self, key, value, ttl=300):
        await self.redis.setex(key, ttl, value)

Characteristics:

Latency: ~5ms (local network)
Capacity: 10K-1M items
Location: Same rack/data center
Durability: Configurable persistence

🌐 L3 Cache (Distributed)

Slower • Largest • Global

# Distributed cache cluster (Redis Cluster, Hazelcast)
class L3Cache:
    def __init__(self, cluster_client):
        self.cluster = cluster_client

    async def get(self, key):
        value = await self.cluster.get(key)
        if value:
            # Promote to higher tiers
            await l2_cache.set(key, value)
            l1_cache.set(key, value)
        return value

    async def set(self, key, value, ttl=3600):
        await self.cluster.setex(key, ttl, value)

Characteristics:

Latency: ~20ms (cross-region possible)
Capacity: 1M-100M items
Location: Distributed across regions
Availability: High availability via replication

Data Movement Strategies

🦥 Lazy Promotion (Demand-Driven)

Most common approach - move data up on access

async def get_with_lazy_promotion(key):
    # Try L1 first
    value = l1_cache.get(key)
    if value:
        return value

    # Try L2
    value = await l2_cache.get(key)
    if value:
        l1_cache.set(key, value)  # Promote to L1
        return value

    # Try L3
    value = await l3_cache.get(key)
    if value:
        l1_cache.set(key, value)  # Promote to L1
        await l2_cache.set(key, value)  # Promote to L2
        return value

    # Database fallback
    value = await database.get(key)
    if value:
        # Populate all tiers
        l1_cache.set(key, value)
        await l2_cache.set(key, value)
        await l3_cache.set(key, value)
    return value

✅ Pros: Simple, no wasted promotions ❌ Cons: First access always pays full latency

🚀 Eager Promotion (Proactive)

Push popular data up the hierarchy

class EagerPromotion:
    def __init__(self):
        self.access_counter = defaultdict(int)
        self.promotion_threshold = 5

    async def track_access(self, key, tier):
        self.access_counter[key] += 1

        if self.access_counter[key] >= self.promotion_threshold:
            await self.promote_to_higher_tiers(key, tier)

    async def promote_to_higher_tiers(self, key, current_tier):
        value = await self.get_from_tier(key, current_tier)

        if current_tier == 'L3':
            await l2_cache.set(key, value)
            l1_cache.set(key, value)
        elif current_tier == 'L2':
            l1_cache.set(key, value)

✅ Pros: Future accesses faster, good for hot data ❌ Cons: Network overhead, may promote cold data

🧠 Adaptive Promotion (ML-Based)

Machine learning predicts what to promote

class AdaptivePromotion:
    def __init__(self):
        self.ml_model = AccessPatternPredictor()
        self.features = FeatureExtractor()

    async def should_promote(self, key, access_pattern):
        features = self.features.extract(key, access_pattern)
        promotion_probability = self.ml_model.predict(features)

        return promotion_probability > 0.7  # 70% threshold

    async def handle_access(self, key, tier):
        pattern = self.get_access_pattern(key)

        if await self.should_promote(key, pattern):
            await self.promote_eagerly(key, tier)
        else:
            await self.promote_lazily(key, tier)

✅ Pros: Optimal promotion decisions, adapts to workload ❌ Cons: Complex implementation, ML infrastructure needed

Cache Sizing Strategy

The 10x Rule of Thumb

Each tier should be roughly 10x larger than the one above:

Typical Sizing:
  L1: 1,000 items    (1KB each = 1MB RAM)
  L2: 10,000 items   (10KB each = 100MB RAM)
  L3: 100,000 items  (100KB each = 10GB RAM)

Large Scale:
  L1: 100,000 items    (10MB RAM)
  L2: 1,000,000 items  (1GB RAM)
  L3: 10,000,000 items (100GB RAM cluster)

Working Set Analysis

class WorkingSetAnalyzer:
    def analyze_access_pattern(self, access_log):
        # Unique items accessed in different time windows
        windows = [60, 300, 900, 3600]  # 1min, 5min, 15min, 1hour
        working_sets = {}

        for window in windows:
            unique_keys = self.get_unique_keys_in_window(access_log, window)
            working_sets[window] = len(unique_keys)

        return {
            'l1_size_recommendation': working_sets[60],      # 1-minute working set
            'l2_size_recommendation': working_sets[900],     # 15-minute working set
            'l3_size_recommendation': working_sets[3600]     # 1-hour working set
        }

Eviction Policies Compared

LRU vs LFU vs ARC vs W-TinyLFU

| Policy | Best For | Memory Overhead | Performance | Complexity | |--------|----------|-----------------|-------------|------------| | LRU | Temporal locality | Low | Good | Simple | | LFU | Frequency-based access | Medium | Good for Zipfian | Medium | | ARC | Mixed workloads | High | Excellent | Complex | | W-TinyLFU | Large caches | Low | Excellent | Medium |

W-TinyLFU Implementation

class WindowTinyLFU:
    def __init__(self, window_size, main_size):
        self.window_cache = LRUCache(window_size)
        self.main_cache = SLRUCache(main_size)  # Segmented LRU
        self.frequency_sketch = CountMinSketch()
        self.doorkeeper = BloomFilter()

    def get(self, key):
        # Try window cache first
        if key in self.window_cache:
            return self.window_cache.get(key)

        # Try main cache
        value = self.main_cache.get(key)
        if value:
            self.frequency_sketch.increment(key)
        return value

    def put(self, key, value):
        # New items go to window cache
        if key not in self.window_cache and key not in self.main_cache:
            evicted = self.window_cache.put(key, value)
            if evicted:
                # Window cache is full, consider promotion to main cache
                self.consider_promotion(evicted, key, value)

Real-World Architecture Examples

Netflix Cache Architecture

Content Delivery:
  L1: Application JVM heap (popular titles)
  L2: Local SSD cache (regional content)
  L3: Regional cache clusters (full catalog)
  Fallback: Origin servers + CDN

Hit Rates:
  L1: 40% (1ms latency)
  L2: 35% (5ms latency)
  L3: 20% (50ms latency)
  Origin: 5% (200ms latency)

Facebook Social Graph

Friend Connections:
  L1: TAO leaf cache (active users)
  L2: TAO aggregator cache (regional)
  L3: MySQL read replicas
  Fallback: Master database

Data Flow:
  - Writes go to master, invalidate all tiers
  - Reads try each tier in sequence
  - Popular connections cached at all levels

Amazon Product Catalog

Product Information:
  L1: Application-level (best sellers)
  L2: ElastiCache clusters (category data)
  L3: DynamoDB DAX (full catalog)
  Fallback: DynamoDB tables

Optimization:
  - ML predicts which products to pre-cache
  - Seasonal adjustments for cache sizes
  - Geographic distribution of L3 caches

Interactive Experiments

Experiment 1: Cache Size Impact

Start with small L1 cache (100 items)
Use Zipfian access pattern (80/20 rule)
Gradually increase L1 to 10K items
Watch L1 hit rate improvements vs. memory cost

Experiment 2: Access Pattern Sensitivity

Set moderate cache sizes (1K/50K/1M)
Switch between access patterns:
- Uniform → Random access, low hit rates
- Zipfian → High hit rates even with small caches
- Temporal → Excellent L1 performance
- Working Set → Predictable, stable performance

Experiment 3: Promotion Strategy Comparison

Use high request rate (5000 req/s)
Compare promotion strategies:
- None → Static tiers, poor performance
- Lazy → Good baseline
- Eager → Better hit rates, more network traffic
- Adaptive → Best overall performance

Experiment 4: Eviction Policy Impact

Set medium cache sizes to see eviction effects
Use mixed workload pattern
Compare eviction policies:
- LRU → Simple, works well for temporal locality
- LFU → Better for frequency-based access
- W-TinyLFU → Best overall performance

Experiment 5: Scale Testing

Use massive dataset (1B items)
Set high request rate (10K req/s)
Find optimal cache sizes for your budget
Observe network traffic implications

Production Deployment Guide

Monitoring Multi-tier Caches

class CacheHierarchyMonitor:
    def track_tier_performance(self):
        metrics = {
            'l1_hit_rate': self.calculate_hit_rate('L1'),
            'l2_hit_rate': self.calculate_hit_rate('L2'),
            'l3_hit_rate': self.calculate_hit_rate('L3'),
            'overall_hit_rate': self.calculate_overall_hit_rate(),
            'average_latency': self.calculate_weighted_latency(),
            'memory_efficiency': self.calculate_memory_efficiency(),
            'promotion_rate': self.track_data_promotions(),
            'network_overhead': self.measure_inter_tier_traffic()
        }
        return metrics

    def detect_anomalies(self, metrics):
        alerts = []

        if metrics['l1_hit_rate'] < 30:
            alerts.append('L1_HIT_RATE_LOW')

        if metrics['overall_hit_rate'] < 90:
            alerts.append('OVERALL_PERFORMANCE_DEGRADED')

        if metrics['network_overhead'] > 100:  # MB/s
            alerts.append('EXCESSIVE_CACHE_TRAFFIC')

        return alerts

Configuration Best Practices

# Production cache hierarchy configuration
cache_hierarchy:
  l1:
    type: "in-memory"
    size: "100MB"
    eviction: "lru"
    ttl: "5m"

  l2:
    type: "redis"
    size: "10GB"
    eviction: "allkeys-lru"
    ttl: "1h"
    cluster: false

  l3:
    type: "redis-cluster"
    size: "1TB"
    eviction: "allkeys-lfu"
    ttl: "24h"
    nodes: 6

  promotion:
    strategy: "adaptive"
    threshold: 3
    ml_model: "access_pattern_predictor_v2"

  monitoring:
    hit_rate_alert_threshold: 85
    latency_alert_threshold: "50ms"
    memory_usage_alert_threshold: 90

The Hierarchy Sweet Spot

The simulation reveals key insights about multi-tier caching:

Small L1 caches can achieve high hit rates with the right access pattern
Promotion strategies significantly impact performance vs. network cost
Eviction policies matter more as cache pressure increases
Memory efficiency often trumps raw hit rates in production

Golden Rule: Design your hierarchy based on your specific access patterns, not generic recommendations. The best configuration varies dramatically based on workload characteristics.

This layered approach allows systems to serve millions of requests per second while maintaining both cost efficiency and excellent user experience.

Published on October 5, 2025 by Anirudh Sharma

← Back to Simulations

Comments

Explore More Interactive Content

Ready to dive deeper? Check out our system blueprints for implementation guides or explore more simulations.

View System Blueprints More Simulations