Caching: The Oldest Trick | System Design Tutorial

Why Caching Works

Access patterns in most systems are wildly unequal. A small fraction of keys (popular users, trending posts, frequently-read config) gets the bulk of the traffic. The rule of thumb: 80% of the traffic hits 20% of the data. Actual distributions are often steeper.

A cache exploits this. Keep the hot 20% in fast memory; let the cold 80% stay in slower storage. Hit rates of 95% or higher on well-designed caches are routine.

Caching is the fastest, cheapest, most dangerous lever in system design. Dangerous because stale data causes bugs that are hard to reason about. More on that below.

Where to Cache

Caching happens at every layer. Multiple caches stacked vertically is normal.

Browser cache       HTTP headers; per-user
  ↓
CDN / edge cache    Shared across users in a region
  ↓
Reverse proxy       NGINX cache, Varnish
  ↓
Application cache   In-process (per instance)
  ↓
Distributed cache   Redis, Memcached (shared across instances)
  ↓
Database buffer     Hot pages in DB memory
  ↓
Disk

Each layer is faster and smaller than the one below it. You want as many requests as possible served from the highest layer that can answer them.

Cache Patterns

Five patterns for how the application interacts with the cache.

Cache-Aside (Lazy Loading)

The most common. Your code checks the cache first; on miss, it reads the source and populates the cache.

def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    redis.setex(f"user:{user_id}", 300, json.dumps(user))
    return user

Pros: simple, cache only holds data that's been asked for, cache can die without data loss.

Cons: every cold read hits the database. First user to ask pays the latency tax.

Read-Through

The cache itself knows how to fetch on miss. Your code only talks to the cache.

user = cache.get(f"user:{user_id}")   # cache handles miss internally

Most caches don't do this natively; you build it with a library. Cleaner code, but the cache has to know the data model.

Write-Through

Writes go to the cache and the source of truth, synchronously.

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
    redis.setex(f"user:{user_id}", 300, json.dumps(data))

Pros: cache is always fresh.

Cons: writes are slower (two systems to hit), and the cache holds data that may never be read.

Write-Behind (Write-Back)

Writes go to the cache; the cache flushes to the source asynchronously.

Pros: very fast writes.

Cons: durability risk. If the cache dies before flushing, data is lost. Rarely worth it for anything that matters.

Refresh-Ahead

The cache refreshes hot keys before they expire. Keeps miss rates low for predictable hot keys.

Useful but tricky to tune. Most teams never implement it; the simpler patterns cover 95% of needs.

Eviction

When the cache is full, something has to go. Common policies:

LRU    Least Recently Used      - default almost everywhere. Works well in practice.
LFU    Least Frequently Used    - good when access is very skewed.
FIFO   First In, First Out      - simple, rarely best.
TTL    Time To Live             - expire based on age, not eviction pressure.
ARC    Adaptive Replacement     - LRU + LFU hybrid.

Redis's default (in modern versions) is allkeys-lru. For most workloads, don't think about it.

TTLs and Invalidation

There are only two hard things in Computer Science: cache invalidation and naming things.
(Phil Karlton)

Invalidation is the nightmare. The cache believes something that's no longer true, and nothing in the system tells it otherwise.

Three strategies, roughly in order of preference.

TTL-Based

Set a time-to-live on every cached entry. After TTL, the entry is gone and the next read refreshes.

redis.setex("user:42", 300, user_data)   # expires in 5 minutes

Pros: simple, bounded staleness, no coordination.

Cons: data is stale for up to the TTL. Picking the TTL is a guess: short TTLs reduce staleness but lower hit rates; long TTLs do the opposite.

Rule of thumb: start with a TTL that matches the maximum staleness users can tolerate. Often 30 seconds to 5 minutes for hot data.

Event-Based (Write-Through Invalidation)

When the source changes, invalidate the cache.

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
    redis.delete(f"user:{user_id}")     # or update it

Pros: cache is fresh.

Cons: coordination between writer and cache. Easy to miss a code path. If an admin tool updates the DB directly, the cache doesn't know.

A good pattern: emit events from the database (change data capture, CDC) and invalidate caches from the event stream. Every write automatically invalidates.

Version-Based (Cache Busting)

Include a version in the key. When the source changes, bump the version.

# Cache key: "user:42:v7"
# When user 42 is updated, server increments the version counter to v8.
# Old "v7" entries age out under LRU.

Useful for static content (bump version on deploy; old assets age out). Less common for dynamic data.

Cache Stampede (Thundering Herd)

A cache entry expires. 1000 requests for that key arrive in the next second. All 1000 miss the cache, all 1000 hit the database. The database melts.

Three common solutions.

Probabilistic Early Expiration

Before TTL, requests probabilistically refresh the entry. One request refreshes "early"; the rest keep using the cached value until the actual expiration. Spreads the refresh load.

Request Coalescing

The first request to miss holds a lock, fetches, populates the cache; other misses wait on the lock. Only one DB hit per cold key.

def get_with_lock(key):
    value = cache.get(key)
    if value:
        return value
    # Try to acquire a short lock
    if cache.setnx(f"{key}:lock", "1", ex=5):
        value = fetch_from_db(key)
        cache.setex(key, 300, value)
        cache.delete(f"{key}:lock")
        return value
    # Someone else is fetching; wait briefly and retry
    time.sleep(0.05)
    return cache.get(key)

Middleware libraries do this for you: dogpile.cache in Python, Caffeine in Java.

Stale-While-Revalidate

On miss, return the stale cached value (even if expired) and kick off a background refresh. User gets a fast response; the cache gets updated asynchronously.

HTTP has a built-in header for this: Cache-Control: max-age=60, stale-while-revalidate=300.

Hot Keys

A single key (a celebrity's profile, a trending product) gets hit harder than the cache shard holding it can handle. Adding cache nodes doesn't help; the key is on one shard.

Fixes:

Per-instance local cache on top of the distributed cache. Each app server holds a tiny LRU cache of very hot keys; hit rates there are near 100%.
Key splitting: use hot_key:0, hot_key:1, ..., hot_key:N; the writer updates all of them; the reader picks a random N. Spreads read load across shards.
Promote to a CDN if the value is cacheable publicly.

Redis Patterns Worth Knowing

Redis is more than a key-value cache. Common patterns:

Atomic Counter

INCR api:user:42:requests

One round trip, atomic increment. No read-modify-write race.

Rate Limit (Fixed Window)

INCR rate:ip:1.2.3.4:minute:202604191015
EXPIRE rate:ip:1.2.3.4:minute:202604191015 60

If the returned value exceeds the limit, reject. Expire the key after the window.

Leaderboard

ZADD leaderboard 1500 "ada"
ZADD leaderboard 1800 "grace"
ZREVRANGE leaderboard 0 9 WITHSCORES    # top 10

A sorted set maintains score order in O(log N). Leaderboards, recency-weighted feeds, priority queues.

Session Store

HSET session:abc123 user_id 42 email ada@example.com
EXPIRE session:abc123 3600

Hashes are efficient for small objects. Expire for auto-logout.

Pub/Sub

PUBLISH notifications "user 42 updated"
SUBSCRIBE notifications

Fire-and-forget messaging. Good for real-time notifications, cache invalidation fan-out. Not durable; if a subscriber is offline, it misses the message. Chapter 6 covers durable messaging.

What NOT to Cache

Some things don't belong in a cache.

Personally identifiable data with legal retention rules. The cache is now PII storage you may not want.
Data that changes on every read. No hit rate, pure overhead.
Tiny computations. Caching a SELECT that already takes 1ms gives you 200 microseconds back; not worth the invalidation risk.
Sensitive data in a shared cache. If user:42:billing sits in Redis without encryption, anyone with Redis access sees it.

Common Pitfalls

Cache as source of truth. It isn't. The cache can die; it will. Always have a source of truth elsewhere.

Infinite TTLs. The cache slowly fills with keys nobody's read in months. Set a TTL even for "permanent" entries; worst case you re-fetch.

Same cache for session and application data. An eviction storm kicks sessions out; users get logged out. Separate caches per concern.

Per-user keys with long TTLs. Each unique user adds a key. Millions of rarely-read entries fill memory. Use aggressive TTLs on per-user caches.

Caching errors. If a cache-aside function hits a DB error and caches the error response, every subsequent request returns the stale error. Don't cache failures.

Invalidation via cron. "We delete cache keys every 10 minutes with a cron job." That reveals an invalidation problem masquerading as a TTL. Fix the invalidation.

Next Steps

Continue to 05-databases.md for the layer caches sit in front of.