April 4, 2026Architecture8 min read

Backend caching strategies — beyond simple key-value

Published April 4, 20268 min read

Most engineers' first cache implementation is GET key → miss → query DB → SET key → return. That works for a week. Then you hit cache stampede during a traffic spike, stale data in production, or a cache invalidation bug that corrupts your user counts. Caching has patterns just like everything else in software engineering. Here are the ones that matter in production.

Cache-aside (lazy loading)

The application manages the cache explicitly. On cache miss, the application fetches from the database and populates the cache.

python

# Cache-aside pattern
import redis
import json
from typing import Optional

r = redis.Redis()

def get_user(user_id: str, ttl: int = 300) -> dict:
    # 1. Try cache
    cached = r.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss — fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", (user_id,)).fetchone()
    if not user:
        return None

    # 3. Populate cache
    r.setex(f"user:{user_id}", ttl, json.dumps(user))
    return user

Pros: Only frequently accessed data ends up in cache. Resilient to cache outages (app falls back to DB).
Cons: Cache miss = extra latency. Risk of stale data between DB write and cache expiry.

Write-through

On every DB write, immediately update the cache. Cache always reflects the latest data.

python

# Write-through pattern
def update_user(user_id: str, updates: dict) -> dict:
    # 1. Update database
    user = db.execute(
        "UPDATE users SET name=%s, email=%s WHERE id=%s RETURNING *",
        (updates["name"], updates["email"], user_id)
    ).fetchone()

    # 2. Update cache immediately
    r.setex(f"user:{user_id}", 300, json.dumps(user))

    return user

Pros: Cache is always consistent with DB. Reads are always fast.
Cons: Write latency includes cache write. Cache may be populated with data that is never read again.

Write-behind (write-back)

Writes go to the cache first. The cache asynchronously persists to the database. Lower write latency, higher complexity.

python

# Write-behind: dangerous, use carefully
import threading
from queue import Queue

write_queue: Queue = Queue()


def update_user_fast(user_id: str, updates: dict):
    # Write to cache immediately (synchronous, fast)
    r.setex(f"user:{user_id}", 300, json.dumps(updates))
    # Enqueue async DB write
    write_queue.put((user_id, updates))


def db_writer_thread():
    """Background thread that flushes writes to DB."""
    while True:
        user_id, updates = write_queue.get()
        try:
            db.execute("UPDATE users SET name=%s WHERE id=%s", (updates["name"], user_id))
            db.commit()
        except Exception as e:
            # If DB write fails, data is in cache but not persisted
            # You MUST handle this — re-queue, alert, or persist to disk
            logger.error("DB write failed: %s", e)
            write_queue.put((user_id, updates))  # re-queue

Use only when: Write performance is critical and you can tolerate the risk of data loss if the cache fails before the DB write completes. Session data, real-time analytics aggregations.

Cache stampede prevention

The thundering herd: a cache key expires, 1000 requests simultaneously get a miss, all query the database at once. The DB collapses. Prevention:

python

# Option 1: Mutex lock on cache miss
import threading

_locks: dict[str, threading.Lock] = {}

def get_user_with_lock(user_id: str) -> dict:
    cache_key = f"user:{user_id}"

    # Fast path: cache hit
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # Slow path: acquire per-key lock
    lock = _locks.setdefault(cache_key, threading.Lock())
    with lock:
        # Re-check after acquiring lock (another thread may have populated it)
        cached = r.get(cache_key)
        if cached:
            return json.loads(cached)

        # We hold the lock — we are the one to populate the cache
        user = db.query("SELECT * FROM users WHERE id = %s", (user_id,)).fetchone()
        r.setex(cache_key, 300, json.dumps(user))
        return user


# Option 2: Probabilistic early expiry (XFetch algorithm)
import math
import random
import time

def get_with_early_expiry(key: str, ttl: int, fetch_fn, beta: float = 1.0):
    """Stochastically refresh the cache before it expires."""
    result = r.get(key)
    if result:
        data = json.loads(result)
        remaining_ttl = r.ttl(key)
        # Compute fetch time from metadata stored in cache
        fetch_time = data.get("_fetch_time", 1.0)
        # XFetch: decide whether to refresh early
        if remaining_ttl - beta * fetch_time * math.log(random.random()) < 0:
            # Trigger early refresh
            result = None

    if not result:
        start = time.monotonic()
        value = fetch_fn()
        elapsed = time.monotonic() - start
        value["_fetch_time"] = elapsed  # store fetch time for XFetch calculation
        r.setex(key, ttl, json.dumps(value))
        return value

    return json.loads(result)

Cache invalidation patterns

python

# Pattern 1: Tag-based invalidation
def get_user_orders(user_id: str) -> list:
    key = f"orders:{user_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    orders = db.query("SELECT * FROM orders WHERE user_id = %s", (user_id,)).fetchall()
    r.setex(key, 300, json.dumps(orders))
    # Tag for bulk invalidation
    r.sadd(f"tag:user:{user_id}", key)
    return orders


def invalidate_user_cache(user_id: str):
    """Invalidate all cache keys tagged with this user."""
    tag_key = f"tag:user:{user_id}"
    keys = r.smembers(tag_key)
    if keys:
        r.delete(*keys)
    r.delete(tag_key)


# Call on any user data change
def update_order(order_id: str, user_id: str, updates: dict):
    db.execute("UPDATE orders SET ... WHERE id = %s", (order_id,))
    invalidate_user_cache(user_id)  # invalidate all user-related cache keys

The two rules of caching

Cache the output of expensive operations, not cheap ones. If a DB query takes 1ms and your cache adds 0.5ms overhead, you are not winning.
Have a plan for invalidation before you cache anything. "We will figure out invalidation later" always means stale data in production. Decide on the invalidation strategy (TTL, explicit delete, tag-based) before you write the first cache line.

Cache stampede has caused more production incidents than stale data. Protect your DB with a mutex or probabilistic refresh, especially for your most popular cache keys.