Backend caching strategies — beyond simple key-value
Most engineers' first cache implementation is GET key → miss → query DB → SET key → return. That works for a week. Then you hit cache stampede during a traffic spike, stale data in production, or a cache invalidation bug that corrupts your user counts. Caching has patterns just like everything else in software engineering. Here are the ones that matter in production.
Cache-aside (lazy loading)
The application manages the cache explicitly. On cache miss, the application fetches from the database and populates the cache.
# Cache-aside pattern
import redis
import json
from typing import Optional
r = redis.Redis()
def get_user(user_id: str, ttl: int = 300) -> dict:
# 1. Try cache
cached = r.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# 2. Cache miss — fetch from DB
user = db.query("SELECT * FROM users WHERE id = %s", (user_id,)).fetchone()
if not user:
return None
# 3. Populate cache
r.setex(f"user:{user_id}", ttl, json.dumps(user))
return user
Pros: Only frequently accessed data ends up in cache. Resilient to cache outages (app falls back to DB).
Cons: Cache miss = extra latency. Risk of stale data between DB write and cache expiry.
Write-through
On every DB write, immediately update the cache. Cache always reflects the latest data.
# Write-through pattern
def update_user(user_id: str, updates: dict) -> dict:
# 1. Update database
user = db.execute(
"UPDATE users SET name=%s, email=%s WHERE id=%s RETURNING *",
(updates["name"], updates["email"], user_id)
).fetchone()
# 2. Update cache immediately
r.setex(f"user:{user_id}", 300, json.dumps(user))
return user
Pros: Cache is always consistent with DB. Reads are always fast.
Cons: Write latency includes cache write. Cache may be populated with data that is never read again.
Write-behind (write-back)
Writes go to the cache first. The cache asynchronously persists to the database. Lower write latency, higher complexity.
# Write-behind: dangerous, use carefully
import threading
from queue import Queue
write_queue: Queue = Queue()
def update_user_fast(user_id: str, updates: dict):
# Write to cache immediately (synchronous, fast)
r.setex(f"user:{user_id}", 300, json.dumps(updates))
# Enqueue async DB write
write_queue.put((user_id, updates))
def db_writer_thread():
"""Background thread that flushes writes to DB."""
while True:
user_id, updates = write_queue.get()
try:
db.execute("UPDATE users SET name=%s WHERE id=%s", (updates["name"], user_id))
db.commit()
except Exception as e:
# If DB write fails, data is in cache but not persisted
# You MUST handle this — re-queue, alert, or persist to disk
logger.error("DB write failed: %s", e)
write_queue.put((user_id, updates)) # re-queue
Use only when: Write performance is critical and you can tolerate the risk of data loss if the cache fails before the DB write completes. Session data, real-time analytics aggregations.
Cache stampede prevention
The thundering herd: a cache key expires, 1000 requests simultaneously get a miss, all query the database at once. The DB collapses. Prevention:
# Option 1: Mutex lock on cache miss
import threading
_locks: dict[str, threading.Lock] = {}
def get_user_with_lock(user_id: str) -> dict:
cache_key = f"user:{user_id}"
# Fast path: cache hit
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Slow path: acquire per-key lock
lock = _locks.setdefault(cache_key, threading.Lock())
with lock:
# Re-check after acquiring lock (another thread may have populated it)
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# We hold the lock — we are the one to populate the cache
user = db.query("SELECT * FROM users WHERE id = %s", (user_id,)).fetchone()
r.setex(cache_key, 300, json.dumps(user))
return user
# Option 2: Probabilistic early expiry (XFetch algorithm)
import math
import random
import time
def get_with_early_expiry(key: str, ttl: int, fetch_fn, beta: float = 1.0):
"""Stochastically refresh the cache before it expires."""
result = r.get(key)
if result:
data = json.loads(result)
remaining_ttl = r.ttl(key)
# Compute fetch time from metadata stored in cache
fetch_time = data.get("_fetch_time", 1.0)
# XFetch: decide whether to refresh early
if remaining_ttl - beta * fetch_time * math.log(random.random()) < 0:
# Trigger early refresh
result = None
if not result:
start = time.monotonic()
value = fetch_fn()
elapsed = time.monotonic() - start
value["_fetch_time"] = elapsed # store fetch time for XFetch calculation
r.setex(key, ttl, json.dumps(value))
return value
return json.loads(result)
Cache invalidation patterns
# Pattern 1: Tag-based invalidation
def get_user_orders(user_id: str) -> list:
key = f"orders:{user_id}"
cached = r.get(key)
if cached:
return json.loads(cached)
orders = db.query("SELECT * FROM orders WHERE user_id = %s", (user_id,)).fetchall()
r.setex(key, 300, json.dumps(orders))
# Tag for bulk invalidation
r.sadd(f"tag:user:{user_id}", key)
return orders
def invalidate_user_cache(user_id: str):
"""Invalidate all cache keys tagged with this user."""
tag_key = f"tag:user:{user_id}"
keys = r.smembers(tag_key)
if keys:
r.delete(*keys)
r.delete(tag_key)
# Call on any user data change
def update_order(order_id: str, user_id: str, updates: dict):
db.execute("UPDATE orders SET ... WHERE id = %s", (order_id,))
invalidate_user_cache(user_id) # invalidate all user-related cache keys
The two rules of caching
- Cache the output of expensive operations, not cheap ones. If a DB query takes 1ms and your cache adds 0.5ms overhead, you are not winning.
- Have a plan for invalidation before you cache anything. "We will figure out invalidation later" always means stale data in production. Decide on the invalidation strategy (TTL, explicit delete, tag-based) before you write the first cache line.
Cache stampede has caused more production incidents than stale data. Protect your DB with a mutex or probabilistic refresh, especially for your most popular cache keys.