FastAPIProduction

FastAPI 105: Caching Strategies & Redis Patterns, Cache-Aside, Invalidation & Thundering Herd

April 3, 202613 min readPART 05 / 18

In Part 4 we built rate limiting that actually holds across workers: token buckets in Redis, sliding windows, atomic pipelines. Part 5 is about the layer that makes your API fast, not just safe. Redis sits between your app and your database, absorbing read load so Postgres doesn't have to. Caching also comes with its own traps, including the thundering herd (500 requests slamming the DB the millisecond a TTL expires) and the slow-motion memory leak of keys with no expiry.

Why caching is not optional at scale

A Postgres query scanning an indexed table might take 5 to 20ms. Under moderate load, that's fine. At 500 req/s on the same endpoint you're doing 500 x 20ms = 10 seconds of query work per second, on a database that can realistically handle 300 to 500 concurrent connections before it starts queueing. You either cache, or you scale horizontally and spend money instead of milliseconds.

Redis reads from RAM. A simple GET takes about 0.1ms, roughly 200x faster than the cheapest indexed query. The economics are obvious. The implementation is where everyone gets it wrong, myself included.

The three caching patterns

1. Cache-aside (lazy loading)

The app checks the cache. On a miss, it queries the database, writes the result to cache, and returns. The cache never talks to the database directly. Your application owns that logic. This is the pattern I reach for 90% of the time.

Request
  |
  +- redis.get(key) HIT: return cached value
  |
  +- MISS:
       |
       +- query Postgres
       +- redis.setex(key, TTL, value)
       +- return value

import json
import redis.asyncio as redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

async def get_user_profile(user_id: int) -> dict:
    key = f"user:{user_id}:profile"

    # 1. Check cache
    cached = await r.get(key)
    if cached:
        return json.loads(cached)

    # 2. Cache miss: query DB
    user = await db.fetch_one(
        "SELECT id, name, email, role FROM users WHERE id = $1",
        user_id
    )
    if not user:
        return None

    # 3. Write to cache with TTL (300 seconds = 5 minutes)
    await r.setex(key, 300, json.dumps(dict(user)))
    return dict(user)

When to use it: read-heavy endpoints where data changes infrequently. User profiles, product details, configuration values. The majority of caching you will build is this pattern.

Trade-off: there's a window of staleness up to the TTL. A user updates their name, and for the next 5 minutes the cache serves the old one. For most use cases that's fine. For anything user-visible and immediately important (think "my own profile right after I saved it"), add explicit invalidation on write.

2. Write-through

Every write goes to the cache and the database, synchronously. The cache stays warm, and the stale-read problem largely goes away because writes keep it current.

async def update_user_profile(user_id: int, data: dict):
    # Write to DB
    await db.execute(
        "UPDATE users SET name = $1, email = $2 WHERE id = $3",
        data["name"], data["email"], user_id
    )

    # Write to cache immediately (keep it warm)
    key = f"user:{user_id}:profile"
    profile = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
    await r.setex(key, 300, json.dumps(dict(profile)))

Cost: every write does two hops (DB and cache), so write latency goes up. If writes are frequent and reads are sparse, you're paying cache write cost for data that may never get read before it expires. This pattern pays off when reads heavily outnumber writes.

3. Write-behind (write-back)

Write to cache, return success, flush to the database asynchronously in batches. The fastest write latency of the three. You are just writing to RAM.

Write -> cache ok -> return OK to client
              |
         (async worker)
              |
         flush to DB

The risk: if your Redis instance crashes between the write and the flush, that data is gone permanently. The client got a 200 OK. The database never saw it. This isn't theoretical. Redis is in-memory by default, and unless you've configured AOF persistence with fsync=always, you can lose recent writes on crash.

Write-behind is rare in transactional web APIs. It shows up in analytics pipelines and write-heavy batch systems where losing a few events is acceptable. For order creation, user data, or anything financial, I'd avoid it.

Cache invalidation: the genuinely hard part

Phil Karlton's quote isn't a joke. Invalidation is where stale data and subtle race conditions live. You have three tools.

Ttl-only

Set a TTL. Accept that data can be stale for up to that window. Simple. Works wherever eventual consistency is fine: public product catalogs, aggregate counts, leaderboards.

await r.setex(key, 300, value)  # expires in 5 minutes, no matter what

Write-invalidate (explicit delete)

On any mutation, delete the cache key. The next read rebuilds it. Precise, and the staleness window is near-zero. The catch is that you have to remember to invalidate everywhere writes happen. Forget one code path and you have a silent staleness bug that nobody catches until someone complains.

async def update_user(user_id: int, data: dict):
    await db.execute("UPDATE users SET name = $1 WHERE id = $2", data["name"], user_id)
    await r.delete(f"user:{user_id}:profile")  # cache gone, next read rebuilds

Versioned keys

Include a version number in the key. On update, increment the version. Old keys expire naturally via TTL. No explicit delete needed, old keys just become unreachable.

# Key: user:42:v3:profile
# After update: user:42:v4:profile
# v3 key orphans and expires on its own TTL

async def get_user_versioned(user_id: int):
    version = await r.get(f"user:{user_id}:version") or "1"
    key = f"user:{user_id}:v{version}:profile"
    return await r.get(key)

async def update_user_versioned(user_id: int, data: dict):
    await db.execute("UPDATE users ...", ...)
    await r.incr(f"user:{user_id}:version")  # bump version, old key becomes stale

The thundering herd problem

A popular cache key expires at 2 AM. You have 12 Uvicorn workers, and 200 requests/second hitting that endpoint. In the first 50ms after expiry: all 200 × 0.05 = 10 requests get a cache MISS simultaneously. All 10 (or 200, in a heavy traffic scenario) hit Postgres. Your database goes from idle to 100% CPU in under a second.

T=0: key expires
T=0.001: 200 simultaneous requests → MISS
T=0.001: 200 simultaneous DB queries fired
T=0.3:   DB CPU: 100%, connections queuing
T=1.0:   DB starts timing out → 500 errors

In-process mutex doesn't help here. Each worker has its own mutex. 12 workers means 12 simultaneous cache misses and 12 simultaneous DB queries. The mutex only protects within a single process.

The fix is a Redis-level distributed lock:

import asyncio

LOCK_TTL = 5  # seconds

async def get_with_stampede_protection(key: str, rebuild_fn):
    # 1. Try cache first
    value = await r.get(key)
    if value:
        return json.loads(value)

    lock_key = f"lock:{key}"

    # 2. Try to acquire distributed lock (NX = only set if not exists)
    acquired = await r.set(lock_key, "1", nx=True, ex=LOCK_TTL)

    if acquired:
        # We won the lock, rebuild the cache
        try:
            value = await rebuild_fn()
            await r.setex(key, 300, json.dumps(value))
            return value
        finally:
            await r.delete(lock_key)
    else:
        # Someone else is rebuilding, wait briefly and retry
        await asyncio.sleep(0.05)
        value = await r.get(key)
        return json.loads(value) if value else await rebuild_fn()

Only one worker across all processes rebuilds the cache. The rest wait 50ms and read the now-warm key. Postgres sees one query instead of 200.

Handling Redis downtime gracefully

Cache is an optimisation. Your API has to work without it. If Redis goes down, you fall through to the database (slower, but correct). The common mistake is letting a Redis exception propagate as a 500 to the client.

async def get_user_safe(user_id: int) -> dict:
    key = f"user:{user_id}:profile"

    try:
        cached = await r.get(key)
        if cached:
            return json.loads(cached)
    except redis.RedisError:
        # Redis is down, fall through to DB
        pass

    # Cache miss or Redis down, query DB directly
    user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)

    try:
        await r.setex(key, 300, json.dumps(dict(user)))
    except redis.RedisError:
        pass  # Can't write to cache, that's fine, just serve from DB

    return dict(user)

Common mistakes engineers make

No TTL on cache writes. Keys accumulate forever. Redis hits maxmemory, starts evicting with LRU, and you get random cache misses that look like intermittent bugs. Always use setex.
Caching entire DB rows when only two fields are needed. You end up invalidating on any field change, not just the fields you actually care about. Cache exactly what the endpoint returns, nothing more.
Caching mutable aggregates without write-invalidate. total_orders = 42 cached for 5 minutes. A new order arrives. For 5 minutes you're serving wrong counts. Either short TTL or explicit invalidation on write.
In-process mutex for stampede protection. Works within one process only. 12 Uvicorn workers means 12 simultaneous DB queries anyway. Use Redis-level distributed locks.
Not wrapping Redis calls in try/except. Redis is a network call. It can fail. If it does and you don't catch it, your whole endpoint returns 500. Cache failures should be invisible to the user.

Part 5 done. Next: Part 6: testing and reliability. What to actually test, contract tests, chaos patterns, and how to write tests that catch production bugs instead of just turning CI green.

← PREV

FastAPI 104: Rate Limiting & Throttling, Token Buckets, Sliding Windows & Redis