Staff Prep 26: Load Balancing Strategies — L4 vs L7, Health Checks & Consistent Hashing
ArchitectureStaff

Staff Prep 26: Load Balancing Strategies — L4 vs L7, Health Checks & Consistent Hashing

April 4, 20269 min readPART 04 / 06

Back to Part 25: Message Queues. Load balancers distribute traffic across backend instances. The layer they operate at, TCP (L4) or HTTP (L7), decides what routing tricks you can actually pull off. When backends scale up or down, consistent hashing is what keeps your cache from evaporating in a single deploy. I always reach for an L7 balancer by default because the operational flexibility is worth the parsing overhead nine times out of ten.

L4 vs L7 load balancing

An L4 balancer sees TCP packets, not their contents. It routes on IP address and port. That makes it fast (no HTTP parsing) but blind to anything at the application layer. It cannot route by URL path, HTTP header or method.

An L7 balancer terminates the HTTP connection and reads the full request. It can route by URL path, headers, cookies and even request body. You pay the cost of parsing, and you get real routing in exchange.

text
L4 Load Balancer (AWS NLB):
  - Routes by: IP, TCP port
  - Can do: round-robin, least connections, hash by source IP
  - Cannot do: route by URL path, inspect headers, inject headers
  - Use case: TCP load balancing (databases, game servers, raw TCP services)

L7 Load Balancer (AWS ALB, nginx, HAProxy):
  - Routes by: URL path, headers, cookies, hostname
  - Can do: SSL termination, path-based routing, header injection, health checks
  - Use case: HTTP/HTTPS services, API gateways, microservices routing

Example L7 routing rules (ALB):
/api/v1/orders → order-service target group
/api/v1/users  → user-service target group
/static/*      → S3 bucket (redirect rule)
Host: admin.myapp.com → admin-service target group

Load balancing algorithms

python
import random
import hashlib
from typing import Optional

# Round-robin: simplest, equal distribution
class RoundRobinBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends
        self.index = 0

    def next_backend(self) -> str:
        backend = self.backends[self.index % len(self.backends)]
        self.index += 1
        return backend

# Least connections: routes to backend with fewest active requests
class LeastConnectionsBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends
        self.connections = {b: 0 for b in backends}

    def next_backend(self) -> str:
        return min(self.connections, key=self.connections.get)

# IP hash: same client always routes to same backend (poor man's sticky session)
class IPHashBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends

    def next_backend(self, client_ip: str) -> str:
        hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
        return self.backends[hash_val % len(self.backends)]

Consistent hashing: minimal disruption on scaling

Simple modulo hashing (hash(key) % N) falls apart when N changes. Add or remove a backend and roughly (N-1)/N of all keys remap to a different server. For caches that is a mass miss event, and your database gets to enjoy the stampede. Consistent hashing limits the damage. When you add a node, only about 1/N of keys move.

python
import bisect
import hashlib

class ConsistentHash:
    def __init__(self, vnodes: int = 100):
        self.vnodes = vnodes
        self.ring = {}     # hash_position -> server_name
        self.sorted_keys = []

    def add_server(self, server: str):
        for i in range(self.vnodes):
            key = f"{server}:{i}"
            hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
            self.ring[hash_pos] = server
            bisect.insort(self.sorted_keys, hash_pos)

    def remove_server(self, server: str):
        for i in range(self.vnodes):
            key = f"{server}:{i}"
            hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
            del self.ring[hash_pos]
            self.sorted_keys.remove(hash_pos)

    def get_server(self, key: str) -> str:
        if not self.ring:
            raise ValueError("No servers in ring")
        hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
        # Find the first server position >= hash_pos (clockwise)
        idx = bisect.bisect(self.sorted_keys, hash_pos)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around
        return self.ring[self.sorted_keys[idx]]

# Usage
ring = ConsistentHash(vnodes=150)  # more vnodes = more even distribution
ring.add_server("cache-1")
ring.add_server("cache-2")
ring.add_server("cache-3")

server = ring.get_server("user:42:profile")  # always maps to same server

# Add a 4th server: only ~25% of keys remap (vs ~75% with modulo hashing)
ring.add_server("cache-4")

Health checks: passive vs active

python
from fastapi import FastAPI, Response

app = FastAPI()

# Health check endpoint (for load balancer probes)
@app.get("/health")
async def health_check(db=Depends(get_db)):
    try:
        await db.execute("SELECT 1")
        db_ok = True
    except Exception:
        db_ok = False

    if not db_ok:
        return Response(
            content='{"status": "degraded", "db": false}',
            status_code=503  # signals LB to route away
        )

    return {"status": "ok", "db": True}

# Deep health check: tests all dependencies
@app.get("/health/deep")
async def deep_health(db=Depends(get_db), redis=Depends(get_redis)):
    checks = {
        "db": await check_db(db),
        "redis": await check_redis(redis),
        "disk": check_disk_space(),
    }
    all_ok = all(checks.values())
    return Response(
        content=json.dumps({"status": "ok" if all_ok else "degraded", "checks": checks}),
        status_code=200 if all_ok else 503
    )

Sticky sessions: when and why to avoid them

Sticky sessions route a user's requests to the same backend based on a cookie. This allows in-memory session state per backend. The problems:

  • Uneven load distribution (popular users all on one backend)
  • When a backend fails, all its "sticky" users lose their session
  • Makes auto-scaling harder (new backends start empty)

Prefer stateless backends with session state in Redis. Then any backend can serve any request.

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What can an L7 load balancer do that an L4 cannot? Give three concrete routing decisions only possible at L7.
  2. You have 4 cache servers. With modulo hashing, one server fails. What fraction of cache keys need to be remapped? With consistent hashing?
  3. Your health check returns 200 OK even when the database is down. What happens? How do you fix the health check?
  4. An L7 load balancer sees a request with Authorization: Bearer EXPIRED_TOKEN. Can it reject this request before it reaches your service? What does it need for that?
  5. Your app uses sticky sessions backed by in-memory session storage. A backend pod is killed during a rolling deployment. What happens to users on that pod? How do you design around this?

Next up: Part 27: CAP Theorem & Distributed Systems. The real version of CAP, eventual consistency, conflict resolution, and CRDTs.

← PREV
Staff Prep 25: Message Queues — Kafka vs SQS vs Redis Streams
← All Architecture Posts