April 4, 2026Architecture8 min read

Rate limiting algorithms — token bucket vs leaky bucket vs sliding window

Published April 4, 20268 min read

Every rate limiter answers the same question: is this request allowed, or should it be rejected? The answer depends on the algorithm. Choose the wrong one and you get bursty traffic that overwhelms your backend, or smooth traffic that causes user-visible throttling at the exact wrong moment. Here are the four main algorithms, how they work, and when to use each.

Fixed window counter

The simplest algorithm: count requests in fixed time windows. Allow up to N requests per window.

python

# Fixed window: allow 100 req/minute
import time
import redis

r = redis.Redis()

def is_allowed_fixed_window(user_id: str, limit: int = 100) -> bool:
    window = int(time.time() // 60)  # current minute
    key = f"rate:{user_id}:{window}"

    count = r.incr(key)
    if count == 1:
        r.expire(key, 60)  # expire after 1 minute

    return count <= limit

Problem: The boundary attack. A user can send 100 requests at 00:59 and 100 more at 01:00 — 200 requests in 2 seconds, even though the limit is 100/minute.

Use when: Simplicity matters more than precision. Internal dashboards, admin APIs.

Sliding window log

Stores the timestamp of every request. Counts requests in the last N seconds — a true sliding window.

python

# Sliding window log: allow 100 req/minute
def is_allowed_sliding_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
    key = f"rate:log:{user_id}"
    now = time.time()
    window_start = now - window

    pipe = r.pipeline()
    # Remove timestamps older than the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count remaining
    pipe.zcard(key)
    # Add current timestamp
    pipe.zadd(key, {str(now): now})
    # Set expiry
    pipe.expire(key, window)
    _, count, _, _ = pipe.execute()

    return count < limit

Pros: Perfectly accurate. No boundary attack vulnerability.
Cons: High memory usage — stores every request timestamp. At 1000 req/min per user, 1000 timestamps per user in Redis.

Use when: Accuracy is critical and traffic volume is low. Financial APIs, payment processing.

Sliding window counter (hybrid)

Combines the memory efficiency of fixed windows with the accuracy of sliding windows. Approximates the sliding window count using the current window count plus a weighted portion of the previous window count.

python

# Sliding window counter approximation
def is_allowed_sliding_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
    now = time.time()
    current_window = int(now // window)
    prev_window = current_window - 1

    current_key = f"rate:{user_id}:{current_window}"
    prev_key = f"rate:{user_id}:{prev_window}"

    pipe = r.pipeline()
    pipe.get(current_key)
    pipe.get(prev_key)
    current_count, prev_count = pipe.execute()

    current_count = int(current_count or 0)
    prev_count = int(prev_count or 0)

    # How far into the current window are we? (0.0 to 1.0)
    elapsed_fraction = (now % window) / window

    # Approximate requests in the last 'window' seconds
    approximate_count = (prev_count * (1 - elapsed_fraction)) + current_count

    if approximate_count >= limit:
        return False

    # Increment current window
    pipe = r.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window * 2)
    pipe.execute()
    return True

Pros: Memory-efficient (only 2 counters per user), reasonably accurate (within 0.003% error according to Cloudflare's analysis).
Use when: High traffic APIs where sliding log is too memory-intensive. Most web APIs.

Token bucket

Each user has a bucket that fills with tokens at a fixed rate. Each request consumes one token. Allows bursts up to the bucket capacity.

python

# Token bucket: 10 req/sec, burst up to 50
import time

def is_allowed_token_bucket(
    user_id: str,
    rate: float = 10.0,    # tokens per second
    capacity: int = 50,    # max burst
) -> bool:
    key = f"rate:bucket:{user_id}"
    now = time.time()

    # Lua script for atomic check-and-update
    lua_script = """
    local key = KEYS[1]
    local rate = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on elapsed time
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)

    if tokens < 1 then
        return 0  -- rejected
    end

    tokens = tokens - 1
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)
    return 1  -- allowed
    """

    result = r.eval(lua_script, 1, key, rate, capacity, now)
    return bool(result)

Pros: Allows controlled bursts. Users can accumulate tokens when idle and use them in a burst — natural for human usage patterns.
Cons: Two parameters to tune (rate + capacity). Complex to implement atomically.

Use when: You want to allow reasonable bursts. Mobile app APIs, where users open the app and make several calls quickly.

Leaky bucket

Requests enter a queue (the bucket). They are processed at a fixed rate regardless of input rate. Excess requests that overflow the queue are rejected.

Difference from token bucket: Token bucket allows bursts up to the capacity. Leaky bucket processes at a constant rate regardless — output is always smooth.

Use when: You need to protect a downstream service that cannot handle bursts. Database write queues, email sending queues.

Decision guide

User-facing API, allow occasional bursts: Token bucket
High accuracy required, low traffic: Sliding window log
High accuracy required, high traffic: Sliding window counter
Protect downstream from any burst: Leaky bucket
Simple use case, internal: Fixed window counter

The sliding window counter is the right default for most production APIs. It is accurate, memory-efficient, and implementable in a single Redis script.