Rate limiting algorithms — token bucket vs leaky bucket vs sliding window
Every rate limiter answers the same question: is this request allowed, or should it be rejected? The answer depends on the algorithm. Choose the wrong one and you get bursty traffic that overwhelms your backend, or smooth traffic that causes user-visible throttling at the exact wrong moment. Here are the four main algorithms, how they work, and when to use each.
Fixed window counter
The simplest algorithm: count requests in fixed time windows. Allow up to N requests per window.
# Fixed window: allow 100 req/minute
import time
import redis
r = redis.Redis()
def is_allowed_fixed_window(user_id: str, limit: int = 100) -> bool:
window = int(time.time() // 60) # current minute
key = f"rate:{user_id}:{window}"
count = r.incr(key)
if count == 1:
r.expire(key, 60) # expire after 1 minute
return count <= limit
Problem: The boundary attack. A user can send 100 requests at 00:59 and 100 more at 01:00 — 200 requests in 2 seconds, even though the limit is 100/minute.
Use when: Simplicity matters more than precision. Internal dashboards, admin APIs.
Sliding window log
Stores the timestamp of every request. Counts requests in the last N seconds — a true sliding window.
# Sliding window log: allow 100 req/minute
def is_allowed_sliding_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
key = f"rate:log:{user_id}"
now = time.time()
window_start = now - window
pipe = r.pipeline()
# Remove timestamps older than the window
pipe.zremrangebyscore(key, 0, window_start)
# Count remaining
pipe.zcard(key)
# Add current timestamp
pipe.zadd(key, {str(now): now})
# Set expiry
pipe.expire(key, window)
_, count, _, _ = pipe.execute()
return count < limit
Pros: Perfectly accurate. No boundary attack vulnerability.
Cons: High memory usage — stores every request timestamp. At 1000 req/min per user, 1000 timestamps per user in Redis.
Use when: Accuracy is critical and traffic volume is low. Financial APIs, payment processing.
Sliding window counter (hybrid)
Combines the memory efficiency of fixed windows with the accuracy of sliding windows. Approximates the sliding window count using the current window count plus a weighted portion of the previous window count.
# Sliding window counter approximation
def is_allowed_sliding_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
now = time.time()
current_window = int(now // window)
prev_window = current_window - 1
current_key = f"rate:{user_id}:{current_window}"
prev_key = f"rate:{user_id}:{prev_window}"
pipe = r.pipeline()
pipe.get(current_key)
pipe.get(prev_key)
current_count, prev_count = pipe.execute()
current_count = int(current_count or 0)
prev_count = int(prev_count or 0)
# How far into the current window are we? (0.0 to 1.0)
elapsed_fraction = (now % window) / window
# Approximate requests in the last 'window' seconds
approximate_count = (prev_count * (1 - elapsed_fraction)) + current_count
if approximate_count >= limit:
return False
# Increment current window
pipe = r.pipeline()
pipe.incr(current_key)
pipe.expire(current_key, window * 2)
pipe.execute()
return True
Pros: Memory-efficient (only 2 counters per user), reasonably accurate (within 0.003% error according to Cloudflare's analysis).
Use when: High traffic APIs where sliding log is too memory-intensive. Most web APIs.
Token bucket
Each user has a bucket that fills with tokens at a fixed rate. Each request consumes one token. Allows bursts up to the bucket capacity.
# Token bucket: 10 req/sec, burst up to 50
import time
def is_allowed_token_bucket(
user_id: str,
rate: float = 10.0, # tokens per second
capacity: int = 50, # max burst
) -> bool:
key = f"rate:bucket:{user_id}"
now = time.time()
# Lua script for atomic check-and-update
lua_script = """
local key = KEYS[1]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Add tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * rate)
if tokens < 1 then
return 0 -- rejected
end
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return 1 -- allowed
"""
result = r.eval(lua_script, 1, key, rate, capacity, now)
return bool(result)
Pros: Allows controlled bursts. Users can accumulate tokens when idle and use them in a burst — natural for human usage patterns.
Cons: Two parameters to tune (rate + capacity). Complex to implement atomically.
Use when: You want to allow reasonable bursts. Mobile app APIs, where users open the app and make several calls quickly.
Leaky bucket
Requests enter a queue (the bucket). They are processed at a fixed rate regardless of input rate. Excess requests that overflow the queue are rejected.
Difference from token bucket: Token bucket allows bursts up to the capacity. Leaky bucket processes at a constant rate regardless — output is always smooth.
Use when: You need to protect a downstream service that cannot handle bursts. Database write queues, email sending queues.
Decision guide
- User-facing API, allow occasional bursts: Token bucket
- High accuracy required, low traffic: Sliding window log
- High accuracy required, high traffic: Sliding window counter
- Protect downstream from any burst: Leaky bucket
- Simple use case, internal: Fixed window counter
The sliding window counter is the right default for most production APIs. It is accurate, memory-efficient, and implementable in a single Redis script.