Staff Prep 05: Scaling Postgres Writes — Batching, UPSERT & Partitioning
Back to Part 04: Scaling Reads. Reads scale horizontally with replicas. Writes do not — they all go to the primary. So when your primary is write-saturated, you have fewer levers: batch more, write less, and partition to distribute I/O. This post covers every write-side technique worth knowing at the Staff level.
The cost of single-row inserts
Every INSERT in Postgres involves: acquiring a row lock, writing to the WAL, writing to the heap, updating all indexes, and flushing WAL to disk (on commit). For a single row insert followed by an immediate commit, you pay that flush cost — typically 1–5ms per commit on a fast SSD.
At 1,000 events/second, single-row inserts = 1,000 WAL flushes/second. That is the ceiling. At 1,001 events/second, you start queueing.
Batch inserts: the correct implementation
import asyncpg
# Slow: 1000 round trips, 1000 commits
async def insert_events_slow(events: list[dict]):
for event in events:
await conn.execute(
"INSERT INTO events (user_id, type, payload) VALUES ($1, $2, $3)",
event["user_id"], event["type"], event["payload"]
)
# Fast: 1 round trip, 1 commit, bulk COPY-style insert
async def insert_events_fast(conn: asyncpg.Connection, events: list[dict]):
# executemany uses a single prepared statement with multiple parameter sets
await conn.executemany(
"INSERT INTO events (user_id, type, payload) VALUES ($1, $2, $3)",
[(e["user_id"], e["type"], e["payload"]) for e in events]
)
# Fastest: COPY protocol (binary or CSV)
async def insert_events_copy(conn: asyncpg.Connection, events: list[dict]):
await conn.copy_records_to_table(
"events",
records=[(e["user_id"], e["type"], e["payload"]) for e in events],
columns=["user_id", "type", "payload"]
)
# COPY bypasses row-level logic but skips triggers and foreign key checks
Benchmark results on a typical cloud Postgres (db.t3.medium):
- Single-row inserts: ~500 rows/second
- executemany with 1000 rows: ~50,000 rows/second
- COPY with 1000 rows: ~200,000 rows/second
UPSERT semantics: INSERT ON CONFLICT
UPSERT (INSERT with ON CONFLICT) is deceptively subtle. The wrong implementation causes silent lost updates and phantom conflicts.
-- Table with a unique constraint on (user_id, date)
CREATE TABLE daily_stats (
user_id BIGINT NOT NULL,
date DATE NOT NULL,
event_count INT NOT NULL DEFAULT 0,
PRIMARY KEY (user_id, date)
);
-- Correct UPSERT: increment counter atomically
INSERT INTO daily_stats (user_id, date, event_count)
VALUES (42, '2026-04-04', 1)
ON CONFLICT (user_id, date)
DO UPDATE SET
event_count = daily_stats.event_count + EXCLUDED.event_count;
-- EXCLUDED refers to the row that was proposed but conflicted
-- Wrong: use of SET event_count = EXCLUDED.event_count
-- This would overwrite the existing count with the new value
-- instead of incrementing it — silent data loss
-- DO NOTHING variant: idempotent insert
INSERT INTO processed_events (event_id, processed_at)
VALUES ('abc123', NOW())
ON CONFLICT (event_id) DO NOTHING;
-- Safe to call multiple times — only inserts once
Deferred constraints and bulk writes
When bulk-inserting rows that reference each other (parent-child in the same batch), foreign key constraints block you. Deferred constraints let you insert the batch first and check constraints at commit time.
-- Define the FK as deferrable
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_order
FOREIGN KEY (order_id) REFERENCES orders(id)
DEFERRABLE INITIALLY DEFERRED;
-- Now you can insert items before the parent order in the same transaction
BEGIN;
SET CONSTRAINTS fk_order_items_order DEFERRED;
INSERT INTO order_items (order_id, product_id, qty) VALUES (999, 1, 2); -- order 999 doesn't exist yet
INSERT INTO orders (id, user_id, total) VALUES (999, 42, 29.99); -- now it does
COMMIT; -- FK check happens here — passes
Write buffering with background workers
For high-throughput event streams (clickstream, metrics, audit logs), writing synchronously to Postgres on every request is unnecessary. Buffer writes in memory or Redis, flush in batches.
import asyncio
import redis.asyncio as redis
r = redis.Redis(host="redis", port=6379)
# API: push event to Redis list (non-blocking, sub-millisecond)
async def record_event(user_id: int, event_type: str, payload: dict):
event = {"user_id": user_id, "type": event_type, "payload": payload}
await r.rpush("events:buffer", json.dumps(event))
# Background worker: flush buffer to Postgres every 5 seconds
async def flush_events_worker(db_pool):
while True:
await asyncio.sleep(5)
# Atomically pop up to 1000 events
pipe = r.pipeline()
pipe.lrange("events:buffer", 0, 999)
pipe.ltrim("events:buffer", 1000, -1)
results = await pipe.execute()
events = [json.loads(e) for e in results[0]]
if events:
async with db_pool.acquire() as conn:
await conn.copy_records_to_table(
"events",
records=[(e["user_id"], e["type"], json.dumps(e["payload"])) for e in events],
columns=["user_id", "type", "payload"]
)
Trade-off: if the process crashes between the Redis push and the Postgres flush, those buffered events are lost. Acceptable for analytics events; unacceptable for order creation. Know your data loss tolerance before applying this pattern.
Partitioning for write distribution
Table partitioning does not directly distribute writes across servers — it distributes them across physical storage segments. The benefit: inserts go into the current partition only, autovacuum works on smaller tables, and old partitions can be dropped (not deleted) — a near-instant operation.
-- Range partition by month (common for time-series data)
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
type TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2026_04 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
CREATE TABLE events_2026_05 PARTITION OF events
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
-- Drop old data in O(1) — no DELETE scan needed
DROP TABLE events_2024_01; -- instant, regardless of row count
-- Automate with pg_partman extension or a cron job
-- Attach a new partition before the current one fills up
Quiz: test your understanding
Before moving on, answer these in your head (or out loud):
- Why are single-row inserts with individual commits expensive in Postgres? What is the specific bottleneck?
- In
ON CONFLICT DO UPDATE SET count = EXCLUDED.count, what doesEXCLUDEDrefer to? What is wrong with this approach for incrementing a counter? - You need to insert 10,000 rows per second into an events table. Walk through your options from slowest to fastest, with trade-offs for each.
- What is the difference between deleting 5 million old events with
DELETE WHERE created_at < '2024-01-01'vs dropping a partition? Why does one cause table bloat and the other does not? - Your write-buffering worker crashes. The Redis buffer has 50,000 events. What happens? How would you make this crash-safe?
Next up — Part 06: Transactions & Concurrency. Isolation levels, deadlocks, FOR UPDATE, and the race conditions that haunt production systems.