ArchitectureStaff

Staff Prep 06: Transactions & Concurrency — Isolation, Deadlocks & Race Conditions

April 4, 202610 min readPART 10 / 19

Back to Part 05: Scaling Writes. Concurrency is where systems fail silently. Two users click "buy" at the same instant. A counter gets decremented twice when it should only move once. A reservation gets double-booked. I've seen every one of these ship to production, usually from code that looked fine in review. Here's how Postgres isolation levels and locking actually prevent them.

The four isolation levels and what they prevent

SQL defines four isolation levels. Each prevents a different class of concurrency anomaly:

Dirty read. Reading uncommitted data from another transaction. Postgres never allows this, even at Read Uncommitted (the level basically doesn't exist here).
Non-repeatable read. Reading the same row twice in one transaction and getting different values because another transaction committed in between.
Phantom read. Running the same range query twice and getting different rows because somebody inserted or deleted inside that range.
Write skew. Two transactions each read overlapping data, make decisions based on it, and write non-overlapping data, violating an invariant. It's not in the SQL standard, but Postgres's Serializable level catches it.

sql

-- Set isolation level for a transaction
BEGIN ISOLATION LEVEL READ COMMITTED;     -- Postgres default
BEGIN ISOLATION LEVEL REPEATABLE READ;    -- Prevents non-repeatable reads and phantoms
BEGIN ISOLATION LEVEL SERIALIZABLE;       -- Prevents write skew; slowest

-- Or at session level
SET default_transaction_isolation = 'repeatable read';

Write skew: the subtle bug

Write skew is the hardest concurrency bug to spot, and the one most people have never heard of. The classic example is an on-call system where at least one doctor must always be on duty.

sql

-- Doctor A and Doctor B are both on call
-- Both want to go off-call simultaneously

-- Transaction A (Doctor A goes off-call):
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM doctors WHERE on_call = true;  -- returns 2
-- Thinks: "2 doctors on call, safe to remove myself"
UPDATE doctors SET on_call = false WHERE id = 1;
COMMIT;

-- Transaction B (Doctor B goes off-call), concurrent with A:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM doctors WHERE on_call = true;  -- also returns 2 (snapshot)
-- Thinks: "2 doctors on call, safe to remove myself"
UPDATE doctors SET on_call = false WHERE id = 2;
COMMIT;

-- Result: 0 doctors on call. Both transactions "succeeded". Invariant violated.

-- Fix: use SERIALIZABLE isolation (detects the conflict, one transaction will retry)
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM doctors WHERE on_call = true;
-- Postgres tracks the read set; if another serializable transaction modifies it,
-- one will get: ERROR: could not serialize access due to read/write dependencies

FOR UPDATE: explicit row locking

When you read a row and then update it, the gap between SELECT and UPDATE is a race condition waiting to happen. Another transaction can slip in and modify the row between your read and your write. FOR UPDATE locks the row at read time. This is the tool most engineers reach for too late, usually after a post-mortem.

sql

-- Race condition without locking
BEGIN;
SELECT balance FROM accounts WHERE id = 42;  -- reads 100
-- Another transaction runs: UPDATE accounts SET balance = 50 WHERE id = 42
UPDATE accounts SET balance = balance - 30 WHERE id = 42;  -- sets to 70, not 20!
COMMIT;

-- Safe: lock the row at read time
BEGIN;
SELECT balance FROM accounts WHERE id = 42 FOR UPDATE;  -- acquires lock, reads 100
-- Other transaction's UPDATE will BLOCK here until we COMMIT
UPDATE accounts SET balance = balance - 30 WHERE id = 42;  -- correctly sets to 70
COMMIT;

-- FOR UPDATE SKIP LOCKED: skip rows already locked by other transactions
-- Useful for job queues where workers pick up work without blocking each other
SELECT * FROM jobs WHERE status = 'pending' ORDER BY created_at LIMIT 1
FOR UPDATE SKIP LOCKED;
-- Each worker gets a different job row, no lock contention

Deadlocks: how they happen and how to prevent them

A deadlock happens when two transactions each hold a lock the other needs. Postgres detects the cycle and kills one of them with an error. Your app has to retry. If you're not handling retries cleanly, you're already broken; you just haven't hit the right traffic pattern yet.

sql

-- Deadlock scenario
-- Transaction A locks account 1, then tries to lock account 2
-- Transaction B locks account 2, then tries to lock account 1

-- Transaction A:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- locks row 1
-- ... some work ...
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- waits for B to release lock 2

-- Transaction B (concurrent):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- locks row 2
-- ... some work ...
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- waits for A to release lock 1
-- DEADLOCK DETECTED -- Postgres kills one of these transactions

-- Prevention: always acquire locks in a consistent order
-- If all transactions lock accounts in ascending id order, deadlocks are impossible
BEGIN;
-- Always lock lower id first
UPDATE accounts SET balance = balance - 100 WHERE id = LEAST(1, 2);
UPDATE accounts SET balance = balance + 100 WHERE id = GREATEST(1, 2);

Advisory locks: application-level coordination

Advisory locks let your application use Postgres as a distributed mutex without locking actual rows. I use these for making sure only one background worker runs a specific job, or for coordinating cron jobs across multiple app servers. They're underrated.

sql

-- Session-level advisory lock (held until released or session ends)
SELECT pg_try_advisory_lock(12345);  -- returns true if acquired, false if already held

-- Safe pattern: try to acquire, do work, release
DO $$
BEGIN
    IF pg_try_advisory_lock(42) THEN
        -- Do exclusive work here
        RAISE NOTICE 'Lock acquired, running exclusive job';
        PERFORM pg_advisory_unlock(42);
    ELSE
        RAISE NOTICE 'Another worker has the lock, skipping';
    END IF;
END;
$$;

-- Transaction-level advisory lock (released automatically at COMMIT/ROLLBACK)
SELECT pg_try_advisory_xact_lock(42);
-- Safer: no need to remember to release

Optimistic vs pessimistic concurrency

Pessimistic locking with FOR UPDATE grabs the row immediately and makes other transactions wait. It's the right call when contention is high and the critical section is short.

Optimistic concurrency reads without locking, then checks at write time that the row hasn't changed. If it has, you retry. I reach for this when conflicts are rare and the critical section is long, like a document editor where a user might sit on a form for 30 seconds. Holding a row lock that long is a bad idea.

sql

-- Optimistic concurrency with a version column
CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    content     TEXT,
    version     INT NOT NULL DEFAULT 1,
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Read the document and its version
SELECT id, content, version FROM documents WHERE id = 5;
-- User edits content for 30 seconds...

-- Write: only update if version has not changed
UPDATE documents
SET content = 'new content', version = version + 1
WHERE id = 5 AND version = 1;  -- optimistic check

-- If 0 rows updated: someone else modified it, retry or show conflict to user
GET DIAGNOSTICS rows_updated = ROW_COUNT;
IF rows_updated = 0 THEN
    RAISE EXCEPTION 'Conflict: document was modified by another user';
END IF;

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

What is write skew and why does Repeatable Read not prevent it? What isolation level does?
You have a ticket reservation system. Two users try to buy the last ticket simultaneously. Write the safe SQL that prevents double-booking.
Walk through exactly how a deadlock occurs with two transactions each updating two accounts. What is the prevention strategy?
When would you choose optimistic concurrency over FOR UPDATE? When would you choose pessimistic?
What does FOR UPDATE SKIP LOCKED do and why is it useful for job queue implementations?

Next up: Part 07: Zero-Downtime Migrations. Expand-contract, concurrent index builds, and how to add a NOT NULL column to a 500M-row table without taking the site down.

← PREV

Postgres Internals 104: Indexes