Reading Postgres EXPLAIN ANALYZE: the guide I wish I had three years ago
I stared at EXPLAIN ANALYZE output for a year without really understanding it. I knew "Seq Scan bad, Index Scan good" but nothing else. Then a senior DBA walked me through a real slow query for 30 minutes and everything clicked. Here is what I learned — the three numbers that matter, the nodes to look for, and the patterns that signal specific performance problems.
How to run EXPLAIN ANALYZE
-- Basic: plan only (no actual execution)
EXPLAIN SELECT * FROM orders WHERE user_id = $1;
-- Full analysis: executes the query, shows actual vs estimated
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = $1;
-- For production: wrap in a transaction to rollback writes
BEGIN;
EXPLAIN (ANALYZE, BUFFERS)
UPDATE orders SET status = 'completed' WHERE id = $1;
ROLLBACK;
The three numbers that matter
Seq Scan on orders
(cost=0.00..45231.00 rows=892000 width=156)
(actual time=0.042..1823.291 rows=892000 loops=1)
^-- estimated ^-- actual
cost (in Postgres cost units)
^-- rows Postgres predicted
^-- rows actually returned
1. Actual vs estimated rows: When actual rows are much higher than estimated (by 10x or more), Postgres made a bad plan decision based on stale statistics. Fix: ANALYZE tablename; to update statistics.
2. actual time: The real wall-clock time. Look at the innermost node with the highest time — that's your bottleneck.
3. loops: If loops > 1, actual time is per-loop. Multiply: actual time × loops = total time for that node.
Reading a real slow query plan
Hash Join (cost=18234.56..89234.12 rows=456 width=200)
(actual time=1823.45..2134.23 rows=12 loops=1)
Hash Cond: (orders.user_id = users.id)
Buffers: shared hit=234 read=89432 <- IO: 89K pages read from disk
-> Seq Scan on orders <- PROBLEM: full table scan
(cost=0.00..54321.00 rows=892000 width=100)
(actual time=0.05..1234.56 rows=892000 loops=1)
Filter: (status = 'completed')
Rows Removed by Filter: 123456
-> Hash (cost=1234.56..1234.56 rows=10000 width=100)
(actual time=45.23..45.23 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1234kB
-> Seq Scan on users
(cost=0.00..1234.56 rows=10000 width=100)
(actual time=0.05..34.23 rows=10000 loops=1)
What this tells me:
- The orders table scan is the bottleneck (1234ms of the 2134ms total)
- 892K rows scanned, only 12 rows returned — 99.9% filter ratio means the index is missing
- 89,432 disk pages read — IO-bound, not CPU-bound
- Fix:
CREATE INDEX ON orders (status) WHERE status = 'completed';
The patterns that indicate specific problems
Pattern 1: Seq Scan + high rows removed by filter
Seq Scan on orders
Rows Removed by Filter: 891988 <- scanning 892K to get 12 rows
Fix: Add an index on the filtered column(s)
Pattern 2: Massive rows estimate mismatch
(cost=... rows=10 ...) <- Postgres estimated 10 rows
(actual time=... rows=89432 ...) <- Actually got 89K rows
Fix: ANALYZE tablename; or consider extending pg_statistic with CREATE STATISTICS
Pattern 3: Nested Loop Join with many loops
Nested Loop
-> Seq Scan on users (rows=10000)
-> Index Scan on orders <- Runs 10000 times!
(actual time=0.05..1.2 loops=10000)
Total: 10000 * 1.2ms = 12 seconds
Fix: Add index on orders.user_id or increase work_mem for hash join
Pattern 4: Hash Batches > 1 (spilling to disk)
Hash (Batches: 8 Memory Usage: 4096kB) <- Batches > 1 = spilling to disk
Fix: SET work_mem = '64MB'; (or increase per session for the heavy query)
Quick diagnostic checklist
- Find the innermost node with the highest actual time
- Check if it is a Seq Scan — if so, check the filter ratio
- Check if rows estimate matches actual rows — big mismatch = stale stats
- Check
shared readin BUFFERS — high read count = IO bound (needs index or more cache) - Check Hash Batches — anything > 1 means work_mem is too low for this query
The online tool explain.dalibo.com renders EXPLAIN output as a visual tree with color-coding. Paste your output there and the bottleneck becomes visually obvious.