ArchitectureStaff

Staff Prep 02: Reading EXPLAIN ANALYZE Like a Query Planner

April 4, 202610 min readPART 02 / 19

After Part 01: Schema Design, today we go inside the query planner. EXPLAIN ANALYZE is the most powerful diagnostic tool in your Postgres arsenal. Most engineers know it exists. Staff engineers read it fluently, and they spot sequential scans, bad estimates, and nested loop anti-patterns without thinking about it. That's what we're building today.

EXPLAIN vs EXPLAIN ANALYZE: the difference

EXPLAIN shows the plan the planner intends to execute: estimated costs, estimated row counts, node types. It does not actually run the query. Use it when the query is too expensive to run just for diagnostics.

EXPLAIN ANALYZE actually executes the query and shows both the plan and the reality: actual rows returned, actual time spent per node. The gap between estimated and actual is where performance bugs hide, and it's the first place I look.

sql

-- Basic usage
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- With actual execution stats
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

-- Full detail: buffers, WAL, settings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.amount, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > NOW() - INTERVAL '30 days';

Reading the output: top-down, inside-out

EXPLAIN output is a tree. The deepest (most indented) nodes run first, and results bubble up. You read it bottom-up for execution order and top-down for overall plan structure. Here is a real example.

text

Hash Join  (cost=2841.00..9428.50 rows=15000 width=72)
            (actual time=45.231..180.447 rows=14823 loops=1)
  Hash Cond: (orders.user_id = users.id)
  ->  Seq Scan on orders  (cost=0.00..5500.00 rows=180000 width=40)
                           (actual time=0.018..55.432 rows=180000 loops=1)
        Filter: (created_at > (now() - '30 days'::interval))
        Rows Removed by Filter: 20000
  ->  Hash  (cost=1500.00..1500.00 rows=50000 width=32)
             (actual time=30.124..30.124 rows=50000 loops=1)
        Buckets: 65536  Batches: 1  Memory Usage: 3200kB
        ->  Seq Scan on users  (cost=0.00..1500.00 rows=50000 width=32)
                                (actual time=0.009..18.203 rows=50000 loops=1)

Planning Time: 1.234 ms
Execution Time: 182.891 ms

Key things to read here:

cost=start..total. Start cost is how long before the first row appears. Total is full execution cost. Units are arbitrary planner cost units, not milliseconds.
rows=estimated vs actual. rows=15000 estimated, rows=14823 actual. Close. When these diverge by 10x or more, the planner is working off bad statistics.
Seq Scan: full table scan. Not always bad. On a small table, a seq scan beats an index scan. On a 10M-row table with a 0.1% selectivity filter, it's a problem.
Hash Join vs Nested Loop vs Merge Join. Each one is optimal in different scenarios. Nested Loop is good for small inner sets, Hash Join for large unsorted sets, Merge Join for pre-sorted large sets.

The four Node types you must know

Sequential scan (seq scan)

Reads every row in the table. Expected on small tables, or when selectivity is low (fetching 10 to 20% of rows or more). A red flag when fetching a tiny fraction of a large table.

Index scan

Uses a B-tree index to find rows. Each matching row requires a heap fetch (the actual table row). Fast for high-selectivity queries. If you're fetching many rows, each heap fetch is a random I/O, which can be slower than a seq scan past about 5 to 10% of the table.

Index only scan

Satisfied entirely from the index, no heap fetch needed. Only possible when all columns in the SELECT are in the index (covering index). The fastest scan type for read-heavy queries.

sql

-- This query can use an Index Only Scan if the index covers all columns
CREATE INDEX idx_orders_user_amount
    ON orders (user_id, amount)
    INCLUDE (created_at);  -- covering: no heap fetch needed

EXPLAIN SELECT user_id, amount, created_at
FROM orders
WHERE user_id = 42;
-- Plan: Index Only Scan using idx_orders_user_amount

Bitmap heap scan

Used when many rows match the index condition. Postgres first builds a bitmap of matching heap pages (Bitmap Index Scan), then fetches those pages in order (Bitmap Heap Scan). Reduces random I/O by sorting the page fetches. Common for medium-selectivity queries.

Diagnosing bad row estimates

The planner uses table statistics (pg_statistic) to estimate row counts. Stale stats lead to bad estimates, which lead to wrong join strategies and wrong index choices.

sql

-- Check when stats were last updated
SELECT schemaname, tablename, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE tablename = 'orders';

-- Force a stats update if stale
ANALYZE orders;

-- Increase stats target for a column with high cardinality
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

The default statistics target is 100. For columns used in WHERE clauses with many distinct values (like status or category_id), bumping to 500 gives the planner better histogram data and more accurate estimates.

Reading actual vs estimated: spotting the problem

text

-- Bad: estimates wildly off
Nested Loop  (cost=0.00..5.00 rows=1 width=40)
              (actual time=0.012..8450.234 rows=50000 loops=1)
  -- Planner thought 1 row, got 50,000
  -- Chose Nested Loop (good for small sets), terrible for 50k rows
  -- Should have used Hash Join instead

-- Root cause: missing statistics on a join column, or
-- skewed distribution (99% of rows have user_id = NULL)

When estimated rows is 1 and actual is 50,000, the planner chose Nested Loop (optimal for tiny sets) and got hammered with a large one. The fix is usually to update statistics or add a partial index. As a last resort you can flip enable_nestloop = off at the session level, though I try very hard not to live there.

The BUFFERS option: finding I/O bottlenecks

sql

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;

-- Output includes:
-- Buffers: shared hit=245 read=1820
-- hit = read from shared_buffers (memory), free
-- read = read from disk, expensive
-- High read count = your working set exceeds shared_buffers

If shared hit is low and read is high, you have two options: increase shared_buffers (RAM permitting) or reduce the data scanned (better indexes).

Practical workflow: optimising a slow query

sql

-- Step 1: Run EXPLAIN ANALYZE BUFFERS on the slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.amount, u.name, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- Step 2: Look for:
-- a) Seq Scans on large tables
-- b) Estimated rows << actual rows
-- c) High Buffers: read count
-- d) Nested Loop with large actual row counts

-- Step 3: Create targeted indexes
CREATE INDEX CONCURRENTLY idx_orders_status_created
    ON orders (status, created_at DESC)
    WHERE status = 'pending';   -- partial index, only pending rows

-- Step 4: Re-run EXPLAIN ANALYZE and verify the plan changed

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

What is the difference between EXPLAIN and EXPLAIN ANALYZE? When would you use one over the other?
You see Seq Scan on orders (rows=1000000) in a query that filters by user_id = 42. What could cause Postgres to choose a seq scan over an index scan?
The planner estimates 5 rows but actually returns 80,000. What are the likely causes and how do you fix them?
What is a Bitmap Heap Scan and when does Postgres prefer it over a plain Index Scan?
What does Buffers: shared hit=10 read=5000 tell you about your database's memory configuration?

Next: Part 03: indexing strategy. B-tree internals, composite index ordering, partial indexes, and when NOT to index.

← PREV

Staff Prep 01: Postgres Schema Design Under Constraints