Staff Prep 02: Reading EXPLAIN ANALYZE Like a Query Planner
ArchitectureStaff

Staff Prep 02: Reading EXPLAIN ANALYZE Like a Query Planner

April 4, 202610 min readPART 02 / 19

Back to Part 01: Schema Design. Today we go inside the query planner. EXPLAIN ANALYZE is the most powerful diagnostic tool in your Postgres arsenal. Most engineers know it exists. Staff engineers can read it fluently — they spot sequential scans, bad estimates, and nested loop anti-patterns without hesitation. That is what we are building today.

EXPLAIN vs EXPLAIN ANALYZE: the difference

EXPLAIN shows the plan the planner intends to execute — estimated costs, estimated row counts, node types. It does not actually run the query. Use it when the query is too expensive to run just for diagnostics.

EXPLAIN ANALYZE actually executes the query and shows both the plan and the reality — actual rows returned, actual time spent per node. The gap between estimated and actual is where performance bugs hide.

sql
-- Basic usage
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- With actual execution stats
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

-- Full detail: buffers, WAL, settings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.amount, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > NOW() - INTERVAL '30 days';

Reading the output: top-down, inside-out

EXPLAIN output is a tree. The deepest (most indented) nodes run first. Results bubble up. You read it bottom-up for execution order, but you read it top-down for the overall plan structure. Here is a real example:

text
Hash Join  (cost=2841.00..9428.50 rows=15000 width=72)
            (actual time=45.231..180.447 rows=14823 loops=1)
  Hash Cond: (orders.user_id = users.id)
  ->  Seq Scan on orders  (cost=0.00..5500.00 rows=180000 width=40)
                           (actual time=0.018..55.432 rows=180000 loops=1)
        Filter: (created_at > (now() - '30 days'::interval))
        Rows Removed by Filter: 20000
  ->  Hash  (cost=1500.00..1500.00 rows=50000 width=32)
             (actual time=30.124..30.124 rows=50000 loops=1)
        Buckets: 65536  Batches: 1  Memory Usage: 3200kB
        ->  Seq Scan on users  (cost=0.00..1500.00 rows=50000 width=32)
                                (actual time=0.009..18.203 rows=50000 loops=1)

Planning Time: 1.234 ms
Execution Time: 182.891 ms

Key things to read here:

  • cost=start..total — start cost is how long before first row appears; total is full execution cost. Units are arbitrary planner cost units, not milliseconds.
  • rows=estimated vs actualrows=15000 estimated, rows=14823 actual. Close. When these diverge by 10× or more, the planner has bad statistics.
  • Seq Scan — full table scan. Not always bad — on a small table, a seq scan beats an index scan. On a 10M-row table with a 0.1% selectivity filter, it is a problem.
  • Hash Join vs Nested Loop vs Merge Join — each optimal in different scenarios. Nested Loop is good for small inner sets; Hash Join for large unsorted sets; Merge Join for pre-sorted large sets.

The four Node types you must know

Sequential scan (seq scan)

Reads every row in the table. Expected on small tables or when selectivity is low (fetching >10–20% of rows). A red flag when fetching a tiny fraction of a large table.

Index scan

Uses a B-tree index to find rows. Each matching row requires a heap fetch (the actual table row). Fast for high-selectivity queries. If you are fetching many rows, each heap fetch is a random I/O — can be slower than a seq scan past ~5–10% of the table.

Index only scan

Satisfied entirely from the index — no heap fetch needed. Only possible when all columns in the SELECT are in the index (covering index). The fastest scan type for read-heavy queries.

sql
-- This query can use an Index Only Scan if the index covers all columns
CREATE INDEX idx_orders_user_amount
    ON orders (user_id, amount)
    INCLUDE (created_at);  -- covering: no heap fetch needed

EXPLAIN SELECT user_id, amount, created_at
FROM orders
WHERE user_id = 42;
-- -> Index Only Scan using idx_orders_user_amount

Bitmap heap scan

Used when many rows match the index condition. Postgres first builds a bitmap of matching heap pages (Bitmap Index Scan), then fetches those pages in order (Bitmap Heap Scan). Reduces random I/O by sorting the page fetches. Common for medium-selectivity queries.

Diagnosing bad row estimates

The planner uses table statistics (pg_statistic) to estimate row counts. Stale stats lead to bad estimates, which lead to wrong join strategies and wrong index choices.

sql
-- Check when stats were last updated
SELECT schemaname, tablename, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE tablename = 'orders';

-- Force a stats update if stale
ANALYZE orders;

-- Increase stats target for a column with high cardinality
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

The default statistics target is 100. For columns used in WHERE clauses with many distinct values (like status or category_id), bumping to 500 gives the planner better histogram data and more accurate estimates.

Reading actual vs estimated: spotting the problem

text
-- Bad: estimates wildly off
Nested Loop  (cost=0.00..5.00 rows=1 width=40)
              (actual time=0.012..8450.234 rows=50000 loops=1)
  -- Planner thought 1 row, got 50,000
  -- Chose Nested Loop (good for small sets), terrible for 50k rows
  -- Should have used Hash Join instead

-- Root cause: missing statistics on a join column, or
-- skewed distribution (99% of rows have user_id = NULL)

When estimated rows are 1 and actual rows are 50,000, the planner chose Nested Loop (optimal for tiny sets) but got hammered with a large set. The fix: update statistics, add a partial index, or use query hints as a last resort (enable_nestloop = off at session level).

The BUFFERS option: finding I/O bottlenecks

sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;

-- Output includes:
-- Buffers: shared hit=245 read=1820
-- hit = read from shared_buffers (memory) — free
-- read = read from disk — expensive
-- High read count = your working set exceeds shared_buffers

If shared hit is low and read is high, you have two options: increase shared_buffers (RAM permitting) or reduce the data scanned (better indexes).

Practical workflow: optimising a slow query

sql
-- Step 1: Run EXPLAIN ANALYZE BUFFERS on the slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.amount, u.name, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- Step 2: Look for:
-- a) Seq Scans on large tables
-- b) Estimated rows << actual rows
-- c) High Buffers: read count
-- d) Nested Loop with large actual row counts

-- Step 3: Create targeted indexes
CREATE INDEX CONCURRENTLY idx_orders_status_created
    ON orders (status, created_at DESC)
    WHERE status = 'pending';   -- partial index: only pending rows

-- Step 4: Re-run EXPLAIN ANALYZE and verify the plan changed

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What is the difference between EXPLAIN and EXPLAIN ANALYZE? When would you use one over the other?
  2. You see Seq Scan on orders (rows=1000000) in a query that filters by user_id = 42. What could cause Postgres to choose a seq scan over an index scan?
  3. The planner estimates 5 rows but actually returns 80,000. What are the likely causes and how do you fix them?
  4. What is a Bitmap Heap Scan and when does Postgres prefer it over a plain Index Scan?
  5. What does Buffers: shared hit=10 read=5000 tell you about your database's memory configuration?

Next up — Part 03: Indexing Strategy. B-tree internals, composite index ordering, partial indexes, and when NOT to index.

← PREV
Staff Prep 01: Postgres Schema Design Under Constraints
← All Architecture Posts