Postgres Internals 104: Indexes
Most engineers know indexes exist. Fewer understand why Postgres ignores a perfectly valid index, why composite column order matters more than the index itself, or when a partial index beats a full one by an order of magnitude. This is Part 4 of the Postgres Internals series.
How an index works
An index is a separate data structure Postgres maintains alongside your table. The default type — a B-tree — stores sorted keys with pointers back to heap tuples (the actual rows).
-- Without an index: Postgres reads every page in the table
-- With a B-tree index: Postgres walks a sorted tree in O(log n)
-- [M]
-- / -- [D,G] [P,T]
-- / | / | -- rows... rows...
-- WHERE user_id = 5
-- → walk tree: M > 5? no → go left → D < 5 < G → found leaf → fetch row
-- One lookup instead of scanning 10 million rows.
Every insert, update, and delete on an indexed table also updates the index. That's the tradeoff — read speed costs write overhead and extra disk space. Index everything and your writes slow to a crawl. Index nothing and your reads scan the entire table every time.
Composite index column order
This is the most common indexing mistake. The order of columns in a composite index determines which queries benefit from it.
CREATE INDEX idx_user_created ON orders(user_id, created_at);
-- This index HELPS:
SELECT * FROM orders WHERE user_id = 5;
SELECT * FROM orders WHERE user_id = 5 AND created_at > '2024-01-01';
SELECT * FROM orders WHERE user_id = 5 ORDER BY created_at DESC;
-- This index does NOT help:
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- Postgres can't use the index because user_id (the first column) has no filter.
-- It's like looking up someone in a phone book sorted by first name
-- when you only know their last name.
The rule: put equality columns first, then range or sort columns.
For the query WHERE user_id = 5 AND created_at > '2024-01-01',
user_id is the equality filter and created_at is the range —
so (user_id, created_at) is correct. Reversing it gives you a
mostly-useless index.
Index types
-- B-tree (default): equality, range, ORDER BY, LIKE 'prefix%'
CREATE INDEX ON users(email);
CREATE INDEX ON orders(user_id, created_at);
-- Hash: equality only — rarely faster than B-tree, not WAL-logged pre-Postgres 10
CREATE INDEX ON sessions USING HASH (token);
-- GIN: arrays, JSONB, full-text search
CREATE INDEX ON products USING GIN (tags); -- WHERE 'electronics' = ANY(tags)
CREATE INDEX ON articles USING GIN (to_tsvector('english', body)); -- full-text
-- GiST: geometric types, ranges, nearest-neighbour
CREATE INDEX ON locations USING GIST (coordinates); -- WHERE ST_DWithin(...)
-- BRIN: huge tables with naturally ordered data (time-series, append-only logs)
-- Tiny index — stores min/max per block range instead of per row
CREATE INDEX ON events USING BRIN (created_at); -- works when rows are inserted in order
In practice: B-tree covers 95% of use cases. Add GIN when you query JSONB or arrays. Add BRIN for append-only time-series tables where you want a tiny index with minimal write overhead.
Partial indexes
A partial index only covers rows that match a condition. Smaller, faster to build, faster to scan.
-- Full index: covers all 10M users including 9.9M deleted ones
CREATE INDEX idx_users_email ON users(email);
-- Partial index: covers only the 100K active users you actually query
CREATE INDEX idx_active_users_email ON users(email)
WHERE deleted_at IS NULL;
-- Partial index for pending orders only (most queries filter on 'pending')
CREATE INDEX idx_pending_orders ON orders(created_at)
WHERE status = 'pending';
The partial index is 100x smaller if 99% of your rows are deleted or completed.
It fits in memory. It gets used. The full index on the same column might not fit
in work_mem and gets skipped in favour of a seq scan.
Covering indexes
An index-only scan returns data directly from the index without touching the heap (the actual table). For this to work, every column the query needs must be in the index.
-- Query: fetch status and total for a user's orders
SELECT status, total FROM orders WHERE user_id = 42;
-- Standard index: Postgres finds the row location from the index,
-- then goes back to the heap to fetch status and total
CREATE INDEX idx_orders_user ON orders(user_id);
-- Covering index: status and total are stored in the index itself
-- Postgres never touches the heap at all
CREATE INDEX idx_orders_user_covering ON orders(user_id) INCLUDE (status, total);
-- Verify: EXPLAIN output shows "Index Only Scan" instead of "Index Scan"
EXPLAIN ANALYZE SELECT status, total FROM orders WHERE user_id = 42;
INCLUDE columns are stored in the leaf nodes but not used for sorting or filtering.
This keeps the index tree smaller while still enabling index-only scans for common projections.
When Postgres ignores your index
You added the index. The query is still slow. EXPLAIN ANALYZE shows a Seq Scan.
Here's why:
-- 1. Too many rows match — seq scan is cheaper than random I/O
-- Index is useful when < ~5-10% of rows match
-- If 8M of 10M rows match created_at > '2024-01-01', Postgres skips the index
SELECT * FROM orders WHERE created_at > '2024-01-01'; -- 80% match → seq scan
-- 2. Wrong data type — implicit cast disables the index
-- Index is on integer user_id, but query passes a string
SELECT * FROM users WHERE user_id = '42'; -- cast: index not used
SELECT * FROM users WHERE user_id = 42; -- correct: index used
-- 3. Function call on the indexed column
SELECT * FROM users WHERE lower(email) = 'test@example.com'; -- index on email not used
-- Fix: create a functional index
CREATE INDEX ON users (lower(email));
-- 4. LIKE with leading wildcard
SELECT * FROM products WHERE name LIKE '%phone%'; -- B-tree can't help
-- Fix: use GIN with pg_trgm for substring search
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON products USING GIN (name gin_trgm_ops);
Diagnosing with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 20;
-- What to look for:
-- "Index Scan using idx_user_created" → index used ✓
-- "Seq Scan on orders" → full table scan, index missing or skipped
-- "Rows Removed by Filter: 480000" → index found rows but many didn't match — bad selectivity
-- "Actual Rows: 20 Loops: 1" → 20 rows returned in one pass ✓
-- "Actual time: 0.043..0.891 ms" → sub-millisecond ✓
-- "Buffers: shared hit=14" → all from cache, no disk reads ✓
The gap between estimated rows and actual rows tells you whether Postgres
has stale statistics. If estimates are wildly off, run ANALYZE orders
to update them. A planner making decisions on stale stats will pick the wrong plan every time.
Quiz
Q1. You have a table events(id, user_id, event_type, created_at)
with 50M rows. Your most common query is:
SELECT * FROM events
WHERE user_id = ? AND event_type = 'click'
ORDER BY created_at DESC LIMIT 20;
What index would you create? In what column order, and why?
💡 Show Answer
CREATE INDEX idx_events_user_type_created
ON events(user_id, event_type, created_at DESC);
Column order rationale:
- user_id first — equality filter, highest selectivity. Narrows to one user's events immediately.
- event_type second — equality filter on the already-narrowed set. Further reduces rows before the sort.
- created_at DESC last — range/sort column always goes last. The
DESCmatches theORDER BYso Postgres can return rows in order directly from the index without a sort step.
Result: Postgres does an Index Scan, reads exactly the rows for that user and event type in the right order, and stops at 20. No heap sort, no full table scan.
Q2. You create INDEX idx_orders_date ON orders(created_at)
but EXPLAIN ANALYZE still shows a Seq Scan for
WHERE created_at > '2024-01-01'. The table has 10M rows and 8M match.
Why isn't Postgres using your index?
💡 Show Answer
Because 80% of the table matches. Postgres's query planner compares the cost of an index scan vs a sequential scan. An index scan on 8M rows means 8M random I/O operations to fetch heap pages — each one a separate disk seek. A sequential scan reads pages in order, which is much faster per-row at that scale.
The planner correctly decides the index is slower here. Indexes are only beneficial when they're selective — typically when less than 5–10% of rows match. If you need to query large date ranges frequently, either partition by date (so the partition itself acts as the filter) or reconsider whether an index is the right tool at all.
Q3. Spot the mistake:
CREATE INDEX idx_users_search ON users(email, name, phone, city, country);
Your only queries are WHERE email = ? lookups. What's wrong with this index?
💡 Show Answer
The extra columns (name, phone, city, country) are useless for the query and make the index significantly larger — more disk, more memory, slower writes.
For WHERE email = ? lookups, only email needs to be in the index key:
CREATE INDEX idx_users_email ON users(email);
If you also need to return name without touching the heap, use INCLUDE rather than adding it to the key:
CREATE INDEX idx_users_email ON users(email) INCLUDE (name);
INCLUDE columns don't affect the tree structure or sort order — they're just carried in the leaf nodes for index-only scans. Key columns affect the B-tree structure and should only contain what you filter or sort on.
Q4. You have a messages table where 99% of rows have
deleted_at IS NOT NULL (soft-deleted). You only ever query non-deleted messages.
What index optimisation should you use?
💡 Show Answer
A partial index with a WHERE clause that matches your query pattern:
CREATE INDEX idx_messages_active ON messages(created_at)
WHERE deleted_at IS NULL;
A full index on created_at across all rows would be 100x larger than necessary — 99% of its entries are for deleted rows you never query. The partial index only indexes the 1% of live rows, fits in memory, and gets used reliably.
Important: the partial index is only used when the query's WHERE clause matches the index condition. SELECT * FROM messages WHERE deleted_at IS NULL AND created_at > now() - interval '7 days' will use it. SELECT * FROM messages WHERE created_at > now() - interval '7 days' will not — no guarantee from the query that deleted_at IS NULL.
Q5. What's the difference between these two indexes, and when would you prefer the second one?
CREATE INDEX idx_a ON orders(user_id, status);
CREATE INDEX idx_b ON orders(user_id, status) INCLUDE (total, created_at);
💡 Show Answer
idx_a stores only user_id and status in the B-tree. Queries that need total or created_at must fetch those columns from the heap (the actual table rows) — an extra I/O step called a heap fetch.
idx_b stores user_id and status in the tree, and carries total and created_at in the leaf nodes. Postgres can answer the query entirely from the index without touching the heap at all — an index-only scan.
-- This query benefits from idx_b — no heap access needed
SELECT total, created_at FROM orders WHERE user_id = 42 AND status = 'completed';
-- EXPLAIN shows: "Index Only Scan using idx_b"
Prefer idx_b when the same columns appear repeatedly in your SELECT list alongside the filter columns, and when avoiding heap fetches matters (high-traffic read paths, latency-sensitive APIs). The tradeoff: idx_b is slightly larger on disk because of the carried columns, and updates to total or created_at must update the index leaf nodes.
Part 4 done. Next up — Part 5: Scaling Reads. Read replicas, PgBouncer connection pooling, Redis caching — and the failure modes each layer introduces.