Staff Prep 03: Postgres Indexing Strategy — What, When, and in What Order
Back to Part 02: EXPLAIN ANALYZE. Today: indexes. Specifically, the decisions that separate a schema that performs from one that collapses under load. The mechanics behind B-trees, the rule for composite index column ordering, partial indexes, and covering indexes with INCLUDE — the most underused Postgres feature in production systems.
B-tree internals: what you actually need to know
Postgres's default index type is a B-tree (balanced tree). Every index entry is stored in sorted order.
A query like WHERE user_id = 42 does a binary search from the root — O(log n) — instead
of a full table scan O(n). For a table with 10 million rows, that is the difference between 23 page
reads and 10 million.
The critical property: B-tree indexes support equality (=), range (<,
>, BETWEEN), and prefix matches (LIKE 'abc%').
They do NOT support: suffix matches (LIKE '%abc'), arbitrary expressions
(unless you index the expression), or case-insensitive searches (unless you index
LOWER(column)).
-- B-tree supports these efficiently
SELECT * FROM users WHERE email = 'user@example.com'; -- equality
SELECT * FROM orders WHERE created_at > '2026-01-01'; -- range
SELECT * FROM products WHERE name LIKE 'Apple%'; -- prefix
-- B-tree does NOT help with these
SELECT * FROM products WHERE name LIKE '%phone'; -- suffix match
SELECT * FROM users WHERE LOWER(email) = 'user@example.com'; -- without expression index
-- Fix: expression index for case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'user@example.com'; -- now uses index
Composite index column ordering: the critical rule
This is the most common indexing mistake. In a composite index (a, b, c),
Postgres can use the index for:
- Queries on
aalone - Queries on
a, b - Queries on
a, b, c - Not queries on
balone - Not queries on
b, c
The rule: equality columns first, then range columns, then sort columns.
-- Query pattern: filter by status (equality), then range on date, then sort
SELECT * FROM orders
WHERE status = 'pending' -- equality
AND created_at > '2026-01-01' -- range
ORDER BY created_at DESC -- sort
LIMIT 20;
-- Wrong index order (range before sort)
CREATE INDEX idx_wrong ON orders (created_at, status);
-- This index cannot be used for the ORDER BY without a sort step
-- Correct index order
CREATE INDEX idx_orders_status_date
ON orders (status, created_at DESC);
-- Equality (status) first, then range+sort (created_at)
-- Postgres can scan this index in order and avoid a Sort node
EXPLAIN SELECT * FROM orders
WHERE status = 'pending' AND created_at > '2026-01-01'
ORDER BY created_at DESC LIMIT 20;
-- -> Index Scan using idx_orders_status_date (no Sort node!)
Partial indexes: smaller, faster, targeted
A partial index only indexes rows that match a WHERE clause. If only 2% of your rows are
status = 'pending', a partial index on pending orders is 50× smaller than a full
index on status. Smaller index = fits in RAM = faster lookups.
-- Full index: indexes ALL 10M rows
CREATE INDEX idx_orders_status_full ON orders (status, created_at);
-- Partial index: indexes only the 200k pending rows
CREATE INDEX idx_orders_pending ON orders (created_at DESC)
WHERE status = 'pending';
-- Query that benefits from partial index
SELECT * FROM orders
WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;
-- Uses idx_orders_pending — 50x smaller, faster scan, fits in shared_buffers
-- Another common use: non-null partial index
CREATE INDEX idx_users_unverified_email ON users (email)
WHERE verified_at IS NULL;
-- Only indexes users who haven't verified — a small subset
Covering indexes with INCLUDE
A covering index stores extra columns in the index leaf pages without including them in the sort key. The benefit: Index Only Scans — no heap fetch needed at all. This is the difference between a query that touches only the index (RAM) vs one that also touches the table (random disk I/O).
-- Without INCLUDE: index scan + heap fetch for every row
CREATE INDEX idx_orders_user ON orders (user_id);
SELECT user_id, amount, created_at FROM orders WHERE user_id = 42;
-- -> Index Scan using idx_orders_user
-- -> Heap fetches for every row (to get amount, created_at)
-- With INCLUDE: index only scan, no heap fetch
CREATE INDEX idx_orders_user_covering
ON orders (user_id)
INCLUDE (amount, created_at);
SELECT user_id, amount, created_at FROM orders WHERE user_id = 42;
-- -> Index Only Scan using idx_orders_user_covering
-- -> Zero heap fetches (amount and created_at are in the index)
-- Check visibility map for Index Only Scan efficiency
VACUUM orders; -- updates visibility map so index only scan is fully effective
When NOT to index
Every index you add has a write cost. INSERT, UPDATE, and DELETE must maintain all indexes on the table. On a write-heavy table with 10 indexes, each write touches 10 index pages. This is why "just add an index" is not a free optimisation.
Indexes are generally not worth it when:
- The column has very low cardinality (e.g., a boolean
is_deletedon a table where 99% of rows haveis_deleted = false) — the planner may prefer a seq scan anyway. - The table is tiny (<1000 rows) — sequential scan fits in one I/O.
- The column is rarely queried but written to frequently.
- You already have a composite index where this column is the leftmost prefix — a separate single-column index is redundant.
-- Find unused indexes (from pg_stat_user_indexes)
SELECT
schemaname,
tablename,
indexname,
idx_scan, -- number of times this index was used
idx_tup_read,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Drop indexes with 0 scans (after confirming they are not used elsewhere)
-- Note: reset stats after major traffic pattern changes with pg_stat_reset()
Building indexes without downtime
-- CONCURRENTLY: builds index without locking writes
-- Takes longer but does not block INSERT/UPDATE/DELETE
CREATE INDEX CONCURRENTLY idx_orders_user_covering
ON orders (user_id)
INCLUDE (amount, created_at);
-- Note: CONCURRENTLY cannot be run inside a transaction block
-- If it fails, it leaves an INVALID index — check and clean up:
SELECT indexname, pg_get_indexdef(indexrelid)
FROM pg_indexes
WHERE tablename = 'orders';
-- Drop and retry if invalid
DROP INDEX CONCURRENTLY idx_orders_user_covering;
Quiz: test your understanding
Before moving on, answer these in your head (or out loud):
- You have a composite index on
(tenant_id, status, created_at). Which of these queries can use it:WHERE tenant_id = 5/WHERE status = 'active'/WHERE tenant_id = 5 AND created_at > '2026-01-01'/WHERE tenant_id = 5 AND status = 'active'? - What is the difference between a covering index and a partial index? Give a use case where you would combine both.
- Why does a B-tree index on
LIKE '%phone'not help? What index type would you use for full-text substring search instead? - You find an index with
idx_scan = 0in production. Walk through your decision process for whether to drop it. - What is the risk of
CREATE INDEXvsCREATE INDEX CONCURRENTLYon a production table? What is the downside of CONCURRENTLY?
Next up — Part 04: Scaling Reads. Read replicas, PgBouncer modes, and the connection pooling math that most engineers get wrong.