Staff Prep 22: Postgres Partitioning & Sharding — When and How
ArchitectureStaff

Staff Prep 22: Postgres Partitioning & Sharding — When and How

April 4, 20269 min readPART 18 / 19

Back to Part 21: MVCC. Partitioning is a single-server strategy that manages very large tables by splitting them into smaller, more manageable pieces. Sharding is a multi-server strategy that distributes data across multiple Postgres instances. Understanding when to use each — and the operational cost of each — is a core Staff-level architectural skill.

Declarative partitioning: the three strategies

sql
-- RANGE partitioning: for time-series data
CREATE TABLE events (
    id          BIGSERIAL,
    user_id     BIGINT NOT NULL,
    type        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    payload     JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

-- LIST partitioning: for categorical data
CREATE TABLE orders (
    id      BIGSERIAL,
    region  TEXT NOT NULL,
    amount  NUMERIC(12,2)
) PARTITION BY LIST (region);

CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US', 'CA');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE orders_apac PARTITION OF orders FOR VALUES IN ('AU', 'SG', 'JP');

-- HASH partitioning: for uniform distribution by key
CREATE TABLE user_sessions (
    id      BIGSERIAL,
    user_id BIGINT NOT NULL,
    data    JSONB
) PARTITION BY HASH (user_id);

CREATE TABLE user_sessions_0 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_sessions_1 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_sessions_2 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_sessions_3 PARTITION OF user_sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Partition pruning: why the query planner needs to see the partition key

sql
-- PRUNING: Postgres skips partitions that cannot possibly match the query

-- Query with partition key in WHERE clause: pruned (fast)
EXPLAIN SELECT * FROM events
WHERE created_at BETWEEN '2026-04-01' AND '2026-04-30';
-- -> Seq Scan on events_2026_q2 only (1 partition scanned, 3 skipped)

-- Query WITHOUT partition key: no pruning (all partitions scanned)
EXPLAIN SELECT * FROM events
WHERE user_id = 42;
-- -> Append (4 partitions) -> Seq Scan on events_2026_q1
--                           -> Seq Scan on events_2026_q2
--                           ... all partitions scanned

-- Fix: add user_id to the partition scheme or add an index
CREATE INDEX idx_events_user_id ON events (user_id, created_at)
    -- Creates index on each partition automatically in PG 11+

-- Enable pruning at runtime (default: on)
SET enable_partition_pruning = ON;

Automating partition management with pg_partman

sql
-- pg_partman: extension for automated time-based partition maintenance
-- Creates future partitions automatically, drops old ones on schedule

-- Setup pg_partman
CREATE EXTENSION pg_partman;

SELECT partman.create_parent(
    p_parent_table => 'public.events',
    p_control => 'created_at',
    p_type => 'native',
    p_interval => 'monthly',
    p_premake => 3  -- pre-create 3 future partitions
);

-- Maintenance function (run via pg_cron or crontab)
SELECT partman.run_maintenance('public.events');

-- Retention: auto-drop partitions older than 12 months
UPDATE partman.part_config
SET retention = '12 months', retention_keep_table = false
WHERE parent_table = 'public.events';

-- Dropping a partition is O(1): DROP TABLE events_2024_01
-- vs DELETE ... WHERE created_at < '2024-02-01' which takes minutes and leaves dead tuples

When to actually shard

Sharding is operationally expensive. You lose JOINs across shards, lose global transactions, and add application complexity to determine which shard to query. Most teams shard too early. The correct order of operations before sharding:

  1. Vertical scaling (bigger instance)
  2. Read replicas (offload reads)
  3. Connection pooling (PgBouncer)
  4. Caching (Redis)
  5. Partitioning (manage table size)
  6. ONLY THEN: shard, if writes are still the bottleneck
python
import hashlib

# Application-level sharding: route writes based on tenant/user ID
SHARD_CONNECTIONS = {
    0: "postgresql://shard-0:5432/mydb",
    1: "postgresql://shard-1:5432/mydb",
    2: "postgresql://shard-2:5432/mydb",
    3: "postgresql://shard-3:5432/mydb",
}

def get_shard(tenant_id: int, num_shards: int = 4) -> int:
    return tenant_id % num_shards

def get_db_for_tenant(tenant_id: int):
    shard = get_shard(tenant_id)
    return engines[shard]

# Problem: adding a shard requires resharding existing data
# Consistent hashing reduces the amount of data moved when resharding

def get_shard_consistent_hash(tenant_id: int, num_shards: int = 4) -> int:
    hash_val = int(hashlib.md5(str(tenant_id).encode()).hexdigest(), 16)
    return hash_val % num_shards

Citus: distributed Postgres

sql
-- Citus: Postgres extension that adds transparent horizontal sharding
-- Works as a coordinator node routing queries to worker nodes

-- Distribute a table across worker nodes (sharded by tenant_id)
SELECT create_distributed_table('orders', 'tenant_id');

-- Citus automatically routes queries to the right shard:
SELECT * FROM orders WHERE tenant_id = 42;
-- -> Routes to the worker holding tenant 42's data

-- Co-location: tables sharded by the same column are co-located
-- JOINs between co-located tables happen locally, no cross-worker joins
SELECT create_distributed_table('order_items', 'tenant_id', colocate_with => 'orders');

-- Now this JOIN is efficient (same shard for both tables)
SELECT o.id, oi.product_id
FROM orders o JOIN order_items oi ON oi.order_id = o.id
WHERE o.tenant_id = 42;

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. You have a 500M-row events table. Queries filter by created_at. How does range partitioning by month help? What must you include in every query for partition pruning to work?
  2. Dropping old data from a partitioned table vs deleting with WHERE: what is the performance difference and why?
  3. What breaks when you shard a database? Name three capabilities you lose and how you typically work around them.
  4. Your e-commerce app has 10,000 tenants. You shard by tenant_id with 4 shards. Tenant 1 has 80% of all data. What problem does this create? How does consistent hashing help?
  5. List the steps you would take before recommending sharding to your team. At what point would you actually recommend it?

Next up — Part 23: VACUUM & Table Bloat. Dead tuples, autovacuum tuning, and how to reclaim space without a maintenance window.

← PREV
Staff Prep 21: Postgres MVCC — Multi-Version Concurrency Under the Hood
← All Architecture Posts