PostgresFoundation

Postgres Internals 103: Schemas & Partitioning

March 23, 202611 min readPART 07 / 19

Two tools most engineers underuse until it's too late. Schemas are how you enforce boundaries in a shared database, not cosmetic folders. Partitioning is how you keep a 500-million-row table from grinding your queries to a halt. This is Part 3 of the Postgres Internals series.

Schemas

A schema is a namespace inside a database. Think of it as a folder that holds tables, views, functions, and sequences, and controls who can access them.

-- Every Postgres database starts with a 'public' schema.
-- Most teams dump everything there. That's fine until it isn't.

-- A better structure for a SaaS product:
database: myapp
  schema: public      -- shared utilities, enums
  schema: analytics   -- aggregated views, reporting tables
  schema: auth        -- users, sessions, tokens, api_keys
  schema: archive     -- cold data, soft-deleted rows

Nothing in analytics leaks to a read-only reporting user unless you explicitly grant it. Nothing in auth is visible to your analytics pipeline by accident. The separation is enforced at the database level, not the application level.

Most teams I've worked on dump everything in public for the first two years and regret it on year three.

Creating and using schemas

-- Create schemas
CREATE SCHEMA analytics;
CREATE SCHEMA auth;

-- Create a table inside a specific schema
CREATE TABLE analytics.page_views (
    id          BIGSERIAL PRIMARY KEY,
    path        TEXT        NOT NULL,
    user_id     INT,
    viewed_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Without explicit qualification, Postgres uses search_path
-- Default: "$user", public  (your username schema first, then public)
SET search_path TO analytics, public;

-- Now this resolves to analytics.page_views
SELECT * FROM page_views WHERE viewed_at > now() - interval '1 day';

-- Grant a reporting role access to the analytics schema only
GRANT USAGE ON SCHEMA analytics TO reporting_role;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO reporting_role;

-- Future tables in the schema are also covered
ALTER DEFAULT PRIVILEGES IN SCHEMA analytics
    GRANT SELECT ON TABLES TO reporting_role;

GRANT USAGE ON SCHEMA lets the role see the schema exists. GRANT SELECT ON TABLES lets it actually read. You need both. Schema access without table access (or the other way round) does nothing.

One schema per tenant?

It's a real pattern. Give each customer their own schema so their data is physically isolated, permissions are trivial, and you can dump or delete one tenant cleanly. The problem is scale. 10,000 customers means 10,000 schemas. Migrations run 10,000 times, connection pooling breaks, and schema switching in application code adds its own complexity.

The practical answer for most products is a tenant_id column with Row Level Security. Reserve schema-per-tenant for regulated industries (healthcare, finance) where data isolation must be demonstrably hard rather than just policy-enforced.

Partitioning

Partitioning splits one logical table into multiple physical storage pieces (partitions) based on the value of a column. From your application's perspective, it's still one table. Postgres routes reads and writes to the right partition automatically.

-- One logical table...
SELECT * FROM orders WHERE created_at > '2025-01-01';

-- ...but Postgres only scans this partition:
-- orders_2025 (physical file on disk)
-- Skips orders_2023, orders_2024 entirely. That's partition pruning.

Range partitioning — the most common case

-- 1. Create the parent table (no data stored here)
CREATE TABLE orders (
    id          BIGSERIAL     NOT NULL,
    user_id     INT           NOT NULL,
    total       NUMERIC(10,2) NOT NULL,
    status      TEXT          NOT NULL DEFAULT 'pending',
    created_at  TIMESTAMPTZ   NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);

-- 2. Create child partitions
CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- 3. Indexes go on the partitions (or the parent — Postgres 11+ propagates automatically)
CREATE INDEX ON orders (user_id, created_at);

-- 4. Inserts route automatically — no application change needed
INSERT INTO orders (user_id, total, created_at)
VALUES (42, 299.99, '2025-03-15');  -- lands in orders_2025

-- 5. Confirm pruning is happening
EXPLAIN SELECT * FROM orders WHERE created_at > '2025-01-01';
-- Look for: "Partitions selected: 1 of 2"

List partitioning — for known enum values

CREATE TABLE payments (
    id          BIGSERIAL     NOT NULL,
    amount      NUMERIC(10,2) NOT NULL,
    status      TEXT          NOT NULL,
    created_at  TIMESTAMPTZ   NOT NULL DEFAULT now()
) PARTITION BY LIST (status);

CREATE TABLE payments_pending   PARTITION OF payments FOR VALUES IN ('pending');
CREATE TABLE payments_completed PARTITION OF payments FOR VALUES IN ('completed');
CREATE TABLE payments_failed    PARTITION OF payments FOR VALUES IN ('failed', 'refunded');

Use LIST when your partition column has a small, fixed set of values. Don't use it when those values are skewed. 95% of payments end up completed, which gives you one giant partition and the whole point collapses.

Hash partitioning — for even distribution

-- When there's no natural range and no enum, spread evenly by hash
CREATE TABLE events (
    id          BIGSERIAL NOT NULL,
    user_id     INT       NOT NULL,
    event_type  TEXT      NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY HASH (user_id);

CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Hash partitioning gives you even distribution but no pruning benefit for range queries. It helps when the goal is parallelism and smaller per-partition indexes, not date filtering. (I almost never reach for it in practice.)

Detaching and archiving old partitions

This is one of the best arguments for partitioning. Deleting two years of old data from a 500M-row unpartitioned table is a multi-hour operation that causes massive VACUUM pressure. With partitioning, it's instant and clean:

-- Detach the partition (it becomes a standalone table, still queryable)
ALTER TABLE orders DETACH PARTITION orders_2023;

-- Archive: dump it, ship it to cold storage, then drop
pg_dump -t orders_2023 myapp > orders_2023_archive.sql

-- Drop it — instant, no bloat, no VACUUM needed
DROP TABLE orders_2023;

Always detach before dropping. It gives you a window to verify the archive before the data is gone. DETACH is safe under load. The partition just stops receiving new writes and disappears from queries on the parent table.

What you need to know before partitioning

Don't partition reflexively. It adds complexity: more partitions to manage, more constraints to keep consistent, migration scripts that need to loop over partitions. Partition when:

The table is already large and growing predictably (tens of millions of rows minimum).
Most queries filter on the partition key. Otherwise you get no pruning benefit.
You need to archive or drop old data regularly.
You want to run VACUUM and ANALYZE on smaller chunks independently.

If your table has 2M rows and you're partitioning because it "might grow," don't. A good index on an unpartitioned table will outperform a poorly designed partitioned one every time. Check EXPLAIN ANALYZE first. Partition when the evidence says you need to.

Quiz

Q1. You're building a SaaS platform with 200 business clients. You're deciding between schema-per-tenant vs a tenant_id column on every table. What are the tradeoffs? When would you pick each approach?

💡 Show Answer

Schema-per-tenant:

Strong physical isolation. Each tenant's data is in separate tables on disk.
Permissions are trivial: GRANT USAGE ON SCHEMA tenant_42 TO ...
Archiving or deleting a tenant is a single DROP SCHEMA tenant_42 CASCADE.
Problems: migrations must run N times (once per schema). Connection pooling struggles with schema switching. At 10,000+ tenants, Postgres catalog bloat becomes real.

tenant_id column + Row Level Security:

One set of tables, one migration, no schema switching overhead.
Scales to millions of tenants without catalog issues.
Isolation enforced via RLS policies at the Postgres level, not just application code.
Problem: isolation is policy-based, not physical. Regulated industries often require demonstrable physical separation.

Pick schema-per-tenant when you have a small number of high-value enterprise clients, strict regulatory isolation requirements, or need per-tenant customisation. Pick tenant_id + RLS for everything else. It scales further with far less operational cost.

Q2. Your events table has 2 billion rows partitioned monthly by created_at. A developer runs:

SELECT * FROM events WHERE user_id = 42;

No created_at filter. What happens? How many partitions does Postgres scan?

💡 Show Answer

Postgres scans all partitions. Without a filter on the partition key (created_at), partition pruning cannot kick in. The planner has no way to know which partitions contain rows where user_id = 42.

If you have 24 monthly partitions, this is 24 sequential scans happening in parallel. It might still be fast thanks to parallelism and indexes on user_id, but you get zero pruning benefit.

The lesson: partition pruning only works when the query filters on the partition key. If most of your queries filter by user_id and not by date, partitioning by date hurts more than it helps. Consider a composite index on (user_id, created_at) on an unpartitioned table instead, or partition by hash on user_id.

Q3. You want to partition a payments table by status (pending, completed, failed, refunded). Which strategy, RANGE, LIST, or HASH, and why?

💡 Show Answer

LIST, because status is a small, fixed set of known values. It's not a numeric range and it's not something you want distributed randomly.

CREATE TABLE payments_pending   PARTITION OF payments FOR VALUES IN ('pending');
CREATE TABLE payments_completed PARTITION OF payments FOR VALUES IN ('completed');
CREATE TABLE payments_failed    PARTITION OF payments FOR VALUES IN ('failed', 'refunded');

But there's a gotcha: if 95% of payments end up completed, you have one giant partition and three tiny ones. You've added partitioning complexity with almost no benefit. Before committing, check the actual distribution of values. If it's heavily skewed, a simple index on status on an unpartitioned table is probably better.

Q4. A teammate says: "Let's just partition everything by date for performance." What's the correct response?

💡 Show Answer

Push back. Partitioning is a specific solution for specific problems, not a default performance tool. The right questions to ask first:

Is the table already large enough that scans are actually slow? (Check EXPLAIN ANALYZE.)
Do most queries filter on the date column? If not, pruning won't help.
Do you need to archive or drop old data regularly?
Are you already using good indexes on the right columns?

Partitioning adds real operational overhead: migrations run per-partition, constraints must be kept consistent, and a query that doesn't filter on the partition key now scans every partition. A composite index on an unpartitioned table often outperforms poorly-planned partitioning at 10x less complexity. Partition when the evidence says you need to, not as a pre-emptive move.

Q5. Walk through the safest way to archive and remove the orders_2023 partition without downtime.

💡 Show Answer

-- Step 1: Detach the partition
-- It becomes a standalone table — still queryable, no longer part of the parent.
-- Safe under load: new writes to 'orders' won't touch it.
ALTER TABLE orders DETACH PARTITION orders_2023;

-- Step 2: Verify the archive (query it, count rows, sanity check)
SELECT count(*) FROM orders_2023;
SELECT min(created_at), max(created_at) FROM orders_2023;

-- Step 3: Dump to cold storage before dropping
-- (run from shell, not SQL)
-- pg_dump -t orders_2023 myapp > orders_2023_archive.sql

-- Step 4: Drop — instant, no VACUUM needed, no bloat
DROP TABLE orders_2023;

The key is detach before drop. Detaching gives you a safety window where the table is off the parent but still exists. If you realise you forgot to archive it, re-attach with ALTER TABLE orders ATTACH PARTITION orders_2023 FOR VALUES FROM (...) TO (...). Once you DROP, it's gone.

Part 3 done. Next up: Part 4: Indexes. B-trees, composite index ordering, partial indexes, covering indexes, and when Postgres ignores your index entirely.

← PREV

Staff Prep 04: Scaling Postgres Reads, Replicas, Pooling & Caching