Postgres Internals 103: Schemas & Partitioning
Two tools that most engineers underuse until it's too late. Schemas are not just cosmetic folders — they're how you enforce boundaries in a shared database. Partitioning is not just a performance trick — it's how you keep a 500-million-row table from grinding your queries to a halt. This is Part 3 of the Postgres Internals series.
Schemas
A schema is a namespace inside a database. Think of it as a folder that holds tables, views, functions, and sequences — and controls who can access them.
-- Every Postgres database starts with a 'public' schema.
-- Most teams dump everything there. That's fine until it isn't.
-- A better structure for a SaaS product:
database: myapp
schema: public -- shared utilities, enums
schema: analytics -- aggregated views, reporting tables
schema: auth -- users, sessions, tokens, api_keys
schema: archive -- cold data, soft-deleted rows
Nothing in analytics leaks to a read-only reporting user unless you explicitly
grant it. Nothing in auth is visible to your analytics pipeline by accident.
The separation is enforced at the database level, not the application level.
Creating and using schemas
-- Create schemas
CREATE SCHEMA analytics;
CREATE SCHEMA auth;
-- Create a table inside a specific schema
CREATE TABLE analytics.page_views (
id BIGSERIAL PRIMARY KEY,
path TEXT NOT NULL,
user_id INT,
viewed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Without explicit qualification, Postgres uses search_path
-- Default: "$user", public (your username schema first, then public)
SET search_path TO analytics, public;
-- Now this resolves to analytics.page_views
SELECT * FROM page_views WHERE viewed_at > now() - interval '1 day';
-- Grant a reporting role access to the analytics schema only
GRANT USAGE ON SCHEMA analytics TO reporting_role;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO reporting_role;
-- Future tables in the schema are also covered
ALTER DEFAULT PRIVILEGES IN SCHEMA analytics
GRANT SELECT ON TABLES TO reporting_role;
GRANT USAGE ON SCHEMA lets the role see the schema exists.
GRANT SELECT ON TABLES lets it actually read. You need both —
schema access without table access, or table access without schema access, does nothing.
One schema per tenant?
It's a real pattern — give each customer their own schema so their data is physically isolated, permissions are trivial, and you can dump or delete one tenant cleanly. The problem is scale: 10,000 customers means 10,000 schemas. Migrations run 10,000 times. Connection pooling breaks. Schema switching in application code adds complexity.
The practical answer for most products: use a tenant_id column with Row Level Security.
Reserve schema-per-tenant for regulated industries (healthcare, finance) where data isolation
must be demonstrably hard rather than just policy-enforced.
Partitioning
Partitioning splits one logical table into multiple physical storage pieces — called partitions — based on the value of a column. From your application's perspective, it's still one table. Postgres routes reads and writes to the right partition automatically.
-- One logical table...
SELECT * FROM orders WHERE created_at > '2025-01-01';
-- ...but Postgres only scans this partition:
-- orders_2025 (physical file on disk)
-- Skips orders_2023, orders_2024 entirely. That's partition pruning.
Range partitioning — the most common case
-- 1. Create the parent table (no data stored here)
CREATE TABLE orders (
id BIGSERIAL NOT NULL,
user_id INT NOT NULL,
total NUMERIC(10,2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
-- 2. Create child partitions
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
-- 3. Indexes go on the partitions (or the parent — Postgres 11+ propagates automatically)
CREATE INDEX ON orders (user_id, created_at);
-- 4. Inserts route automatically — no application change needed
INSERT INTO orders (user_id, total, created_at)
VALUES (42, 299.99, '2025-03-15'); -- lands in orders_2025
-- 5. Confirm pruning is happening
EXPLAIN SELECT * FROM orders WHERE created_at > '2025-01-01';
-- Look for: "Partitions selected: 1 of 2"
List partitioning — for known enum values
CREATE TABLE payments (
id BIGSERIAL NOT NULL,
amount NUMERIC(10,2) NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY LIST (status);
CREATE TABLE payments_pending PARTITION OF payments FOR VALUES IN ('pending');
CREATE TABLE payments_completed PARTITION OF payments FOR VALUES IN ('completed');
CREATE TABLE payments_failed PARTITION OF payments FOR VALUES IN ('failed', 'refunded');
Use LIST when your partition column has a small, fixed set of values.
Don't use it when those values are skewed — 95% of payments end up completed,
so you'd have one giant partition and the whole point collapses.
Hash partitioning — for even distribution
-- When there's no natural range and no enum — spread evenly by hash
CREATE TABLE events (
id BIGSERIAL NOT NULL,
user_id INT NOT NULL,
event_type TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY HASH (user_id);
CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);
Hash partitioning gives you even distribution but no pruning benefit for range queries. It helps when the goal is parallelism and smaller per-partition indexes, not date filtering.
Detaching and archiving old partitions
This is one of the best arguments for partitioning. Deleting two years of old data from a 500M-row unpartitioned table is a multi-hour operation that causes massive VACUUM pressure. With partitioning, it's instant and clean:
-- Detach the partition (it becomes a standalone table, still queryable)
ALTER TABLE orders DETACH PARTITION orders_2023;
-- Archive: dump it, ship it to cold storage, then drop
pg_dump -t orders_2023 myapp > orders_2023_archive.sql
-- Drop it — instant, no bloat, no VACUUM needed
DROP TABLE orders_2023;
Always detach before dropping — it gives you a window to verify the archive
before the data is gone. DETACH is safe under load; the partition just stops
receiving new writes and disappears from queries on the parent table.
What you need to know before partitioning
Don't partition reflexively. It adds complexity — more partitions to manage, more constraints to keep consistent, migration scripts that need to loop over partitions. Partition when:
- The table is already large and growing predictably (tens of millions of rows minimum)
- Most queries filter on the partition key — otherwise you get no pruning benefit
- You need to archive or drop old data regularly
- You want to run
VACUUMandANALYZEon smaller chunks independently
If your table has 2M rows and you're partitioning because it "might grow" — don't.
A good index on an unpartitioned table will outperform a poorly designed partitioned one every time.
Check EXPLAIN ANALYZE first. Partition when the evidence says you need to.
Quiz
Q1. You're building a SaaS platform with 200 business clients.
You're deciding between schema-per-tenant vs a tenant_id column on every table.
What are the tradeoffs? When would you pick each approach?
💡 Show Answer
Schema-per-tenant:
- Strong physical isolation — each tenant's data is in separate tables on disk
- Permissions are trivial:
GRANT USAGE ON SCHEMA tenant_42 TO ... - Archiving or deleting a tenant is a single
DROP SCHEMA tenant_42 CASCADE - Problems: migrations must run N times (once per schema). Connection pooling struggles with schema switching. At 10,000+ tenants, Postgres catalog bloat becomes real.
tenant_id column + Row Level Security:
- One set of tables, one migration, no schema switching overhead
- Scales to millions of tenants without catalog issues
- Isolation enforced via RLS policies at the Postgres level — not just application code
- Problem: isolation is policy-based, not physical. Regulated industries (healthcare, finance) often require demonstrable physical separation.
Pick schema-per-tenant when you have a small number of high-value enterprise clients, strict regulatory isolation requirements, or need per-tenant customisation (different columns, different constraints). Pick tenant_id + RLS for everything else — it scales further with far less operational cost.
Q2. Your events table has 2 billion rows partitioned monthly by created_at.
A developer runs:
SELECT * FROM events WHERE user_id = 42;
No created_at filter. What happens? How many partitions does Postgres scan?
💡 Show Answer
Postgres scans all partitions. Without a filter on the partition key (created_at), partition pruning cannot kick in — the planner has no way to know which partitions contain rows where user_id = 42.
If you have 24 monthly partitions, this is 24 sequential scans happening in parallel. It might still be fast thanks to parallelism and indexes on user_id, but you get zero pruning benefit.
The lesson: partition pruning only works when the query filters on the partition key. If most of your queries filter by user_id and not by date, partitioning by date hurts more than it helps — consider a composite index on (user_id, created_at) on an unpartitioned table instead, or partition by hash on user_id.
Q3. You want to partition a payments table by status
(pending, completed, failed, refunded).
Which strategy — RANGE, LIST, or HASH — and why?
💡 Show Answer
LIST — because status is a small, fixed set of known values, not a numeric range and not something you want distributed randomly.
CREATE TABLE payments_pending PARTITION OF payments FOR VALUES IN ('pending');
CREATE TABLE payments_completed PARTITION OF payments FOR VALUES IN ('completed');
CREATE TABLE payments_failed PARTITION OF payments FOR VALUES IN ('failed', 'refunded');
But there's a gotcha: if 95% of payments end up completed, you have one giant partition and three tiny ones. You've added partitioning complexity with almost no benefit. Before committing, check the actual distribution of values. If it's heavily skewed, a simple index on status on an unpartitioned table is probably better.
Q4. A teammate says: "Let's just partition everything by date for performance." What's the correct response?
💡 Show Answer
Push back. Partitioning is not a default performance tool — it's a specific solution for specific problems. The right question before partitioning is:
- Is the table already large enough that scans are actually slow? (Check
EXPLAIN ANALYZE) - Do most queries filter on the date column? If not, pruning won't help
- Do you need to archive or drop old data regularly?
- Are you already using good indexes on the right columns?
Partitioning adds real operational overhead: migrations run per-partition, constraints must be kept consistent, and a query that doesn't filter on the partition key now scans every partition. A composite index on an unpartitioned table often outperforms poorly-planned partitioning at 10x less complexity. Partition when the evidence says you need to — not as a pre-emptive move.
Q5. Walk through the safest way to archive and remove the orders_2023 partition
without downtime.
💡 Show Answer
-- Step 1: Detach the partition
-- It becomes a standalone table — still queryable, no longer part of the parent.
-- Safe under load: new writes to 'orders' won't touch it.
ALTER TABLE orders DETACH PARTITION orders_2023;
-- Step 2: Verify the archive (query it, count rows, sanity check)
SELECT count(*) FROM orders_2023;
SELECT min(created_at), max(created_at) FROM orders_2023;
-- Step 3: Dump to cold storage before dropping
-- (run from shell, not SQL)
-- pg_dump -t orders_2023 myapp > orders_2023_archive.sql
-- Step 4: Drop — instant, no VACUUM needed, no bloat
DROP TABLE orders_2023;
The key is detach before drop. Detaching gives you a safety window — the table is off the parent but still exists. If you realise you forgot to archive it, you can re-attach with ALTER TABLE orders ATTACH PARTITION orders_2023 FOR VALUES FROM (...) TO (...). Once you DROP, it's gone.
Part 3 done. Next up — Part 4: Indexes. B-trees, composite index ordering, partial indexes, covering indexes, and when Postgres ignores your index entirely.