Staff Prep 01: Postgres Schema Design Under Constraints
ArchitectureStaff

Staff Prep 01: Postgres Schema Design Under Constraints

April 4, 20269 min readPART 01 / 19

This is Day 1 of the 28-day Staff Engineer prep series. We start with Postgres schema design — not because it is glamorous, but because every system design interview, every architectural review, every production incident eventually traces back to the data model. If your schema is wrong, everything built on top of it is fighting an uphill battle.

The staff engineer mindset on schema design

Junior engineers ask: "Will this store the data?" Senior engineers ask: "Will this query be fast?" Staff engineers ask: "What will this schema look like in 18 months when requirements change, and can we migrate it without a maintenance window?" That third question changes everything about how you design.

The core tension in schema design is between normalisation (avoid duplication, enforce consistency) and denormalisation (accept duplication, gain query speed). Neither extreme is correct. The right schema is the one that makes your actual queries fast while staying maintainable.

Normalisation in practice

Third Normal Form (3NF) is the practical target for transactional systems. Every non-key column depends on the primary key, the whole primary key, and nothing but the primary key. What this means concretely:

sql
-- Bad: city_name depends on zip_code, not on user_id
CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    zip_code    TEXT,
    city_name   TEXT   -- violation: transitive dependency on zip_code
);

-- Good: separate the zip-to-city mapping
CREATE TABLE zip_codes (
    zip_code    TEXT PRIMARY KEY,
    city_name   TEXT NOT NULL,
    state_code  CHAR(2) NOT NULL
);

CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    zip_code    TEXT REFERENCES zip_codes(zip_code)
);

The fixed version means updating a city name requires touching one row in zip_codes, not scanning the entire users table. That is the practical payoff of normalisation.

When to denormalise intentionally

Denormalisation is not a mistake — it is a deliberate trade-off. You accept write complexity (keeping duplicated data in sync) in exchange for read performance (no joins at query time). The cases where it is justified:

  • Reporting and analytics: A total_orders counter on a user row avoids a COUNT(*) over the orders table on every profile load.
  • Denormalised audit fields: Storing created_by_name alongside created_by_id means the audit log survives user deletion without orphaned lookups.
  • Read-optimised materialised views: Pre-aggregated summaries that are refreshed asynchronously, so OLAP queries do not hit the OLTP tables.
sql
-- Denormalised counter with trigger maintenance
ALTER TABLE users ADD COLUMN order_count INT NOT NULL DEFAULT 0;

CREATE OR REPLACE FUNCTION increment_user_order_count()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE users SET order_count = order_count + 1
    WHERE id = NEW.user_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_order_count
AFTER INSERT ON orders
FOR EACH ROW EXECUTE FUNCTION increment_user_order_count();

The trigger keeps order_count consistent automatically. The trade-off: every INSERT into orders now takes a row lock on the corresponding user row. At high insert rates on the same user, this becomes a bottleneck. Know your write patterns before denormalising counters.

Data types: precision matters

Wrong data types are silent performance killers. The common mistakes at the Staff level:

sql
-- Wrong types (common mistakes)
id          VARCHAR(36)    -- UUIDs as varchar: slower comparisons, bloated indexes
price       FLOAT          -- floating point for money: rounding errors will hurt you
status      VARCHAR(20)    -- free-text status: no enforcement, inconsistent values
created_at  VARCHAR(30)    -- timestamps as strings: no timezone, no range queries

-- Correct types
id          UUID DEFAULT gen_random_uuid()   -- native UUID type, compact storage
price       NUMERIC(12, 2)                   -- exact decimal arithmetic
status      TEXT CHECK (status IN ('active', 'inactive', 'pending'))  -- or use an ENUM
created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()  -- timezone-aware timestamp

NUMERIC(12, 2) for money is non-negotiable. A FLOAT cannot exactly represent 0.1 in binary. Multiply that across millions of transactions and you get rounding errors that are nearly impossible to debug in production.

Constraints as documentation and guards

Database constraints are the last line of defence. Application code has bugs. Migrations get run incorrectly. Direct SQL gets executed in emergencies. Constraints guarantee your invariants hold regardless of how data enters the database.

sql
CREATE TABLE orders (
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    amount      NUMERIC(12, 2) NOT NULL CHECK (amount > 0),
    status      TEXT NOT NULL DEFAULT 'pending'
                    CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    shipped_at  TIMESTAMPTZ,

    -- Cross-column constraint: shipped_at only set when status is shipped
    CONSTRAINT chk_shipped_at CHECK (
        (status = 'shipped' AND shipped_at IS NOT NULL) OR
        (status != 'shipped')
    )
);

The cross-column constraint catches the invariant "a shipped order must have a ship date" at the database level. No amount of application-layer validation is as reliable.

Multi-tenancy schema patterns

Multi-tenant SaaS is a common Staff-level design challenge. The three patterns, with trade-offs:

1. Row-level tenancy — add tenant_id to every table, enforce with Row Level Security (RLS). Simplest operationally: one database, one schema, all tenants co-mingled. Risk: a missing WHERE clause leaks cross-tenant data. Mitigation: RLS policies enforce it at the Postgres level.

sql
-- Row Level Security for multi-tenancy
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.tenant_id')::BIGINT);

-- Set at connection/session time in your app:
-- SET app.tenant_id = '42';

2. Schema-per-tenant — each tenant gets their own Postgres schema (tenant_42.orders). Strong isolation. Migrations must run N times (once per tenant). Works up to ~100 tenants; gets painful at thousands.

3. Database-per-tenant — maximum isolation, full-cost operational overhead. Reserved for enterprise customers with compliance requirements.

Designing for future migrations

The best schema change is the one you do not need. Every column you add costs a migration. Design with flexibility in mind:

  • JSONB for extensible attributes: A metadata JSONB column lets you attach arbitrary key-value data without schema migrations. Index it with GIN for querying.
  • Soft deletes: deleted_at TIMESTAMPTZ instead of hard DELETE. Preserves referential integrity, enables recovery, simplifies audit trails.
  • Avoid NOT NULL on new columns: Adding a NOT NULL column to a large table requires a table rewrite unless you provide a DEFAULT. Use nullable first, backfill, then add the constraint.
sql
-- Extensible attributes without schema changes
CREATE TABLE products (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    price       NUMERIC(12, 2) NOT NULL,
    attributes  JSONB NOT NULL DEFAULT '{}'
);

-- GIN index for JSONB attribute queries
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);

-- Query: find all products that are "featured"
SELECT * FROM products WHERE attributes @> '{"featured": true}'::jsonb;

-- Query: find all products in a specific category
SELECT * FROM products WHERE attributes->>'category' = 'electronics';

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What is a transitive dependency and why does it violate 3NF? Give a concrete example.
  2. You have a comments table with a user_id FK. A user gets deleted. What happens with ON DELETE RESTRICT vs ON DELETE CASCADE vs ON DELETE SET NULL? Which would you choose for comments?
  3. Why should you never use FLOAT for monetary values in a database? What type would you use instead?
  4. You need to add a verified BOOLEAN NOT NULL DEFAULT false column to a 500M-row table in production. Walk through exactly how you would do this without downtime.
  5. Your SaaS app has 2,000 tenants. Compare row-level tenancy with RLS vs schema-per-tenant. What is the operational break-even point?

Next up — Part 02: EXPLAIN ANALYZE. We go inside the query planner to understand why your queries are slow.

← All Architecture Posts