Postgres Internals 101: Tables, Types & Constraints
PostgresFoundation

Postgres Internals 101: Tables, Types & Constraints

March 22, 202612 min readPART 03 / 19

Most engineers use Postgres every day and have never thought about what a table actually is on disk, why TIMESTAMP without a timezone will eventually ruin your life, or why storing money as FLOAT is a slow-motion bug waiting to explode. This is Part 1 of the Postgres Internals series — building real understanding from the ground up, one layer at a time.

The series: Postgres internals

This is a living series. Each post covers one layer of Postgres — written as I learn it, with real examples, the mistakes I've seen in production, and quiz questions at the end so you can test yourself.

Layer 1 — Storage & Structure     ← You are here
  Tables, Views, Schemas, Partitioning

Layer 2 — Querying
  JOINs, Aggregations, CTEs, Window Functions

Layer 3 — Performance
  Indexes, EXPLAIN ANALYZE, VACUUM, Connection Pooling

Layer 4 — Reliability
  Transactions, ACID, Locking, WAL, Replication

Layer 5 — Scale
  Sharding, Read Replicas, JSONB, Postgres at 100M+ rows

What a table actually is

Postgres stores tables as heap files on disk. Each table is one or more 8KB pages. Rows are packed into these pages with no inherent order.

Page 1 (8KB)          Page 2 (8KB)
┌──────────────────┐  ┌──────────────────┐
│ row 1            │  │ row 9            │
│ row 2            │  │ row 10           │
│ row 3            │  │ ...              │
│ ...              │  │                  │
│ row 8            │  │                  │
└──────────────────┘  └──────────────────┘

This means: SELECT * FROM users without ORDER BY gives you rows in whatever order they're physically stored — which can change as rows are inserted, updated, and deleted. Never rely on implicit ordering.

A corollary that trips people up: Postgres does not guarantee row order within a transaction. Two identical queries run back-to-back can return rows in different order if autovacuum or concurrent inserts happened in between.

Column types — the ones that matter

Postgres has 40+ types. You'll use about 10 of them regularly. Here's what actually matters:

-- Numbers
id          BIGSERIAL       -- auto-increment 64-bit (use this, not SERIAL)
price       NUMERIC(10, 2)  -- exact decimal — for money, always
count       INTEGER         -- 32-bit int, fine for most counters
ratio       FLOAT           -- approximate — fine for analytics, fatal for money

-- Text
name        TEXT            -- no length limit, use this by default
code        VARCHAR(10)     -- only when you genuinely need to enforce length
flag        CHAR(1)         -- avoid — pads with spaces, causes subtle bugs

-- Time
created_at  TIMESTAMPTZ     -- always with timezone. ALWAYS.
birth_date  DATE            -- date only, no time component

-- Everything else
data        JSONB           -- binary JSON, indexable, fast
is_active   BOOLEAN
user_id     UUID            -- good for distributed systems, avoids ID enumeration

The two rules everyone learns the hard way

Rule 1: Always use TIMESTAMPTZ, never TIMESTAMP.

Plain TIMESTAMP stores no timezone information. It looks fine in development because your app server and DB server are in the same timezone. Then you deploy to a server in Singapore, or daylight saving time kicks in, or a teammate in a different timezone runs a migration — and suddenly every timestamp is off by hours.

-- This is a time bomb
created_at  TIMESTAMP

-- This is correct
created_at  TIMESTAMPTZ

TIMESTAMPTZ stores the moment in UTC internally and converts on display based on the session timezone. It costs the same storage. There is no reason to use plain TIMESTAMP.

Rule 2: Never store money as FLOAT.

Floating point numbers are approximations. IEEE 754 cannot represent most decimal fractions exactly in binary. This is fine for scientific computation. It is not fine for money.

-- Dangerous — floating point imprecision
price  FLOAT

-- Safe — exact decimal arithmetic
price  NUMERIC(10, 2)   -- up to 99,999,999.99

-- Example of the float bug:
SELECT 0.1 + 0.2;
-- Returns: 0.30000000000000004
-- Not 0.30. Your accountant will find this eventually.

Constraints — the database enforcing your rules

Application code has bugs. APIs get called directly. Scripts run in emergencies. Bad data will get into your database unless the database itself rejects it. Constraints are your last line of defence.

CREATE TABLE members (
    id          BIGSERIAL PRIMARY KEY,
    email       TEXT NOT NULL UNIQUE,
    gym_id      BIGINT NOT NULL REFERENCES gyms(id) ON DELETE CASCADE,
    plan        TEXT NOT NULL DEFAULT 'monthly'
                    CHECK (plan IN ('monthly', 'quarterly', 'annual')),
    status      TEXT NOT NULL DEFAULT 'active'
                    CHECK (status IN ('active', 'expired', 'paused')),
    fee         NUMERIC(8, 2) NOT NULL CHECK (fee > 0),
    joined_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at  TIMESTAMPTZ NOT NULL
);

Breaking down each constraint:

PRIMARY KEY     → unique + not null + auto-indexed. Every table should have one.
NOT NULL        → this column must always have a value. Be liberal with this.
UNIQUE          → no two rows can have the same value. Also creates an index.
REFERENCES      → foreign key. Postgres will reject orphaned records.
ON DELETE CASCADE → if the gym is deleted, delete all its members too.
CHECK           → custom validation. Enforced at the DB level, always.
DEFAULT         → value inserted when none is provided.

NULL — the silent killer

NULL means "unknown" — not zero, not empty string, not false. It has counterintuitive behaviour that causes real bugs:

SELECT NULL = NULL;      -- Returns NULL, not TRUE
SELECT NULL != NULL;     -- Returns NULL, not FALSE
SELECT NULL + 1;         -- Returns NULL
SELECT '' = NULL;        -- Returns NULL

-- This query will silently miss rows where status IS NULL:
WHERE status != 'expired'

-- This is correct:
WHERE status != 'expired' OR status IS NULL

-- Safe NULL comparison:
WHERE status IS DISTINCT FROM 'expired'

The rule of thumb: use NOT NULL on every column where a missing value doesn't make semantic sense. Most columns. The fewer NULLable columns you have, the fewer NULL-related bugs you'll debug at 2am.

ON DELETE behaviour — choose carefully

When you define a foreign key, you need to decide what happens when the referenced row is deleted:

-- RESTRICT (default) — blocks the delete if child rows exist
gym_id  BIGINT REFERENCES gyms(id)

-- CASCADE — deletes all child rows automatically
gym_id  BIGINT REFERENCES gyms(id) ON DELETE CASCADE

-- SET NULL — sets the foreign key to NULL on child rows
gym_id  BIGINT REFERENCES gyms(id) ON DELETE SET NULL

-- SET DEFAULT — sets the foreign key to its default value
gym_id  BIGINT REFERENCES gyms(id) ON DELETE SET DEFAULT

ON DELETE CASCADE is convenient but dangerous if misapplied. Deleting a gym should probably delete its members. But deleting a category probably should not silently delete all products in that category. Think before you cascade.

Quick reference: table design checklist

✅ Every table has a BIGSERIAL or UUID primary key
✅ Timestamp columns use TIMESTAMPTZ, not TIMESTAMP
✅ Money columns use NUMERIC, not FLOAT
✅ Columns that must exist are NOT NULL
✅ Text columns with a fixed domain use CHECK constraints
✅ Foreign keys are defined for all relationships
✅ ON DELETE behaviour is explicitly chosen, not defaulted blindly
✅ No CHAR() columns — use TEXT or VARCHAR

Test yourself

Write your answers before revealing them — that's how this sticks.

Q1. Design the members table for a gym management app. Include: member name, email, phone, gym (foreign key), plan type (monthly/quarterly/annual), status (active/expired/paused), fee amount, join date, expiry date. Write the full CREATE TABLE statement with correct types and constraints.

💡 Show Answer
CREATE TABLE members (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT NOT NULL UNIQUE,
    phone       TEXT NOT NULL,          -- TEXT, not INTEGER (leading zeros, +91)
    gym_id      BIGINT NOT NULL REFERENCES gyms(id) ON DELETE RESTRICT,
    plan        TEXT NOT NULL DEFAULT 'monthly'
                    CHECK (plan IN ('monthly', 'quarterly', 'annual')),
    status      TEXT NOT NULL DEFAULT 'active'
                    CHECK (status IN ('active', 'expired', 'paused')),
    fee         NUMERIC(8, 2) NOT NULL CHECK (fee > 0),
    joined_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at  TIMESTAMPTZ NOT NULL
);

Key things: phone is TEXT (never INTEGER — you'd lose the leading zero and can't store +91). Money is NUMERIC. Both date columns are TIMESTAMPTZ. Status is enforced with CHECK at the DB level.

Q2. A teammate uses TIMESTAMP instead of TIMESTAMPTZ for joined_at. The app runs fine in Mumbai. What breaks when you deploy to a Singapore server six months later?

💡 Show Answer

Plain TIMESTAMP stores no timezone. When your server was in Mumbai (UTC+5:30), timestamps were written assuming IST. When the server moves to Singapore (UTC+8), Postgres interprets all stored timestamps in the new timezone — every historical record is now off by 2.5 hours. Membership expiry dates are wrong. Reports show incorrect times. There's no fix without a data migration.

TIMESTAMPTZ stores the moment in UTC internally and converts at query time based on the session timezone — same data, always correct regardless of where the server lives.

Q3. Why is storing a membership fee as FLOAT dangerous? Give a concrete example of how it breaks.

💡 Show Answer

IEEE 754 floating point cannot represent most decimal fractions exactly in binary. Classic example:

SELECT 0.1 + 0.2;
-- Returns: 0.30000000000000004
-- Not 0.30

At scale: if you store 499.99 as FLOAT, Postgres might store 499.98999999999997. Sum 10,000 memberships and the rounding error is real money your accountant will find. Use NUMERIC(8, 2) — it does exact decimal arithmetic. No surprises.

Q4. You have a status column that should only ever be 'active', 'expired', or 'paused'. A junior engineer enforces this in the API layer only. What are two ways invalid data can still get into the database?

💡 Show Answer
  • Direct DB access — a hotfix script, a migration, or a developer running raw SQL bypasses the API entirely. No constraint = anything goes.
  • Another service / future code — a new microservice writes to the same table without knowing the validation rules. Introduces 'cancelled' or 'ACTIVE' (wrong case) and nothing stops it.
  • Bonus: the API validation code itself has a bug and allows unexpected values through.

A DB-level CHECK constraint catches all of these because it runs regardless of how the write arrives.

Q5. What does ON DELETE CASCADE on a foreign key do? Give one scenario where it's the right choice, and one where it would be catastrophic.

💡 Show Answer

ON DELETE CASCADE automatically deletes all child rows when the referenced parent row is deleted.

Right choice: Deleting a gym should delete all its members and their attendance records. The data is meaningless without the parent — cascade makes sense.

Catastrophic: Deleting a category row silently cascades to delete all products in that category, their orders, their payment records. A fat-fingered admin deletes one category and wipes transaction history. The default (RESTRICT) would have blocked the delete and saved the day.

Next up — Part 2: Views, Materialized Views & Schemas. When to use a view vs a CTE vs a materialized view, and how schemas help you organize large databases without losing your mind.

← PREV
Staff Prep 02: Reading EXPLAIN ANALYZE Like a Query Planner
← All Postgres Posts