Schema Design: Tables That Enforce Their Own Rules

This chapter shows you how to design tables so bad data never reaches them in the first place.

The Deal with Schemas

A schema is the contract between your application and your data. Good schemas let the database reject bad inputs without any help from application code. That matters because application code changes, bugs ship, and new services get added. The database is the last line of defense.

The tools are: data types, NOT NULL, primary keys, foreign keys, UNIQUE, CHECK, and defaults. Use them aggressively.

A Realistic Starting Schema

We'll build a small e-commerce schema and use it for the rest of the tutorial. Four tables: users, products, orders, order_items.

CREATE TABLE users (
    id         SERIAL PRIMARY KEY,
    email      TEXT UNIQUE NOT NULL,
    name       TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE products (
    id         SERIAL PRIMARY KEY,
    sku        TEXT UNIQUE NOT NULL,
    name       TEXT NOT NULL,
    price      NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
    stock      INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE orders (
    id         SERIAL PRIMARY KEY,
    user_id    INTEGER NOT NULL REFERENCES users(id),
    status     TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
    total      NUMERIC(10, 2) NOT NULL CHECK (total >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE order_items (
    order_id   INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INTEGER NOT NULL REFERENCES products(id),
    quantity   INTEGER NOT NULL CHECK (quantity > 0),
    price      NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Every word of that is doing work. The rest of the chapter explains each one.

Data Types That Matter

PostgreSQL has dozens of types. You'll use a handful constantly.

Numeric:

INTEGER         32-bit signed integer (-2.1B to 2.1B). Default for IDs and counts.
BIGINT          64-bit signed integer. Use when INTEGER might overflow.
NUMERIC(p, s)   Arbitrary precision decimal. Use for money, always.
REAL / DOUBLE   Floating point. Use for measurements, never for money.

Text:

TEXT            Variable-length string. Your default.
VARCHAR(n)      Same as TEXT, with a length cap. Rarely needed.
CHAR(n)         Fixed-length, space-padded. Almost never what you want.

Temporal:

TIMESTAMPTZ     Date + time + timezone. Use this for "when did X happen".
TIMESTAMP       Date + time, no timezone. Avoid; timezones will bite you.
DATE            Just the date (2026-04-19).
INTERVAL        A duration ('3 days', '1 hour 30 minutes').

Others you'll meet:

BOOLEAN         TRUE / FALSE / NULL.
UUID            128-bit identifier. Use with gen_random_uuid().
JSONB           Binary JSON. Indexed, queryable. See Chapter 11.
BYTEA           Bytes. Use sparingly; most binary data belongs outside the DB.

Pick the narrowest type that fits. INTEGER for a user ID, NUMERIC(10, 2) for a price, TEXT for strings. Don't reach for VARCHAR(255) out of habit.

Primary Keys

Every table needs a primary key. It uniquely identifies a row and is automatically indexed.

Two choices for the strategy:

Surrogate key (SERIAL or BIGSERIAL): an auto-incrementing integer with no business meaning. The default choice.

id SERIAL PRIMARY KEY

SERIAL is shorthand for "INTEGER, auto-incremented, NOT NULL, PRIMARY KEY". In modern PostgreSQL, prefer the SQL standard form:

id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY

It behaves slightly better around permissions and sequences.

Natural key: a column that already uniquely identifies the row (email, sku, isbn). Tempting, but natural keys tend to change or get reformatted, and changing a primary key ripples everywhere. Use a surrogate key and put a UNIQUE constraint on the natural identifier instead.

Composite key: two or more columns together form the key. Common for join tables:

PRIMARY KEY (order_id, product_id)

Foreign Keys

A foreign key says "this column references a row in another table". The database enforces it:

user_id INTEGER NOT NULL REFERENCES users(id)

You cannot insert an order with user_id = 42 unless a user with id = 42 exists. You cannot delete a user who has orders (by default).

Control what happens on delete with ON DELETE:

ON DELETE RESTRICT   Refuse the delete if children exist. (Default.)
ON DELETE CASCADE    Delete the children too.
ON DELETE SET NULL   Null out the foreign key on children.

For order_items, cascade makes sense: if the order is gone, its line items are meaningless. For orders.user_id, restrict is probably right; you don't want to silently delete a user's order history.

NOT NULL and DEFAULT

NOT NULL says "this column must always have a value". Combine with DEFAULT to make the common case painless:

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

The default is only applied when you don't provide a value in INSERT. If you insert NULL explicitly, it still fails the NOT NULL check.

As a rule: make columns NOT NULL unless you have a real reason for them to be optional. NULL is the source of most "why did that query return nothing?" moments.

UNIQUE Constraints

email TEXT UNIQUE NOT NULL

The database enforces that no two rows share an email. You get this for free; don't try to enforce uniqueness in application code.

Multi-column unique:

CREATE TABLE memberships (
    user_id INTEGER NOT NULL REFERENCES users(id),
    team_id INTEGER NOT NULL REFERENCES teams(id),
    role    TEXT NOT NULL,
    UNIQUE (user_id, team_id)
);

A user can belong to many teams, but only once per team.

CHECK Constraints

A CHECK constraint runs an expression on every insert or update. If it's false, the write is rejected.

price    NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
stock    INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
quantity INTEGER NOT NULL CHECK (quantity > 0),
status   TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled'))

These checks are cheap to run and priceless when a buggy client tries to write a negative stock level. The client gets an error. The data stays correct.

Altering a Schema

Schemas change. PostgreSQL handles most changes online (without locking the table for long):

ALTER TABLE users ADD COLUMN phone TEXT;
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
ALTER TABLE users ADD CONSTRAINT phone_format CHECK (phone ~ '^\+[0-9]+$');
ALTER TABLE users DROP COLUMN phone;
ALTER TABLE users RENAME COLUMN name TO full_name;

A few rules of thumb:

  • Adding a nullable column is free.
  • Adding a NOT NULL column with a constant DEFAULT is fast in PostgreSQL 11+ (it stores the default without rewriting rows).
  • Adding a CHECK constraint scans the table. On big tables, add it as NOT VALID first, then VALIDATE CONSTRAINT later when the table is quiet.

Chapter 12 covers migration tooling.

Common Pitfalls

Using VARCHAR(255). PostgreSQL TEXT has no performance cost over VARCHAR. The 255 is cargo-culted from MySQL.

Storing money as FLOAT. 0.1 + 0.2 != 0.3 in floating point. Use NUMERIC(10, 2).

Storing timestamps without timezone. TIMESTAMP drops the timezone silently. Six months later, someone asks "what timezone are these in?" and nobody knows. Always TIMESTAMPTZ.

Keeping enums in application code only. A CHECK (status IN (...)) at the DB level catches typos nothing else will.

Soft-deleting with a deleted_at column and forgetting to filter it. Works, but every query now needs WHERE deleted_at IS NULL. Consider whether you actually need soft deletes, or whether archive tables and foreign keys with ON DELETE CASCADE would serve you better.

Next Steps

Continue to 04-writing-data.md to put data into these tables.