Schema Design: Tables That Enforce Their Own Rules
This chapter shows you how to design tables so bad data never reaches them in the first place.
The Deal with Schemas
A schema is the contract between your application and your data. Good schemas let the database reject bad inputs without any help from application code. That matters because application code changes, bugs ship, and new services get added. The database is the last line of defense.
The tools are: data types, NOT NULL, primary keys, foreign keys, UNIQUE, CHECK, and defaults. Use them aggressively.
A Realistic Starting Schema
We'll build a small e-commerce schema and use it for the rest of the tutorial. Four tables: users, products, orders, order_items.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
status TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
total NUMERIC(10, 2) NOT NULL CHECK (total >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
price NUMERIC(10, 2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);
Every word of that is doing work. The rest of the chapter explains each one.
Data Types That Matter
PostgreSQL has dozens of types. You'll use a handful constantly.
Numeric:
INTEGER 32-bit signed integer (-2.1B to 2.1B). Default for IDs and counts.
BIGINT 64-bit signed integer. Use when INTEGER might overflow.
NUMERIC(p, s) Arbitrary precision decimal. Use for money, always.
REAL / DOUBLE Floating point. Use for measurements, never for money.
Text:
TEXT Variable-length string. Your default.
VARCHAR(n) Same as TEXT, with a length cap. Rarely needed.
CHAR(n) Fixed-length, space-padded. Almost never what you want.
Temporal:
TIMESTAMPTZ Date + time + timezone. Use this for "when did X happen".
TIMESTAMP Date + time, no timezone. Avoid; timezones will bite you.
DATE Just the date (2026-04-19).
INTERVAL A duration ('3 days', '1 hour 30 minutes').
Others you'll meet:
BOOLEAN TRUE / FALSE / NULL.
UUID 128-bit identifier. Use with gen_random_uuid().
JSONB Binary JSON. Indexed, queryable. See Chapter 11.
BYTEA Bytes. Use sparingly; most binary data belongs outside the DB.
Pick the narrowest type that fits. INTEGER for a user ID, NUMERIC(10, 2) for a price, TEXT for strings. Don't reach for VARCHAR(255) out of habit.
Primary Keys
Every table needs a primary key. It uniquely identifies a row and is automatically indexed.
Two choices for the strategy:
Surrogate key (SERIAL or BIGSERIAL): an auto-incrementing integer with no business meaning. The default choice.
id SERIAL PRIMARY KEY
SERIAL is shorthand for "INTEGER, auto-incremented, NOT NULL, PRIMARY KEY". In modern PostgreSQL, prefer the SQL standard form:
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY
It behaves slightly better around permissions and sequences.
Natural key: a column that already uniquely identifies the row (email, sku, isbn). Tempting, but natural keys tend to change or get reformatted, and changing a primary key ripples everywhere. Use a surrogate key and put a UNIQUE constraint on the natural identifier instead.
Composite key: two or more columns together form the key. Common for join tables:
PRIMARY KEY (order_id, product_id)
Foreign Keys
A foreign key says "this column references a row in another table". The database enforces it:
user_id INTEGER NOT NULL REFERENCES users(id)
You cannot insert an order with user_id = 42 unless a user with id = 42 exists. You cannot delete a user who has orders (by default).
Control what happens on delete with ON DELETE:
ON DELETE RESTRICT Refuse the delete if children exist. (Default.)
ON DELETE CASCADE Delete the children too.
ON DELETE SET NULL Null out the foreign key on children.
For order_items, cascade makes sense: if the order is gone, its line items are meaningless. For orders.user_id, restrict is probably right; you don't want to silently delete a user's order history.
NOT NULL and DEFAULT
NOT NULL says "this column must always have a value". Combine with DEFAULT to make the common case painless:
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
The default is only applied when you don't provide a value in INSERT. If you insert NULL explicitly, it still fails the NOT NULL check.
As a rule: make columns NOT NULL unless you have a real reason for them to be optional. NULL is the source of most "why did that query return nothing?" moments.
UNIQUE Constraints
email TEXT UNIQUE NOT NULL
The database enforces that no two rows share an email. You get this for free; don't try to enforce uniqueness in application code.
Multi-column unique:
CREATE TABLE memberships (
user_id INTEGER NOT NULL REFERENCES users(id),
team_id INTEGER NOT NULL REFERENCES teams(id),
role TEXT NOT NULL,
UNIQUE (user_id, team_id)
);
A user can belong to many teams, but only once per team.
CHECK Constraints
A CHECK constraint runs an expression on every insert or update. If it's false, the write is rejected.
price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
quantity INTEGER NOT NULL CHECK (quantity > 0),
status TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled'))
These checks are cheap to run and priceless when a buggy client tries to write a negative stock level. The client gets an error. The data stays correct.
Altering a Schema
Schemas change. PostgreSQL handles most changes online (without locking the table for long):
ALTER TABLE users ADD COLUMN phone TEXT;
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
ALTER TABLE users ADD CONSTRAINT phone_format CHECK (phone ~ '^\+[0-9]+$');
ALTER TABLE users DROP COLUMN phone;
ALTER TABLE users RENAME COLUMN name TO full_name;
A few rules of thumb:
- Adding a nullable column is free.
- Adding a
NOT NULLcolumn with a constantDEFAULTis fast in PostgreSQL 11+ (it stores the default without rewriting rows). - Adding a
CHECKconstraint scans the table. On big tables, add it asNOT VALIDfirst, thenVALIDATE CONSTRAINTlater when the table is quiet.
Chapter 12 covers migration tooling.
Common Pitfalls
Using VARCHAR(255). PostgreSQL TEXT has no performance cost over VARCHAR. The 255 is cargo-culted from MySQL.
Storing money as FLOAT. 0.1 + 0.2 != 0.3 in floating point. Use NUMERIC(10, 2).
Storing timestamps without timezone. TIMESTAMP drops the timezone silently. Six months later, someone asks "what timezone are these in?" and nobody knows. Always TIMESTAMPTZ.
Keeping enums in application code only. A CHECK (status IN (...)) at the DB level catches typos nothing else will.
Soft-deleting with a deleted_at column and forgetting to filter it. Works, but every query now needs WHERE deleted_at IS NULL. Consider whether you actually need soft deletes, or whether archive tables and foreign keys with ON DELETE CASCADE would serve you better.
Next Steps
Continue to 04-writing-data.md to put data into these tables.