Indexing: Making Queries Fast
This chapter teaches you how indexes work, when to add them, and how to spot the ones that are quietly hurting your writes.
What an Index Is
An index is a separate data structure that lets the database find rows by a column's value without scanning the whole table. The default index type in PostgreSQL is a B-tree, a balanced search tree. Lookups and range queries on a B-tree are logarithmic in the table size, so 10 million rows costs about 24 steps.
Every PRIMARY KEY and UNIQUE constraint automatically creates an index. Foreign keys do not (though you almost always want one; see below).
You create indexes manually with CREATE INDEX:
CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (status, created_at);
CREATE UNIQUE INDEX ON products (sku);
PostgreSQL auto-names indexes if you don't. If you want a name (for later DROP INDEX):
CREATE INDEX orders_user_created_idx ON orders (user_id, created_at DESC);
How to Tell an Index Is Being Used
EXPLAIN shows the query plan. EXPLAIN ANALYZE runs the query and shows actual timings.
EXPLAIN SELECT * FROM orders WHERE user_id = 1;
With no index:
Seq Scan on orders (cost=0.00..120.00 rows=3 width=52)
Filter: (user_id = 1)
"Seq Scan" means "read every row". For three rows, that's fine. For 10 million, not fine.
After CREATE INDEX ON orders (user_id);:
Index Scan using orders_user_id_idx on orders (cost=0.15..8.17 rows=3 width=52)
Index Cond: (user_id = 1)
"Index Scan" means the engine used the index.
Chapter 10 goes deep on reading plans. For now: if you see Seq Scan where you expected Index Scan, either the index doesn't exist, the planner thinks it's not worth using (usually when the table is tiny), or the query isn't written in a way the index can help.
Composite Indexes
An index on (a, b) can serve queries that filter on a, or on a AND b, but not on b alone. Think of a phone book sorted by last name, then first name. It's great for looking up "all the Joneses named Harry" or "all the Joneses". It doesn't help "find everyone named Harry".
Design composite indexes with the most-selective, most-frequently-filtered column first.
-- Queries filter by status, then optionally by date
CREATE INDEX ON orders (status, created_at DESC);
Partial Indexes
A partial index covers only rows matching a predicate. Much smaller, much faster, and often the right answer for status-like columns where one value dominates.
-- 95% of orders are 'shipped'. Only index the other 5%.
CREATE INDEX ON orders (created_at) WHERE status IN ('pending', 'paid');
Queries with the matching predicate automatically use the index:
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 20;
Expression Indexes
Index an expression, not just a column:
CREATE INDEX ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = LOWER('Ada@Example.com');
The planner matches the indexed expression literally. WHERE email = ... won't use an index on LOWER(email); it needs WHERE LOWER(email) = ....
Other Index Types
Beyond B-tree, PostgreSQL ships four more index types. You won't need most of them often.
HASH equality-only. B-tree does everything HASH does; skip it.
GIN for JSONB, arrays, full-text search.
GiST for geometric data, ranges, nearest-neighbor.
BRIN for huge, naturally-ordered tables (time series logs).
The most common non-default: GIN for JSONB:
CREATE INDEX ON events USING GIN (payload jsonb_path_ops);
SELECT * FROM events WHERE payload @> '{"action": "checkout"}';
Foreign Keys and Indexes
PostgreSQL doesn't automatically index the referencing side of a foreign key. You almost always want one, because:
- Queries that join via the FK will do seq scans otherwise.
ON DELETEandON UPDATEon the parent need to find matching children; without an index, that's a full scan every time.
CREATE INDEX ON orders (user_id); -- orders.user_id → users.id
CREATE INDEX ON order_items (product_id); -- order_items.product_id → products.id
When NOT to Index
Every index is data the database keeps up to date. Every INSERT, UPDATE of an indexed column, and DELETE writes to the index too. Too many indexes and writes slow down; the working set no longer fits in memory.
Skip indexes when:
- The table is small (a few thousand rows). Seq Scan is often faster than index lookup.
- The column isn't used in
WHERE,JOIN, orORDER BY. - The column has very low cardinality (e.g. a
booleanwhere 50% are true). The index rarely helps; the planner usually ignores it. - You already have a composite index that starts with that column.
Inspect existing indexes:
\d+ orders -- in psql, shows columns and indexes
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';
Find unused indexes over time:
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
Indexes with idx_scan = 0 after weeks of traffic are candidates for removal.
Covering Indexes (INCLUDE)
PostgreSQL 11+ supports INCLUDE columns on an index. These aren't part of the index's search key but are stored alongside it, letting the planner answer some queries from the index alone ("index-only scan").
CREATE INDEX ON orders (user_id) INCLUDE (total, status);
-- Can be answered without touching the table
SELECT total, status FROM orders WHERE user_id = 1;
Useful for hot queries on wide tables. Don't reach for it until EXPLAIN ANALYZE shows heap fetches as the bottleneck.
Common Pitfalls
Indexing every column. Each index costs writes. Measure before adding.
Leading wildcard on LIKE. WHERE name LIKE '%ada%' can't use a B-tree index. Use pg_trgm with a GIN index, or a full-text search index.
Function on the indexed column. WHERE LOWER(email) = ... won't use an index on email. You need an expression index on LOWER(email).
Type mismatch. WHERE phone = 1234567890 when phone is TEXT forces a cast and skips the index. Quote the number.
Missing FK index. The most common performance bug: a foreign key with no index on the child side. Queries like "every order for this user" scan the whole orders table.
Next Steps
Continue to 08-transactions.md to keep data consistent under concurrent writes.