Indexing: Making Queries Fast | Databases Tutorial

This chapter teaches you how indexes work, when to add them, and how to spot the ones that are quietly hurting your writes.

What an Index Is

An index is a separate data structure that lets the database find rows by a column's value without scanning the whole table. The default index type in PostgreSQL is a B-tree, a balanced search tree. Lookups and range queries on a B-tree are logarithmic in the table size, so 10 million rows costs about 24 steps.

Every PRIMARY KEY and UNIQUE constraint automatically creates an index. Foreign keys do not (though you almost always want one; see below).

You create indexes manually with CREATE INDEX:

CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (status, created_at);
CREATE UNIQUE INDEX ON products (sku);

PostgreSQL auto-names indexes if you don't. If you want a name (for later DROP INDEX):

CREATE INDEX orders_user_created_idx ON orders (user_id, created_at DESC);

How to Tell an Index Is Being Used

EXPLAIN shows the query plan. EXPLAIN ANALYZE runs the query and shows actual timings.

EXPLAIN SELECT * FROM orders WHERE user_id = 1;

With no index:

 Seq Scan on orders  (cost=0.00..120.00 rows=3 width=52)
   Filter: (user_id = 1)

"Seq Scan" means "read every row". For three rows, that's fine. For 10 million, not fine.

After CREATE INDEX ON orders (user_id);:

 Index Scan using orders_user_id_idx on orders  (cost=0.15..8.17 rows=3 width=52)
   Index Cond: (user_id = 1)

"Index Scan" means the engine used the index.

Chapter 10 goes deep on reading plans. For now: if you see Seq Scan where you expected Index Scan, either the index doesn't exist, the planner thinks it's not worth using (usually when the table is tiny), or the query isn't written in a way the index can help.

Composite Indexes

An index on (a, b) can serve queries that filter on a, or on a AND b, but not on b alone. Think of a phone book sorted by last name, then first name. It's great for looking up "all the Joneses named Harry" or "all the Joneses". It doesn't help "find everyone named Harry".

Design composite indexes with the most-selective, most-frequently-filtered column first.

-- Queries filter by status, then optionally by date
CREATE INDEX ON orders (status, created_at DESC);

Partial Indexes

A partial index covers only rows matching a predicate. Much smaller, much faster, and often the right answer for status-like columns where one value dominates.

-- 95% of orders are 'shipped'. Only index the other 5%.
CREATE INDEX ON orders (created_at) WHERE status IN ('pending', 'paid');

Queries with the matching predicate automatically use the index:

SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 20;

Expression Indexes

Index an expression, not just a column:

CREATE INDEX ON users (LOWER(email));

SELECT * FROM users WHERE LOWER(email) = LOWER('Ada@Example.com');

The planner matches the indexed expression literally. WHERE email = ... won't use an index on LOWER(email); it needs WHERE LOWER(email) = ....

Other Index Types

Beyond B-tree, PostgreSQL ships four more index types. You won't need most of them often.

HASH    equality-only. B-tree does everything HASH does; skip it.
GIN     for JSONB, arrays, full-text search.
GiST    for geometric data, ranges, nearest-neighbor.
BRIN    for huge, naturally-ordered tables (time series logs).

The most common non-default: GIN for JSONB:

CREATE INDEX ON events USING GIN (payload jsonb_path_ops);

SELECT * FROM events WHERE payload @> '{"action": "checkout"}';

Foreign Keys and Indexes

PostgreSQL doesn't automatically index the referencing side of a foreign key. You almost always want one, because:

Queries that join via the FK will do seq scans otherwise.
ON DELETE and ON UPDATE on the parent need to find matching children; without an index, that's a full scan every time.

CREATE INDEX ON orders (user_id);           -- orders.user_id → users.id
CREATE INDEX ON order_items (product_id);   -- order_items.product_id → products.id

When NOT to Index

Every index is data the database keeps up to date. Every INSERT, UPDATE of an indexed column, and DELETE writes to the index too. Too many indexes and writes slow down; the working set no longer fits in memory.

Skip indexes when:

The table is small (a few thousand rows). Seq Scan is often faster than index lookup.
The column isn't used in WHERE, JOIN, or ORDER BY.
The column has very low cardinality (e.g. a boolean where 50% are true). The index rarely helps; the planner usually ignores it.
You already have a composite index that starts with that column.

Inspect existing indexes:

\d+ orders    -- in psql, shows columns and indexes

SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';

Find unused indexes over time:

SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Indexes with idx_scan = 0 after weeks of traffic are candidates for removal.

Covering Indexes (INCLUDE)

PostgreSQL 11+ supports INCLUDE columns on an index. These aren't part of the index's search key but are stored alongside it, letting the planner answer some queries from the index alone ("index-only scan").

CREATE INDEX ON orders (user_id) INCLUDE (total, status);

-- Can be answered without touching the table
SELECT total, status FROM orders WHERE user_id = 1;

Useful for hot queries on wide tables. Don't reach for it until EXPLAIN ANALYZE shows heap fetches as the bottleneck.

Common Pitfalls

Indexing every column. Each index costs writes. Measure before adding.

Leading wildcard on LIKE. WHERE name LIKE '%ada%' can't use a B-tree index. Use pg_trgm with a GIN index, or a full-text search index.

Function on the indexed column. WHERE LOWER(email) = ... won't use an index on email. You need an expression index on LOWER(email).

Type mismatch. WHERE phone = 1234567890 when phone is TEXT forces a cast and skips the index. Quote the number.

Missing FK index. The most common performance bug: a foreign key with no index on the child side. Queries like "every order for this user" scan the whole orders table.

Next Steps

Continue to 08-transactions.md to keep data consistent under concurrent writes.