SQL Basics: Reading Data
This chapter teaches you to pull data out of a database with confidence: SELECT, WHERE, ORDER BY, LIMIT, expressions, and the fiddly bits of NULL.
The Anatomy of a SELECT
Every SELECT statement has the same skeleton:
SELECT <columns>
FROM <table>
WHERE <row filter>
ORDER BY <sort expression>
LIMIT <row count>
OFFSET <row count>;
Only SELECT and FROM are required. The rest are optional but keep the same order. Put them out of order and the parser gives up.
Start with the sample users table from Chapter 1. If you don't have it yet:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
INSERT INTO users (email, name) VALUES
('ada@example.com', 'Ada Lovelace'),
('grace@example.com', 'Grace Hopper'),
('alan@example.com', 'Alan Turing'),
('margaret@example.com', 'Margaret Hamilton'),
('katherine@example.com', 'Katherine Johnson');
Picking Columns
Select everything:
SELECT * FROM users;
SELECT * is fine in the shell. In application code, name the columns you want. It documents intent, and it doesn't break when someone adds a new column.
SELECT id, email, name FROM users;
You can rename columns in the output with AS. The AS keyword is optional but reads better:
SELECT email AS address, name AS full_name
FROM users;
Filtering with WHERE
WHERE keeps rows where the expression is TRUE. Rows where it's FALSE or NULL drop out.
SELECT name
FROM users
WHERE email = 'ada@example.com';
Common comparisons:
SELECT name, created_at FROM users
WHERE created_at > NOW() - INTERVAL '7 days';
SELECT * FROM users
WHERE name LIKE 'A%'; -- starts with A
SELECT * FROM users
WHERE name ILIKE '%hopper%'; -- case-insensitive contains
SELECT * FROM users
WHERE id IN (1, 3, 5);
SELECT * FROM users
WHERE id BETWEEN 2 AND 4; -- inclusive on both sides
Combine with AND, OR, NOT. Use parentheses when in doubt; SQL's precedence surprises people.
SELECT *
FROM users
WHERE (name LIKE 'A%' OR name LIKE 'G%')
AND created_at > '2024-01-01';
NULL Is Not a Value
NULL means "unknown". That has two consequences that catch everyone:
- You cannot compare to
NULLwith=. UseIS NULLorIS NOT NULL. NULLin any arithmetic or comparison producesNULL, whichWHEREtreats as false.
-- Does nothing useful. NULL = NULL is NULL, not TRUE.
SELECT * FROM users WHERE email = NULL;
-- This is what you actually want.
SELECT * FROM users WHERE email IS NULL;
COALESCE turns NULL into something concrete:
SELECT name, COALESCE(email, 'no email on file') AS contact
FROM users;
COALESCE(a, b, c) returns the first non-null argument. Think of it as "try a, then b, then c".
Sorting with ORDER BY
SELECT name FROM users ORDER BY name; -- ascending
SELECT name FROM users ORDER BY name DESC; -- descending
SELECT name, created_at FROM users
ORDER BY created_at DESC, name ASC; -- multi-key
Sorting on an expression works too:
SELECT name FROM users ORDER BY LENGTH(name);
NULL sorts last by default in PostgreSQL for ascending order, first for descending. If you care, be explicit:
SELECT name, updated_at FROM users
ORDER BY updated_at DESC NULLS LAST;
Paging with LIMIT and OFFSET
SELECT name FROM users ORDER BY created_at DESC LIMIT 10;
SELECT name FROM users ORDER BY created_at DESC LIMIT 10 OFFSET 20;
LIMIT 10 OFFSET 20 means "skip the first 20, return the next 10". Good for paging UI. Bad for paging through millions of rows, because each page re-scans everything it's skipping. Chapter 10 covers keyset pagination, which fixes this.
Expressions in the SELECT List
You are not limited to raw columns. Any expression works:
SELECT name,
LENGTH(name) AS name_length,
UPPER(name) AS shouty,
EXTRACT(YEAR FROM created_at) AS signup_year
FROM users;
Strings, numbers, dates, booleans all support rich operator libraries. A few you'll use constantly:
-- String concatenation
SELECT name || ' <' || email || '>' AS contact_line FROM users;
-- Arithmetic
SELECT id, id * 100 AS padded_id FROM users;
-- Conditional expression (the SQL version of if/else)
SELECT name,
CASE
WHEN created_at > NOW() - INTERVAL '30 days' THEN 'new'
WHEN created_at > NOW() - INTERVAL '1 year' THEN 'regular'
ELSE 'veteran'
END AS tier
FROM users;
DISTINCT and Deduplication
SELECT DISTINCT removes duplicate rows from the result:
SELECT DISTINCT LEFT(email, 1) AS first_letter
FROM users
ORDER BY first_letter;
DISTINCT applies to the full row of selected columns, not just the first. If you want "one row per X", use GROUP BY (Chapter 6) or window functions, not DISTINCT.
Common Pitfalls
Quoting strings vs identifiers. Single quotes are for string literals. Double quotes are for identifiers (column and table names, often case-sensitive). This matters:
-- Correct: string literal
SELECT * FROM users WHERE name = 'Ada Lovelace';
-- Wrong: looks for a column called "Ada Lovelace"
SELECT * FROM users WHERE name = "Ada Lovelace";
Comparing TEXT to a differently-cased version. 'Ada' = 'ada' is false. Use LOWER() or ILIKE when you want case-insensitivity:
SELECT * FROM users WHERE LOWER(name) = LOWER('ada lovelace');
Forgetting the semicolon. psql will happily wait forever. If the prompt changes from learning=# to learning-#, psql thinks your statement isn't finished. Add the semicolon.
Next Steps
Continue to 03-schema-design.md to design tables that enforce their own rules.