Relational Model and SQL

Learning Objectives

By the end of this reading, you will be able to:

  • Understand the mathematical foundations of the relational model
  • Write complex SQL queries using SELECT, WHERE, and ORDER BY
  • Perform JOIN operations (INNER, LEFT, RIGHT, FULL OUTER)
  • Use aggregate functions and GROUP BY for data analysis
  • Write subqueries and correlated subqueries
  • Understand set operations (UNION, INTERSECT, EXCEPT)
  • Apply NULL handling in queries
  • Optimize query writing for readability and performance

Introduction

Structured Query Language (SQL) is the standard language for interacting with relational databases. Developed in the 1970s at IBM, SQL has become the lingua franca of data management. In this reading, we'll explore SQL in depth, from basic queries to complex multi-table operations.

The Relational Model: Formal Foundations

Relations as Sets

In the relational model, a relation is a set of tuples. Each tuple is an ordered set of attribute values.

Mathematically:

  • Given domains D₁, D₂, ..., Dₙ
  • A relation R is a subset of D₁ × D₂ × ... × Dₙ
  • Each element in R is a tuple (d₁, d₂, ..., dₙ) where dᵢ ∈ Dᵢ

Properties:

  1. No duplicate tuples: Each row is unique
  2. No ordering of tuples: Rows have no inherent order
  3. No ordering of attributes: Columns are accessed by name
  4. Atomic values: Each cell contains a single value

Relational Algebra Operations

Relational algebra provides the theoretical foundation for SQL:

  • Selection (σ): Choose rows that satisfy a condition
  • Projection (π): Choose specific columns
  • Union (∪): Combine tuples from two relations
  • Difference (−): Tuples in one relation but not another
  • Cartesian Product (×): All combinations of tuples
  • Join (⋈): Combine related tuples from two relations

We'll see how these map to SQL operations throughout this reading.

Sample Database Schema

Let's work with an example database for an online bookstore:

-- Create the database
CREATE DATABASE Bookstore;
USE Bookstore;

-- Authors table
CREATE TABLE Authors (
    AuthorID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Country VARCHAR(50),
    BirthYear INT
);

-- Publishers table
CREATE TABLE Publishers (
    PublisherID INT PRIMARY KEY,
    PublisherName VARCHAR(100) NOT NULL,
    City VARCHAR(50),
    Country VARCHAR(50)
);

-- Books table
CREATE TABLE Books (
    BookID INT PRIMARY KEY,
    Title VARCHAR(200) NOT NULL,
    ISBN VARCHAR(13) UNIQUE,
    PublicationYear INT,
    Price DECIMAL(10,2),
    Pages INT,
    PublisherID INT,
    FOREIGN KEY (PublisherID) REFERENCES Publishers(PublisherID)
);

-- Book_Authors junction table (many-to-many)
CREATE TABLE Book_Authors (
    BookID INT,
    AuthorID INT,
    AuthorOrder INT,  -- 1 for primary author, 2 for second, etc.
    PRIMARY KEY (BookID, AuthorID),
    FOREIGN KEY (BookID) REFERENCES Books(BookID),
    FOREIGN KEY (AuthorID) REFERENCES Authors(AuthorID)
);

-- Customers table
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) UNIQUE,
    City VARCHAR(50),
    Country VARCHAR(50),
    RegistrationDate DATE
);

-- Orders table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    TotalAmount DECIMAL(10,2),
    Status VARCHAR(20),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Order_Items table
CREATE TABLE Order_Items (
    OrderID INT,
    BookID INT,
    Quantity INT,
    Price DECIMAL(10,2),  -- Price at time of order
    PRIMARY KEY (OrderID, BookID),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (BookID) REFERENCES Books(BookID)
);

-- Reviews table
CREATE TABLE Reviews (
    ReviewID INT PRIMARY KEY,
    BookID INT,
    CustomerID INT,
    Rating INT CHECK (Rating >= 1 AND Rating <= 5),
    ReviewText TEXT,
    ReviewDate DATE,
    FOREIGN KEY (BookID) REFERENCES Books(BookID),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Insert sample data
INSERT INTO Authors VALUES
    (1, 'J.K.', 'Rowling', 'UK', 1965),
    (2, 'George', 'Orwell', 'UK', 1903),
    (3, 'Jane', 'Austen', 'UK', 1775),
    (4, 'Mark', 'Twain', 'USA', 1835),
    (5, 'Ernest', 'Hemingway', 'USA', 1899),
    (6, 'Agatha', 'Christie', 'UK', 1890),
    (7, 'Isaac', 'Asimov', 'USA', 1920),
    (8, 'Arthur', 'Doyle', 'UK', 1859);

INSERT INTO Publishers VALUES
    (1, 'Penguin Random House', 'New York', 'USA'),
    (2, 'HarperCollins', 'New York', 'USA'),
    (3, 'Simon & Schuster', 'New York', 'USA'),
    (4, 'Hachette', 'Paris', 'France'),
    (5, 'Macmillan', 'London', 'UK');

INSERT INTO Books VALUES
    (1, 'Harry Potter and the Philosopher''s Stone', '9780747532699', 1997, 19.99, 223, 1),
    (2, '1984', '9780451524935', 1949, 15.99, 328, 2),
    (3, 'Animal Farm', '9780451526342', 1945, 12.99, 112, 2),
    (4, 'Pride and Prejudice', '9780141439518', 1813, 10.99, 279, 3),
    (5, 'Adventures of Huckleberry Finn', '9780486280615', 1884, 8.99, 224, 3),
    (6, 'The Old Man and the Sea', '9780684801223', 1952, 14.99, 127, 4),
    (7, 'Murder on the Orient Express', '9780062693662', 1934, 16.99, 256, 2),
    (8, 'Foundation', '9780553293357', 1951, 18.99, 255, 5),
    (9, 'The Adventures of Sherlock Holmes', '9780486474915', 1892, 11.99, 307, 3);

INSERT INTO Book_Authors VALUES
    (1, 1, 1), (2, 2, 1), (3, 2, 1), (4, 3, 1),
    (5, 4, 1), (6, 5, 1), (7, 6, 1), (8, 7, 1), (9, 8, 1);

INSERT INTO Customers VALUES
    (1, 'Alice', 'Johnson', 'alice@email.com', 'New York', 'USA', '2022-01-15'),
    (2, 'Bob', 'Smith', 'bob@email.com', 'London', 'UK', '2022-03-20'),
    (3, 'Carol', 'Davis', 'carol@email.com', 'Toronto', 'Canada', '2022-05-10'),
    (4, 'David', 'Wilson', 'david@email.com', 'Sydney', 'Australia', '2023-01-05'),
    (5, 'Eve', 'Brown', 'eve@email.com', 'Paris', 'France', '2023-02-14');

INSERT INTO Orders VALUES
    (1, 1, '2023-06-15', 35.98, 'Delivered'),
    (2, 2, '2023-07-20', 15.99, 'Delivered'),
    (3, 1, '2023-08-05', 51.97, 'Shipped'),
    (4, 3, '2023-08-10', 12.99, 'Processing'),
    (5, 4, '2023-09-01', 33.98, 'Delivered');

INSERT INTO Order_Items VALUES
    (1, 1, 1, 19.99), (1, 2, 1, 15.99),
    (2, 2, 1, 15.99),
    (3, 1, 1, 19.99), (3, 7, 1, 16.99), (3, 8, 1, 14.99),
    (4, 3, 1, 12.99),
    (5, 6, 1, 14.99), (5, 7, 1, 18.99);

INSERT INTO Reviews VALUES
    (1, 1, 1, 5, 'Amazing book! Could not put it down.', '2023-06-20'),
    (2, 2, 1, 4, 'Thought-provoking and still relevant today.', '2023-06-22'),
    (3, 2, 2, 5, 'A masterpiece of dystopian fiction.', '2023-07-25'),
    (4, 7, 1, 5, 'Classic Christie at her best!', '2023-08-10'),
    (5, 1, 4, 4, 'Great start to the series.', '2023-09-05');

Basic SELECT Queries

Simple SELECT

The SELECT statement retrieves data from one or more tables. In relational algebra, this combines projection (π) and selection (σ).

-- Select all columns
SELECT * FROM Books;

-- Select specific columns (Projection: π)
SELECT Title, Price FROM Books;

-- Select with WHERE clause (Selection: σ)
SELECT Title, Price
FROM Books
WHERE Price < 15;

-- Multiple conditions
SELECT Title, PublicationYear
FROM Books
WHERE PublicationYear > 1900 AND Price < 20;

-- Pattern matching with LIKE
SELECT Title, AuthorID
FROM Books b
JOIN Book_Authors ba ON b.BookID = ba.BookID
WHERE Title LIKE '%Sherlock%';

-- IN operator
SELECT FirstName, LastName
FROM Authors
WHERE Country IN ('UK', 'USA');

-- BETWEEN operator
SELECT Title, Price
FROM Books
WHERE Price BETWEEN 10 AND 20;

Column Aliases

-- Column aliases for readability
SELECT
    Title AS BookTitle,
    Price AS CurrentPrice,
    Price * 0.9 AS DiscountedPrice
FROM Books;

-- Calculated fields
SELECT
    FirstName,
    LastName,
    2024 - BirthYear AS Age
FROM Authors;

DISTINCT

-- Remove duplicates
SELECT DISTINCT Country FROM Authors;

-- Count distinct values
SELECT COUNT(DISTINCT Country) AS NumberOfCountries
FROM Authors;

ORDER BY

-- Sort by one column
SELECT Title, Price
FROM Books
ORDER BY Price;

-- Descending order
SELECT Title, Price
FROM Books
ORDER BY Price DESC;

-- Multiple columns
SELECT Title, PublicationYear, Price
FROM Books
ORDER BY PublicationYear DESC, Price ASC;

-- Sort by calculated field
SELECT
    Title,
    Price,
    Pages,
    Price / Pages AS PricePerPage
FROM Books
ORDER BY PricePerPage;

LIMIT and OFFSET

-- Get top 5 most expensive books
SELECT Title, Price
FROM Books
ORDER BY Price DESC
LIMIT 5;

-- Pagination: Get books 6-10
SELECT Title, Price
FROM Books
ORDER BY Price DESC
LIMIT 5 OFFSET 5;

Aggregate Functions

Aggregate functions perform calculations on sets of rows.

-- COUNT: Number of rows
SELECT COUNT(*) AS TotalBooks FROM Books;
SELECT COUNT(DISTINCT PublisherID) AS NumberOfPublishers FROM Books;

-- SUM: Total
SELECT SUM(TotalAmount) AS TotalRevenue FROM Orders;

-- AVG: Average
SELECT AVG(Price) AS AveragePrice FROM Books;

-- MIN and MAX
SELECT
    MIN(Price) AS CheapestBook,
    MAX(Price) AS MostExpensiveBook
FROM Books;

-- Multiple aggregates
SELECT
    COUNT(*) AS TotalBooks,
    AVG(Price) AS AvgPrice,
    MIN(Price) AS MinPrice,
    MAX(Price) AS MaxPrice,
    SUM(Pages) AS TotalPages
FROM Books;

GROUP BY and HAVING

GROUP BY groups rows with the same values in specified columns.

-- Books per publisher
SELECT
    PublisherID,
    COUNT(*) AS BookCount
FROM Books
GROUP BY PublisherID;

-- Average price by publication decade
SELECT
    FLOOR(PublicationYear / 10) * 10 AS Decade,
    AVG(Price) AS AvgPrice,
    COUNT(*) AS BookCount
FROM Books
GROUP BY FLOOR(PublicationYear / 10) * 10
ORDER BY Decade;

-- Orders per customer
SELECT
    CustomerID,
    COUNT(*) AS OrderCount,
    SUM(TotalAmount) AS TotalSpent,
    AVG(TotalAmount) AS AvgOrderValue
FROM Orders
GROUP BY CustomerID
ORDER BY TotalSpent DESC;

-- HAVING: Filter groups (like WHERE but for groups)
SELECT
    PublisherID,
    COUNT(*) AS BookCount,
    AVG(Price) AS AvgPrice
FROM Books
GROUP BY PublisherID
HAVING COUNT(*) > 1;

-- Combine WHERE and HAVING
-- WHERE filters rows before grouping
-- HAVING filters groups after aggregation
SELECT
    PublisherID,
    AVG(Price) AS AvgPrice
FROM Books
WHERE PublicationYear > 1900  -- Filter books first
GROUP BY PublisherID
HAVING AVG(Price) > 15        -- Then filter groups
ORDER BY AvgPrice DESC;

JOIN Operations

JOINs combine rows from multiple tables based on related columns.

INNER JOIN

Returns only rows with matches in both tables.

-- Books with publisher information
SELECT
    b.Title,
    b.Price,
    p.PublisherName,
    p.City
FROM Books b
INNER JOIN Publishers p ON b.PublisherID = p.PublisherID;

-- Authors and their books
SELECT
    a.FirstName,
    a.LastName,
    b.Title,
    b.PublicationYear
FROM Authors a
INNER JOIN Book_Authors ba ON a.AuthorID = ba.AuthorID
INNER JOIN Books b ON ba.BookID = b.BookID
ORDER BY a.LastName, b.PublicationYear;

-- Multiple joins: Customers, orders, and books
SELECT
    c.FirstName,
    c.LastName,
    o.OrderDate,
    b.Title,
    oi.Quantity,
    oi.Price
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
INNER JOIN Order_Items oi ON o.OrderID = oi.OrderID
INNER JOIN Books b ON oi.BookID = b.BookID
ORDER BY o.OrderDate;

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table, with matching rows from the right table (or NULL if no match).

-- All books, with reviews if they exist
SELECT
    b.Title,
    r.Rating,
    r.ReviewText
FROM Books b
LEFT JOIN Reviews r ON b.BookID = r.BookID;

-- Count reviews per book (including books with no reviews)
SELECT
    b.Title,
    COUNT(r.ReviewID) AS ReviewCount,
    AVG(r.Rating) AS AvgRating
FROM Books b
LEFT JOIN Reviews r ON b.BookID = r.BookID
GROUP BY b.BookID, b.Title
ORDER BY ReviewCount DESC;

-- Find books with no reviews
SELECT
    b.Title,
    b.Price
FROM Books b
LEFT JOIN Reviews r ON b.BookID = r.BookID
WHERE r.ReviewID IS NULL;

RIGHT JOIN (RIGHT OUTER JOIN)

Returns all rows from the right table, with matching rows from the left table.

-- All customers with their order count (including customers with no orders)
SELECT
    c.FirstName,
    c.LastName,
    COUNT(o.OrderID) AS OrderCount
FROM Orders o
RIGHT JOIN Customers c ON o.CustomerID = c.CustomerID
GROUP BY c.CustomerID, c.FirstName, c.LastName;

FULL OUTER JOIN

Returns all rows from both tables, with NULLs where there's no match.

-- All books and all reviews (including unmatched)
SELECT
    b.Title,
    r.Rating,
    r.ReviewText
FROM Books b
FULL OUTER JOIN Reviews r ON b.BookID = r.BookID;

-- Note: MySQL doesn't support FULL OUTER JOIN directly
-- Workaround using UNION:
SELECT b.Title, r.Rating, r.ReviewText
FROM Books b LEFT JOIN Reviews r ON b.BookID = r.BookID
UNION
SELECT b.Title, r.Rating, r.ReviewText
FROM Books b RIGHT JOIN Reviews r ON b.BookID = r.BookID;

CROSS JOIN

Returns the Cartesian product of two tables (all possible combinations).

-- All combinations of authors and publishers
SELECT
    a.LastName AS Author,
    p.PublisherName AS Publisher
FROM Authors a
CROSS JOIN Publishers p;

-- Practical use: Generate all possible combinations for analysis
SELECT
    c.Country AS CustomerCountry,
    p.Country AS PublisherCountry,
    COUNT(*) AS Combinations
FROM Customers c
CROSS JOIN Publishers p
GROUP BY c.Country, p.Country;

SELF JOIN

Join a table to itself.

-- Find authors from the same country
SELECT
    a1.FirstName || ' ' || a1.LastName AS Author1,
    a2.FirstName || ' ' || a2.LastName AS Author2,
    a1.Country
FROM Authors a1
INNER JOIN Authors a2
    ON a1.Country = a2.Country
    AND a1.AuthorID < a2.AuthorID  -- Avoid duplicates and self-matches
ORDER BY a1.Country;

Subqueries

A subquery is a query nested within another query.

Subqueries in WHERE Clause

-- Books more expensive than average
SELECT Title, Price
FROM Books
WHERE Price > (SELECT AVG(Price) FROM Books);

-- Books from publishers in the USA
SELECT Title, Price
FROM Books
WHERE PublisherID IN (
    SELECT PublisherID
    FROM Publishers
    WHERE Country = 'USA'
);

-- Customers who have placed orders
SELECT FirstName, LastName
FROM Customers
WHERE CustomerID IN (
    SELECT DISTINCT CustomerID FROM Orders
);

-- Books with above-average ratings
SELECT b.Title, AVG(r.Rating) AS AvgRating
FROM Books b
JOIN Reviews r ON b.BookID = r.BookID
GROUP BY b.BookID, b.Title
HAVING AVG(r.Rating) > (
    SELECT AVG(Rating) FROM Reviews
);

Subqueries in FROM Clause (Derived Tables)

-- Average price by publisher (using derived table)
SELECT
    p.PublisherName,
    BookStats.AvgPrice,
    BookStats.BookCount
FROM Publishers p
JOIN (
    SELECT
        PublisherID,
        AVG(Price) AS AvgPrice,
        COUNT(*) AS BookCount
    FROM Books
    GROUP BY PublisherID
) AS BookStats ON p.PublisherID = BookStats.PublisherID;

-- Top 3 customers by spending
SELECT CustomerName, TotalSpent
FROM (
    SELECT
        c.FirstName || ' ' || c.LastName AS CustomerName,
        SUM(o.TotalAmount) AS TotalSpent
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
    GROUP BY c.CustomerID, c.FirstName, c.LastName
    ORDER BY TotalSpent DESC
    LIMIT 3
) AS TopCustomers;

Correlated Subqueries

A correlated subquery references columns from the outer query.

-- Books more expensive than the average for their publisher
SELECT
    b.Title,
    b.Price,
    b.PublisherID
FROM Books b
WHERE b.Price > (
    SELECT AVG(Price)
    FROM Books b2
    WHERE b2.PublisherID = b.PublisherID
);

-- Customers with above-average order values
SELECT
    c.FirstName,
    c.LastName,
    (SELECT AVG(TotalAmount)
     FROM Orders o
     WHERE o.CustomerID = c.CustomerID) AS AvgOrderValue
FROM Customers c
WHERE (
    SELECT AVG(TotalAmount)
    FROM Orders o
    WHERE o.CustomerID = c.CustomerID
) > (
    SELECT AVG(TotalAmount) FROM Orders
);

-- EXISTS: Check if subquery returns any rows
SELECT Title, Price
FROM Books b
WHERE EXISTS (
    SELECT 1
    FROM Reviews r
    WHERE r.BookID = b.BookID AND r.Rating = 5
);

-- NOT EXISTS: Find books with no reviews
SELECT Title, Price
FROM Books b
WHERE NOT EXISTS (
    SELECT 1
    FROM Reviews r
    WHERE r.BookID = b.BookID
);

Set Operations

Combine results from multiple queries.

UNION

Combines results and removes duplicates.

-- All countries where we have authors or publishers
SELECT Country FROM Authors
UNION
SELECT Country FROM Publishers;

-- UNION ALL: Keep duplicates
SELECT Country FROM Authors
UNION ALL
SELECT Country FROM Publishers;

INTERSECT

Returns rows that appear in both queries.

-- Countries with both authors and publishers
SELECT Country FROM Authors
INTERSECT
SELECT Country FROM Publishers;

-- MySQL alternative (no native INTERSECT):
SELECT DISTINCT a.Country
FROM Authors a
INNER JOIN Publishers p ON a.Country = p.Country;

EXCEPT (or MINUS)

Returns rows from the first query that aren't in the second.

-- Countries with authors but no publishers
SELECT Country FROM Authors
EXCEPT
SELECT Country FROM Publishers;

-- MySQL alternative (no native EXCEPT):
SELECT DISTINCT Country
FROM Authors
WHERE Country NOT IN (SELECT Country FROM Publishers);

NULL Handling

NULL represents missing or unknown data.

-- Find authors without birth year
SELECT FirstName, LastName
FROM Authors
WHERE BirthYear IS NULL;

-- NOT NULL
SELECT FirstName, LastName
FROM Authors
WHERE BirthYear IS NOT NULL;

-- COALESCE: Return first non-NULL value
SELECT
    Title,
    COALESCE(Price, 0) AS Price  -- If Price is NULL, return 0
FROM Books;

-- NULL in comparisons
-- NULL = NULL is not TRUE (it's NULL)
-- Always use IS NULL or IS NOT NULL

-- NULLIF: Return NULL if two values are equal
SELECT
    Title,
    NULLIF(Pages, 0) AS Pages  -- Return NULL if Pages is 0
FROM Books;

Advanced Query Examples

Window Functions (Analytics)

Window functions perform calculations across rows related to the current row.

-- Rank books by price
SELECT
    Title,
    Price,
    RANK() OVER (ORDER BY Price DESC) AS PriceRank,
    DENSE_RANK() OVER (ORDER BY Price DESC) AS DensePriceRank
FROM Books;

-- Running total of order amounts
SELECT
    OrderID,
    OrderDate,
    TotalAmount,
    SUM(TotalAmount) OVER (ORDER BY OrderDate) AS RunningTotal
FROM Orders;

-- Average price per publisher with comparison
SELECT
    b.Title,
    b.Price,
    p.PublisherName,
    AVG(Price) OVER (PARTITION BY b.PublisherID) AS AvgPriceForPublisher,
    b.Price - AVG(Price) OVER (PARTITION BY b.PublisherID) AS DifferenceFromAvg
FROM Books b
JOIN Publishers p ON b.PublisherID = p.PublisherID;

Common Table Expressions (CTEs)

CTEs make complex queries more readable.

-- Calculate order statistics
WITH OrderStats AS (
    SELECT
        CustomerID,
        COUNT(*) AS OrderCount,
        SUM(TotalAmount) AS TotalSpent,
        AVG(TotalAmount) AS AvgOrderValue
    FROM Orders
    GROUP BY CustomerID
)
SELECT
    c.FirstName,
    c.LastName,
    os.OrderCount,
    os.TotalSpent,
    os.AvgOrderValue
FROM Customers c
JOIN OrderStats os ON c.CustomerID = os.CustomerID
WHERE os.TotalSpent > 30
ORDER BY os.TotalSpent DESC;

-- Multiple CTEs
WITH
    BookSales AS (
        SELECT
            oi.BookID,
            SUM(oi.Quantity) AS TotalSold,
            SUM(oi.Quantity * oi.Price) AS Revenue
        FROM Order_Items oi
        GROUP BY oi.BookID
    ),
    BookRatings AS (
        SELECT
            BookID,
            AVG(Rating) AS AvgRating,
            COUNT(*) AS ReviewCount
        FROM Reviews
        GROUP BY BookID
    )
SELECT
    b.Title,
    COALESCE(bs.TotalSold, 0) AS TotalSold,
    COALESCE(bs.Revenue, 0) AS Revenue,
    COALESCE(br.AvgRating, 0) AS AvgRating,
    COALESCE(br.ReviewCount, 0) AS ReviewCount
FROM Books b
LEFT JOIN BookSales bs ON b.BookID = bs.BookID
LEFT JOIN BookRatings br ON b.BookID = br.BookID
ORDER BY Revenue DESC;

Recursive CTEs

Useful for hierarchical data.

-- Example with employee hierarchy
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    ManagerID INT,
    FOREIGN KEY (ManagerID) REFERENCES Employees(EmployeeID)
);

-- Find all subordinates of a manager
WITH RECURSIVE EmployeeHierarchy AS (
    -- Base case: Start with the CEO (no manager)
    SELECT EmployeeID, Name, ManagerID, 1 AS Level
    FROM Employees
    WHERE ManagerID IS NULL

    UNION ALL

    -- Recursive case: Find direct reports
    SELECT e.EmployeeID, e.Name, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT * FROM EmployeeHierarchy;

Query Optimization Tips

Use Indexes Wisely

-- We'll cover this in detail in the indexing module
-- For now, understand that WHERE, JOIN, and ORDER BY columns benefit from indexes
SELECT Title, Price
FROM Books
WHERE PublisherID = 3;  -- Index on PublisherID helps here

Select Only Needed Columns

-- Bad: Retrieves unnecessary data
SELECT * FROM Books;

-- Good: Only get what you need
SELECT Title, Price FROM Books;

Use EXISTS Instead of COUNT

-- Less efficient
SELECT *
FROM Customers c
WHERE (SELECT COUNT(*) FROM Orders o WHERE o.CustomerID = c.CustomerID) > 0;

-- More efficient
SELECT *
FROM Customers c
WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID);

Avoid Functions on Indexed Columns in WHERE

-- Bad: Function prevents index use
SELECT * FROM Books
WHERE YEAR(PublicationYear) = 2000;

-- Good: Rewrite without function
SELECT * FROM Books
WHERE PublicationYear BETWEEN 2000 AND 2000;

Exercises

Basic Exercises

  1. Simple SELECT: Write a query to find all books published after 1950.

  2. Pattern Matching: Find all authors whose last name starts with 'D'.

  3. Aggregate Functions: Calculate the total number of pages across all books.

  4. ORDER BY: List all books ordered by price (most expensive first), then by title alphabetically.

  5. DISTINCT: Find all unique countries where customers are located.

Intermediate Exercises

  1. JOIN: Write a query to show each book's title, author name, and publisher name.

  2. GROUP BY: Find the total revenue generated by each book.

  3. HAVING: List publishers that have published more than 2 books, showing the publisher name and book count.

  4. LEFT JOIN: Show all customers and the number of orders they've placed (include customers with zero orders).

  5. Subquery: Find all books that are more expensive than the most expensive book published by HarperCollins.

  6. CASE Statement: Categorize books as 'Budget' (< $12), 'Standard' ($12-$17), or 'Premium' (> $17).

-- Template:
SELECT
    Title,
    Price,
    CASE
        WHEN Price < 12 THEN 'Budget'
        WHEN Price <= 17 THEN 'Standard'
        ELSE 'Premium'
    END AS PriceCategory
FROM Books;

Advanced Exercises

  1. Complex JOIN: Write a query to show:

    • Customer name
    • Number of orders
    • Total amount spent
    • Average order value
    • Most recent order date

    Only include customers who have placed at least one order.

  2. Correlated Subquery: For each book, show its title, price, and how much more expensive it is than the cheapest book from the same publisher.

  3. CTE: Using a Common Table Expression, find the top 3 bestselling books and show:

    • Book title
    • Total quantity sold
    • Total revenue
    • Number of different customers who bought it
  4. Window Function: For each order, show:

    • Order ID
    • Order date
    • Total amount
    • A running total of all orders up to that date
    • The rank of this order by amount
  5. Multi-Table Analysis: Create a report showing:

    • Publisher name
    • Number of books published
    • Average book price
    • Total revenue from book sales
    • Number of 5-star reviews received

    Only include publishers with at least one book that has been reviewed.

  6. Set Operations: Find customers who have reviewed books but never ordered any books.

  7. Complex Aggregation: For each author, calculate:

    • Total number of books written
    • Average price of their books
    • Total number of reviews their books have received
    • Average rating of their books
    • Whether they've written any bestsellers (books sold more than 2 copies)

Summary

In this reading, we explored SQL in depth:

  • Basic SELECT queries with WHERE, ORDER BY, and LIMIT
  • Aggregate functions (COUNT, SUM, AVG, MIN, MAX) for data analysis
  • GROUP BY for grouping data and HAVING for filtering groups
  • JOIN operations (INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF) for combining tables
  • Subqueries in WHERE, FROM, and SELECT clauses
  • Correlated subqueries that reference the outer query
  • Set operations (UNION, INTERSECT, EXCEPT) for combining results
  • NULL handling with IS NULL, COALESCE, and NULLIF
  • Advanced features like window functions and CTEs
  • Query optimization techniques for better performance

Mastering SQL is essential for working with relational databases. Practice writing queries regularly to build intuition for when to use each technique.

Next Reading

Continue to 03-normalization.md to learn about database design principles, functional dependencies, and normal forms (1NF, 2NF, 3NF, BCNF) to create well-structured databases.