Indexing and Query Optimization

Learning Objectives

By the end of this reading, you will be able to:

  • Understand what indexes are and why they improve query performance
  • Explain how B-tree and B+ tree indexes work
  • Understand hash indexes and their use cases
  • Create and manage indexes in SQL
  • Analyze query execution plans using EXPLAIN
  • Identify queries that benefit from indexing
  • Recognize the trade-offs of indexing
  • Optimize queries for better performance
  • Understand covering indexes and composite indexes
  • Apply indexing best practices

Introduction

As databases grow, query performance becomes critical. A query that takes milliseconds on 1,000 rows might take seconds or minutes on millions of rows. Indexing is the primary technique for improving query performance.

An index is a data structure that improves the speed of data retrieval operations on a database table. Think of it like a book's index: instead of reading every page to find a topic, you check the index and jump directly to the relevant pages.

Why Indexes Matter

Without an Index: Full Table Scan

-- Table with 1 million rows
SELECT * FROM Customers WHERE Email = 'alice@email.com';

Without an index:

  • Database scans all 1,000,000 rows
  • Time complexity: O(n)
  • Slow and inefficient

With an Index: Direct Lookup

With an index on Email:

  • Database uses index to find matching rows directly
  • Time complexity: O(log n) for B-tree or O(1) for hash
  • Fast and efficient

Performance Example

-- Create test table with 1 million rows
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Email VARCHAR(100),
    Name VARCHAR(100),
    City VARCHAR(50)
);

-- Insert 1 million rows (simulated)
-- Without index on Email
SELECT * FROM Users WHERE Email = 'user123@email.com';
-- Time: ~500ms (full table scan)

-- Create index
CREATE INDEX idx_email ON Users(Email);

-- With index
SELECT * FROM Users WHERE Email = 'user123@email.com';
-- Time: ~5ms (index lookup)
-- 100x faster!

How Indexes Work: B-Tree

The most common index structure is the B-tree (Balanced Tree) or its variant B+ tree.

B-Tree Structure

                    [50]
                   /    \
              [25]        [75]
             /    \      /    \
         [10] [35] [60] [85]

Properties:

  • All leaf nodes are at the same depth (balanced)
  • Each node contains multiple keys (high branching factor)
  • Keys are sorted within nodes
  • Search time: O(log n)

B+ Tree Structure

Used by most databases (MySQL, PostgreSQL):

                    [50|75]  ← Internal nodes (keys only)
                   /    |    \
              [25|35] [60|70] [85|90]
                |       |       |
                ↓       ↓       ↓
    [10,15,20,25,30,35] → [50,55,60,65,70] → [75,80,85,90,95]
                          ← Leaf nodes (keys + data pointers) →
                             Linked list for range queries

Advantages of B+ tree:

  • All data in leaf nodes (efficient range queries)
  • Internal nodes have more keys (better fanout)
  • Leaf nodes linked (fast sequential access)

Example: Finding UserID = 65

  1. Start at root: 65 > 50 and 65 < 75 → go middle
  2. At [60|70]: 65 > 60 and 65 < 70 → go middle
  3. At leaf: scan [60,65,70] → found at position 2
  4. Follow pointer to row data

Operations: 3 (vs. 1,000,000 without index)

Hash Indexes

Hash indexes use a hash table for equality lookups.

Hash Index Structure

Hash Function: h(key) = key mod 10

Keys: [5, 15, 23, 31, 42, 55]

Hash Table:
Bucket 0:
Bucket 1: 31 → row pointer
Bucket 2: 42 → row pointer
Bucket 3: 23 → row pointer
Bucket 4:
Bucket 5: 5 → 15 → 55 → row pointers (chaining for collisions)
Bucket 6:
Bucket 7:
Bucket 8:
Bucket 9:

Hash vs. B-Tree

Hash Index:

  • Pros: O(1) equality lookups
  • Cons: No range queries, no sorting, no partial matches
-- Good for hash index
SELECT * FROM Users WHERE UserID = 12345;

-- Cannot use hash index
SELECT * FROM Users WHERE UserID > 12345;  -- Range query
SELECT * FROM Users WHERE UserID BETWEEN 100 AND 200;  -- Range
SELECT * FROM Users ORDER BY UserID;  -- Sorting

B-Tree Index:

  • Pros: Supports ranges, sorting, partial matches
  • Cons: O(log n) lookups (slower than hash for equality)
-- All these benefit from B-tree
SELECT * FROM Users WHERE UserID = 12345;  -- Equality
SELECT * FROM Users WHERE UserID > 12345;  -- Range
SELECT * FROM Users WHERE UserID BETWEEN 100 AND 200;  -- Range
SELECT * FROM Users ORDER BY UserID;  -- Sorting

Most databases default to B-tree indexes because they're more versatile.

Creating and Managing Indexes

Basic Index Creation

-- Single column index
CREATE INDEX idx_email ON Customers(Email);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_unique_email ON Customers(Email);

-- Composite (multi-column) index
CREATE INDEX idx_name_city ON Customers(LastName, FirstName, City);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_customers ON Customers(CustomerID)
WHERE Active = TRUE;

-- Expression index (PostgreSQL)
CREATE INDEX idx_lower_email ON Customers(LOWER(Email));

Viewing Indexes

-- MySQL
SHOW INDEX FROM Customers;

-- PostgreSQL
\d Customers

-- Standard SQL
SELECT * FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'Customers';

Dropping Indexes

DROP INDEX idx_email ON Customers;  -- MySQL
DROP INDEX idx_email;                -- PostgreSQL

Index Types by Database

MySQL:

-- B-Tree (default)
CREATE INDEX idx_name ON Users(Name);

-- Hash (MEMORY tables only)
CREATE INDEX idx_id USING HASH ON Users(UserID);

-- Full-text index
CREATE FULLTEXT INDEX idx_description ON Products(Description);

-- Spatial index
CREATE SPATIAL INDEX idx_location ON Stores(Location);

PostgreSQL:

-- B-Tree (default)
CREATE INDEX idx_name ON Users(Name);

-- Hash
CREATE INDEX idx_id USING HASH ON Users(UserID);

-- GiST (Generalized Search Tree)
CREATE INDEX idx_location USING GIST ON Stores(Location);

-- GIN (Generalized Inverted Index) - for arrays, JSON
CREATE INDEX idx_tags USING GIN ON Articles(Tags);

Composite Indexes

A composite index includes multiple columns.

Column Order Matters

CREATE INDEX idx_lastname_firstname ON Customers(LastName, FirstName);

This index can optimize:

-- Uses index (matches leftmost prefix)
SELECT * FROM Customers WHERE LastName = 'Smith';
SELECT * FROM Customers WHERE LastName = 'Smith' AND FirstName = 'John';

-- Uses index partially (only LastName)
SELECT * FROM Customers WHERE LastName = 'Smith' OR FirstName = 'John';

-- Cannot use index (doesn't start with LastName)
SELECT * FROM Customers WHERE FirstName = 'John';

Leftmost Prefix Rule

The index (A, B, C) can optimize queries on:

  • (A)
  • (A, B)
  • (A, B, C)

But NOT on:

  • (B)
  • (C)
  • (B, C)

Designing Composite Indexes

-- Frequently queried columns
SELECT * FROM Orders
WHERE CustomerID = 123 AND Status = 'Pending';

-- Create composite index matching the query
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

-- Order matters: most selective column first
-- If CustomerID has high cardinality, put it first
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

-- If Status has low cardinality (few values), put CustomerID first
-- This is already the case above

Example: Finding the Right Index

-- Sample queries
SELECT * FROM Orders WHERE CustomerID = 100;
SELECT * FROM Orders WHERE CustomerID = 100 AND OrderDate > '2024-01-01';
SELECT * FROM Orders WHERE CustomerID = 100 ORDER BY OrderDate DESC;

-- Optimal index supports all three queries
CREATE INDEX idx_customer_date ON Orders(CustomerID, OrderDate);

Covering Indexes

A covering index contains all columns needed for a query, avoiding table lookups.

-- Query needs CustomerID, OrderDate, TotalAmount
SELECT OrderDate, TotalAmount
FROM Orders
WHERE CustomerID = 100;

-- Index covers all columns in SELECT and WHERE
CREATE INDEX idx_covering ON Orders(CustomerID, OrderDate, TotalAmount);

-- Without covering index:
-- 1. Use index to find matching rows
-- 2. Look up table to get TotalAmount
-- Two operations

-- With covering index:
-- 1. Use index to find matching rows AND get TotalAmount
-- One operation (faster)

Include Columns (SQL Server, PostgreSQL 11+)

-- PostgreSQL INCLUDE syntax
CREATE INDEX idx_customer_covering ON Orders(CustomerID)
INCLUDE (OrderDate, TotalAmount);

-- CustomerID is in the tree structure (can be searched)
-- OrderDate and TotalAmount are only in leaf nodes (covering)

Query Execution Plans: EXPLAIN

The EXPLAIN command shows how the database executes a query.

Basic EXPLAIN

EXPLAIN SELECT * FROM Customers WHERE Email = 'alice@email.com';

-- Output (MySQL):
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | Customers | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+

-- type = ALL means full table scan (BAD)
-- rows = 1000 means scanning 1000 rows

After creating index:

CREATE INDEX idx_email ON Customers(Email);
EXPLAIN SELECT * FROM Customers WHERE Email = 'alice@email.com';

-- Output:
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table     | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+
|  1 | SIMPLE      | Customers | ref  | idx_email     | idx_email | 103     | const |    1 | NULL  |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+

-- type = ref means index lookup (GOOD)
-- rows = 1 means only 1 row examined

EXPLAIN Output Fields

type (access method):

  • system: Table has only one row
  • const: Primary key or unique index lookup
  • eq_ref: Unique index used in join
  • ref: Non-unique index lookup
  • range: Index range scan (BETWEEN, >, <)
  • index: Full index scan
  • ALL: Full table scan (worst)

Best to worst: const > eq_ref > ref > range > index > ALL

possible_keys: Indexes the optimizer considered key: Index actually used rows: Estimated rows examined (lower is better) Extra: Additional information

EXPLAIN ANALYZE (PostgreSQL)

Shows actual execution time:

EXPLAIN ANALYZE
SELECT * FROM Orders
WHERE CustomerID = 100 AND OrderDate > '2024-01-01';

-- Output includes actual runtime:
-- Planning Time: 0.123 ms
-- Execution Time: 0.456 ms

Interpreting Plans

-- Example 1: Full table scan
EXPLAIN SELECT * FROM Products WHERE Price > 50;

-- If shows type=ALL, rows=100000
-- Problem: No index on Price
-- Solution: CREATE INDEX idx_price ON Products(Price);
-- Example 2: Index not used
EXPLAIN SELECT * FROM Users WHERE YEAR(BirthDate) = 1990;

-- If shows type=ALL despite index on BirthDate
-- Problem: Function on indexed column prevents index use
-- Solution: Rewrite query
SELECT * FROM Users
WHERE BirthDate >= '1990-01-01' AND BirthDate < '1991-01-01';
-- Example 3: Wrong index used
CREATE INDEX idx_status ON Orders(Status);
CREATE INDEX idx_customer ON Orders(CustomerID);

EXPLAIN SELECT * FROM Orders
WHERE CustomerID = 100 AND Status = 'Pending';

-- If uses idx_status (less selective)
-- Problem: Optimizer chose wrong index
-- Solution: Create composite index
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

Indexing Best Practices

1. Index Columns Used in WHERE

-- Frequently queried
SELECT * FROM Products WHERE Category = 'Electronics';

-- Create index
CREATE INDEX idx_category ON Products(Category);

2. Index Foreign Keys

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Index foreign key for joins
CREATE INDEX idx_customer ON Orders(CustomerID);

-- Now this join is fast
SELECT c.Name, o.OrderDate
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID;

3. Index Columns Used in ORDER BY

-- Frequently sorted
SELECT * FROM Products ORDER BY Price DESC;

-- Index helps sorting
CREATE INDEX idx_price ON Products(Price);

4. Index Columns Used in GROUP BY

-- Aggregation query
SELECT Category, COUNT(*)
FROM Products
GROUP BY Category;

-- Index helps grouping
CREATE INDEX idx_category ON Products(Category);

5. Don't Over-Index

Every index has costs:

  • Storage: Indexes take disk space
  • Write performance: INSERT, UPDATE, DELETE are slower
  • Maintenance: Indexes need updating
-- Too many indexes (BAD)
CREATE INDEX idx_col1 ON Table(Col1);
CREATE INDEX idx_col2 ON Table(Col2);
CREATE INDEX idx_col3 ON Table(Col3);
CREATE INDEX idx_col4 ON Table(Col4);
CREATE INDEX idx_col5 ON Table(Col5);

-- Better: Identify query patterns and create targeted indexes
CREATE INDEX idx_frequent_query ON Table(Col1, Col2);  -- For common query

6. Choose Index Column Order Carefully

-- Query pattern
SELECT * FROM Orders
WHERE Status = 'Pending' AND CustomerID = 100;

-- Option 1: Low cardinality first
CREATE INDEX idx_status_customer ON Orders(Status, CustomerID);
-- Status has ~5 values, CustomerID has thousands
-- Less selective column first = larger intermediate result

-- Option 2: High cardinality first (BETTER)
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);
-- Narrows down quickly to one customer, then filters by status

7. Use Partial Indexes for Filtered Queries

-- Queries only on active customers
SELECT * FROM Customers WHERE Active = TRUE AND City = 'New York';

-- Regular index includes inactive customers (wasted space)
CREATE INDEX idx_city ON Customers(City);

-- Partial index (PostgreSQL) - smaller, faster
CREATE INDEX idx_active_city ON Customers(City) WHERE Active = TRUE;

8. Avoid Indexing Low-Cardinality Columns Alone

-- Gender has only 2-3 values (very low cardinality)
-- Index not helpful for queries like:
SELECT * FROM Users WHERE Gender = 'F';

-- This would still scan ~50% of rows
-- Full table scan might be faster than index lookup + table lookups

-- However, useful in composite indexes
CREATE INDEX idx_gender_age ON Users(Gender, Age);

Query Optimization Techniques

1. Use Indexes Effectively

-- Bad: Function prevents index use
SELECT * FROM Users WHERE UPPER(Email) = 'ALICE@EMAIL.COM';

-- Good: Store lowercase, query lowercase
SELECT * FROM Users WHERE Email = 'alice@email.com';

-- Or use expression index (PostgreSQL)
CREATE INDEX idx_lower_email ON Users(LOWER(Email));
SELECT * FROM Users WHERE LOWER(Email) = 'alice@email.com';

2. Avoid SELECT *

-- Bad: Fetches all columns (may not use covering index)
SELECT * FROM Orders WHERE CustomerID = 100;

-- Good: Select only needed columns (may use covering index)
SELECT OrderID, OrderDate, TotalAmount
FROM Orders
WHERE CustomerID = 100;

3. Use EXISTS Instead of IN for Subqueries

-- Less efficient
SELECT * FROM Customers
WHERE CustomerID IN (SELECT CustomerID FROM Orders);

-- More efficient
SELECT * FROM Customers c
WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID);

4. Use LIMIT for Large Result Sets

-- Bad: Fetches all matching rows
SELECT * FROM Products WHERE Category = 'Electronics';

-- Good: Limit result size
SELECT * FROM Products WHERE Category = 'Electronics' LIMIT 100;

5. Optimize Joins

-- Bad: Joining large tables without indexes
SELECT c.Name, o.OrderDate
FROM Customers c
JOIN Orders o ON c.Email = o.CustomerEmail;  -- No index on Email

-- Good: Index join columns
CREATE INDEX idx_customer_email ON Customers(Email);
CREATE INDEX idx_order_email ON Orders(CustomerEmail);

6. Use Appropriate Data Types

-- Bad: Using VARCHAR for numeric data
CREATE TABLE Products (
    ProductID VARCHAR(20),  -- Should be INT
    Price VARCHAR(10)       -- Should be DECIMAL
);

-- Indexes on numeric types are more efficient
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    Price DECIMAL(10,2)
);

7. Analyze Query Patterns

-- Identify slow queries
-- MySQL
SELECT query_time, sql_text
FROM mysql.slow_log
ORDER BY query_time DESC;

-- PostgreSQL
SELECT query, total_time
FROM pg_stat_statements
ORDER BY total_time DESC;

Advanced Indexing Concepts

Clustered vs. Non-Clustered Indexes

Clustered Index:

  • Determines physical order of data in table
  • One per table (usually primary key)
  • Fast for range queries
-- MySQL InnoDB: Primary key is clustered index
CREATE TABLE Users (
    UserID INT PRIMARY KEY,  -- Clustered index
    Name VARCHAR(100)
);
-- Data rows are stored in UserID order

Non-Clustered Index:

  • Separate structure from table data
  • Multiple per table
  • Contains pointers to data rows
CREATE INDEX idx_name ON Users(Name);  -- Non-clustered index
-- Index contains Name + pointer to row

Index Selectivity

Selectivity = Number of distinct values / Total number of rows

High selectivity (close to 1.0) = good for indexing Low selectivity (close to 0.0) = poor for indexing

-- UserID: 1,000,000 distinct values / 1,000,000 rows = 1.0 (excellent)
-- Gender: 3 distinct values / 1,000,000 rows = 0.000003 (poor)

-- Check selectivity
SELECT
    COUNT(DISTINCT Email) / COUNT(*) AS Selectivity
FROM Customers;

Full-Text Indexes

For text search:

-- MySQL
CREATE FULLTEXT INDEX idx_description ON Products(Description);

SELECT * FROM Products
WHERE MATCH(Description) AGAINST('laptop computer');

-- PostgreSQL
CREATE INDEX idx_description ON Products USING GIN(to_tsvector('english', Description));

SELECT * FROM Products
WHERE to_tsvector('english', Description) @@ to_tsquery('laptop & computer');

Monitoring and Maintenance

Index Usage Statistics

-- PostgreSQL: Check if indexes are being used
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- Indexes with idx_scan = 0 are never used (consider dropping)

Rebuilding Indexes

Over time, indexes can become fragmented:

-- MySQL
OPTIMIZE TABLE Customers;

-- PostgreSQL
REINDEX TABLE Customers;

-- Rebuild specific index
REINDEX INDEX idx_email;

Analyzing Tables

Update statistics for the query optimizer:

-- MySQL
ANALYZE TABLE Customers;

-- PostgreSQL
ANALYZE Customers;

-- SQL Server
UPDATE STATISTICS Customers;

Practical Example: E-commerce Query Optimization

-- Slow query (no indexes)
SELECT
    p.ProductName,
    p.Price,
    c.CategoryName,
    AVG(r.Rating) AS AvgRating
FROM Products p
JOIN Categories c ON p.CategoryID = c.CategoryID
LEFT JOIN Reviews r ON p.ProductID = r.ProductID
WHERE p.Price BETWEEN 50 AND 200
    AND c.CategoryName = 'Electronics'
GROUP BY p.ProductID, p.ProductName, p.Price, c.CategoryName
HAVING AVG(r.Rating) >= 4.0
ORDER BY p.Price DESC;

EXPLAIN ...
-- Shows: type=ALL (full table scans), rows=1000000

-- Optimization steps:

-- 1. Index foreign keys (for joins)
CREATE INDEX idx_product_category ON Products(CategoryID);
CREATE INDEX idx_review_product ON Reviews(ProductID);

-- 2. Index WHERE clause columns
CREATE INDEX idx_product_price ON Products(Price);
CREATE INDEX idx_category_name ON Categories(CategoryName);

-- 3. Consider composite index for Products
CREATE INDEX idx_category_price ON Products(CategoryID, Price);

-- 4. Add covering index for frequent queries
CREATE INDEX idx_product_covering
ON Products(CategoryID, Price, ProductID, ProductName);

-- After optimization:
EXPLAIN ...
-- Shows: type=ref/range, rows=100 (10,000x fewer rows examined)

Exercises

Basic Exercises

  1. Create Indexes: Given the table:

    CREATE TABLE Employees (
        EmployeeID INT PRIMARY KEY,
        FirstName VARCHAR(50),
        LastName VARCHAR(50),
        Department VARCHAR(50),
        Salary DECIMAL(10,2),
        HireDate DATE
    );
    

    Create appropriate indexes for these queries:

    SELECT * FROM Employees WHERE Department = 'Sales';
    SELECT * FROM Employees WHERE LastName = 'Smith';
    SELECT * FROM Employees ORDER BY HireDate DESC;
    
  2. Analyze EXPLAIN: Run EXPLAIN on:

    SELECT * FROM Products WHERE Price > 100;
    

    Before and after creating an index on Price. Compare the results.

  3. Composite Index: You have frequent queries:

    SELECT * FROM Orders WHERE CustomerID = 100 AND Status = 'Shipped';
    

    Design an appropriate index.

Intermediate Exercises

  1. Index Effectiveness: Given:

    CREATE TABLE Students (
        StudentID INT PRIMARY KEY,
        Name VARCHAR(100),
        Gender CHAR(1),
        GPA DECIMAL(3,2)
    );
    

    Which columns benefit most from indexing? Why?

  2. Covering Index: Design a covering index for:

    SELECT ProductID, ProductName, Price
    FROM Products
    WHERE CategoryID = 5;
    
  3. Query Rewrite: This query doesn't use the index on BirthDate:

    SELECT * FROM Users WHERE YEAR(BirthDate) = 1990;
    

    Rewrite it to use the index.

  4. Index Order: You have queries:

    SELECT * FROM Logs WHERE UserID = 100 AND LogDate > '2024-01-01';
    SELECT * FROM Logs WHERE UserID = 100 ORDER BY LogDate DESC LIMIT 10;
    

    Design an optimal composite index.

Advanced Exercises

  1. Optimization Challenge: Given:

    SELECT
        c.CustomerName,
        COUNT(o.OrderID) AS OrderCount,
        SUM(o.TotalAmount) AS TotalSpent
    FROM Customers c
    LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
    WHERE c.Country = 'USA'
        AND o.OrderDate >= '2024-01-01'
    GROUP BY c.CustomerID, c.CustomerName
    HAVING SUM(o.TotalAmount) > 1000
    ORDER BY TotalSpent DESC;
    

    Design all necessary indexes and explain your choices.

  2. Index Audit: You have these indexes:

    CREATE INDEX idx1 ON Orders(CustomerID);
    CREATE INDEX idx2 ON Orders(OrderDate);
    CREATE INDEX idx3 ON Orders(CustomerID, OrderDate);
    CREATE INDEX idx4 ON Orders(Status);
    CREATE INDEX idx5 ON Orders(CustomerID, Status, OrderDate);
    

    Which indexes are redundant? Explain.

  3. Real-World Scenario: Design indexes for a blog platform:

    CREATE TABLE Posts (
        PostID INT PRIMARY KEY,
        AuthorID INT,
        Title VARCHAR(200),
        Content TEXT,
        PublishDate DATETIME,
        CategoryID INT,
        ViewCount INT,
        Status VARCHAR(20)  -- 'draft', 'published', 'archived'
    );
    

    Common queries:

    • Get published posts by author
    • Get recent published posts by category
    • Get most viewed posts
    • Search posts by title

Summary

In this reading, we explored database indexing and query optimization:

  • Indexes dramatically improve query performance using B-trees and hash structures
  • B-tree indexes support equality, range queries, and sorting
  • Hash indexes provide O(1) lookups for equality but no range support
  • Composite indexes must follow the leftmost prefix rule
  • Covering indexes avoid table lookups by including all needed columns
  • EXPLAIN shows query execution plans and identifies optimization opportunities
  • Best practices: Index WHERE/JOIN/ORDER BY columns, avoid over-indexing, monitor usage
  • Trade-offs: Indexes improve reads but slow writes and use storage

Effective indexing requires understanding query patterns, analyzing execution plans, and balancing read vs. write performance.

Next Reading

Continue to 05-transactions.md to learn about transactions, ACID properties, isolation levels, concurrency control, and distributed transactions.