Indexing and Query Optimization | Computer Science Fundamentals

As databases grow, query performance becomes critical. A query that takes milliseconds on 1,000 rows might take seconds or minutes on millions of rows. Indexing is the primary technique for improving query performance.

An index is a data structure that improves the speed of data retrieval operations on a database table. Think of it like a book's index: instead of reading every page to find a topic, you check the index and jump directly to the relevant pages.

Why Indexes Matter

Without an Index: Full Table Scan

-- Table with 1 million rows
SELECT * FROM Customers WHERE Email = 'alice@email.com';

Without an index:

Database scans all 1,000,000 rows
Time complexity: O(n)
Slow and inefficient

With an Index: Direct Lookup

With an index on Email:

Database uses index to find matching rows directly
Time complexity: O(log n) for B-tree or O(1) for hash
Fast and efficient

Performance Example

-- Create test table with 1 million rows
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Email VARCHAR(100),
    Name VARCHAR(100),
    City VARCHAR(50)
);

-- Insert 1 million rows (simulated)
-- Without index on Email
SELECT * FROM Users WHERE Email = 'user123@email.com';
-- Time: ~500ms (full table scan)

-- Create index
CREATE INDEX idx_email ON Users(Email);

-- With index
SELECT * FROM Users WHERE Email = 'user123@email.com';
-- Time: ~5ms (index lookup)
-- 100x faster!

How Indexes Work: B-Tree

The most common index structure is the B-tree (Balanced Tree) or its variant B+ tree.

B-Tree Structure

                    [50]
                   /    \
              [25]        [75]
             /    \      /    \
         [10] [35] [60] [85]

Properties:

All leaf nodes are at the same depth (balanced)
Each node contains multiple keys (high branching factor)
Keys are sorted within nodes
Search time: O(log n)

B+ Tree Structure

Used by most databases (MySQL, PostgreSQL):

                    [50|75]  ← Internal nodes (keys only)
                   /    |    \
              [25|35] [60|70] [85|90]
                |       |       |
                ↓       ↓       ↓
    [10,15,20,25,30,35] → [50,55,60,65,70] → [75,80,85,90,95]
                          ← Leaf nodes (keys + data pointers) →
                             Linked list for range queries

Advantages of B+ tree:

All data in leaf nodes (efficient range queries)
Internal nodes have more keys (better fanout)
Leaf nodes linked (fast sequential access)

Example: Finding UserID = 65

Start at root: 65 > 50 and 65 < 75 → go middle
At [60|70]: 65 > 60 and 65 < 70 → go middle
At leaf: scan [60,65,70] → found at position 2
Follow pointer to row data

Operations: 3 (vs. 1,000,000 without index)

Hash Indexes

Hash indexes use a hash table for equality lookups.

Hash Index Structure

Hash Function: h(key) = key mod 10

Keys: [5, 15, 23, 31, 42, 55]

Hash Table:
Bucket 0:
Bucket 1: 31 → row pointer
Bucket 2: 42 → row pointer
Bucket 3: 23 → row pointer
Bucket 4:
Bucket 5: 5 → 15 → 55 → row pointers (chaining for collisions)
Bucket 6:
Bucket 7:
Bucket 8:
Bucket 9:

Hash vs. B-Tree

Hash Index:

Pros: O(1) equality lookups
Cons: No range queries, no sorting, no partial matches

-- Good for hash index
SELECT * FROM Users WHERE UserID = 12345;

-- Cannot use hash index
SELECT * FROM Users WHERE UserID > 12345;  -- Range query
SELECT * FROM Users WHERE UserID BETWEEN 100 AND 200;  -- Range
SELECT * FROM Users ORDER BY UserID;  -- Sorting

B-Tree Index:

Pros: Supports ranges, sorting, partial matches
Cons: O(log n) lookups (slower than hash for equality)

-- All these benefit from B-tree
SELECT * FROM Users WHERE UserID = 12345;  -- Equality
SELECT * FROM Users WHERE UserID > 12345;  -- Range
SELECT * FROM Users WHERE UserID BETWEEN 100 AND 200;  -- Range
SELECT * FROM Users ORDER BY UserID;  -- Sorting

Most databases default to B-tree indexes because they're more versatile.

Creating and Managing Indexes

Basic Index Creation

-- Single column index
CREATE INDEX idx_email ON Customers(Email);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_unique_email ON Customers(Email);

-- Composite (multi-column) index
CREATE INDEX idx_name_city ON Customers(LastName, FirstName, City);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_customers ON Customers(CustomerID)
WHERE Active = TRUE;

-- Expression index (PostgreSQL)
CREATE INDEX idx_lower_email ON Customers(LOWER(Email));

Viewing Indexes

-- MySQL
SHOW INDEX FROM Customers;

-- PostgreSQL
\d Customers

-- Standard SQL
SELECT * FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'Customers';

Dropping Indexes

DROP INDEX idx_email ON Customers;  -- MySQL
DROP INDEX idx_email;                -- PostgreSQL

Index Types by Database

MySQL:

-- B-Tree (default)
CREATE INDEX idx_name ON Users(Name);

-- Hash (MEMORY tables only)
CREATE INDEX idx_id USING HASH ON Users(UserID);

-- Full-text index
CREATE FULLTEXT INDEX idx_description ON Products(Description);

-- Spatial index
CREATE SPATIAL INDEX idx_location ON Stores(Location);

PostgreSQL:

-- B-Tree (default)
CREATE INDEX idx_name ON Users(Name);

-- Hash
CREATE INDEX idx_id USING HASH ON Users(UserID);

-- GiST (Generalized Search Tree)
CREATE INDEX idx_location USING GIST ON Stores(Location);

-- GIN (Generalized Inverted Index) - for arrays, JSON
CREATE INDEX idx_tags USING GIN ON Articles(Tags);

Composite Indexes

A composite index includes multiple columns.

Column Order Matters

CREATE INDEX idx_lastname_firstname ON Customers(LastName, FirstName);

This index can optimize:

-- Uses index (matches leftmost prefix)
SELECT * FROM Customers WHERE LastName = 'Smith';
SELECT * FROM Customers WHERE LastName = 'Smith' AND FirstName = 'John';

-- Uses index partially (only LastName)
SELECT * FROM Customers WHERE LastName = 'Smith' OR FirstName = 'John';

-- Cannot use index (doesn't start with LastName)
SELECT * FROM Customers WHERE FirstName = 'John';

Leftmost Prefix Rule

The index (A, B, C) can optimize queries on:

(A)
(A, B)
(A, B, C)

But NOT on:

(B)
(C)
(B, C)

Designing Composite Indexes

-- Frequently queried columns
SELECT * FROM Orders
WHERE CustomerID = 123 AND Status = 'Pending';

-- Create composite index matching the query
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

-- Order matters: most selective column first
-- If CustomerID has high cardinality, put it first
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

-- If Status has low cardinality (few values), put CustomerID first
-- This is already the case above

Example: Finding the Right Index

-- Sample queries
SELECT * FROM Orders WHERE CustomerID = 100;
SELECT * FROM Orders WHERE CustomerID = 100 AND OrderDate > '2024-01-01';
SELECT * FROM Orders WHERE CustomerID = 100 ORDER BY OrderDate DESC;

-- Optimal index supports all three queries
CREATE INDEX idx_customer_date ON Orders(CustomerID, OrderDate);

Covering Indexes

A covering index contains all columns needed for a query, avoiding table lookups.

-- Query needs CustomerID, OrderDate, TotalAmount
SELECT OrderDate, TotalAmount
FROM Orders
WHERE CustomerID = 100;

-- Index covers all columns in SELECT and WHERE
CREATE INDEX idx_covering ON Orders(CustomerID, OrderDate, TotalAmount);

-- Without covering index:
-- 1. Use index to find matching rows
-- 2. Look up table to get TotalAmount
-- Two operations

-- With covering index:
-- 1. Use index to find matching rows AND get TotalAmount
-- One operation (faster)

Include Columns (SQL Server, PostgreSQL 11+)

-- PostgreSQL INCLUDE syntax
CREATE INDEX idx_customer_covering ON Orders(CustomerID)
INCLUDE (OrderDate, TotalAmount);

-- CustomerID is in the tree structure (can be searched)
-- OrderDate and TotalAmount are only in leaf nodes (covering)

Query Execution Plans: EXPLAIN

The EXPLAIN command shows how the database executes a query.

Basic EXPLAIN

EXPLAIN SELECT * FROM Customers WHERE Email = 'alice@email.com';

-- Output (MySQL):
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | Customers | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+

-- type = ALL means full table scan (BAD)
-- rows = 1000 means scanning 1000 rows

After creating index:

CREATE INDEX idx_email ON Customers(Email);
EXPLAIN SELECT * FROM Customers WHERE Email = 'alice@email.com';

-- Output:
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table     | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+
|  1 | SIMPLE      | Customers | ref  | idx_email     | idx_email | 103     | const |    1 | NULL  |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-------+

-- type = ref means index lookup (GOOD)
-- rows = 1 means only 1 row examined

EXPLAIN Output Fields

type (access method):

system: Table has only one row
const: Primary key or unique index lookup
eq_ref: Unique index used in join
ref: Non-unique index lookup
range: Index range scan (BETWEEN, >, <)
index: Full index scan
ALL: Full table scan (worst)

Best to worst: const > eq_ref > ref > range > index > ALL

possible_keys: Indexes the optimizer considered key: Index actually used rows: Estimated rows examined (lower is better) Extra: Additional information

EXPLAIN ANALYZE (PostgreSQL)

Shows actual execution time:

EXPLAIN ANALYZE
SELECT * FROM Orders
WHERE CustomerID = 100 AND OrderDate > '2024-01-01';

-- Output includes actual runtime:
-- Planning Time: 0.123 ms
-- Execution Time: 0.456 ms

Interpreting Plans

-- Example 1: Full table scan
EXPLAIN SELECT * FROM Products WHERE Price > 50;

-- If shows type=ALL, rows=100000
-- Problem: No index on Price
-- Solution: CREATE INDEX idx_price ON Products(Price);

-- Example 2: Index not used
EXPLAIN SELECT * FROM Users WHERE YEAR(BirthDate) = 1990;

-- If shows type=ALL despite index on BirthDate
-- Problem: Function on indexed column prevents index use
-- Solution: Rewrite query
SELECT * FROM Users
WHERE BirthDate >= '1990-01-01' AND BirthDate < '1991-01-01';

-- Example 3: Wrong index used
CREATE INDEX idx_status ON Orders(Status);
CREATE INDEX idx_customer ON Orders(CustomerID);

EXPLAIN SELECT * FROM Orders
WHERE CustomerID = 100 AND Status = 'Pending';

-- If uses idx_status (less selective)
-- Problem: Optimizer chose wrong index
-- Solution: Create composite index
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);

Indexing Best Practices

1. Index Columns Used in WHERE

-- Frequently queried
SELECT * FROM Products WHERE Category = 'Electronics';

-- Create index
CREATE INDEX idx_category ON Products(Category);

2. Index Foreign Keys

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Index foreign key for joins
CREATE INDEX idx_customer ON Orders(CustomerID);

-- Now this join is fast
SELECT c.Name, o.OrderDate
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID;

3. Index Columns Used in ORDER BY

-- Frequently sorted
SELECT * FROM Products ORDER BY Price DESC;

-- Index helps sorting
CREATE INDEX idx_price ON Products(Price);

4. Index Columns Used in GROUP BY

-- Aggregation query
SELECT Category, COUNT(*)
FROM Products
GROUP BY Category;

-- Index helps grouping
CREATE INDEX idx_category ON Products(Category);

5. Don't Over-Index

Every index has costs:

Storage: Indexes take disk space
Write performance: INSERT, UPDATE, DELETE are slower
Maintenance: Indexes need updating

-- Too many indexes (BAD)
CREATE INDEX idx_col1 ON Table(Col1);
CREATE INDEX idx_col2 ON Table(Col2);
CREATE INDEX idx_col3 ON Table(Col3);
CREATE INDEX idx_col4 ON Table(Col4);
CREATE INDEX idx_col5 ON Table(Col5);

-- Better: Identify query patterns and create targeted indexes
CREATE INDEX idx_frequent_query ON Table(Col1, Col2);  -- For common query

6. Choose Index Column Order Carefully

-- Query pattern
SELECT * FROM Orders
WHERE Status = 'Pending' AND CustomerID = 100;

-- Option 1: Low cardinality first
CREATE INDEX idx_status_customer ON Orders(Status, CustomerID);
-- Status has ~5 values, CustomerID has thousands
-- Less selective column first = larger intermediate result

-- Option 2: High cardinality first (BETTER)
CREATE INDEX idx_customer_status ON Orders(CustomerID, Status);
-- Narrows down quickly to one customer, then filters by status

7. Use Partial Indexes for Filtered Queries

-- Queries only on active customers
SELECT * FROM Customers WHERE Active = TRUE AND City = 'New York';

-- Regular index includes inactive customers (wasted space)
CREATE INDEX idx_city ON Customers(City);

-- Partial index (PostgreSQL) - smaller, faster
CREATE INDEX idx_active_city ON Customers(City) WHERE Active = TRUE;

8. Avoid Indexing Low-Cardinality Columns Alone

-- Gender has only 2-3 values (very low cardinality)
-- Index not helpful for queries like:
SELECT * FROM Users WHERE Gender = 'F';

-- This would still scan ~50% of rows
-- Full table scan might be faster than index lookup + table lookups

-- However, useful in composite indexes
CREATE INDEX idx_gender_age ON Users(Gender, Age);

Query Optimization Techniques

1. Use Indexes Effectively

-- Bad: Function prevents index use
SELECT * FROM Users WHERE UPPER(Email) = 'ALICE@EMAIL.COM';

-- Good: Store lowercase, query lowercase
SELECT * FROM Users WHERE Email = 'alice@email.com';

-- Or use expression index (PostgreSQL)
CREATE INDEX idx_lower_email ON Users(LOWER(Email));
SELECT * FROM Users WHERE LOWER(Email) = 'alice@email.com';

2. Avoid SELECT *

-- Bad: Fetches all columns (may not use covering index)
SELECT * FROM Orders WHERE CustomerID = 100;

-- Good: Select only needed columns (may use covering index)
SELECT OrderID, OrderDate, TotalAmount
FROM Orders
WHERE CustomerID = 100;

3. Use EXISTS Instead of IN for Subqueries

-- Less efficient
SELECT * FROM Customers
WHERE CustomerID IN (SELECT CustomerID FROM Orders);

-- More efficient
SELECT * FROM Customers c
WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID);

4. Use LIMIT for Large Result Sets

-- Bad: Fetches all matching rows
SELECT * FROM Products WHERE Category = 'Electronics';

-- Good: Limit result size
SELECT * FROM Products WHERE Category = 'Electronics' LIMIT 100;

5. Optimize Joins

-- Bad: Joining large tables without indexes
SELECT c.Name, o.OrderDate
FROM Customers c
JOIN Orders o ON c.Email = o.CustomerEmail;  -- No index on Email

-- Good: Index join columns
CREATE INDEX idx_customer_email ON Customers(Email);
CREATE INDEX idx_order_email ON Orders(CustomerEmail);

6. Use Appropriate Data Types

-- Bad: Using VARCHAR for numeric data
CREATE TABLE Products (
    ProductID VARCHAR(20),  -- Should be INT
    Price VARCHAR(10)       -- Should be DECIMAL
);

-- Indexes on numeric types are more efficient
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    Price DECIMAL(10,2)
);

7. Analyze Query Patterns

-- Identify slow queries
-- MySQL
SELECT query_time, sql_text
FROM mysql.slow_log
ORDER BY query_time DESC;

-- PostgreSQL
SELECT query, total_time
FROM pg_stat_statements
ORDER BY total_time DESC;

Advanced Indexing Concepts

Clustered vs. Non-Clustered Indexes

Clustered Index:

Determines physical order of data in table
One per table (usually primary key)
Fast for range queries

-- MySQL InnoDB: Primary key is clustered index
CREATE TABLE Users (
    UserID INT PRIMARY KEY,  -- Clustered index
    Name VARCHAR(100)
);
-- Data rows are stored in UserID order

Non-Clustered Index:

Separate structure from table data
Multiple per table
Contains pointers to data rows

CREATE INDEX idx_name ON Users(Name);  -- Non-clustered index
-- Index contains Name + pointer to row

Index Selectivity

Selectivity = Number of distinct values / Total number of rows

High selectivity (close to 1.0) = good for indexing Low selectivity (close to 0.0) = poor for indexing

-- UserID: 1,000,000 distinct values / 1,000,000 rows = 1.0 (excellent)
-- Gender: 3 distinct values / 1,000,000 rows = 0.000003 (poor)

-- Check selectivity
SELECT
    COUNT(DISTINCT Email) / COUNT(*) AS Selectivity
FROM Customers;

Full-Text Indexes

For text search:

-- MySQL
CREATE FULLTEXT INDEX idx_description ON Products(Description);

SELECT * FROM Products
WHERE MATCH(Description) AGAINST('laptop computer');

-- PostgreSQL
CREATE INDEX idx_description ON Products USING GIN(to_tsvector('english', Description));

SELECT * FROM Products
WHERE to_tsvector('english', Description) @@ to_tsquery('laptop & computer');

Monitoring and Maintenance

Index Usage Statistics

-- PostgreSQL: Check if indexes are being used
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- Indexes with idx_scan = 0 are never used (consider dropping)

Rebuilding Indexes

Over time, indexes can become fragmented:

-- MySQL
OPTIMIZE TABLE Customers;

-- PostgreSQL
REINDEX TABLE Customers;

-- Rebuild specific index
REINDEX INDEX idx_email;

Analyzing Tables

Update statistics for the query optimizer:

-- MySQL
ANALYZE TABLE Customers;

-- PostgreSQL
ANALYZE Customers;

-- SQL Server
UPDATE STATISTICS Customers;

Practical Example: E-commerce Query Optimization

-- Slow query (no indexes)
SELECT
    p.ProductName,
    p.Price,
    c.CategoryName,
    AVG(r.Rating) AS AvgRating
FROM Products p
JOIN Categories c ON p.CategoryID = c.CategoryID
LEFT JOIN Reviews r ON p.ProductID = r.ProductID
WHERE p.Price BETWEEN 50 AND 200
    AND c.CategoryName = 'Electronics'
GROUP BY p.ProductID, p.ProductName, p.Price, c.CategoryName
HAVING AVG(r.Rating) >= 4.0
ORDER BY p.Price DESC;

EXPLAIN ...
-- Shows: type=ALL (full table scans), rows=1000000

-- Optimization steps:

-- 1. Index foreign keys (for joins)
CREATE INDEX idx_product_category ON Products(CategoryID);
CREATE INDEX idx_review_product ON Reviews(ProductID);

-- 2. Index WHERE clause columns
CREATE INDEX idx_product_price ON Products(Price);
CREATE INDEX idx_category_name ON Categories(CategoryName);

-- 3. Consider composite index for Products
CREATE INDEX idx_category_price ON Products(CategoryID, Price);

-- 4. Add covering index for frequent queries
CREATE INDEX idx_product_covering
ON Products(CategoryID, Price, ProductID, ProductName);

-- After optimization:
EXPLAIN ...
-- Shows: type=ref/range, rows=100 (10,000x fewer rows examined)

Exercises

Basic Exercises

Create Indexes: Given the table:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Department VARCHAR(50),
    Salary DECIMAL(10,2),
    HireDate DATE
);

Create appropriate indexes for these queries:

SELECT * FROM Employees WHERE Department = 'Sales';
SELECT * FROM Employees WHERE LastName = 'Smith';
SELECT * FROM Employees ORDER BY HireDate DESC;

Analyze EXPLAIN: Run EXPLAIN on:
```
SELECT * FROM Products WHERE Price > 100;
```
Before and after creating an index on Price. Compare the results.
Composite Index: You have frequent queries:
```
SELECT * FROM Orders WHERE CustomerID = 100 AND Status = 'Shipped';
```
Design an appropriate index.

Intermediate Exercises

Index Effectiveness: Given:

CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    Name VARCHAR(100),
    Gender CHAR(1),
    GPA DECIMAL(3,2)
);

Which columns benefit most from indexing? Why?

Covering Index: Design a covering index for:

SELECT ProductID, ProductName, Price
FROM Products
WHERE CategoryID = 5;

Query Rewrite: This query doesn't use the index on BirthDate:
```
SELECT * FROM Users WHERE YEAR(BirthDate) = 1990;
```
Rewrite it to use the index.

Index Order: You have queries:

SELECT * FROM Logs WHERE UserID = 100 AND LogDate > '2024-01-01';
SELECT * FROM Logs WHERE UserID = 100 ORDER BY LogDate DESC LIMIT 10;

Design an optimal composite index.

Advanced Exercises

Optimization Challenge: Given:

SELECT
    c.CustomerName,
    COUNT(o.OrderID) AS OrderCount,
    SUM(o.TotalAmount) AS TotalSpent
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.Country = 'USA'
    AND o.OrderDate >= '2024-01-01'
GROUP BY c.CustomerID, c.CustomerName
HAVING SUM(o.TotalAmount) > 1000
ORDER BY TotalSpent DESC;

Design all necessary indexes and explain your choices.

Index Audit: You have these indexes:

CREATE INDEX idx1 ON Orders(CustomerID);
CREATE INDEX idx2 ON Orders(OrderDate);
CREATE INDEX idx3 ON Orders(CustomerID, OrderDate);
CREATE INDEX idx4 ON Orders(Status);
CREATE INDEX idx5 ON Orders(CustomerID, Status, OrderDate);

Which indexes are redundant? Explain.

Real-World Scenario: Design indexes for a blog platform:

CREATE TABLE Posts (
    PostID INT PRIMARY KEY,
    AuthorID INT,
    Title VARCHAR(200),
    Content TEXT,
    PublishDate DATETIME,
    CategoryID INT,
    ViewCount INT,
    Status VARCHAR(20)  -- 'draft', 'published', 'archived'
);

Common queries:

Get published posts by author
Get recent published posts by category
Get most viewed posts
Search posts by title

Summary

In this reading, we explored database indexing and query optimization:

Indexes dramatically improve query performance using B-trees and hash structures
B-tree indexes support equality, range queries, and sorting
Hash indexes provide O(1) lookups for equality but no range support
Composite indexes must follow the leftmost prefix rule
Covering indexes avoid table lookups by including all needed columns
EXPLAIN shows query execution plans and identifies optimization opportunities
Best practices: Index WHERE/JOIN/ORDER BY columns, avoid over-indexing, monitor usage
Trade-offs: Indexes improve reads but slow writes and use storage

Effective indexing requires understanding query patterns, analyzing execution plans, and balancing read vs. write performance.

Next Reading

Continue to 05-transactions.md to learn about transactions, ACID properties, isolation levels, concurrency control, and distributed transactions.