Database Design and Normalization

Learning Objectives

By the end of this reading, you will be able to:

  • Understand the goals of database design and normalization
  • Identify functional dependencies in a database schema
  • Apply the normal forms: 1NF, 2NF, 3NF, and BCNF
  • Recognize and eliminate update anomalies
  • Design normalized database schemas from scratch
  • Understand when denormalization is appropriate
  • Use entity-relationship diagrams to visualize database design
  • Apply normalization principles to real-world scenarios

Introduction

Database design is one of the most critical aspects of building reliable applications. A well-designed database ensures data integrity, eliminates redundancy, and provides a solid foundation for efficient queries. Normalization is the systematic process of organizing data to minimize redundancy and dependency.

Poor database design leads to:

  • Update anomalies: Inconsistent data after updates
  • Insertion anomalies: Inability to insert certain data
  • Deletion anomalies: Unintended data loss
  • Data redundancy: Wasted storage and inconsistency risks

Update Anomalies: The Problem

Consider this poorly designed table for a university:

StudentCourses Table (Unnormalized)
+------------+----------+----------+---------+---------------+--------------+
| StudentID  | StuName  | CourseID | CoName  | InstructorID  | InstrName    |
+------------+----------+----------+---------+---------------+--------------+
| S001       | Alice    | CS101    | Intro   | I001          | Dr. Smith    |
| S001       | Alice    | CS201    | Data S  | I002          | Dr. Jones    |
| S002       | Bob      | CS101    | Intro   | I001          | Dr. Smith    |
| S003       | Carol    | CS201    | Data S  | I002          | Dr. Jones    |
| S003       | Carol    | MATH101  | Calc    | I003          | Dr. Wilson   |
+------------+----------+----------+---------+---------------+--------------+

This design has serious problems:

Update Anomaly

If Dr. Smith changes their name to Dr. Smith-Johnson, we must update multiple rows. Missing one row creates inconsistency.

-- Must update multiple rows - error-prone
UPDATE StudentCourses
SET InstrName = 'Dr. Smith-Johnson'
WHERE InstructorID = 'I001';

Insertion Anomaly

We cannot add a new instructor who isn't teaching any courses yet, because we'd have NULL values for student and course information.

Deletion Anomaly

If Carol drops MATH101 and it's the only course taught by Dr. Wilson, we lose information about Dr. Wilson entirely.

-- Deleting the enrollment loses instructor information
DELETE FROM StudentCourses
WHERE StudentID = 'S003' AND CourseID = 'MATH101';
-- Dr. Wilson information is now gone!

Normalization solves these problems by decomposing tables properly.

Functional Dependencies

A functional dependency X → Y means that the value of X uniquely determines the value of Y.

Examples

In our student database:

  • StudentID → StudentName (Student ID determines name)
  • CourseID → CourseName (Course ID determines course name)
  • InstructorID → InstructorName (Instructor ID determines name)
  • (StudentID, CourseID) → Grade (Student + Course determines grade)

Types of Functional Dependencies

Trivial Dependency

X → Y where Y ⊆ X

(StudentID, Name) → StudentID  -- Trivial
(StudentID, Name) → Name        -- Trivial

Non-Trivial Dependency

X → Y where Y ⊄ X

StudentID → Name               -- Non-trivial
StudentID → (Name, Major)      -- Non-trivial

Partial Dependency

When a non-prime attribute depends on part of a candidate key.

Given candidate key (StudentID, CourseID):
(StudentID, CourseID) → Grade       -- Full dependency (good)
StudentID → StudentName             -- Partial dependency (bad in 2NF)
CourseID → CourseName               -- Partial dependency (bad in 2NF)

Transitive Dependency

X → Y and Y → Z, therefore X → Z (but Y is not a candidate key)

StudentID → DepartmentID → DepartmentName
-- StudentID transitively determines DepartmentName

Determining Functional Dependencies

Given a table, identify functional dependencies by asking:

  1. What uniquely identifies each row? (candidate keys)
  2. What determines what? (dependencies)
  3. Are there partial or transitive dependencies?

Normal Forms

Normal forms are progressive rules that ensure better database design. Each normal form builds on the previous one.

First Normal Form (1NF)

Rule: Each cell contains only atomic (indivisible) values, and there are no repeating groups.

Violation Example

Students Table (NOT in 1NF)
+------------+----------+------------------+
| StudentID  | Name     | Courses          |
+------------+----------+------------------+
| S001       | Alice    | CS101, CS201     |  -- Multiple values in one cell
| S002       | Bob      | CS101            |
| S003       | Carol    | CS201, MATH101   |  -- Multiple values in one cell
+------------+----------+------------------+

Another violation:

Students Table (NOT in 1NF)
+------------+----------+----------+----------+
| StudentID  | Name     | Course1  | Course2  |  -- Repeating groups
+------------+----------+----------+----------+
| S001       | Alice    | CS101    | CS201    |
| S002       | Bob      | CS101    | NULL     |
| S003       | Carol    | CS201    | MATH101  |
+------------+----------+----------+----------+

Converting to 1NF

-- Proper 1NF design
CREATE TABLE Students (
    StudentID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL
);

CREATE TABLE Courses (
    CourseID VARCHAR(10) PRIMARY KEY,
    CourseName VARCHAR(100) NOT NULL
);

CREATE TABLE Enrollments (
    StudentID VARCHAR(10),
    CourseID VARCHAR(10),
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

-- Sample data
INSERT INTO Students VALUES
    ('S001', 'Alice'),
    ('S002', 'Bob'),
    ('S003', 'Carol');

INSERT INTO Courses VALUES
    ('CS101', 'Introduction to CS'),
    ('CS201', 'Data Structures'),
    ('MATH101', 'Calculus I');

INSERT INTO Enrollments VALUES
    ('S001', 'CS101', 'A'),
    ('S001', 'CS201', 'B+'),
    ('S002', 'CS101', 'A-'),
    ('S003', 'CS201', 'A'),
    ('S003', 'MATH101', 'B');

Second Normal Form (2NF)

Rule: Must be in 1NF, and all non-prime attributes must be fully dependent on the entire candidate key (no partial dependencies).

This only applies to tables with composite keys.

Violation Example

Enrollments Table (1NF but NOT 2NF)
+------------+----------+----------+-------------+-----------+---------------+
| StudentID  | CourseID | Grade    | StudentName | CoName    | InstructorID  |
+------------+----------+----------+-------------+-----------+---------------+
| S001       | CS101    | A        | Alice       | Intro CS  | I001          |
| S001       | CS201    | B+       | Alice       | Data Str  | I002          |
| S002       | CS101    | A-       | Bob         | Intro CS  | I001          |
+------------+----------+----------+-------------+-----------+---------------+

Functional dependencies:
(StudentID, CourseID) → Grade          -- Full dependency ✓
StudentID → StudentName                -- Partial dependency ✗
CourseID → CourseName                  -- Partial dependency ✗
CourseID → InstructorID                -- Partial dependency ✗

StudentName depends only on StudentID (part of the key), not on the whole key.

Converting to 2NF

Decompose the table to eliminate partial dependencies:

-- Students table (student info depends only on StudentID)
CREATE TABLE Students (
    StudentID VARCHAR(10) PRIMARY KEY,
    StudentName VARCHAR(100) NOT NULL
);

-- Courses table (course info depends only on CourseID)
CREATE TABLE Courses (
    CourseID VARCHAR(10) PRIMARY KEY,
    CourseName VARCHAR(100) NOT NULL,
    InstructorID VARCHAR(10)
);

-- Enrollments table (only the grade depends on both)
CREATE TABLE Enrollments (
    StudentID VARCHAR(10),
    CourseID VARCHAR(10),
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

INSERT INTO Students VALUES
    ('S001', 'Alice'),
    ('S002', 'Bob'),
    ('S003', 'Carol');

INSERT INTO Courses VALUES
    ('CS101', 'Introduction to CS', 'I001'),
    ('CS201', 'Data Structures', 'I002'),
    ('MATH101', 'Calculus I', 'I003');

INSERT INTO Enrollments VALUES
    ('S001', 'CS101', 'A'),
    ('S001', 'CS201', 'B+'),
    ('S002', 'CS101', 'A-');

Third Normal Form (3NF)

Rule: Must be in 2NF, and no transitive dependencies (non-prime attributes must depend only on candidate keys, not on other non-prime attributes).

Violation Example

Courses Table (2NF but NOT 3NF)
+----------+-------------+--------------+--------------+
| CourseID | CourseName  | InstructorID | InstrName    |
+----------+-------------+--------------+--------------+
| CS101    | Intro CS    | I001         | Dr. Smith    |
| CS201    | Data Str    | I002         | Dr. Jones    |
| MATH101  | Calculus I  | I003         | Dr. Wilson   |
+----------+-------------+--------------+--------------+

Functional dependencies:
CourseID → InstructorID → InstructorName
-- Transitive dependency: CourseID → InstructorName through InstructorID

Converting to 3NF

-- Courses table (remove InstructorName)
CREATE TABLE Courses (
    CourseID VARCHAR(10) PRIMARY KEY,
    CourseName VARCHAR(100) NOT NULL,
    InstructorID VARCHAR(10),
    FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);

-- Instructors table (separate table for instructor info)
CREATE TABLE Instructors (
    InstructorID VARCHAR(10) PRIMARY KEY,
    InstructorName VARCHAR(100) NOT NULL,
    Department VARCHAR(50)
);

INSERT INTO Instructors VALUES
    ('I001', 'Dr. Smith', 'Computer Science'),
    ('I002', 'Dr. Jones', 'Computer Science'),
    ('I003', 'Dr. Wilson', 'Mathematics');

INSERT INTO Courses VALUES
    ('CS101', 'Introduction to CS', 'I001'),
    ('CS201', 'Data Structures', 'I002'),
    ('MATH101', 'Calculus I', 'I003');

Boyce-Codd Normal Form (BCNF)

Rule: Must be in 3NF, and for every functional dependency X → Y, X must be a superkey.

BCNF is a stricter version of 3NF. A relation is in BCNF if there are no anomalies related to functional dependencies.

3NF vs BCNF

A table can be in 3NF but not BCNF when:

  • There are overlapping candidate keys
  • A prime attribute depends on a non-prime attribute

Violation Example

CourseInstructor Table (3NF but NOT BCNF)
+----------+--------------+-------------+
| CourseID | InstructorID | TextbookID  |
+----------+--------------+-------------+
| CS101    | I001         | T001        |
| CS101    | I002         | T001        |
| CS201    | I002         | T002        |
| CS201    | I003         | T002        |
+----------+--------------+-------------+

Candidate keys: (CourseID, InstructorID)
Functional dependencies:
(CourseID, InstructorID) → TextbookID  -- ✓ Superkey
CourseID → TextbookID                  -- ✗ Not a superkey (violates BCNF)

The dependency CourseID → TextbookID violates BCNF because CourseID alone is not a superkey.

Converting to BCNF

-- CourseTextbook table
CREATE TABLE CourseTextbook (
    CourseID VARCHAR(10) PRIMARY KEY,
    TextbookID VARCHAR(10),
    FOREIGN KEY (TextbookID) REFERENCES Textbooks(TextbookID)
);

-- CourseInstructor table
CREATE TABLE CourseInstructor (
    CourseID VARCHAR(10),
    InstructorID VARCHAR(10),
    Semester VARCHAR(20),
    PRIMARY KEY (CourseID, InstructorID, Semester),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
    FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);

INSERT INTO CourseTextbook VALUES
    ('CS101', 'T001'),
    ('CS201', 'T002');

INSERT INTO CourseInstructor VALUES
    ('CS101', 'I001', 'Fall 2023'),
    ('CS101', 'I002', 'Spring 2024'),
    ('CS201', 'I002', 'Fall 2023'),
    ('CS201', 'I003', 'Spring 2024');

Higher Normal Forms

Fourth Normal Form (4NF)

Eliminates multi-valued dependencies. Rarely needed in practice.

Fifth Normal Form (5NF)

Eliminates join dependencies. Very rarely needed.

For most applications, BCNF is sufficient.

Normalization Process: Step-by-Step Example

Let's normalize a complete example from scratch.

Initial Unnormalized Table

LibraryRecords Table (Unnormalized)
+--------+----------+----------+----------------+----------+----------+--------------+
| CardNo | PName    | BookID   | Title          | Author   | BDate    | ReturnDate   |
+--------+----------+----------+----------------+----------+----------+--------------+
| C001   | Alice    | B101,B102| DB Sys, SQL    | Date,Kim | 1/1/24   | 1/15/24      |
| C002   | Bob      | B101     | DB Systems     | Date     | 1/5/24   | 1/19/24      |
+--------+----------+----------+----------------+----------+----------+--------------+

Step 1: Convert to 1NF

Eliminate repeating groups and ensure atomic values.

CREATE TABLE LibraryRecords_1NF (
    CardNo VARCHAR(10),
    PatronName VARCHAR(100),
    BookID VARCHAR(10),
    Title VARCHAR(200),
    Author VARCHAR(100),
    BorrowDate DATE,
    ReturnDate DATE,
    PRIMARY KEY (CardNo, BookID, BorrowDate)
);

INSERT INTO LibraryRecords_1NF VALUES
    ('C001', 'Alice', 'B101', 'Database Systems', 'Date', '2024-01-01', '2024-01-15'),
    ('C001', 'Alice', 'B102', 'SQL Fundamentals', 'Kim', '2024-01-01', '2024-01-15'),
    ('C002', 'Bob', 'B101', 'Database Systems', 'Date', '2024-01-05', '2024-01-19');

Step 2: Convert to 2NF

Identify functional dependencies:

  • CardNo → PatronName (partial)
  • BookID → Title, Author (partial)
  • (CardNo, BookID, BorrowDate) → ReturnDate (full)

Remove partial dependencies:

-- Patrons table
CREATE TABLE Patrons (
    CardNo VARCHAR(10) PRIMARY KEY,
    PatronName VARCHAR(100) NOT NULL
);

-- Books table
CREATE TABLE Books (
    BookID VARCHAR(10) PRIMARY KEY,
    Title VARCHAR(200) NOT NULL,
    Author VARCHAR(100)
);

-- Borrowing records table
CREATE TABLE BorrowingRecords (
    CardNo VARCHAR(10),
    BookID VARCHAR(10),
    BorrowDate DATE,
    ReturnDate DATE,
    PRIMARY KEY (CardNo, BookID, BorrowDate),
    FOREIGN KEY (CardNo) REFERENCES Patrons(CardNo),
    FOREIGN KEY (BookID) REFERENCES Books(BookID)
);

INSERT INTO Patrons VALUES
    ('C001', 'Alice'),
    ('C002', 'Bob');

INSERT INTO Books VALUES
    ('B101', 'Database Systems', 'Date'),
    ('B102', 'SQL Fundamentals', 'Kim');

INSERT INTO BorrowingRecords VALUES
    ('C001', 'B101', '2024-01-01', '2024-01-15'),
    ('C001', 'B102', '2024-01-01', '2024-01-15'),
    ('C002', 'B101', '2024-01-05', '2024-01-19');

Step 3: Convert to 3NF

Check for transitive dependencies. If we had:

  • BookID → PublisherID → PublisherName

We'd need to separate Publishers:

-- Publishers table (if needed)
CREATE TABLE Publishers (
    PublisherID VARCHAR(10) PRIMARY KEY,
    PublisherName VARCHAR(100) NOT NULL,
    City VARCHAR(50)
);

-- Update Books table
CREATE TABLE Books_3NF (
    BookID VARCHAR(10) PRIMARY KEY,
    Title VARCHAR(200) NOT NULL,
    Author VARCHAR(100),
    PublisherID VARCHAR(10),
    FOREIGN KEY (PublisherID) REFERENCES Publishers(PublisherID)
);

Step 4: Verify BCNF

Check all functional dependencies - each should have a superkey on the left side. Our design is already in BCNF.

Complete Example: E-commerce Database

Let's design a normalized e-commerce database from scratch.

Requirements Analysis

We need to track:

  • Customers (name, email, address)
  • Products (name, description, price, category)
  • Orders (customer, date, status, shipping address)
  • Order items (products in each order, quantities)

Normalized Schema

-- Customers table
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY AUTO_INCREMENT,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) UNIQUE NOT NULL,
    Phone VARCHAR(20),
    RegistrationDate DATE
);

-- Addresses table (customers can have multiple addresses)
CREATE TABLE Addresses (
    AddressID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerID INT NOT NULL,
    AddressType VARCHAR(20), -- 'Billing', 'Shipping'
    Street VARCHAR(100),
    City VARCHAR(50),
    State VARCHAR(50),
    ZipCode VARCHAR(10),
    Country VARCHAR(50),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Categories table
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY AUTO_INCREMENT,
    CategoryName VARCHAR(50) NOT NULL UNIQUE,
    Description TEXT
);

-- Products table
CREATE TABLE Products (
    ProductID INT PRIMARY KEY AUTO_INCREMENT,
    ProductName VARCHAR(100) NOT NULL,
    Description TEXT,
    Price DECIMAL(10,2) NOT NULL,
    StockQuantity INT DEFAULT 0,
    CategoryID INT,
    FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);

-- Orders table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerID INT NOT NULL,
    OrderDate DATETIME DEFAULT CURRENT_TIMESTAMP,
    Status VARCHAR(20) DEFAULT 'Pending',
    ShippingAddressID INT,
    TotalAmount DECIMAL(10,2),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID),
    FOREIGN KEY (ShippingAddressID) REFERENCES Addresses(AddressID)
);

-- OrderItems table
CREATE TABLE OrderItems (
    OrderID INT,
    ProductID INT,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL, -- Price at time of order
    Subtotal DECIMAL(10,2) GENERATED ALWAYS AS (Quantity * UnitPrice),
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

-- Sample data
INSERT INTO Customers (FirstName, LastName, Email, Phone, RegistrationDate) VALUES
    ('John', 'Doe', 'john@email.com', '555-0101', '2024-01-15'),
    ('Jane', 'Smith', 'jane@email.com', '555-0102', '2024-02-20');

INSERT INTO Addresses (CustomerID, AddressType, Street, City, State, ZipCode, Country) VALUES
    (1, 'Billing', '123 Main St', 'New York', 'NY', '10001', 'USA'),
    (1, 'Shipping', '456 Oak Ave', 'New York', 'NY', '10002', 'USA'),
    (2, 'Billing', '789 Pine Rd', 'Boston', 'MA', '02101', 'USA');

INSERT INTO Categories (CategoryName, Description) VALUES
    ('Electronics', 'Electronic devices and accessories'),
    ('Books', 'Physical and digital books'),
    ('Clothing', 'Apparel and accessories');

INSERT INTO Products (ProductName, Description, Price, StockQuantity, CategoryID) VALUES
    ('Laptop', 'High-performance laptop', 999.99, 50, 1),
    ('Mouse', 'Wireless mouse', 29.99, 200, 1),
    ('SQL Guide', 'Comprehensive SQL tutorial', 49.99, 100, 2),
    ('T-Shirt', 'Cotton t-shirt', 19.99, 500, 3);

INSERT INTO Orders (CustomerID, OrderDate, Status, ShippingAddressID, TotalAmount) VALUES
    (1, '2024-03-01', 'Delivered', 2, 1029.98),
    (2, '2024-03-15', 'Shipped', 3, 49.99);

INSERT INTO OrderItems (OrderID, ProductID, Quantity, UnitPrice) VALUES
    (1, 1, 1, 999.99),
    (1, 2, 1, 29.99),
    (2, 3, 1, 49.99);

-- Useful queries on normalized data
-- Get customer orders with details
SELECT
    c.FirstName,
    c.LastName,
    o.OrderID,
    o.OrderDate,
    o.TotalAmount,
    p.ProductName,
    oi.Quantity,
    oi.UnitPrice
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN OrderItems oi ON o.OrderID = oi.OrderID
JOIN Products p ON oi.ProductID = p.ProductID
ORDER BY o.OrderDate DESC;

-- Category sales report
SELECT
    cat.CategoryName,
    COUNT(DISTINCT o.OrderID) AS OrderCount,
    SUM(oi.Quantity) AS TotalItemsSold,
    SUM(oi.Subtotal) AS TotalRevenue
FROM Categories cat
JOIN Products p ON cat.CategoryID = p.CategoryID
JOIN OrderItems oi ON p.ProductID = oi.ProductID
JOIN Orders o ON oi.OrderID = o.OrderID
GROUP BY cat.CategoryID, cat.CategoryName
ORDER BY TotalRevenue DESC;

Denormalization: When to Break the Rules

While normalization is important, sometimes controlled denormalization improves performance.

When to Denormalize

  1. Read-heavy workloads: Reduce JOIN operations
  2. Reporting databases: Pre-aggregate data
  3. Performance critical queries: Avoid complex JOINs
  4. Caching: Store computed values

Example: Denormalization for Performance

-- Normalized design (requires JOINs)
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

CREATE TABLE OrderItems (
    OrderID INT,
    ProductID INT,
    Quantity INT,
    UnitPrice DECIMAL(10,2),
    PRIMARY KEY (OrderID, ProductID)
);

-- Query requires JOIN and aggregation
SELECT OrderID, SUM(Quantity * UnitPrice) AS Total
FROM OrderItems
GROUP BY OrderID;

-- Denormalized design (duplicates total)
CREATE TABLE Orders_Denormalized (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    TotalAmount DECIMAL(10,2),  -- Denormalized: stores computed value
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Trade-off: Faster reads, but must maintain TotalAmount on updates

Maintaining Denormalized Data

-- Trigger to maintain denormalized TotalAmount
DELIMITER //
CREATE TRIGGER update_order_total
AFTER INSERT ON OrderItems
FOR EACH ROW
BEGIN
    UPDATE Orders_Denormalized
    SET TotalAmount = (
        SELECT SUM(Quantity * UnitPrice)
        FROM OrderItems
        WHERE OrderID = NEW.OrderID
    )
    WHERE OrderID = NEW.OrderID;
END//
DELIMITER ;

Denormalization Guidelines

  • Document it: Clearly note denormalized fields
  • Maintain consistency: Use triggers or application logic
  • Measure impact: Ensure performance gains justify complexity
  • Consider alternatives: Materialized views, caching, indexing

Entity-Relationship Diagrams (ERD)

ERDs visualize database design before implementation.

ER Diagram Components

Entity: [Customer]
        +-----------+
        | Customer  |
        +-----------+
        | CustomerID|  (PK)
        | Name      |
        | Email     |
        +-----------+

Relationship: Customer --< places >-- Order
              (One customer places many orders)

Cardinality:
  1:1  One-to-one       ---|---
  1:N  One-to-many      ---|<
  N:M  Many-to-many     >--|<

Example ERD: University

[Student] ---|< enrolls >---|< [Course]
   |                            |
   |                            |
  1:1                          N:1
   |                            |
[StudentInfo]              [Department]

Translated to tables:

CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    CourseName VARCHAR(100),
    DepartmentID INT,
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

CREATE TABLE Enrollments ( -- Many-to-many junction table
    StudentID INT,
    CourseID INT,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

Practical Design Guidelines

1. Identify Entities and Attributes

List all nouns in requirements (potential entities and attributes).

2. Identify Relationships

Determine how entities relate (1:1, 1:N, N:M).

3. Apply Normal Forms

Progressively normalize to at least 3NF.

4. Consider Performance

Identify frequently-run queries and optimize schema accordingly.

5. Document Decisions

Explain any denormalization or design trade-offs.

Exercises

Basic Exercises

  1. Identify Violations: For each table below, identify which normal form is violated:

    Table A:
    +------+-------+-------------+
    | ID   | Name  | Phones      |
    +------+-------+-------------+
    | 1    | Alice | 555-1, 555-2|
    +------+-------+-------------+
    
  2. Functional Dependencies: List all functional dependencies:

    Employee(EmpID, Name, DeptID, DeptName, ManagerID, ManagerName)
    
  3. Convert to 1NF: Normalize this table to 1NF:

    Authors Table:
    +----------+----------------+
    | AuthorID | Books          |
    +----------+----------------+
    | A1       | Book1, Book2   |
    | A2       | Book3          |
    +----------+----------------+
    

Intermediate Exercises

  1. Convert to 2NF: Given:

    CREATE TABLE ProjectAssignments (
        EmployeeID INT,
        ProjectID INT,
        EmployeeName VARCHAR(100),
        ProjectName VARCHAR(100),
        HoursWorked INT,
        PRIMARY KEY (EmployeeID, ProjectID)
    );
    

    Normalize to 2NF. Show all resulting tables.

  2. Convert to 3NF: Given:

    CREATE TABLE Books (
        BookID INT PRIMARY KEY,
        Title VARCHAR(200),
        AuthorID INT,
        AuthorName VARCHAR(100),
        PublisherID INT,
        PublisherName VARCHAR(100)
    );
    

    Normalize to 3NF. Show all resulting tables.

  3. BCNF Analysis: Determine if this table is in BCNF:

    CourseSchedule(CourseID, Room, Time, Instructor)
    Candidate keys: (CourseID, Time), (Room, Time)
    FDs: CourseID → Instructor, (Room, Time) → CourseID
    

Advanced Exercises

  1. Complete Normalization: Normalize this table to BCNF:

    Hospital(PatientID, PatientName, DoctorID, DoctorName,
             DoctorSpecialty, RoomNo, RoomType, AdmitDate)
    
    FDs:
    PatientID → PatientName, RoomNo, AdmitDate
    DoctorID → DoctorName, DoctorSpecialty
    RoomNo → RoomType
    (PatientID, DoctorID) → (all attributes)
    
  2. Design from Scratch: Design a normalized database for a movie rental system:

    • Customers rent movies
    • Movies have genres, directors, actors
    • Track rental dates and return dates
    • Late fees calculated based on return date

    Show ERD and SQL CREATE statements.

  3. Denormalization Decision: You have a normalized design:

    Products(ProductID, Name, Price)
    OrderItems(OrderID, ProductID, Quantity)
    

    Orders table needs frequent queries for total amount. Should you: a) Keep normalized and use JOINs b) Add TotalAmount to Orders table (denormalized) c) Use a materialized view

    Justify your choice considering read/write patterns.

  4. Complex Schema: Design a complete database for a social media platform:

    • Users with profiles
    • Posts (text, images, videos)
    • Comments on posts
    • Likes on posts and comments
    • Friendships (symmetrical relationship)
    • Groups and memberships
    • Messages between users

    Provide:

    • ERD showing all entities and relationships
    • Complete SQL schema in at least 3NF
    • 5 sample queries demonstrating the design

Summary

In this reading, we explored database normalization:

  • Update anomalies (update, insertion, deletion) motivate normalization
  • Functional dependencies describe how attributes determine each other
  • Normal forms progressively eliminate redundancy:
    • 1NF: Atomic values, no repeating groups
    • 2NF: No partial dependencies
    • 3NF: No transitive dependencies
    • BCNF: Every determinant is a candidate key
  • Normalization process: Systematically decompose tables
  • Denormalization: Controlled redundancy for performance
  • ERD: Visual tool for database design

Good database design is foundational to building reliable, scalable applications. Practice normalizing schemas to develop intuition for spotting anomalies and dependencies.

Next Reading

Continue to 04-indexing.md to learn about indexes, B-trees, hash indexes, query optimization, and performance tuning techniques.