Database Design and Normalization
Learning Objectives
By the end of this reading, you will be able to:
- Understand the goals of database design and normalization
- Identify functional dependencies in a database schema
- Apply the normal forms: 1NF, 2NF, 3NF, and BCNF
- Recognize and eliminate update anomalies
- Design normalized database schemas from scratch
- Understand when denormalization is appropriate
- Use entity-relationship diagrams to visualize database design
- Apply normalization principles to real-world scenarios
Introduction
Database design is one of the most critical aspects of building reliable applications. A well-designed database ensures data integrity, eliminates redundancy, and provides a solid foundation for efficient queries. Normalization is the systematic process of organizing data to minimize redundancy and dependency.
Poor database design leads to:
- Update anomalies: Inconsistent data after updates
- Insertion anomalies: Inability to insert certain data
- Deletion anomalies: Unintended data loss
- Data redundancy: Wasted storage and inconsistency risks
Update Anomalies: The Problem
Consider this poorly designed table for a university:
StudentCourses Table (Unnormalized)
+------------+----------+----------+---------+---------------+--------------+
| StudentID | StuName | CourseID | CoName | InstructorID | InstrName |
+------------+----------+----------+---------+---------------+--------------+
| S001 | Alice | CS101 | Intro | I001 | Dr. Smith |
| S001 | Alice | CS201 | Data S | I002 | Dr. Jones |
| S002 | Bob | CS101 | Intro | I001 | Dr. Smith |
| S003 | Carol | CS201 | Data S | I002 | Dr. Jones |
| S003 | Carol | MATH101 | Calc | I003 | Dr. Wilson |
+------------+----------+----------+---------+---------------+--------------+
This design has serious problems:
Update Anomaly
If Dr. Smith changes their name to Dr. Smith-Johnson, we must update multiple rows. Missing one row creates inconsistency.
-- Must update multiple rows - error-prone
UPDATE StudentCourses
SET InstrName = 'Dr. Smith-Johnson'
WHERE InstructorID = 'I001';
Insertion Anomaly
We cannot add a new instructor who isn't teaching any courses yet, because we'd have NULL values for student and course information.
Deletion Anomaly
If Carol drops MATH101 and it's the only course taught by Dr. Wilson, we lose information about Dr. Wilson entirely.
-- Deleting the enrollment loses instructor information
DELETE FROM StudentCourses
WHERE StudentID = 'S003' AND CourseID = 'MATH101';
-- Dr. Wilson information is now gone!
Normalization solves these problems by decomposing tables properly.
Functional Dependencies
A functional dependency X → Y means that the value of X uniquely determines the value of Y.
Examples
In our student database:
StudentID → StudentName(Student ID determines name)CourseID → CourseName(Course ID determines course name)InstructorID → InstructorName(Instructor ID determines name)(StudentID, CourseID) → Grade(Student + Course determines grade)
Types of Functional Dependencies
Trivial Dependency
X → Y where Y ⊆ X
(StudentID, Name) → StudentID -- Trivial
(StudentID, Name) → Name -- Trivial
Non-Trivial Dependency
X → Y where Y ⊄ X
StudentID → Name -- Non-trivial
StudentID → (Name, Major) -- Non-trivial
Partial Dependency
When a non-prime attribute depends on part of a candidate key.
Given candidate key (StudentID, CourseID):
(StudentID, CourseID) → Grade -- Full dependency (good)
StudentID → StudentName -- Partial dependency (bad in 2NF)
CourseID → CourseName -- Partial dependency (bad in 2NF)
Transitive Dependency
X → Y and Y → Z, therefore X → Z (but Y is not a candidate key)
StudentID → DepartmentID → DepartmentName
-- StudentID transitively determines DepartmentName
Determining Functional Dependencies
Given a table, identify functional dependencies by asking:
- What uniquely identifies each row? (candidate keys)
- What determines what? (dependencies)
- Are there partial or transitive dependencies?
Normal Forms
Normal forms are progressive rules that ensure better database design. Each normal form builds on the previous one.
First Normal Form (1NF)
Rule: Each cell contains only atomic (indivisible) values, and there are no repeating groups.
Violation Example
Students Table (NOT in 1NF)
+------------+----------+------------------+
| StudentID | Name | Courses |
+------------+----------+------------------+
| S001 | Alice | CS101, CS201 | -- Multiple values in one cell
| S002 | Bob | CS101 |
| S003 | Carol | CS201, MATH101 | -- Multiple values in one cell
+------------+----------+------------------+
Another violation:
Students Table (NOT in 1NF)
+------------+----------+----------+----------+
| StudentID | Name | Course1 | Course2 | -- Repeating groups
+------------+----------+----------+----------+
| S001 | Alice | CS101 | CS201 |
| S002 | Bob | CS101 | NULL |
| S003 | Carol | CS201 | MATH101 |
+------------+----------+----------+----------+
Converting to 1NF
-- Proper 1NF design
CREATE TABLE Students (
StudentID VARCHAR(10) PRIMARY KEY,
Name VARCHAR(100) NOT NULL
);
CREATE TABLE Courses (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL
);
CREATE TABLE Enrollments (
StudentID VARCHAR(10),
CourseID VARCHAR(10),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
-- Sample data
INSERT INTO Students VALUES
('S001', 'Alice'),
('S002', 'Bob'),
('S003', 'Carol');
INSERT INTO Courses VALUES
('CS101', 'Introduction to CS'),
('CS201', 'Data Structures'),
('MATH101', 'Calculus I');
INSERT INTO Enrollments VALUES
('S001', 'CS101', 'A'),
('S001', 'CS201', 'B+'),
('S002', 'CS101', 'A-'),
('S003', 'CS201', 'A'),
('S003', 'MATH101', 'B');
Second Normal Form (2NF)
Rule: Must be in 1NF, and all non-prime attributes must be fully dependent on the entire candidate key (no partial dependencies).
This only applies to tables with composite keys.
Violation Example
Enrollments Table (1NF but NOT 2NF)
+------------+----------+----------+-------------+-----------+---------------+
| StudentID | CourseID | Grade | StudentName | CoName | InstructorID |
+------------+----------+----------+-------------+-----------+---------------+
| S001 | CS101 | A | Alice | Intro CS | I001 |
| S001 | CS201 | B+ | Alice | Data Str | I002 |
| S002 | CS101 | A- | Bob | Intro CS | I001 |
+------------+----------+----------+-------------+-----------+---------------+
Functional dependencies:
(StudentID, CourseID) → Grade -- Full dependency ✓
StudentID → StudentName -- Partial dependency ✗
CourseID → CourseName -- Partial dependency ✗
CourseID → InstructorID -- Partial dependency ✗
StudentName depends only on StudentID (part of the key), not on the whole key.
Converting to 2NF
Decompose the table to eliminate partial dependencies:
-- Students table (student info depends only on StudentID)
CREATE TABLE Students (
StudentID VARCHAR(10) PRIMARY KEY,
StudentName VARCHAR(100) NOT NULL
);
-- Courses table (course info depends only on CourseID)
CREATE TABLE Courses (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
InstructorID VARCHAR(10)
);
-- Enrollments table (only the grade depends on both)
CREATE TABLE Enrollments (
StudentID VARCHAR(10),
CourseID VARCHAR(10),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
INSERT INTO Students VALUES
('S001', 'Alice'),
('S002', 'Bob'),
('S003', 'Carol');
INSERT INTO Courses VALUES
('CS101', 'Introduction to CS', 'I001'),
('CS201', 'Data Structures', 'I002'),
('MATH101', 'Calculus I', 'I003');
INSERT INTO Enrollments VALUES
('S001', 'CS101', 'A'),
('S001', 'CS201', 'B+'),
('S002', 'CS101', 'A-');
Third Normal Form (3NF)
Rule: Must be in 2NF, and no transitive dependencies (non-prime attributes must depend only on candidate keys, not on other non-prime attributes).
Violation Example
Courses Table (2NF but NOT 3NF)
+----------+-------------+--------------+--------------+
| CourseID | CourseName | InstructorID | InstrName |
+----------+-------------+--------------+--------------+
| CS101 | Intro CS | I001 | Dr. Smith |
| CS201 | Data Str | I002 | Dr. Jones |
| MATH101 | Calculus I | I003 | Dr. Wilson |
+----------+-------------+--------------+--------------+
Functional dependencies:
CourseID → InstructorID → InstructorName
-- Transitive dependency: CourseID → InstructorName through InstructorID
Converting to 3NF
-- Courses table (remove InstructorName)
CREATE TABLE Courses (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
InstructorID VARCHAR(10),
FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);
-- Instructors table (separate table for instructor info)
CREATE TABLE Instructors (
InstructorID VARCHAR(10) PRIMARY KEY,
InstructorName VARCHAR(100) NOT NULL,
Department VARCHAR(50)
);
INSERT INTO Instructors VALUES
('I001', 'Dr. Smith', 'Computer Science'),
('I002', 'Dr. Jones', 'Computer Science'),
('I003', 'Dr. Wilson', 'Mathematics');
INSERT INTO Courses VALUES
('CS101', 'Introduction to CS', 'I001'),
('CS201', 'Data Structures', 'I002'),
('MATH101', 'Calculus I', 'I003');
Boyce-Codd Normal Form (BCNF)
Rule: Must be in 3NF, and for every functional dependency X → Y, X must be a superkey.
BCNF is a stricter version of 3NF. A relation is in BCNF if there are no anomalies related to functional dependencies.
3NF vs BCNF
A table can be in 3NF but not BCNF when:
- There are overlapping candidate keys
- A prime attribute depends on a non-prime attribute
Violation Example
CourseInstructor Table (3NF but NOT BCNF)
+----------+--------------+-------------+
| CourseID | InstructorID | TextbookID |
+----------+--------------+-------------+
| CS101 | I001 | T001 |
| CS101 | I002 | T001 |
| CS201 | I002 | T002 |
| CS201 | I003 | T002 |
+----------+--------------+-------------+
Candidate keys: (CourseID, InstructorID)
Functional dependencies:
(CourseID, InstructorID) → TextbookID -- ✓ Superkey
CourseID → TextbookID -- ✗ Not a superkey (violates BCNF)
The dependency CourseID → TextbookID violates BCNF because CourseID alone is not a superkey.
Converting to BCNF
-- CourseTextbook table
CREATE TABLE CourseTextbook (
CourseID VARCHAR(10) PRIMARY KEY,
TextbookID VARCHAR(10),
FOREIGN KEY (TextbookID) REFERENCES Textbooks(TextbookID)
);
-- CourseInstructor table
CREATE TABLE CourseInstructor (
CourseID VARCHAR(10),
InstructorID VARCHAR(10),
Semester VARCHAR(20),
PRIMARY KEY (CourseID, InstructorID, Semester),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);
INSERT INTO CourseTextbook VALUES
('CS101', 'T001'),
('CS201', 'T002');
INSERT INTO CourseInstructor VALUES
('CS101', 'I001', 'Fall 2023'),
('CS101', 'I002', 'Spring 2024'),
('CS201', 'I002', 'Fall 2023'),
('CS201', 'I003', 'Spring 2024');
Higher Normal Forms
Fourth Normal Form (4NF)
Eliminates multi-valued dependencies. Rarely needed in practice.
Fifth Normal Form (5NF)
Eliminates join dependencies. Very rarely needed.
For most applications, BCNF is sufficient.
Normalization Process: Step-by-Step Example
Let's normalize a complete example from scratch.
Initial Unnormalized Table
LibraryRecords Table (Unnormalized)
+--------+----------+----------+----------------+----------+----------+--------------+
| CardNo | PName | BookID | Title | Author | BDate | ReturnDate |
+--------+----------+----------+----------------+----------+----------+--------------+
| C001 | Alice | B101,B102| DB Sys, SQL | Date,Kim | 1/1/24 | 1/15/24 |
| C002 | Bob | B101 | DB Systems | Date | 1/5/24 | 1/19/24 |
+--------+----------+----------+----------------+----------+----------+--------------+
Step 1: Convert to 1NF
Eliminate repeating groups and ensure atomic values.
CREATE TABLE LibraryRecords_1NF (
CardNo VARCHAR(10),
PatronName VARCHAR(100),
BookID VARCHAR(10),
Title VARCHAR(200),
Author VARCHAR(100),
BorrowDate DATE,
ReturnDate DATE,
PRIMARY KEY (CardNo, BookID, BorrowDate)
);
INSERT INTO LibraryRecords_1NF VALUES
('C001', 'Alice', 'B101', 'Database Systems', 'Date', '2024-01-01', '2024-01-15'),
('C001', 'Alice', 'B102', 'SQL Fundamentals', 'Kim', '2024-01-01', '2024-01-15'),
('C002', 'Bob', 'B101', 'Database Systems', 'Date', '2024-01-05', '2024-01-19');
Step 2: Convert to 2NF
Identify functional dependencies:
CardNo → PatronName(partial)BookID → Title, Author(partial)(CardNo, BookID, BorrowDate) → ReturnDate(full)
Remove partial dependencies:
-- Patrons table
CREATE TABLE Patrons (
CardNo VARCHAR(10) PRIMARY KEY,
PatronName VARCHAR(100) NOT NULL
);
-- Books table
CREATE TABLE Books (
BookID VARCHAR(10) PRIMARY KEY,
Title VARCHAR(200) NOT NULL,
Author VARCHAR(100)
);
-- Borrowing records table
CREATE TABLE BorrowingRecords (
CardNo VARCHAR(10),
BookID VARCHAR(10),
BorrowDate DATE,
ReturnDate DATE,
PRIMARY KEY (CardNo, BookID, BorrowDate),
FOREIGN KEY (CardNo) REFERENCES Patrons(CardNo),
FOREIGN KEY (BookID) REFERENCES Books(BookID)
);
INSERT INTO Patrons VALUES
('C001', 'Alice'),
('C002', 'Bob');
INSERT INTO Books VALUES
('B101', 'Database Systems', 'Date'),
('B102', 'SQL Fundamentals', 'Kim');
INSERT INTO BorrowingRecords VALUES
('C001', 'B101', '2024-01-01', '2024-01-15'),
('C001', 'B102', '2024-01-01', '2024-01-15'),
('C002', 'B101', '2024-01-05', '2024-01-19');
Step 3: Convert to 3NF
Check for transitive dependencies. If we had:
BookID → PublisherID → PublisherName
We'd need to separate Publishers:
-- Publishers table (if needed)
CREATE TABLE Publishers (
PublisherID VARCHAR(10) PRIMARY KEY,
PublisherName VARCHAR(100) NOT NULL,
City VARCHAR(50)
);
-- Update Books table
CREATE TABLE Books_3NF (
BookID VARCHAR(10) PRIMARY KEY,
Title VARCHAR(200) NOT NULL,
Author VARCHAR(100),
PublisherID VARCHAR(10),
FOREIGN KEY (PublisherID) REFERENCES Publishers(PublisherID)
);
Step 4: Verify BCNF
Check all functional dependencies - each should have a superkey on the left side. Our design is already in BCNF.
Complete Example: E-commerce Database
Let's design a normalized e-commerce database from scratch.
Requirements Analysis
We need to track:
- Customers (name, email, address)
- Products (name, description, price, category)
- Orders (customer, date, status, shipping address)
- Order items (products in each order, quantities)
Normalized Schema
-- Customers table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Email VARCHAR(100) UNIQUE NOT NULL,
Phone VARCHAR(20),
RegistrationDate DATE
);
-- Addresses table (customers can have multiple addresses)
CREATE TABLE Addresses (
AddressID INT PRIMARY KEY AUTO_INCREMENT,
CustomerID INT NOT NULL,
AddressType VARCHAR(20), -- 'Billing', 'Shipping'
Street VARCHAR(100),
City VARCHAR(50),
State VARCHAR(50),
ZipCode VARCHAR(10),
Country VARCHAR(50),
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
-- Categories table
CREATE TABLE Categories (
CategoryID INT PRIMARY KEY AUTO_INCREMENT,
CategoryName VARCHAR(50) NOT NULL UNIQUE,
Description TEXT
);
-- Products table
CREATE TABLE Products (
ProductID INT PRIMARY KEY AUTO_INCREMENT,
ProductName VARCHAR(100) NOT NULL,
Description TEXT,
Price DECIMAL(10,2) NOT NULL,
StockQuantity INT DEFAULT 0,
CategoryID INT,
FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);
-- Orders table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY AUTO_INCREMENT,
CustomerID INT NOT NULL,
OrderDate DATETIME DEFAULT CURRENT_TIMESTAMP,
Status VARCHAR(20) DEFAULT 'Pending',
ShippingAddressID INT,
TotalAmount DECIMAL(10,2),
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID),
FOREIGN KEY (ShippingAddressID) REFERENCES Addresses(AddressID)
);
-- OrderItems table
CREATE TABLE OrderItems (
OrderID INT,
ProductID INT,
Quantity INT NOT NULL,
UnitPrice DECIMAL(10,2) NOT NULL, -- Price at time of order
Subtotal DECIMAL(10,2) GENERATED ALWAYS AS (Quantity * UnitPrice),
PRIMARY KEY (OrderID, ProductID),
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);
-- Sample data
INSERT INTO Customers (FirstName, LastName, Email, Phone, RegistrationDate) VALUES
('John', 'Doe', 'john@email.com', '555-0101', '2024-01-15'),
('Jane', 'Smith', 'jane@email.com', '555-0102', '2024-02-20');
INSERT INTO Addresses (CustomerID, AddressType, Street, City, State, ZipCode, Country) VALUES
(1, 'Billing', '123 Main St', 'New York', 'NY', '10001', 'USA'),
(1, 'Shipping', '456 Oak Ave', 'New York', 'NY', '10002', 'USA'),
(2, 'Billing', '789 Pine Rd', 'Boston', 'MA', '02101', 'USA');
INSERT INTO Categories (CategoryName, Description) VALUES
('Electronics', 'Electronic devices and accessories'),
('Books', 'Physical and digital books'),
('Clothing', 'Apparel and accessories');
INSERT INTO Products (ProductName, Description, Price, StockQuantity, CategoryID) VALUES
('Laptop', 'High-performance laptop', 999.99, 50, 1),
('Mouse', 'Wireless mouse', 29.99, 200, 1),
('SQL Guide', 'Comprehensive SQL tutorial', 49.99, 100, 2),
('T-Shirt', 'Cotton t-shirt', 19.99, 500, 3);
INSERT INTO Orders (CustomerID, OrderDate, Status, ShippingAddressID, TotalAmount) VALUES
(1, '2024-03-01', 'Delivered', 2, 1029.98),
(2, '2024-03-15', 'Shipped', 3, 49.99);
INSERT INTO OrderItems (OrderID, ProductID, Quantity, UnitPrice) VALUES
(1, 1, 1, 999.99),
(1, 2, 1, 29.99),
(2, 3, 1, 49.99);
-- Useful queries on normalized data
-- Get customer orders with details
SELECT
c.FirstName,
c.LastName,
o.OrderID,
o.OrderDate,
o.TotalAmount,
p.ProductName,
oi.Quantity,
oi.UnitPrice
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN OrderItems oi ON o.OrderID = oi.OrderID
JOIN Products p ON oi.ProductID = p.ProductID
ORDER BY o.OrderDate DESC;
-- Category sales report
SELECT
cat.CategoryName,
COUNT(DISTINCT o.OrderID) AS OrderCount,
SUM(oi.Quantity) AS TotalItemsSold,
SUM(oi.Subtotal) AS TotalRevenue
FROM Categories cat
JOIN Products p ON cat.CategoryID = p.CategoryID
JOIN OrderItems oi ON p.ProductID = oi.ProductID
JOIN Orders o ON oi.OrderID = o.OrderID
GROUP BY cat.CategoryID, cat.CategoryName
ORDER BY TotalRevenue DESC;
Denormalization: When to Break the Rules
While normalization is important, sometimes controlled denormalization improves performance.
When to Denormalize
- Read-heavy workloads: Reduce JOIN operations
- Reporting databases: Pre-aggregate data
- Performance critical queries: Avoid complex JOINs
- Caching: Store computed values
Example: Denormalization for Performance
-- Normalized design (requires JOINs)
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
CREATE TABLE OrderItems (
OrderID INT,
ProductID INT,
Quantity INT,
UnitPrice DECIMAL(10,2),
PRIMARY KEY (OrderID, ProductID)
);
-- Query requires JOIN and aggregation
SELECT OrderID, SUM(Quantity * UnitPrice) AS Total
FROM OrderItems
GROUP BY OrderID;
-- Denormalized design (duplicates total)
CREATE TABLE Orders_Denormalized (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10,2), -- Denormalized: stores computed value
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
-- Trade-off: Faster reads, but must maintain TotalAmount on updates
Maintaining Denormalized Data
-- Trigger to maintain denormalized TotalAmount
DELIMITER //
CREATE TRIGGER update_order_total
AFTER INSERT ON OrderItems
FOR EACH ROW
BEGIN
UPDATE Orders_Denormalized
SET TotalAmount = (
SELECT SUM(Quantity * UnitPrice)
FROM OrderItems
WHERE OrderID = NEW.OrderID
)
WHERE OrderID = NEW.OrderID;
END//
DELIMITER ;
Denormalization Guidelines
- Document it: Clearly note denormalized fields
- Maintain consistency: Use triggers or application logic
- Measure impact: Ensure performance gains justify complexity
- Consider alternatives: Materialized views, caching, indexing
Entity-Relationship Diagrams (ERD)
ERDs visualize database design before implementation.
ER Diagram Components
Entity: [Customer]
+-----------+
| Customer |
+-----------+
| CustomerID| (PK)
| Name |
| Email |
+-----------+
Relationship: Customer --< places >-- Order
(One customer places many orders)
Cardinality:
1:1 One-to-one ---|---
1:N One-to-many ---|<
N:M Many-to-many >--|<
Example ERD: University
[Student] ---|< enrolls >---|< [Course]
| |
| |
1:1 N:1
| |
[StudentInfo] [Department]
Translated to tables:
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(100)
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100),
DepartmentID INT,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
CREATE TABLE Enrollments ( -- Many-to-many junction table
StudentID INT,
CourseID INT,
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
Practical Design Guidelines
1. Identify Entities and Attributes
List all nouns in requirements (potential entities and attributes).
2. Identify Relationships
Determine how entities relate (1:1, 1:N, N:M).
3. Apply Normal Forms
Progressively normalize to at least 3NF.
4. Consider Performance
Identify frequently-run queries and optimize schema accordingly.
5. Document Decisions
Explain any denormalization or design trade-offs.
Exercises
Basic Exercises
Identify Violations: For each table below, identify which normal form is violated:
Table A: +------+-------+-------------+ | ID | Name | Phones | +------+-------+-------------+ | 1 | Alice | 555-1, 555-2| +------+-------+-------------+Functional Dependencies: List all functional dependencies:
Employee(EmpID, Name, DeptID, DeptName, ManagerID, ManagerName)Convert to 1NF: Normalize this table to 1NF:
Authors Table: +----------+----------------+ | AuthorID | Books | +----------+----------------+ | A1 | Book1, Book2 | | A2 | Book3 | +----------+----------------+
Intermediate Exercises
Convert to 2NF: Given:
CREATE TABLE ProjectAssignments ( EmployeeID INT, ProjectID INT, EmployeeName VARCHAR(100), ProjectName VARCHAR(100), HoursWorked INT, PRIMARY KEY (EmployeeID, ProjectID) );Normalize to 2NF. Show all resulting tables.
Convert to 3NF: Given:
CREATE TABLE Books ( BookID INT PRIMARY KEY, Title VARCHAR(200), AuthorID INT, AuthorName VARCHAR(100), PublisherID INT, PublisherName VARCHAR(100) );Normalize to 3NF. Show all resulting tables.
BCNF Analysis: Determine if this table is in BCNF:
CourseSchedule(CourseID, Room, Time, Instructor) Candidate keys: (CourseID, Time), (Room, Time) FDs: CourseID → Instructor, (Room, Time) → CourseID
Advanced Exercises
Complete Normalization: Normalize this table to BCNF:
Hospital(PatientID, PatientName, DoctorID, DoctorName, DoctorSpecialty, RoomNo, RoomType, AdmitDate) FDs: PatientID → PatientName, RoomNo, AdmitDate DoctorID → DoctorName, DoctorSpecialty RoomNo → RoomType (PatientID, DoctorID) → (all attributes)Design from Scratch: Design a normalized database for a movie rental system:
- Customers rent movies
- Movies have genres, directors, actors
- Track rental dates and return dates
- Late fees calculated based on return date
Show ERD and SQL CREATE statements.
Denormalization Decision: You have a normalized design:
Products(ProductID, Name, Price) OrderItems(OrderID, ProductID, Quantity)Orders table needs frequent queries for total amount. Should you: a) Keep normalized and use JOINs b) Add TotalAmount to Orders table (denormalized) c) Use a materialized view
Justify your choice considering read/write patterns.
Complex Schema: Design a complete database for a social media platform:
- Users with profiles
- Posts (text, images, videos)
- Comments on posts
- Likes on posts and comments
- Friendships (symmetrical relationship)
- Groups and memberships
- Messages between users
Provide:
- ERD showing all entities and relationships
- Complete SQL schema in at least 3NF
- 5 sample queries demonstrating the design
Summary
In this reading, we explored database normalization:
- Update anomalies (update, insertion, deletion) motivate normalization
- Functional dependencies describe how attributes determine each other
- Normal forms progressively eliminate redundancy:
- 1NF: Atomic values, no repeating groups
- 2NF: No partial dependencies
- 3NF: No transitive dependencies
- BCNF: Every determinant is a candidate key
- Normalization process: Systematically decompose tables
- Denormalization: Controlled redundancy for performance
- ERD: Visual tool for database design
Good database design is foundational to building reliable, scalable applications. Practice normalizing schemas to develop intuition for spotting anomalies and dependencies.
Next Reading
Continue to 04-indexing.md to learn about indexes, B-trees, hash indexes, query optimization, and performance tuning techniques.