Greedy Algorithms
Introduction
Greedy algorithms make the locally optimal choice at each step, hoping to find a global optimum. They're simpler than dynamic programming and often more efficient, but they don't always produce optimal solutions.
Learning Objectives
By the end of this reading, you will be able to:
- Understand when greedy algorithms work
- Apply greedy strategies to classic problems
- Prove greedy correctness using exchange arguments
- Distinguish between greedy and DP problems
1. The Greedy Approach
Key Characteristics
- Greedy Choice Property: A global optimum can be reached by making locally optimal choices
- Optimal Substructure: An optimal solution contains optimal solutions to subproblems
When Does Greedy Work?
- When local optimum leads to global optimum
- Often in scheduling, selection, and covering problems
- Must be proven correct (or shown incorrect by counterexample)
Greedy vs DP
| Aspect | Greedy | Dynamic Programming |
|---|---|---|
| Choices | Make one choice, never reconsider | Consider all choices |
| Subproblems | One subproblem remains | Multiple subproblems |
| Time | Usually faster | Often slower |
| Correctness | Must prove | Always optimal |
2. Classic Greedy Problems
Activity Selection
Select maximum non-overlapping activities.
def activity_selection(activities):
"""
activities: list of (start, end) tuples
Returns maximum number of non-overlapping activities
Greedy choice: Always pick the activity that ends earliest
"""
# Sort by end time
activities = sorted(activities, key=lambda x: x[1])
selected = []
last_end = 0
for start, end in activities:
if start >= last_end:
selected.append((start, end))
last_end = end
return selected
# Example
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 9), (5, 9), (6, 10), (8, 11)]
print(activity_selection(activities))
# [(1, 4), (5, 7), (8, 11)] - 3 activities
Why it works: By choosing the earliest-ending activity, we leave maximum room for remaining activities.
Fractional Knapsack
Unlike 0/1 knapsack, we can take fractions of items.
def fractional_knapsack(items, capacity):
"""
items: list of (value, weight)
Greedy choice: Take items with best value/weight ratio
"""
# Calculate value per unit weight
items_with_ratio = [(v, w, v/w) for v, w in items]
# Sort by ratio (descending)
items_with_ratio.sort(key=lambda x: x[2], reverse=True)
total_value = 0
remaining = capacity
for value, weight, ratio in items_with_ratio:
if weight <= remaining:
# Take entire item
total_value += value
remaining -= weight
else:
# Take fraction
total_value += ratio * remaining
break
return total_value
# Example
items = [(60, 10), (100, 20), (120, 30)] # (value, weight)
print(fractional_knapsack(items, 50)) # 240.0
Coin Change (Greedy - Sometimes Works)
def coin_change_greedy(coins, amount):
"""
WARNING: Only works for certain coin systems (e.g., US coins)
"""
coins = sorted(coins, reverse=True)
count = 0
remaining = amount
for coin in coins:
if remaining >= coin:
num_coins = remaining // coin
count += num_coins
remaining -= num_coins * coin
return count if remaining == 0 else -1
# Works for US coins: [25, 10, 5, 1]
print(coin_change_greedy([25, 10, 5, 1], 67)) # 6: 25+25+10+5+1+1
# FAILS for some coin systems
# coins = [1, 3, 4], amount = 6
# Greedy: 4+1+1 = 3 coins
# Optimal: 3+3 = 2 coins
Huffman Coding
Build optimal prefix-free encoding.
import heapq
class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def huffman_coding(frequencies):
"""
frequencies: dict of char -> frequency
Returns encoding dict
"""
# Create leaf nodes
heap = [HuffmanNode(char, freq) for char, freq in frequencies.items()]
heapq.heapify(heap)
# Build tree by combining lowest frequency nodes
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = HuffmanNode(None, left.freq + right.freq)
merged.left = left
merged.right = right
heapq.heappush(heap, merged)
# Generate codes
codes = {}
def generate_codes(node, code=""):
if node.char is not None:
codes[node.char] = code or "0"
return
generate_codes(node.left, code + "0")
generate_codes(node.right, code + "1")
if heap:
generate_codes(heap[0])
return codes
# Example
frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
codes = huffman_coding(frequencies)
print(codes)
# {'f': '0', 'c': '100', 'd': '101', 'a': '1100', 'b': '1101', 'e': '111'}
3. Scheduling Problems
Job Scheduling with Deadlines
def job_scheduling(jobs):
"""
jobs: list of (id, profit, deadline)
Schedule jobs to maximize profit, one job per time slot
"""
# Sort by profit (descending)
jobs = sorted(jobs, key=lambda x: x[1], reverse=True)
max_deadline = max(job[2] for job in jobs)
slots = [None] * (max_deadline + 1)
total_profit = 0
scheduled = []
for job_id, profit, deadline in jobs:
# Find latest available slot before deadline
for slot in range(deadline, 0, -1):
if slots[slot] is None:
slots[slot] = job_id
total_profit += profit
scheduled.append(job_id)
break
return total_profit, scheduled
# Example
jobs = [('a', 100, 2), ('b', 19, 1), ('c', 27, 2), ('d', 25, 1), ('e', 15, 3)]
print(job_scheduling(jobs)) # (142, ['a', 'c', 'e'])
Minimum Meeting Rooms
def min_meeting_rooms(intervals):
"""
intervals: list of (start, end)
Find minimum rooms needed for all meetings
"""
if not intervals:
return 0
# Track start and end events
events = []
for start, end in intervals:
events.append((start, 1)) # Meeting starts
events.append((end, -1)) # Meeting ends
events.sort()
max_rooms = 0
current_rooms = 0
for time, delta in events:
current_rooms += delta
max_rooms = max(max_rooms, current_rooms)
return max_rooms
# Example
intervals = [(0, 30), (5, 10), (15, 20)]
print(min_meeting_rooms(intervals)) # 2
Task Scheduling with Cooldown
from collections import Counter
def least_interval(tasks, n):
"""
Schedule tasks with cooldown n between same tasks.
Return minimum time needed.
"""
counts = Counter(tasks)
max_count = max(counts.values())
num_max = sum(1 for c in counts.values() if c == max_count)
# Minimum time = (max_count - 1) * (n + 1) + num_max
# This accounts for cycles with max-frequency tasks
result = (max_count - 1) * (n + 1) + num_max
# Can't be less than total tasks
return max(result, len(tasks))
# Example
tasks = ['A', 'A', 'A', 'B', 'B', 'B']
print(least_interval(tasks, 2)) # 8: A B _ A B _ A B
4. Interval Problems
Merge Intervals
def merge_intervals(intervals):
"""Merge overlapping intervals."""
if not intervals:
return []
intervals.sort()
merged = [intervals[0]]
for start, end in intervals[1:]:
if start <= merged[-1][1]:
merged[-1] = (merged[-1][0], max(merged[-1][1], end))
else:
merged.append((start, end))
return merged
# Example
intervals = [(1, 3), (2, 6), (8, 10), (15, 18)]
print(merge_intervals(intervals)) # [(1, 6), (8, 10), (15, 18)]
Interval Covering
def min_points_to_cover(intervals):
"""
Find minimum points that cover all intervals.
A point covers an interval if it lies within [start, end].
"""
if not intervals:
return []
# Sort by end point
intervals.sort(key=lambda x: x[1])
points = []
last_point = float('-inf')
for start, end in intervals:
if start > last_point:
# Need new point
points.append(end)
last_point = end
return points
# Example
intervals = [(1, 4), (2, 8), (3, 6), (4, 7)]
print(min_points_to_cover(intervals)) # [4, 7] or similar
5. Graph Greedy Algorithms
Minimum Spanning Tree (Kruskal's)
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
def kruskal(n, edges):
"""
n: number of vertices
edges: list of (weight, u, v)
Greedy choice: Always pick the smallest edge that doesn't form a cycle
"""
edges.sort()
uf = UnionFind(n)
mst = []
total_weight = 0
for weight, u, v in edges:
if uf.union(u, v):
mst.append((u, v, weight))
total_weight += weight
if len(mst) == n - 1:
break
return total_weight, mst
Minimum Spanning Tree (Prim's)
import heapq
def prim(graph, start=0):
"""
graph: adjacency list {node: [(neighbor, weight), ...]}
Greedy choice: Always add the minimum-weight edge to the growing tree
"""
n = len(graph)
visited = [False] * n
mst_edges = []
total_weight = 0
# (weight, from_node, to_node)
heap = [(0, -1, start)]
while heap and len(mst_edges) < n:
weight, from_node, to_node = heapq.heappop(heap)
if visited[to_node]:
continue
visited[to_node] = True
if from_node != -1:
mst_edges.append((from_node, to_node, weight))
total_weight += weight
for neighbor, edge_weight in graph[to_node]:
if not visited[neighbor]:
heapq.heappush(heap, (edge_weight, to_node, neighbor))
return total_weight, mst_edges
6. Other Greedy Problems
Jump Game
def can_jump(nums):
"""
Can you reach the last index? nums[i] is max jump from i.
"""
max_reach = 0
for i, jump in enumerate(nums):
if i > max_reach:
return False
max_reach = max(max_reach, i + jump)
return True
def min_jumps(nums):
"""
Minimum jumps to reach the last index.
"""
n = len(nums)
if n <= 1:
return 0
jumps = 0
current_end = 0
farthest = 0
for i in range(n - 1):
farthest = max(farthest, i + nums[i])
if i == current_end:
jumps += 1
current_end = farthest
if current_end >= n - 1:
break
return jumps
Gas Station
def can_complete_circuit(gas, cost):
"""
Find starting station to complete circular route.
gas[i]: gas at station i
cost[i]: cost to travel from i to i+1
"""
total_tank = 0
current_tank = 0
start = 0
for i in range(len(gas)):
net = gas[i] - cost[i]
total_tank += net
current_tank += net
if current_tank < 0:
start = i + 1
current_tank = 0
return start if total_tank >= 0 else -1
7. Proving Greedy Correctness
Exchange Argument
To prove greedy is optimal:
- Assume there's an optimal solution different from greedy
- Show we can transform it toward greedy without worsening it
- Conclude greedy is also optimal
Example: Activity Selection
Claim: Selecting the activity with earliest end time is optimal.
Proof:
- Let S be our greedy solution, O be any optimal solution
- Let a be the activity with earliest end time
- If a ∈ O, we're done for the first choice
- If a ∉ O, let b be the first activity in O
- Since a ends earliest, end(a) ≤ end(b)
- Replace b with a in O: O' = (O - {b}) ∪ {a}
- O' is valid (no new conflicts since a ends earlier)
- |O'| = |O|, so O' is also optimal
- By induction, we can transform O to match S
8. When Greedy Fails
Counterexample: Coin Change
Coins: [1, 3, 4]
Amount: 6
Greedy: 4 + 1 + 1 = 3 coins
Optimal: 3 + 3 = 2 coins
Counterexample: 0/1 Knapsack
Items: [(value=60, weight=10), (value=100, weight=20), (value=120, weight=30)]
Capacity: 50
Greedy by value/weight: Take item 1 (ratio 6), then item 2 (ratio 5)
= 60 + 100 = 160
Optimal: Take items 2 and 3
= 100 + 120 = 220
Exercises
Basic
Implement a greedy algorithm for making change with US coins (1, 5, 10, 25 cents).
Find the maximum number of non-overlapping intervals.
Assign cookies to children to maximize contentment (each child needs minimum cookie size).
Intermediate
Find minimum number of arrows to burst all balloons (each balloon is an interval).
Implement a greedy solution for the task scheduler problem.
Partition labels: partition a string so each letter appears in at most one part.
Advanced
Prove the correctness of Huffman coding using an exchange argument.
Implement Dijkstra's shortest path algorithm (greedy on distances).
Find counterexample showing greedy fails for subset sum.
Summary
- Greedy makes locally optimal choices
- Works when greedy choice property and optimal substructure hold
- Faster and simpler than DP when applicable
- Must prove correctness or find counterexample
- Common applications: scheduling, intervals, graphs (MST)
- When greedy fails, consider DP