Greedy Algorithms

Introduction

Greedy algorithms make the locally optimal choice at each step, hoping to find a global optimum. They're simpler than dynamic programming and often more efficient, but they don't always produce optimal solutions.

Learning Objectives

By the end of this reading, you will be able to:

  • Understand when greedy algorithms work
  • Apply greedy strategies to classic problems
  • Prove greedy correctness using exchange arguments
  • Distinguish between greedy and DP problems

1. The Greedy Approach

Key Characteristics

  1. Greedy Choice Property: A global optimum can be reached by making locally optimal choices
  2. Optimal Substructure: An optimal solution contains optimal solutions to subproblems

When Does Greedy Work?

  • When local optimum leads to global optimum
  • Often in scheduling, selection, and covering problems
  • Must be proven correct (or shown incorrect by counterexample)

Greedy vs DP

AspectGreedyDynamic Programming
ChoicesMake one choice, never reconsiderConsider all choices
SubproblemsOne subproblem remainsMultiple subproblems
TimeUsually fasterOften slower
CorrectnessMust proveAlways optimal

2. Classic Greedy Problems

Activity Selection

Select maximum non-overlapping activities.

def activity_selection(activities):
    """
    activities: list of (start, end) tuples
    Returns maximum number of non-overlapping activities

    Greedy choice: Always pick the activity that ends earliest
    """
    # Sort by end time
    activities = sorted(activities, key=lambda x: x[1])

    selected = []
    last_end = 0

    for start, end in activities:
        if start >= last_end:
            selected.append((start, end))
            last_end = end

    return selected


# Example
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 9), (5, 9), (6, 10), (8, 11)]
print(activity_selection(activities))
# [(1, 4), (5, 7), (8, 11)] - 3 activities

Why it works: By choosing the earliest-ending activity, we leave maximum room for remaining activities.

Fractional Knapsack

Unlike 0/1 knapsack, we can take fractions of items.

def fractional_knapsack(items, capacity):
    """
    items: list of (value, weight)
    Greedy choice: Take items with best value/weight ratio
    """
    # Calculate value per unit weight
    items_with_ratio = [(v, w, v/w) for v, w in items]

    # Sort by ratio (descending)
    items_with_ratio.sort(key=lambda x: x[2], reverse=True)

    total_value = 0
    remaining = capacity

    for value, weight, ratio in items_with_ratio:
        if weight <= remaining:
            # Take entire item
            total_value += value
            remaining -= weight
        else:
            # Take fraction
            total_value += ratio * remaining
            break

    return total_value


# Example
items = [(60, 10), (100, 20), (120, 30)]  # (value, weight)
print(fractional_knapsack(items, 50))  # 240.0

Coin Change (Greedy - Sometimes Works)

def coin_change_greedy(coins, amount):
    """
    WARNING: Only works for certain coin systems (e.g., US coins)
    """
    coins = sorted(coins, reverse=True)
    count = 0
    remaining = amount

    for coin in coins:
        if remaining >= coin:
            num_coins = remaining // coin
            count += num_coins
            remaining -= num_coins * coin

    return count if remaining == 0 else -1


# Works for US coins: [25, 10, 5, 1]
print(coin_change_greedy([25, 10, 5, 1], 67))  # 6: 25+25+10+5+1+1

# FAILS for some coin systems
# coins = [1, 3, 4], amount = 6
# Greedy: 4+1+1 = 3 coins
# Optimal: 3+3 = 2 coins

Huffman Coding

Build optimal prefix-free encoding.

import heapq

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq


def huffman_coding(frequencies):
    """
    frequencies: dict of char -> frequency
    Returns encoding dict
    """
    # Create leaf nodes
    heap = [HuffmanNode(char, freq) for char, freq in frequencies.items()]
    heapq.heapify(heap)

    # Build tree by combining lowest frequency nodes
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)

        merged = HuffmanNode(None, left.freq + right.freq)
        merged.left = left
        merged.right = right

        heapq.heappush(heap, merged)

    # Generate codes
    codes = {}

    def generate_codes(node, code=""):
        if node.char is not None:
            codes[node.char] = code or "0"
            return
        generate_codes(node.left, code + "0")
        generate_codes(node.right, code + "1")

    if heap:
        generate_codes(heap[0])

    return codes


# Example
frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
codes = huffman_coding(frequencies)
print(codes)
# {'f': '0', 'c': '100', 'd': '101', 'a': '1100', 'b': '1101', 'e': '111'}

3. Scheduling Problems

Job Scheduling with Deadlines

def job_scheduling(jobs):
    """
    jobs: list of (id, profit, deadline)
    Schedule jobs to maximize profit, one job per time slot
    """
    # Sort by profit (descending)
    jobs = sorted(jobs, key=lambda x: x[1], reverse=True)

    max_deadline = max(job[2] for job in jobs)
    slots = [None] * (max_deadline + 1)

    total_profit = 0
    scheduled = []

    for job_id, profit, deadline in jobs:
        # Find latest available slot before deadline
        for slot in range(deadline, 0, -1):
            if slots[slot] is None:
                slots[slot] = job_id
                total_profit += profit
                scheduled.append(job_id)
                break

    return total_profit, scheduled


# Example
jobs = [('a', 100, 2), ('b', 19, 1), ('c', 27, 2), ('d', 25, 1), ('e', 15, 3)]
print(job_scheduling(jobs))  # (142, ['a', 'c', 'e'])

Minimum Meeting Rooms

def min_meeting_rooms(intervals):
    """
    intervals: list of (start, end)
    Find minimum rooms needed for all meetings
    """
    if not intervals:
        return 0

    # Track start and end events
    events = []
    for start, end in intervals:
        events.append((start, 1))   # Meeting starts
        events.append((end, -1))    # Meeting ends

    events.sort()

    max_rooms = 0
    current_rooms = 0

    for time, delta in events:
        current_rooms += delta
        max_rooms = max(max_rooms, current_rooms)

    return max_rooms


# Example
intervals = [(0, 30), (5, 10), (15, 20)]
print(min_meeting_rooms(intervals))  # 2

Task Scheduling with Cooldown

from collections import Counter

def least_interval(tasks, n):
    """
    Schedule tasks with cooldown n between same tasks.
    Return minimum time needed.
    """
    counts = Counter(tasks)
    max_count = max(counts.values())
    num_max = sum(1 for c in counts.values() if c == max_count)

    # Minimum time = (max_count - 1) * (n + 1) + num_max
    # This accounts for cycles with max-frequency tasks
    result = (max_count - 1) * (n + 1) + num_max

    # Can't be less than total tasks
    return max(result, len(tasks))


# Example
tasks = ['A', 'A', 'A', 'B', 'B', 'B']
print(least_interval(tasks, 2))  # 8: A B _ A B _ A B

4. Interval Problems

Merge Intervals

def merge_intervals(intervals):
    """Merge overlapping intervals."""
    if not intervals:
        return []

    intervals.sort()
    merged = [intervals[0]]

    for start, end in intervals[1:]:
        if start <= merged[-1][1]:
            merged[-1] = (merged[-1][0], max(merged[-1][1], end))
        else:
            merged.append((start, end))

    return merged


# Example
intervals = [(1, 3), (2, 6), (8, 10), (15, 18)]
print(merge_intervals(intervals))  # [(1, 6), (8, 10), (15, 18)]

Interval Covering

def min_points_to_cover(intervals):
    """
    Find minimum points that cover all intervals.
    A point covers an interval if it lies within [start, end].
    """
    if not intervals:
        return []

    # Sort by end point
    intervals.sort(key=lambda x: x[1])
    points = []
    last_point = float('-inf')

    for start, end in intervals:
        if start > last_point:
            # Need new point
            points.append(end)
            last_point = end

    return points


# Example
intervals = [(1, 4), (2, 8), (3, 6), (4, 7)]
print(min_points_to_cover(intervals))  # [4, 7] or similar

5. Graph Greedy Algorithms

Minimum Spanning Tree (Kruskal's)

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True


def kruskal(n, edges):
    """
    n: number of vertices
    edges: list of (weight, u, v)
    Greedy choice: Always pick the smallest edge that doesn't form a cycle
    """
    edges.sort()
    uf = UnionFind(n)
    mst = []
    total_weight = 0

    for weight, u, v in edges:
        if uf.union(u, v):
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == n - 1:
                break

    return total_weight, mst

Minimum Spanning Tree (Prim's)

import heapq

def prim(graph, start=0):
    """
    graph: adjacency list {node: [(neighbor, weight), ...]}
    Greedy choice: Always add the minimum-weight edge to the growing tree
    """
    n = len(graph)
    visited = [False] * n
    mst_edges = []
    total_weight = 0

    # (weight, from_node, to_node)
    heap = [(0, -1, start)]

    while heap and len(mst_edges) < n:
        weight, from_node, to_node = heapq.heappop(heap)

        if visited[to_node]:
            continue

        visited[to_node] = True
        if from_node != -1:
            mst_edges.append((from_node, to_node, weight))
            total_weight += weight

        for neighbor, edge_weight in graph[to_node]:
            if not visited[neighbor]:
                heapq.heappush(heap, (edge_weight, to_node, neighbor))

    return total_weight, mst_edges

6. Other Greedy Problems

Jump Game

def can_jump(nums):
    """
    Can you reach the last index? nums[i] is max jump from i.
    """
    max_reach = 0

    for i, jump in enumerate(nums):
        if i > max_reach:
            return False
        max_reach = max(max_reach, i + jump)

    return True


def min_jumps(nums):
    """
    Minimum jumps to reach the last index.
    """
    n = len(nums)
    if n <= 1:
        return 0

    jumps = 0
    current_end = 0
    farthest = 0

    for i in range(n - 1):
        farthest = max(farthest, i + nums[i])

        if i == current_end:
            jumps += 1
            current_end = farthest

            if current_end >= n - 1:
                break

    return jumps

Gas Station

def can_complete_circuit(gas, cost):
    """
    Find starting station to complete circular route.
    gas[i]: gas at station i
    cost[i]: cost to travel from i to i+1
    """
    total_tank = 0
    current_tank = 0
    start = 0

    for i in range(len(gas)):
        net = gas[i] - cost[i]
        total_tank += net
        current_tank += net

        if current_tank < 0:
            start = i + 1
            current_tank = 0

    return start if total_tank >= 0 else -1

7. Proving Greedy Correctness

Exchange Argument

To prove greedy is optimal:

  1. Assume there's an optimal solution different from greedy
  2. Show we can transform it toward greedy without worsening it
  3. Conclude greedy is also optimal

Example: Activity Selection

Claim: Selecting the activity with earliest end time is optimal.

Proof:

  1. Let S be our greedy solution, O be any optimal solution
  2. Let a be the activity with earliest end time
  3. If a ∈ O, we're done for the first choice
  4. If a ∉ O, let b be the first activity in O
  5. Since a ends earliest, end(a) ≤ end(b)
  6. Replace b with a in O: O' = (O - {b}) ∪ {a}
  7. O' is valid (no new conflicts since a ends earlier)
  8. |O'| = |O|, so O' is also optimal
  9. By induction, we can transform O to match S

8. When Greedy Fails

Counterexample: Coin Change

Coins: [1, 3, 4]
Amount: 6

Greedy: 4 + 1 + 1 = 3 coins
Optimal: 3 + 3 = 2 coins

Counterexample: 0/1 Knapsack

Items: [(value=60, weight=10), (value=100, weight=20), (value=120, weight=30)]
Capacity: 50

Greedy by value/weight: Take item 1 (ratio 6), then item 2 (ratio 5)
  = 60 + 100 = 160

Optimal: Take items 2 and 3
  = 100 + 120 = 220

Exercises

Basic

  1. Implement a greedy algorithm for making change with US coins (1, 5, 10, 25 cents).

  2. Find the maximum number of non-overlapping intervals.

  3. Assign cookies to children to maximize contentment (each child needs minimum cookie size).

Intermediate

  1. Find minimum number of arrows to burst all balloons (each balloon is an interval).

  2. Implement a greedy solution for the task scheduler problem.

  3. Partition labels: partition a string so each letter appears in at most one part.

Advanced

  1. Prove the correctness of Huffman coding using an exchange argument.

  2. Implement Dijkstra's shortest path algorithm (greedy on distances).

  3. Find counterexample showing greedy fails for subset sum.


Summary

  • Greedy makes locally optimal choices
  • Works when greedy choice property and optimal substructure hold
  • Faster and simpler than DP when applicable
  • Must prove correctness or find counterexample
  • Common applications: scheduling, intervals, graphs (MST)
  • When greedy fails, consider DP

Next Reading

Graph Algorithms →