Graphs: Nodes, Edges, and the Algorithms That Navigate Them

This chapter covers graph representations, BFS, DFS, shortest paths, topological sort, and Union-Find.

What a Graph Is

A graph is a set of nodes (vertices) connected by edges. Directed or undirected. Weighted or unweighted. Trees are graphs (acyclic, connected, with a root); almost everything else interesting is a graph too.

Real-world graphs: social networks, road maps, dependency chains, web pages, state machines, package trees.

Representations

Three common ways to store a graph.

Adjacency List

A dict from node to list of neighbors. Most flexible and Pythonic.

graph = {
    "A": ["B", "C"],
    "B": ["A", "D"],
    "C": ["A", "D"],
    "D": ["B", "C", "E"],
    "E": ["D"],
}

Space: O(V + E). Good for sparse graphs, which is most real graphs.

For weighted graphs:

graph = {
    "A": [("B", 2), ("C", 5)],
    "B": [("D", 3)],
    # ...
}

Adjacency Matrix

A V-by-V 2D array where matrix[i][j] is 1 (or the weight) if there's an edge from i to j.

    A  B  C  D  E
A [ 0, 1, 1, 0, 0 ]
B [ 1, 0, 0, 1, 0 ]
C [ 1, 0, 0, 1, 0 ]
D [ 0, 1, 1, 0, 1 ]
E [ 0, 0, 0, 1, 0 ]

O(V^2) space. O(1) edge lookup. Bad for sparse graphs (mostly zeroes).

Good for: dense graphs, matrix-based algorithms (Floyd-Warshall).

Edge List

Just a list of (u, v) pairs.

edges = [("A", "B"), ("A", "C"), ("B", "D"), ("C", "D"), ("D", "E")]

Rare as a working representation. Sometimes convenient as input format.

For most algorithms, build an adjacency list as your first step.

Explore nodes in order of distance from the start. Uses a queue.

from collections import deque

def bfs(graph, start):
    seen = {start}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        print(node)
        for nb in graph[node]:
            if nb not in seen:
                seen.add(nb)
                queue.append(nb)

O(V + E) time, O(V) space.

Uses:

  • Shortest path in an unweighted graph (distance = fewest edges).
  • Level-by-level traversal.
  • Finding connected components.

BFS for Shortest Path

def shortest_path(graph, start, end):
    seen = {start: 0}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node == end:
            return seen[node]
        for nb in graph[node]:
            if nb not in seen:
                seen[nb] = seen[node] + 1
                queue.append(nb)
    return -1

Distance equals number of edges on the shortest path. BFS guarantees that because it explores in order of distance.

Follow one path as deep as it goes; backtrack. Uses a stack (recursion or explicit).

def dfs(graph, start, seen=None):
    if seen is None:
        seen = set()
    seen.add(start)
    print(start)
    for nb in graph[start]:
        if nb not in seen:
            dfs(graph, nb, seen)

O(V + E), same as BFS.

Uses:

  • Cycle detection.
  • Topological sort.
  • Finding strongly connected components.
  • Path existence (not shortest path).

Iterative DFS

def dfs_iter(graph, start):
    seen = {start}
    stack = [start]
    while stack:
        node = stack.pop()
        print(node)
        for nb in graph[node]:
            if nb not in seen:
                seen.add(nb)
                stack.append(nb)

Swap the queue for a stack and you've converted BFS to DFS. Handy when recursion depth is a concern.

Dijkstra's Algorithm

Shortest path with non-negative weights. BFS weighted by edge cost.

import heapq

def dijkstra(graph, start):
    dist = {start: 0}
    pq = [(0, start)]      # (distance, node)
    while pq:
        d, node = heapq.heappop(pq)
        if d > dist.get(node, float("inf")):
            continue       # stale entry
        for nb, weight in graph[node]:
            new_dist = d + weight
            if new_dist < dist.get(nb, float("inf")):
                dist[nb] = new_dist
                heapq.heappush(pq, (new_dist, nb))
    return dist

O((V + E) log V) with a binary heap. Faster variants exist (Fibonacci heap: O(E + V log V)), rarely worth the complexity.

Dijkstra fails on graphs with negative edges. Use Bellman-Ford for those (slower: O(V * E)).

Topological Sort

A linear ordering of nodes in a DAG (directed acyclic graph) such that every edge points from earlier to later. Used for build systems, dependency resolution, course prerequisites.

Kahn's Algorithm (BFS-based)

from collections import deque

def topo_sort(graph):
    indegree = {node: 0 for node in graph}
    for node in graph:
        for nb in graph[node]:
            indegree[nb] = indegree.get(nb, 0) + 1
            indegree.setdefault(node, 0)

    queue = deque([n for n in indegree if indegree[n] == 0])
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for nb in graph.get(node, []):
            indegree[nb] -= 1
            if indegree[nb] == 0:
                queue.append(nb)

    if len(order) != len(indegree):
        raise ValueError("cycle detected")
    return order

O(V + E). Also detects cycles (if no valid order exists, the graph has a cycle).

Cycle Detection

In a directed graph, DFS with three colors: white (unvisited), gray (in the current path), black (done).

def has_cycle(graph):
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {node: WHITE for node in graph}

    def visit(node):
        color[node] = GRAY
        for nb in graph[node]:
            if color.get(nb, WHITE) == GRAY:
                return True
            if color.get(nb, WHITE) == WHITE and visit(nb):
                return True
        color[node] = BLACK
        return False

    for node in graph:
        if color[node] == WHITE and visit(node):
            return True
    return False

A gray-to-gray edge means a back edge, which means a cycle.

For undirected graphs, Union-Find is often simpler.

Union-Find (Disjoint Set Union)

Track which nodes are in the same set. Very fast when combined with path compression and union by rank.

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]   # path compression
            x = self.parent[x]
        return x

    def union(self, a, b):
        ra, rb = self.find(a), self.find(b)
        if ra == rb:
            return False   # already in same set
        if self.rank[ra] < self.rank[rb]:
            ra, rb = rb, ra
        self.parent[rb] = ra
        if self.rank[ra] == self.rank[rb]:
            self.rank[ra] += 1
        return True

Find and Union are amortized nearly O(1) (technically O(α(n)) where α is the inverse Ackermann function, which is less than 5 for any practical n).

Uses:

  • Detecting cycles in undirected graphs.
  • Minimum spanning trees (Kruskal's).
  • Grouping connected components.
  • Offline connectivity queries.

Example: Connected Components

def count_components(n, edges):
    uf = UnionFind(n)
    for a, b in edges:
        uf.union(a, b)
    return len({uf.find(i) for i in range(n)})

Each union collapses two sets into one; distinct roots at the end are distinct components.

Minimum Spanning Trees

Given a weighted undirected graph, find a tree (no cycles, connected) with minimum total edge weight.

Two algorithms:

  • Kruskal's: sort edges by weight, add each if it doesn't form a cycle (Union-Find). O(E log E).
  • Prim's: grow a tree from a start node, adding the cheapest edge to a node not yet in the tree (priority queue). O(E log V).

Kruskal's is easier to code with Union-Find:

def kruskal(n, edges):
    edges = sorted(edges, key=lambda e: e[2])   # (u, v, weight)
    uf = UnionFind(n)
    total = 0
    mst = []
    for u, v, w in edges:
        if uf.union(u, v):
            total += w
            mst.append((u, v, w))
    return total, mst

Picking the Right Algorithm

Problem                              Algorithm
Shortest path, unweighted            BFS
Shortest path, non-negative weights  Dijkstra
Shortest path, negative weights      Bellman-Ford
All-pairs shortest paths             Floyd-Warshall (dense) or Johnson (sparse)
Cycle detection (directed)           DFS with colors
Cycle detection (undirected)         Union-Find or DFS
Topological sort                     Kahn's (BFS) or DFS
Connected components                 BFS/DFS or Union-Find
Minimum spanning tree                Kruskal's or Prim's

Memorize the mapping. In interviews, the hardest part is often recognizing which shape a problem has.

Common Pitfalls

Recursion depth on deep graphs. 10,000-node DFS will crash Python. Use iterative DFS for deep inputs, or raise sys.setrecursionlimit cautiously.

Forgetting to mark nodes as seen. Infinite loop on cycles.

Using Dijkstra with negative edges. Wrong answer. Use Bellman-Ford.

Building an adjacency matrix for a sparse graph. Wastes O(V^2) space; most entries are 0.

Treating undirected graphs as directed. If (u, v) is an undirected edge, add both u → v and v → u to the adjacency list.

Next Steps

Continue to 09-dynamic-programming.md to solve problems by remembering subproblems.