Graphs: Nodes, Edges, and the Algorithms That Navigate Them
This chapter covers graph representations, BFS, DFS, shortest paths, topological sort, and Union-Find.
What a Graph Is
A graph is a set of nodes (vertices) connected by edges. Directed or undirected. Weighted or unweighted. Trees are graphs (acyclic, connected, with a root); almost everything else interesting is a graph too.
Real-world graphs: social networks, road maps, dependency chains, web pages, state machines, package trees.
Representations
Three common ways to store a graph.
Adjacency List
A dict from node to list of neighbors. Most flexible and Pythonic.
graph = {
"A": ["B", "C"],
"B": ["A", "D"],
"C": ["A", "D"],
"D": ["B", "C", "E"],
"E": ["D"],
}
Space: O(V + E). Good for sparse graphs, which is most real graphs.
For weighted graphs:
graph = {
"A": [("B", 2), ("C", 5)],
"B": [("D", 3)],
# ...
}
Adjacency Matrix
A V-by-V 2D array where matrix[i][j] is 1 (or the weight) if there's an edge from i to j.
A B C D E
A [ 0, 1, 1, 0, 0 ]
B [ 1, 0, 0, 1, 0 ]
C [ 1, 0, 0, 1, 0 ]
D [ 0, 1, 1, 0, 1 ]
E [ 0, 0, 0, 1, 0 ]
O(V^2) space. O(1) edge lookup. Bad for sparse graphs (mostly zeroes).
Good for: dense graphs, matrix-based algorithms (Floyd-Warshall).
Edge List
Just a list of (u, v) pairs.
edges = [("A", "B"), ("A", "C"), ("B", "D"), ("C", "D"), ("D", "E")]
Rare as a working representation. Sometimes convenient as input format.
For most algorithms, build an adjacency list as your first step.
BFS (Breadth-First Search)
Explore nodes in order of distance from the start. Uses a queue.
from collections import deque
def bfs(graph, start):
seen = {start}
queue = deque([start])
while queue:
node = queue.popleft()
print(node)
for nb in graph[node]:
if nb not in seen:
seen.add(nb)
queue.append(nb)
O(V + E) time, O(V) space.
Uses:
- Shortest path in an unweighted graph (distance = fewest edges).
- Level-by-level traversal.
- Finding connected components.
BFS for Shortest Path
def shortest_path(graph, start, end):
seen = {start: 0}
queue = deque([start])
while queue:
node = queue.popleft()
if node == end:
return seen[node]
for nb in graph[node]:
if nb not in seen:
seen[nb] = seen[node] + 1
queue.append(nb)
return -1
Distance equals number of edges on the shortest path. BFS guarantees that because it explores in order of distance.
DFS (Depth-First Search)
Follow one path as deep as it goes; backtrack. Uses a stack (recursion or explicit).
def dfs(graph, start, seen=None):
if seen is None:
seen = set()
seen.add(start)
print(start)
for nb in graph[start]:
if nb not in seen:
dfs(graph, nb, seen)
O(V + E), same as BFS.
Uses:
- Cycle detection.
- Topological sort.
- Finding strongly connected components.
- Path existence (not shortest path).
Iterative DFS
def dfs_iter(graph, start):
seen = {start}
stack = [start]
while stack:
node = stack.pop()
print(node)
for nb in graph[node]:
if nb not in seen:
seen.add(nb)
stack.append(nb)
Swap the queue for a stack and you've converted BFS to DFS. Handy when recursion depth is a concern.
Dijkstra's Algorithm
Shortest path with non-negative weights. BFS weighted by edge cost.
import heapq
def dijkstra(graph, start):
dist = {start: 0}
pq = [(0, start)] # (distance, node)
while pq:
d, node = heapq.heappop(pq)
if d > dist.get(node, float("inf")):
continue # stale entry
for nb, weight in graph[node]:
new_dist = d + weight
if new_dist < dist.get(nb, float("inf")):
dist[nb] = new_dist
heapq.heappush(pq, (new_dist, nb))
return dist
O((V + E) log V) with a binary heap. Faster variants exist (Fibonacci heap: O(E + V log V)), rarely worth the complexity.
Dijkstra fails on graphs with negative edges. Use Bellman-Ford for those (slower: O(V * E)).
Topological Sort
A linear ordering of nodes in a DAG (directed acyclic graph) such that every edge points from earlier to later. Used for build systems, dependency resolution, course prerequisites.
Kahn's Algorithm (BFS-based)
from collections import deque
def topo_sort(graph):
indegree = {node: 0 for node in graph}
for node in graph:
for nb in graph[node]:
indegree[nb] = indegree.get(nb, 0) + 1
indegree.setdefault(node, 0)
queue = deque([n for n in indegree if indegree[n] == 0])
order = []
while queue:
node = queue.popleft()
order.append(node)
for nb in graph.get(node, []):
indegree[nb] -= 1
if indegree[nb] == 0:
queue.append(nb)
if len(order) != len(indegree):
raise ValueError("cycle detected")
return order
O(V + E). Also detects cycles (if no valid order exists, the graph has a cycle).
Cycle Detection
In a directed graph, DFS with three colors: white (unvisited), gray (in the current path), black (done).
def has_cycle(graph):
WHITE, GRAY, BLACK = 0, 1, 2
color = {node: WHITE for node in graph}
def visit(node):
color[node] = GRAY
for nb in graph[node]:
if color.get(nb, WHITE) == GRAY:
return True
if color.get(nb, WHITE) == WHITE and visit(nb):
return True
color[node] = BLACK
return False
for node in graph:
if color[node] == WHITE and visit(node):
return True
return False
A gray-to-gray edge means a back edge, which means a cycle.
For undirected graphs, Union-Find is often simpler.
Union-Find (Disjoint Set Union)
Track which nodes are in the same set. Very fast when combined with path compression and union by rank.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb:
return False # already in same set
if self.rank[ra] < self.rank[rb]:
ra, rb = rb, ra
self.parent[rb] = ra
if self.rank[ra] == self.rank[rb]:
self.rank[ra] += 1
return True
Find and Union are amortized nearly O(1) (technically O(α(n)) where α is the inverse Ackermann function, which is less than 5 for any practical n).
Uses:
- Detecting cycles in undirected graphs.
- Minimum spanning trees (Kruskal's).
- Grouping connected components.
- Offline connectivity queries.
Example: Connected Components
def count_components(n, edges):
uf = UnionFind(n)
for a, b in edges:
uf.union(a, b)
return len({uf.find(i) for i in range(n)})
Each union collapses two sets into one; distinct roots at the end are distinct components.
Minimum Spanning Trees
Given a weighted undirected graph, find a tree (no cycles, connected) with minimum total edge weight.
Two algorithms:
- Kruskal's: sort edges by weight, add each if it doesn't form a cycle (Union-Find). O(E log E).
- Prim's: grow a tree from a start node, adding the cheapest edge to a node not yet in the tree (priority queue). O(E log V).
Kruskal's is easier to code with Union-Find:
def kruskal(n, edges):
edges = sorted(edges, key=lambda e: e[2]) # (u, v, weight)
uf = UnionFind(n)
total = 0
mst = []
for u, v, w in edges:
if uf.union(u, v):
total += w
mst.append((u, v, w))
return total, mst
Picking the Right Algorithm
Problem Algorithm
Shortest path, unweighted BFS
Shortest path, non-negative weights Dijkstra
Shortest path, negative weights Bellman-Ford
All-pairs shortest paths Floyd-Warshall (dense) or Johnson (sparse)
Cycle detection (directed) DFS with colors
Cycle detection (undirected) Union-Find or DFS
Topological sort Kahn's (BFS) or DFS
Connected components BFS/DFS or Union-Find
Minimum spanning tree Kruskal's or Prim's
Memorize the mapping. In interviews, the hardest part is often recognizing which shape a problem has.
Common Pitfalls
Recursion depth on deep graphs. 10,000-node DFS will crash Python. Use iterative DFS for deep inputs, or raise sys.setrecursionlimit cautiously.
Forgetting to mark nodes as seen. Infinite loop on cycles.
Using Dijkstra with negative edges. Wrong answer. Use Bellman-Ford.
Building an adjacency matrix for a sparse graph. Wastes O(V^2) space; most entries are 0.
Treating undirected graphs as directed. If (u, v) is an undirected edge, add both u → v and v → u to the adjacency list.
Next Steps
Continue to 09-dynamic-programming.md to solve problems by remembering subproblems.