Big-O: Measuring What Your Code Costs
This chapter covers Big-O notation, common complexity classes, and the thinking that turns "my code is slow" into a concrete plan.
Why Big-O
You have two versions of an algorithm. Which is faster?
Timing them at n=100 is almost useless. Both feel instant. But what happens at n=1,000,000? The answer isn't obvious from the source code alone. You need a way to reason about cost that doesn't require running the code.
Big-O is that way. It describes how work grows as input grows, ignoring constant factors and smaller terms.
The Idea
If an algorithm does 3n + 5 steps on input of size n, we say it's O(n). The 3 and the 5 don't matter at scale. At n = 1,000,000, "3n + 5" and "n" are essentially the same; at n = 10, they differ by a factor of 3. We care about the shape of growth, not exact counts.
This drops:
- Constant factors. 3n → n.
- Lower-order terms. n² + n → n².
- Adds. n + 100 → n.
What remains is the dominant term, which determines asymptotic behavior.
Common Classes
From best to worst:
O(1) constant hash lookup, array index, stack push
O(log n) logarithmic binary search, balanced tree operations
O(n) linear scan an array, most single loops
O(n log n) log-linear efficient sorts (merge, heap, Timsort)
O(n^2) quadratic nested loops, bubble sort, naive solutions
O(n^3) cubic triple nested loops, naive matrix multiply
O(2^n) exponential brute force subsets, naive recursion
O(n!) factorial permutations, brute force TSP
Rough intuition for "will this finish in a second":
Class n = 10 n = 1000 n = 1,000,000
O(log n) instant instant instant
O(n) instant instant fast
O(n log n) instant instant fast
O(n^2) instant 1 ms 10+ minutes
O(n^3) instant 1 s years
O(2^n) 1 ms forever forever
"Instant" is a rough proxy for "under a microsecond in Python". The point isn't the exact times; it's the gap between classes at scale.
Reading Complexity from Code
The quick rules.
Single Loop over n
for item in items:
do_something(item)
One pass: O(n).
Nested Loops over n
for a in items:
for b in items:
pair(a, b)
n × n = O(n^2).
Halving Each Iteration
while n > 0:
n = n // 2
Halving log₂(n) times: O(log n).
Loop × Log
for item in items:
# some O(log n) work
binary_search(other, item)
n × log n = O(n log n).
Multiple Inputs
for a in items_a: # size m
for b in items_b: # size n
pair(a, b)
O(m × n). Don't collapse unrelated inputs into one n.
Recursion
Apply the recurrence. Merge sort splits into 2 halves (log n levels) and does O(n) work per level: O(n log n). Classic Fibonacci (naive) branches twice per call: O(2^n).
Space Complexity
Same idea, different axis. Counts extra memory used (beyond the input).
def reverse(s: str) -> str:
return s[::-1] # O(n) extra memory for the result
def reverse_in_place(arr: list):
lo, hi = 0, len(arr) - 1
while lo < hi:
arr[lo], arr[hi] = arr[hi], arr[lo]
lo, hi = lo + 1, hi - 1
# O(1) extra memory (swap doesn't allocate)
Recursion has hidden space cost: the call stack. Deeply recursive code uses O(depth) extra memory. Python's default recursion limit is ~1000.
Worst, Average, Best Case
Big-O by default describes the worst case. Some algorithms are reported in average case too.
Quicksort: worst O(n^2), average O(n log n)
Hash lookup: worst O(n) (all collisions), average O(1)
Binary search: worst O(log n), average O(log n)
In practice: cite the worst case unless you're specifically discussing typical behavior.
Amortized Complexity
Some operations are usually cheap but occasionally expensive, and the expensive ones are rare enough that the average per operation is cheap.
Python list append is O(1) amortized. Most appends fit in pre-allocated space; occasionally the list doubles its capacity (O(n) work). Over many appends, the average is O(1).
Amortized means "over a sequence of operations, each costs X on average". Useful for: dynamic arrays, union-find with path compression, hash table resizing.
Big-O, Big-Theta, Big-Omega
Formally:
- O(f): upper bound (at most this fast).
- Θ(f): tight bound (exactly this, both upper and lower).
- Ω(f): lower bound (at least this slow).
In interviews and daily engineering, people say "O" and mean Θ most of the time. Don't correct them.
Practical Advice
Estimate Before Writing
Before you code, answer: what's the complexity? If it's too high, redesign now, not after a failed deploy.
Measure, Don't Trust
Big-O hides constants. For specific hot code, benchmark both versions. Sometimes O(n^2) with a tiny constant beats O(n log n) with a huge constant, for small n.
Python-Specific
Python is slow compared to C. Translate "how many operations" to "seconds" with caution. Roughly, Python does 10M simple operations per second. An O(n^2) solution at n = 10,000 does 100M ops, so about 10 seconds. At n = 100,000, about 1000 seconds. The gap between "interview solution" and "production solution" is usually about 1000x in raw ops.
Dominant Term Rules
O(n + log n) = O(n). Drop the smaller terms. O(2n) = O(n). Drop constants. O(n * 500) = O(n). Constants never matter for the class.
An Example, Walked Through
What's the complexity?
def has_duplicate(nums: list[int]) -> bool:
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
return False
Nested loops, both over n: O(n^2). Space: O(1) (no extra data structures).
Same problem, faster:
def has_duplicate(nums: list[int]) -> bool:
seen = set()
for x in nums:
if x in seen:
return True
seen.add(x)
return False
One loop, O(1) set operations: O(n) time, O(n) space.
Trade: extra O(n) space for O(n) vs O(n^2) time. For n = 10,000, the first version does 100M comparisons; the second does 10,000. The space cost is a hash set of 10,000 ints, which is fine.
This trade (time for space) is one of the main levers in algorithm design. Chapters 3 through 11 are variations on pulling that lever.
Common Pitfalls
Counting steps too precisely. Big-O is about growth, not exact counts. 3n and 5n are both O(n).
Forgetting hidden loops. x in list is O(n). sorted(list) is O(n log n). Don't treat library calls as O(1).
Ignoring the dominant term. O(n^2) + O(n) is O(n^2). The linear work is invisible at scale.
Mixing up time and space. An algorithm can be O(n) time and O(n^2) space, or vice versa. Know which you're reporting.
Assuming worst case is rare. In adversarial settings (user inputs), the worst case is what you'll see. Design for it.
Next Steps
Continue to 03-arrays-and-hashing.md for the everyday data structures.