Algorithm Analysis and Big O
Introduction
Algorithm analysis lets us compare algorithms objectively, independent of hardware or implementation details. Big O notation provides a language for discussing efficiency that every programmer should understand.
Learning Objectives
By the end of this reading, you will be able to:
- Analyze time and space complexity
- Use Big O, Big Omega, and Big Theta notation
- Identify common complexity classes
- Analyze code to determine its complexity
- Make informed decisions about algorithm tradeoffs
1. Why Analyze Algorithms?
The Problem with Timing
import time
start = time.time()
result = my_algorithm(data)
end = time.time()
print(f"Took {end - start} seconds")
Problems:
- Results vary by machine, language, load
- Can't compare algorithms on different inputs
- Doesn't predict behavior on larger inputs
What We Really Want
- Machine-independent comparison
- Growth rate as input size increases
- Worst-case guarantees (usually)
2. Counting Operations
Instead of time, count fundamental operations.
def sum_array(arr):
total = 0 # 1 assignment
for x in arr: # n iterations
total += x # n additions + n assignments
return total # 1 return
# Total: 1 + 2n + 1 = 2n + 2 operations
But we don't care about exact counts. We care about growth rate.
3. Big O Notation
Big O describes an upper bound on growth rate.
Definition: f(n) = O(g(n)) means there exist constants c > 0 and n₀ such that: f(n) ≤ c · g(n) for all n ≥ n₀
In plain English: f grows no faster than g (asymptotically).
Examples
2n + 2 = O(n) (ignore constants and lower-order terms)
3n² + 5n + 10 = O(n²)
n + log n = O(n)
5 = O(1)
Simplification Rules
- Drop constants: O(2n) → O(n)
- Drop lower-order terms: O(n² + n) → O(n²)
- Focus on dominant term: O(n + log n) → O(n)
4. Common Complexity Classes
Listed from fastest to slowest growth:
| Notation | Name | Example |
|---|---|---|
| O(1) | Constant | Array access |
| O(log n) | Logarithmic | Binary search |
| O(n) | Linear | Linear search |
| O(n log n) | Linearithmic | Merge sort |
| O(n²) | Quadratic | Bubble sort |
| O(n³) | Cubic | Naive matrix multiply |
| O(2ⁿ) | Exponential | Subset generation |
| O(n!) | Factorial | Permutation generation |
Growth Comparison
For n = 1,000,000:
| Complexity | Operations |
|---|---|
| O(1) | 1 |
| O(log n) | ~20 |
| O(n) | 1,000,000 |
| O(n log n) | ~20,000,000 |
| O(n²) | 1,000,000,000,000 |
5. Analyzing Code
Constant Time: O(1)
def get_first(arr):
return arr[0] # O(1)
def swap(arr, i, j):
arr[i], arr[j] = arr[j], arr[i] # O(1)
Linear Time: O(n)
def find_max(arr):
max_val = arr[0]
for x in arr: # O(n)
if x > max_val:
max_val = x
return max_val
Quadratic Time: O(n²)
def has_duplicates(arr):
n = len(arr)
for i in range(n): # O(n)
for j in range(i + 1, n): # O(n)
if arr[i] == arr[j]:
return True
return False
# Total: O(n²)
Logarithmic Time: O(log n)
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right: # O(log n) iterations
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Why log n? Each iteration halves the search space: n → n/2 → n/4 → ... → 1
Number of halvings: log₂(n)
Linearithmic Time: O(n log n)
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid]) # T(n/2)
right = merge_sort(arr[mid:]) # T(n/2)
return merge(left, right) # O(n)
# T(n) = 2T(n/2) + O(n) = O(n log n)
Exponential Time: O(2ⁿ)
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2) # Two recursive calls
# Each call branches into two: O(2ⁿ)
6. Analyzing Loops
Sequential Loops: Add Complexities
for i in range(n): # O(n)
process(i)
for i in range(n): # O(n)
process(i)
# Total: O(n) + O(n) = O(n)
Nested Loops: Multiply Complexities
for i in range(n): # O(n)
for j in range(n): # O(n)
process(i, j)
# Total: O(n) × O(n) = O(n²)
Dependent Nested Loops
for i in range(n):
for j in range(i): # j goes from 0 to i-1
process(i, j)
# Iterations: 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 = O(n²)
Loops with Different Bounds
for i in range(n): # O(n)
for j in range(m): # O(m)
process(i, j)
# Total: O(n × m)
Logarithmic Loops
i = n
while i > 0:
process(i)
i = i // 2 # Halving
# Iterations: log₂(n) = O(log n)
7. Space Complexity
Memory usage as a function of input size.
Examples
# O(1) space - constant extra memory
def sum_array(arr):
total = 0
for x in arr:
total += x
return total
# O(n) space - linear extra memory
def double_array(arr):
result = []
for x in arr:
result.append(x * 2)
return result
# O(n) space - recursion uses stack
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1) # n stack frames
In-Place Algorithms
Use O(1) extra space (modifying input is allowed):
def reverse_in_place(arr):
left, right = 0, len(arr) - 1
while left < right:
arr[left], arr[right] = arr[right], arr[left]
left += 1
right -= 1
# O(1) extra space
8. Other Asymptotic Notations
Big Omega (Ω) - Lower Bound
f(n) = Ω(g(n)) means f grows at least as fast as g.
Example: Any comparison-based sort is Ω(n log n)
Big Theta (Θ) - Tight Bound
f(n) = Θ(g(n)) means f grows at the same rate as g.
Example: Merge sort is Θ(n log n) - both upper and lower bound
Relationships
- f(n) = Θ(g(n)) ⟺ f(n) = O(g(n)) AND f(n) = Ω(g(n))
- Big O is most commonly used (worst-case focus)
9. Best, Average, Worst Case
Quick Sort Example
def quicksort(arr):
# Best case: Pivot always splits evenly
# T(n) = 2T(n/2) + O(n) = O(n log n)
# Worst case: Pivot always smallest/largest
# T(n) = T(n-1) + O(n) = O(n²)
# Average case: Random pivot
# O(n log n)
pass
Linear Search Example
Best case: O(1) - Target is first element
Worst case: O(n) - Target is last or not present
Average case: O(n) - Target is in middle on average
We typically report worst case unless specified otherwise.
10. Amortized Analysis
Average time per operation over a sequence of operations.
Dynamic Array Example
# Individual append: O(n) worst case (resize)
# But most appends are O(1)
# Amortized analysis:
# n appends cost: n × O(1) + sum of resize costs
# Resize costs: 1 + 2 + 4 + ... + n/2 ≈ n
# Total: O(n) + O(n) = O(n)
# Amortized per append: O(n) / n = O(1)
11. Common Patterns
Pattern: Look at Growth
# O(1): Fixed number of operations
x = arr[5]
# O(log n): Halving/doubling each step
while n > 0:
n //= 2
# O(n): Single pass through data
for x in arr:
process(x)
# O(n log n): Divide and conquer with linear merge
merge_sort(arr)
# O(n²): Nested iteration over data
for i in range(n):
for j in range(n):
process(i, j)
# O(2ⁿ): Generate all subsets
for subset in all_subsets(arr):
process(subset)
Pattern: Recursion
# T(n) = T(n-1) + O(1) → O(n)
def countdown(n):
if n == 0: return
countdown(n - 1)
# T(n) = T(n-1) + O(n) → O(n²)
def remove_each(arr):
if len(arr) == 0: return
process_all(arr) # O(n)
remove_each(arr[1:])
# T(n) = 2T(n/2) + O(n) → O(n log n)
def merge_sort(arr):
if len(arr) <= 1: return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
# T(n) = 2T(n/2) + O(1) → O(n)
def count_nodes(tree):
if tree is None: return 0
return 1 + count_nodes(tree.left) + count_nodes(tree.right)
12. Practical Considerations
Real-World Factors
Big O ignores:
- Constants: 100n vs 2n both O(n), but 100n is slower
- Cache effects: Array access is faster than linked list (locality)
- Small inputs: O(n²) might beat O(n log n) for small n
- Hidden costs: Memory allocation, garbage collection
Rules of Thumb
| Complexity | Max n for ~1 second |
|---|---|
| O(n!) | ~10 |
| O(2ⁿ) | ~20-25 |
| O(n³) | ~500 |
| O(n²) | ~10,000 |
| O(n log n) | ~10,000,000 |
| O(n) | ~100,000,000 |
| O(log n) | Any |
Exercises
Basic
- What is the time complexity?
for i in range(n):
for j in range(10):
print(i, j)
- What is the time complexity?
i = n
while i > 0:
print(i)
i = i // 3
- What is the space complexity?
def mystery(n):
if n <= 0:
return 0
return mystery(n - 1) + n
Intermediate
- Analyze time and space:
def process(arr):
n = len(arr)
result = []
for i in range(n):
for j in range(i, n):
result.append(arr[i] + arr[j])
return result
Prove: 3n² + 5n + 10 = O(n²)
What's the complexity of this recursive function?
def f(n):
if n <= 1:
return 1
return f(n // 2) + f(n // 2)
Advanced
Analyze the amortized complexity of n operations on a stack that doubles its capacity when full.
Given a function with complexity O(n² log n), how much longer will it take if we double the input size?
Design an algorithm to find if an array has two elements that sum to zero. Analyze solutions with different time/space tradeoffs.
Summary
- Big O describes growth rate, ignoring constants and lower terms
- Common classes: O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(2ⁿ)
- Nested loops multiply; sequential loops add
- Space complexity counts extra memory used
- Amortized analysis averages over sequences of operations
- Real-world performance depends on constants, cache, and actual input