Searching: Finding Things Fast
This chapter covers linear search, binary search, and the surprisingly powerful "binary search on the answer" pattern.
Linear Search
The simplest. Walk from left to right; return when you find the target.
def linear_search(arr, target):
for i, x in enumerate(arr):
if x == target:
return i
return -1
O(n) worst case. Unavoidable when the data is unsorted and unindexed.
In Python, arr.index(target) does the same thing and raises ValueError if not found. target in arr does a linear scan too.
When Linear Search Is Fine
Don't reach for fancier algorithms when:
- The list is tiny (under ~20 items). The constant factor beats a binary search's log.
- The list is unsorted and you'd have to sort it to binary-search. Sorting is O(n log n); one linear scan is O(n).
- You'll search once and move on. Building a hash set for one lookup is wasteful.
Rule of thumb: for "search once", linear is fine; for "search many times", build a hash set.
Binary Search
The reason data is often sorted. In a sorted array, you can find a target in O(log n).
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
if arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return -1
Each step halves the search range. log₂(1,000,000) is about 20 steps. At a million items, binary search is 50,000x faster than linear.
Hand-rolling binary search is bug-prone. Python has bisect:
from bisect import bisect_left, bisect_right, insort
arr = [1, 3, 5, 7, 9]
i = bisect_left(arr, 5) # 2: leftmost position where 5 fits
j = bisect_right(arr, 5) # 3: rightmost position after any existing 5s
# Check if value exists:
i = bisect_left(arr, 5)
found = i < len(arr) and arr[i] == 5
# Insert while maintaining sorted order:
insort(arr, 4) # [1, 3, 4, 5, 7, 9]
Use bisect for real code. Hand-roll binary search in interviews and to understand the pattern.
The Classic Bugs
Off-by-one errors in binary search are legendary. Three questions to pin down:
- Is
hiinclusive (len(arr) - 1) or exclusive (len(arr))? - Is the condition
lo < hiorlo <= hi? - On miss, do you return
lo,hi, or-1?
The version above uses inclusive hi, lo <= hi, and returns -1 on miss. Pick one style and memorize it. Mixing styles mid-function is where bugs live.
Binary Search on the Answer
The powerful generalization. Instead of searching for a value in an array, you binary-search over the answer space of a problem.
The idea: if you can answer "is X a valid solution?" in polynomial time, and the answer has a monotonic property (small X → true, big X → false, or vice versa), you can binary-search for the boundary.
Example: Koko Eating Bananas
Koko eats k bananas per hour. Given a list of piles and an hour limit h, find the smallest k that lets her finish within h hours.
def min_eating_speed(piles, h):
def can_finish(k):
hours = 0
for p in piles:
hours += (p + k - 1) // k # ceiling division
return hours <= h
lo, hi = 1, max(piles)
while lo < hi:
mid = (lo + hi) // 2
if can_finish(mid):
hi = mid # try smaller
else:
lo = mid + 1 # must go bigger
return lo
The answer space is [1, max(piles)]. can_finish(k) is monotonic: if k works, anything bigger works. Binary-search the boundary.
Pattern: two nested loops (outer over answer, inner over input) collapse to O(n log max) when the check is O(n) and the answer space has log max possibilities.
Example: Capacity to Ship Packages
Given package weights and D days to ship all of them, find the minimum ship capacity.
Same pattern:
def ship_within_days(weights, d):
def can_ship(capacity):
days = 1
load = 0
for w in weights:
if load + w > capacity:
days += 1
load = 0
load += w
return days <= d
lo, hi = max(weights), sum(weights)
while lo < hi:
mid = (lo + hi) // 2
if can_ship(mid):
hi = mid
else:
lo = mid + 1
return lo
Same structure. Different problem. Both are O(n log max).
This pattern solves problems that look like dynamic programming or greedy at first glance. If you can frame it as "smallest X such that condition(X) is true", binary search is often the clean answer.
Finding the Boundary
Variants of binary search find different boundaries.
First Occurrence of a Value
def first_occurrence(arr, target):
lo, hi = 0, len(arr) - 1
result = -1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
result = mid
hi = mid - 1 # keep searching left
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return result
Smallest Value Greater Than Target
def first_greater(arr, target):
lo, hi = 0, len(arr)
while lo < hi:
mid = (lo + hi) // 2
if arr[mid] <= target:
lo = mid + 1
else:
hi = mid
return lo if lo < len(arr) else -1
Same shape, different comparisons. Often easier to use bisect_right in Python.
Search a Rotated Sorted Array
Harder. The array was sorted, then rotated. Still binary-searchable, but you have to figure out which half is sorted:
def search_rotated(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
if arr[lo] <= arr[mid]: # left half is sorted
if arr[lo] <= target < arr[mid]:
hi = mid - 1
else:
lo = mid + 1
else: # right half is sorted
if arr[mid] < target <= arr[hi]:
lo = mid + 1
else:
hi = mid - 1
return -1
Subtler invariant but same O(log n) cost.
Hash Lookup vs Binary Search
Both answer "is target in X?". Trade-offs:
- Hash set: O(1) average, O(n) worst. No ordering. O(n) space.
- Sorted array + binary search: O(log n) worst. Ordering preserved. O(n) space (the array itself).
Hash for pure membership. Sorted for membership + range queries (bisect finds ranges; hash doesn't).
Common Pitfalls
Off-by-one. Pick one style of binary search and use it consistently. Mixing < with <= leads to infinite loops or missed elements.
Integer overflow. (lo + hi) // 2 can overflow in fixed-size int languages. In Python, ints are unbounded, so you're safe. In C/Java/Rust, prefer lo + (hi - lo) // 2.
Searching unsorted data. Binary search on an unsorted array returns garbage. Sort first, or use linear search.
Overkill. On a list of 5 items, linear is faster because the log's constant is bigger.
Not using bisect. Python's bisect is both faster (written in C) and less buggy than your hand-rolled version.
Next Steps
Continue to 06-sorting.md for the algorithms that arrange data in the first place.