Searching: Finding Things Fast

This chapter covers linear search, binary search, and the surprisingly powerful "binary search on the answer" pattern.

The simplest. Walk from left to right; return when you find the target.

def linear_search(arr, target):
    for i, x in enumerate(arr):
        if x == target:
            return i
    return -1

O(n) worst case. Unavoidable when the data is unsorted and unindexed.

In Python, arr.index(target) does the same thing and raises ValueError if not found. target in arr does a linear scan too.

When Linear Search Is Fine

Don't reach for fancier algorithms when:

  • The list is tiny (under ~20 items). The constant factor beats a binary search's log.
  • The list is unsorted and you'd have to sort it to binary-search. Sorting is O(n log n); one linear scan is O(n).
  • You'll search once and move on. Building a hash set for one lookup is wasteful.

Rule of thumb: for "search once", linear is fine; for "search many times", build a hash set.

The reason data is often sorted. In a sorted array, you can find a target in O(log n).

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        if arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Each step halves the search range. log₂(1,000,000) is about 20 steps. At a million items, binary search is 50,000x faster than linear.

Hand-rolling binary search is bug-prone. Python has bisect:

from bisect import bisect_left, bisect_right, insort

arr = [1, 3, 5, 7, 9]

i = bisect_left(arr, 5)     # 2: leftmost position where 5 fits
j = bisect_right(arr, 5)    # 3: rightmost position after any existing 5s

# Check if value exists:
i = bisect_left(arr, 5)
found = i < len(arr) and arr[i] == 5

# Insert while maintaining sorted order:
insort(arr, 4)              # [1, 3, 4, 5, 7, 9]

Use bisect for real code. Hand-roll binary search in interviews and to understand the pattern.

The Classic Bugs

Off-by-one errors in binary search are legendary. Three questions to pin down:

  1. Is hi inclusive (len(arr) - 1) or exclusive (len(arr))?
  2. Is the condition lo < hi or lo <= hi?
  3. On miss, do you return lo, hi, or -1?

The version above uses inclusive hi, lo <= hi, and returns -1 on miss. Pick one style and memorize it. Mixing styles mid-function is where bugs live.

Binary Search on the Answer

The powerful generalization. Instead of searching for a value in an array, you binary-search over the answer space of a problem.

The idea: if you can answer "is X a valid solution?" in polynomial time, and the answer has a monotonic property (small X → true, big X → false, or vice versa), you can binary-search for the boundary.

Example: Koko Eating Bananas

Koko eats k bananas per hour. Given a list of piles and an hour limit h, find the smallest k that lets her finish within h hours.

def min_eating_speed(piles, h):
    def can_finish(k):
        hours = 0
        for p in piles:
            hours += (p + k - 1) // k    # ceiling division
        return hours <= h

    lo, hi = 1, max(piles)
    while lo < hi:
        mid = (lo + hi) // 2
        if can_finish(mid):
            hi = mid              # try smaller
        else:
            lo = mid + 1          # must go bigger
    return lo

The answer space is [1, max(piles)]. can_finish(k) is monotonic: if k works, anything bigger works. Binary-search the boundary.

Pattern: two nested loops (outer over answer, inner over input) collapse to O(n log max) when the check is O(n) and the answer space has log max possibilities.

Example: Capacity to Ship Packages

Given package weights and D days to ship all of them, find the minimum ship capacity.

Same pattern:

def ship_within_days(weights, d):
    def can_ship(capacity):
        days = 1
        load = 0
        for w in weights:
            if load + w > capacity:
                days += 1
                load = 0
            load += w
        return days <= d

    lo, hi = max(weights), sum(weights)
    while lo < hi:
        mid = (lo + hi) // 2
        if can_ship(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

Same structure. Different problem. Both are O(n log max).

This pattern solves problems that look like dynamic programming or greedy at first glance. If you can frame it as "smallest X such that condition(X) is true", binary search is often the clean answer.

Finding the Boundary

Variants of binary search find different boundaries.

First Occurrence of a Value

def first_occurrence(arr, target):
    lo, hi = 0, len(arr) - 1
    result = -1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            result = mid
            hi = mid - 1           # keep searching left
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return result

Smallest Value Greater Than Target

def first_greater(arr, target):
    lo, hi = 0, len(arr)
    while lo < hi:
        mid = (lo + hi) // 2
        if arr[mid] <= target:
            lo = mid + 1
        else:
            hi = mid
    return lo if lo < len(arr) else -1

Same shape, different comparisons. Often easier to use bisect_right in Python.

Search a Rotated Sorted Array

Harder. The array was sorted, then rotated. Still binary-searchable, but you have to figure out which half is sorted:

def search_rotated(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        if arr[lo] <= arr[mid]:      # left half is sorted
            if arr[lo] <= target < arr[mid]:
                hi = mid - 1
            else:
                lo = mid + 1
        else:                         # right half is sorted
            if arr[mid] < target <= arr[hi]:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

Subtler invariant but same O(log n) cost.

Both answer "is target in X?". Trade-offs:

  • Hash set: O(1) average, O(n) worst. No ordering. O(n) space.
  • Sorted array + binary search: O(log n) worst. Ordering preserved. O(n) space (the array itself).

Hash for pure membership. Sorted for membership + range queries (bisect finds ranges; hash doesn't).

Common Pitfalls

Off-by-one. Pick one style of binary search and use it consistently. Mixing < with <= leads to infinite loops or missed elements.

Integer overflow. (lo + hi) // 2 can overflow in fixed-size int languages. In Python, ints are unbounded, so you're safe. In C/Java/Rust, prefer lo + (hi - lo) // 2.

Searching unsorted data. Binary search on an unsorted array returns garbage. Sort first, or use linear search.

Overkill. On a list of 5 items, linear is faster because the log's constant is bigger.

Not using bisect. Python's bisect is both faster (written in C) and less buggy than your hand-rolled version.

Next Steps

Continue to 06-sorting.md for the algorithms that arrange data in the first place.