Sorting Algorithms
Introduction
Sorting is arranging elements in a specific order (ascending, descending, or custom). It's one of the most studied problems in computer science, with numerous algorithms offering different tradeoffs between time complexity, space usage, and stability.
Learning Objectives
By the end of this reading, you will be able to:
- Implement fundamental sorting algorithms
- Analyze time and space complexity of sorts
- Understand stability in sorting
- Choose the right sorting algorithm for different scenarios
1. Sorting Fundamentals
What Makes a Good Sort?
- Time complexity: How fast?
- Space complexity: How much extra memory?
- Stability: Do equal elements maintain relative order?
- Adaptivity: Is it faster on nearly-sorted data?
Stability Example
Original: [(Alice, 90), (Bob, 85), (Carol, 90)]
Sort by score:
Stable: [(Bob, 85), (Alice, 90), (Carol, 90)] # Alice before Carol
Unstable: [(Bob, 85), (Carol, 90), (Alice, 90)] # Order may change
2. Simple Sorts: O(n²)
Bubble Sort
Repeatedly swap adjacent elements if out of order.
def bubble_sort(arr):
"""
Time: O(n²), Space: O(1), Stable: Yes
"""
n = len(arr)
for i in range(n):
swapped = False
for j in range(n - i - 1):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped: # Optimization: already sorted
break
Characteristics:
- Simple to understand
- Very slow for large datasets
- Adaptive: O(n) if already sorted
Selection Sort
Find minimum, swap to front, repeat.
def selection_sort(arr):
"""
Time: O(n²), Space: O(1), Stable: No
"""
n = len(arr)
for i in range(n):
min_idx = i
for j in range(i + 1, n):
if arr[j] < arr[min_idx]:
min_idx = j
arr[i], arr[min_idx] = arr[min_idx], arr[i]
Characteristics:
- Always O(n²) - no early termination
- Minimum number of swaps: O(n)
- Not stable
Insertion Sort
Build sorted portion by inserting each element in correct position.
def insertion_sort(arr):
"""
Time: O(n²) worst, O(n) best, Space: O(1), Stable: Yes
"""
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
Characteristics:
- O(n) on nearly sorted data
- Good for small arrays
- Online: can sort as data arrives
- Often used as base case in hybrid sorts
3. Efficient Sorts: O(n log n)
Merge Sort
Divide array in half, sort each half, merge.
def merge_sort(arr):
"""
Time: O(n log n), Space: O(n), Stable: Yes
"""
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
In-place merge sort variant:
def merge_sort_inplace(arr, left=0, right=None):
if right is None:
right = len(arr) - 1
if left < right:
mid = (left + right) // 2
merge_sort_inplace(arr, left, mid)
merge_sort_inplace(arr, mid + 1, right)
merge_inplace(arr, left, mid, right)
def merge_inplace(arr, left, mid, right):
left_copy = arr[left:mid + 1]
right_copy = arr[mid + 1:right + 1]
i = j = 0
k = left
while i < len(left_copy) and j < len(right_copy):
if left_copy[i] <= right_copy[j]:
arr[k] = left_copy[i]
i += 1
else:
arr[k] = right_copy[j]
j += 1
k += 1
while i < len(left_copy):
arr[k] = left_copy[i]
i += 1
k += 1
while j < len(right_copy):
arr[k] = right_copy[j]
j += 1
k += 1
Characteristics:
- Guaranteed O(n log n)
- Stable
- Requires O(n) extra space
- Good for linked lists (O(1) space possible)
Quick Sort
Choose pivot, partition array, sort partitions.
def quick_sort(arr, low=0, high=None):
"""
Time: O(n log n) avg, O(n²) worst, Space: O(log n), Stable: No
"""
if high is None:
high = len(arr) - 1
if low < high:
pivot_idx = partition(arr, low, high)
quick_sort(arr, low, pivot_idx - 1)
quick_sort(arr, pivot_idx + 1, high)
def partition(arr, low, high):
"""Lomuto partition scheme"""
pivot = arr[high]
i = low - 1
for j in range(low, high):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
arr[i + 1], arr[high] = arr[high], arr[i + 1]
return i + 1
Randomized Quick Sort:
import random
def quick_sort_random(arr, low=0, high=None):
if high is None:
high = len(arr) - 1
if low < high:
# Random pivot selection
pivot_idx = random.randint(low, high)
arr[pivot_idx], arr[high] = arr[high], arr[pivot_idx]
pivot_idx = partition(arr, low, high)
quick_sort_random(arr, low, pivot_idx - 1)
quick_sort_random(arr, pivot_idx + 1, high)
Characteristics:
- O(n log n) average, O(n²) worst (rare with random pivot)
- In-place (O(log n) stack space)
- Not stable
- Often fastest in practice
Heap Sort
Build heap, repeatedly extract max.
def heap_sort(arr):
"""
Time: O(n log n), Space: O(1), Stable: No
"""
n = len(arr)
# Build max-heap
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# Extract elements
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
heapify(arr, i, 0)
def heapify(arr, n, i):
largest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
Characteristics:
- Guaranteed O(n log n)
- In-place
- Not stable
- Poor cache performance
4. Linear Time Sorts: O(n)
These achieve O(n) by not comparing elements.
Counting Sort
Count occurrences, reconstruct array.
def counting_sort(arr, max_val=None):
"""
Time: O(n + k), Space: O(k), Stable: Yes
k = range of values
"""
if max_val is None:
max_val = max(arr)
count = [0] * (max_val + 1)
for num in arr:
count[num] += 1
result = []
for i, c in enumerate(count):
result.extend([i] * c)
return result
For objects (stable):
def counting_sort_stable(arr, key=lambda x: x):
"""Stable counting sort for objects"""
max_key = max(key(x) for x in arr)
count = [0] * (max_key + 1)
for item in arr:
count[key(item)] += 1
# Cumulative count
for i in range(1, len(count)):
count[i] += count[i - 1]
result = [None] * len(arr)
for item in reversed(arr):
k = key(item)
count[k] -= 1
result[count[k]] = item
return result
Characteristics:
- Great when k (range) is small
- Not comparison-based
- Stable
Radix Sort
Sort by each digit, from least to most significant.
def radix_sort(arr):
"""
Time: O(d * (n + b)), Space: O(n + b)
d = number of digits, b = base
"""
if not arr:
return arr
max_val = max(arr)
exp = 1
while max_val // exp > 0:
counting_sort_by_digit(arr, exp)
exp *= 10
def counting_sort_by_digit(arr, exp):
n = len(arr)
output = [0] * n
count = [0] * 10
for num in arr:
digit = (num // exp) % 10
count[digit] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for i in range(n - 1, -1, -1):
digit = (arr[i] // exp) % 10
count[digit] -= 1
output[count[digit]] = arr[i]
for i in range(n):
arr[i] = output[i]
Characteristics:
- Good for fixed-length integers or strings
- O(d * n) where d is digits/length
Bucket Sort
Distribute elements into buckets, sort buckets.
def bucket_sort(arr):
"""
Time: O(n + k) average, Space: O(n + k)
Best for uniformly distributed data in [0, 1)
"""
if not arr:
return arr
n = len(arr)
buckets = [[] for _ in range(n)]
# Distribute into buckets
for num in arr:
bucket_idx = int(n * num) # Assumes num in [0, 1)
buckets[bucket_idx].append(num)
# Sort each bucket
for bucket in buckets:
bucket.sort() # Or use insertion sort
# Concatenate
result = []
for bucket in buckets:
result.extend(bucket)
return result
5. Special Purpose Sorts
Tim Sort (Python's Default)
Hybrid of merge sort and insertion sort.
# Python uses Tim Sort internally
arr.sort() # In-place
sorted_arr = sorted(arr) # Returns new list
# Custom key
arr.sort(key=lambda x: x.lower())
arr.sort(key=len)
# Reverse
arr.sort(reverse=True)
# Multiple criteria
students.sort(key=lambda s: (s.grade, s.name))
Partial Sorts
When you only need k smallest/largest elements.
import heapq
# k smallest - O(n log k)
k_smallest = heapq.nsmallest(k, arr)
# k largest - O(n log k)
k_largest = heapq.nlargest(k, arr)
# Quick select for kth element - O(n) average
def quick_select(arr, k):
if len(arr) == 1:
return arr[0]
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
mid = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
if k < len(left):
return quick_select(left, k)
elif k < len(left) + len(mid):
return pivot
else:
return quick_select(right, k - len(left) - len(mid))
6. Sorting Algorithm Comparison
| Algorithm | Time (Best) | Time (Avg) | Time (Worst) | Space | Stable |
|---|---|---|---|---|---|
| Bubble | O(n) | O(n²) | O(n²) | O(1) | Yes |
| Selection | O(n²) | O(n²) | O(n²) | O(1) | No |
| Insertion | O(n) | O(n²) | O(n²) | O(1) | Yes |
| Merge | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| Quick | O(n log n) | O(n log n) | O(n²) | O(log n) | No |
| Heap | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
| Counting | O(n+k) | O(n+k) | O(n+k) | O(k) | Yes |
| Radix | O(d·n) | O(d·n) | O(d·n) | O(n+k) | Yes |
7. Choosing a Sorting Algorithm
Small Arrays (n < 50)
Use insertion sort - low overhead, good cache performance.
General Purpose
Use quick sort (or language default like Tim Sort).
Guaranteed O(n log n)
Use merge sort or heap sort.
Need Stability
Use merge sort or Tim Sort.
Known Range of Integers
Use counting sort or radix sort.
Nearly Sorted Data
Use insertion sort or Tim Sort (adaptive).
Memory Constrained
Use heap sort (in-place, O(1) extra).
Linked Lists
Use merge sort (O(1) extra space possible).
8. Stability Deep Dive
Why stability matters:
# Sorting by multiple criteria
students = [
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'English', 88),
('Bob', 'English', 92)
]
# First sort by name, then by subject
# With stable sort, we can do:
students.sort(key=lambda x: x[0]) # By name
students.sort(key=lambda x: x[1]) # By subject
# Results are sorted by subject, with names in order within each subject
Exercises
Basic
Implement insertion sort and count the number of swaps for a given array.
Sort an array of 0s, 1s, and 2s in O(n) time (Dutch National Flag problem).
Modify bubble sort to sort in descending order.
Intermediate
Implement merge sort for a linked list.
Find the kth largest element using quick select.
Sort an array of strings by length, then alphabetically for same length.
Advanced
Implement Tim Sort (insertion sort for small runs, merge sort to combine).
Sort an array when each element is at most k positions from its sorted position.
External merge sort: sort a file that doesn't fit in memory.
Summary
- O(n²) sorts: bubble, selection, insertion - simple but slow
- O(n log n) sorts: merge, quick, heap - efficient comparison-based
- O(n) sorts: counting, radix, bucket - non-comparison, special cases
- Consider: time, space, stability, adaptivity, practical performance
- Python's Tim Sort handles most cases well