Probability Fundamentals
Introduction
Probability theory is the mathematical framework for reasoning about uncertainty. In computer science, probability appears everywhere: analyzing randomized algorithms, machine learning, network protocols, cryptography, and data analysis. Understanding probability is essential for any serious computer scientist.
This reading covers the foundations of probability: sample spaces, events, probability axioms, conditional probability, and Bayes' theorem.
Learning Objectives
By the end of this reading, you will be able to:
- Define sample spaces and events formally
- Apply probability axioms to compute probabilities
- Use counting techniques for probability calculations
- Understand and apply conditional probability
- Apply Bayes' theorem to update beliefs with evidence
- Recognize independence and compute joint probabilities
1. Sample Spaces and Events
Sample Space
The sample space (denoted Ω or S) is the set of all possible outcomes of a random experiment.
Examples:
Coin flip: Ω = {H, T}
Die roll: Ω = {1, 2, 3, 4, 5, 6}
Two coin flips: Ω = {HH, HT, TH, TT}
Two die rolls: Ω = {(1,1), (1,2), ..., (6,6)} (36 outcomes)
Events
An event is a subset of the sample space - a collection of outcomes we're interested in.
Die roll where result is even: E = {2, 4, 6}
Die roll where result > 4: F = {5, 6}
At least one head in two flips: G = {HH, HT, TH}
Event Operations
Events can be combined using set operations:
| Operation | Notation | Meaning |
|---|---|---|
| Union | A ∪ B | A or B (or both) occurs |
| Intersection | A ∩ B | Both A and B occur |
| Complement | A' or Ā | A does not occur |
| Difference | A - B | A occurs but not B |
Example:
Ω = {1, 2, 3, 4, 5, 6}
A = {1, 2, 3} (result ≤ 3)
B = {2, 4, 6} (result is even)
A ∪ B = {1, 2, 3, 4, 6}
A ∩ B = {2}
A' = {4, 5, 6}
A - B = {1, 3}
Mutually Exclusive Events
Events A and B are mutually exclusive (disjoint) if they cannot both occur:
A ∩ B = ∅
Example: "Die shows 1" and "Die shows 6" are mutually exclusive.
2. Probability Axioms
Kolmogorov Axioms
Probability is a function P that assigns a number to each event, satisfying:
- Non-negativity: P(A) ≥ 0 for all events A
- Normalization: P(Ω) = 1
- Additivity: For mutually exclusive events A and B: P(A ∪ B) = P(A) + P(B)
Consequences of the Axioms
From these axioms, we can derive:
P(∅) = 0 Empty event has probability 0
P(A') = 1 - P(A) Complement rule
P(A ∪ B) = P(A) + P(B) - P(A ∩ B) Inclusion-exclusion
P(A) ≤ P(B) if A ⊆ B Monotonicity
0 ≤ P(A) ≤ 1 Probability bounds
Computing Probabilities
For finite sample spaces with equally likely outcomes:
P(A) = |A| / |Ω| = (number of favorable outcomes) / (total outcomes)
Example: Probability of rolling an even number:
Ω = {1, 2, 3, 4, 5, 6}, |Ω| = 6
A = {2, 4, 6}, |A| = 3
P(A) = 3/6 = 1/2
3. Counting Techniques
Counting is fundamental to probability with finite sample spaces.
Multiplication Principle
If experiment 1 has n₁ outcomes and experiment 2 has n₂ outcomes, the combined experiment has n₁ × n₂ outcomes.
# Password with 3 lowercase letters followed by 2 digits
letters = 26
digits = 10
total_passwords = 26 * 26 * 26 * 10 * 10 # = 1,757,600
Permutations
Permutation: An ordered arrangement of objects.
Number of ways to arrange n distinct objects: n!
Number of ways to arrange r objects from n distinct objects: P(n,r) = n!/(n-r)!
import math
def permutations(n, r):
"""Number of ways to arrange r items from n distinct items"""
return math.factorial(n) // math.factorial(n - r)
# Ways to arrange 3 books from 5 on a shelf
print(permutations(5, 3)) # 60
Combinations
Combination: An unordered selection of objects (order doesn't matter).
Number of ways to choose r objects from n: C(n,r) = n! / (r!(n-r)!)
Also written as $\binom{n}{r}$ ("n choose r").
def combinations(n, r):
"""Number of ways to choose r items from n (order doesn't matter)"""
return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))
# Ways to choose 3 books from 5 (not arranging them)
print(combinations(5, 3)) # 10
# Probability of getting exactly 2 heads in 5 coin flips
favorable = combinations(5, 2) # Ways to choose which 2 flips are heads
total = 2**5 # Total outcomes
prob = favorable / total # 10/32 = 0.3125
Counting Example: Poker Hands
# Standard 52-card deck
# Total 5-card hands
total_hands = combinations(52, 5) # 2,598,960
# Probability of a flush (5 cards same suit)
# Choose 1 suit (4 ways), then choose 5 from 13 cards of that suit
flushes = 4 * combinations(13, 5) # 5,148
# But this includes straight flushes, which we might want to exclude
# For simplicity, P(flush including straight flush) = 5148/2598960 ≈ 0.00198
# Probability of four of a kind
# Choose rank for four (13 ways), choose the kicker from remaining 48
four_of_kind = 13 * 48 # 624
p_four = four_of_kind / total_hands # ≈ 0.00024
4. Conditional Probability
Definition
The conditional probability of A given B is the probability of A occurring, knowing that B has occurred:
P(A|B) = P(A ∩ B) / P(B) (provided P(B) > 0)
Intuition: We're restricting our sample space to only outcomes where B occurred.
Example: Two Dice
Roll two dice. What's P(sum = 8 | first die = 3)?
Ω = {(1,1), (1,2), ..., (6,6)} (36 outcomes)
B = {first die is 3} = {(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)}
A = {sum = 8} = {(2,6), (3,5), (4,4), (5,3), (6,2)}
A ∩ B = {(3,5)}
P(A|B) = P(A ∩ B) / P(B) = (1/36) / (6/36) = 1/6
Chain Rule (Multiplication Rule)
Rearranging the conditional probability formula:
P(A ∩ B) = P(A|B) × P(B) = P(B|A) × P(A)
For multiple events:
P(A ∩ B ∩ C) = P(A) × P(B|A) × P(C|A ∩ B)
Law of Total Probability
If B₁, B₂, ..., Bₙ partition Ω (mutually exclusive and exhaustive), then:
P(A) = Σᵢ P(A|Bᵢ) × P(Bᵢ)
Example: A factory has 3 machines producing widgets.
Machine 1: 30% of production, 2% defect rate
Machine 2: 45% of production, 3% defect rate
Machine 3: 25% of production, 5% defect rate
P(defective) = P(D|M1)P(M1) + P(D|M2)P(M2) + P(D|M3)P(M3)
= 0.02(0.30) + 0.03(0.45) + 0.05(0.25)
= 0.006 + 0.0135 + 0.0125
= 0.032 = 3.2%
5. Bayes' Theorem
The Theorem
Bayes' theorem relates conditional probabilities:
P(B|A) = P(A|B) × P(B) / P(A)
Or, using the law of total probability:
P(Bᵢ|A) = P(A|Bᵢ) × P(Bᵢ) / Σⱼ P(A|Bⱼ) × P(Bⱼ)
Terminology
- P(B): Prior probability (belief before evidence)
- P(A|B): Likelihood (probability of evidence given hypothesis)
- P(B|A): Posterior probability (updated belief after evidence)
- P(A): Marginal likelihood (normalizing constant)
Example: Medical Diagnosis
A disease affects 1% of the population.
A test is 95% accurate:
- P(positive | diseased) = 0.95 (sensitivity)
- P(negative | healthy) = 0.95 (specificity)
If someone tests positive, what's P(diseased | positive)?
Let D = diseased, T+ = positive test
P(D) = 0.01 (prior)
P(T+|D) = 0.95 (sensitivity)
P(T+|D') = 0.05 (false positive rate = 1 - specificity)
P(T+) = P(T+|D)P(D) + P(T+|D')P(D')
= 0.95(0.01) + 0.05(0.99)
= 0.0095 + 0.0495
= 0.059
P(D|T+) = P(T+|D) × P(D) / P(T+)
= 0.95 × 0.01 / 0.059
= 0.161
Only about 16% chance of having the disease despite positive test!
This counterintuitive result shows why understanding Bayes' theorem matters.
Implementation
def bayes_theorem(prior, likelihood, marginal):
"""
Calculate posterior probability using Bayes' theorem
P(H|E) = P(E|H) * P(H) / P(E)
prior: P(H) - prior probability of hypothesis
likelihood: P(E|H) - probability of evidence given hypothesis
marginal: P(E) - total probability of evidence
"""
return (likelihood * prior) / marginal
def bayes_with_total_prob(prior, likelihood, false_positive_rate):
"""
Bayes' theorem with law of total probability for binary hypothesis
prior: P(H)
likelihood: P(E|H)
false_positive_rate: P(E|not H)
"""
# P(E) = P(E|H)P(H) + P(E|not H)P(not H)
marginal = likelihood * prior + false_positive_rate * (1 - prior)
return (likelihood * prior) / marginal
# Medical diagnosis example
prior_disease = 0.01
sensitivity = 0.95
false_positive = 0.05
p_disease_given_positive = bayes_with_total_prob(
prior_disease,
sensitivity,
false_positive
)
print(f"P(disease | positive test) = {p_disease_given_positive:.3f}") # 0.161
6. Independence
Definition
Events A and B are independent if the occurrence of one doesn't affect the probability of the other:
P(A|B) = P(A) or equivalently P(A ∩ B) = P(A) × P(B)
Testing Independence
def are_independent(p_a, p_b, p_a_and_b, tolerance=1e-9):
"""Check if events A and B are independent"""
return abs(p_a_and_b - p_a * p_b) < tolerance
# Example: Roll a die
# A = {1, 2, 3} (low roll)
# B = {1, 3, 5} (odd roll)
# A ∩ B = {1, 3}
p_a = 3/6 # 1/2
p_b = 3/6 # 1/2
p_a_and_b = 2/6 # 1/3
print(are_independent(p_a, p_b, p_a_and_b)) # False
print(f"P(A)P(B) = {p_a * p_b}, P(A∩B) = {p_a_and_b}")
# P(A)P(B) = 0.25, P(A∩B) = 0.333...
# Not equal, so not independent
Independent vs. Mutually Exclusive
These are different concepts:
- Mutually exclusive: A ∩ B = ∅ (can't both happen)
- Independent: P(A ∩ B) = P(A)P(B) (don't affect each other)
If A and B are mutually exclusive with positive probabilities, they cannot be independent:
P(A ∩ B) = 0 ≠ P(A)P(B) > 0
Conditional Independence
A and B are conditionally independent given C if:
P(A ∩ B | C) = P(A|C) × P(B|C)
This is important in probabilistic graphical models (Bayesian networks).
7. Applications in Computer Science
Randomized Algorithms
import random
def randomized_quicksort_analysis():
"""
In randomized quicksort, we pick a random pivot.
If we pick median: perfect split, T(n) = 2T(n/2) + O(n) = O(n log n)
If we pick min/max: worst split, T(n) = T(n-1) + O(n) = O(n²)
What's the probability of a "good" pivot (middle 50%)?
"""
# A pivot is "good" if it's in the middle 50% of elements
p_good = 0.5
# Expected number of random picks until good pivot
expected_picks = 1 / p_good # = 2
# With high probability, we get O(n log n) performance
return expected_picks
def analyze_hash_collision():
"""
Birthday problem: probability of collision in hash table
With n items hashed to m buckets, what's P(at least one collision)?
"""
def p_no_collision(n, m):
"""Probability of no collision with n items, m buckets"""
if n > m:
return 0
p = 1.0
for i in range(n):
p *= (m - i) / m
return p
def p_collision(n, m):
return 1 - p_no_collision(n, m)
# Birthday problem: 23 people, 365 days
print(f"P(collision with 23 people): {p_collision(23, 365):.3f}") # ≈ 0.507
# Hash table: 1000 items, 10000 buckets
print(f"P(collision with 1000 items, 10000 buckets): {p_collision(1000, 10000):.3f}")
analyze_hash_collision()
Network Reliability
def series_reliability(probabilities):
"""
Components in series: all must work
P(system works) = P(A) × P(B) × ... (if independent)
"""
result = 1.0
for p in probabilities:
result *= p
return result
def parallel_reliability(probabilities):
"""
Components in parallel: at least one must work
P(system works) = 1 - P(all fail) = 1 - (1-P(A))(1-P(B))...
"""
p_all_fail = 1.0
for p in probabilities:
p_all_fail *= (1 - p)
return 1 - p_all_fail
# Network with redundant paths
# Path 1 reliability: 0.9
# Path 2 reliability: 0.9
# Connected in parallel
print(f"Parallel reliability: {parallel_reliability([0.9, 0.9]):.3f}") # 0.99
# Three routers in series, each 0.99 reliable
print(f"Series reliability: {series_reliability([0.99, 0.99, 0.99]):.3f}") # 0.970
Spam Classification (Naive Bayes)
def naive_bayes_spam():
"""
Simple spam classifier using Bayes' theorem
P(spam | words) ∝ P(words | spam) × P(spam)
Naive assumption: words are conditionally independent given class
"""
# Training data (simplified)
spam_words = {'free': 0.8, 'winner': 0.7, 'click': 0.6, 'hello': 0.1}
ham_words = {'free': 0.1, 'winner': 0.05, 'click': 0.1, 'hello': 0.5}
p_spam = 0.3 # Prior: 30% of emails are spam
p_ham = 0.7
def classify(email_words):
"""Classify email as spam or ham"""
# Calculate P(words | spam) × P(spam)
log_p_spam_given_words = 0
log_p_ham_given_words = 0
import math
for word in email_words:
if word in spam_words:
log_p_spam_given_words += math.log(spam_words[word])
if word in ham_words:
log_p_ham_given_words += math.log(ham_words[word])
log_p_spam_given_words += math.log(p_spam)
log_p_ham_given_words += math.log(p_ham)
if log_p_spam_given_words > log_p_ham_given_words:
return "SPAM"
else:
return "HAM"
# Test
print(classify(['free', 'winner', 'click'])) # SPAM
print(classify(['hello'])) # HAM
naive_bayes_spam()
Exercises
Basic
A bag contains 5 red balls and 3 blue balls. Two balls are drawn without replacement. What is:
- P(both red)?
- P(second is blue | first is red)?
A fair die is rolled twice. Find P(sum = 7).
In a class of 30 students, 18 play basketball and 15 play soccer. If 10 play both, what's the probability a randomly chosen student plays neither?
Intermediate
You have three boxes:
- Box 1: 2 gold coins
- Box 2: 1 gold, 1 silver coin
- Box 3: 2 silver coins
You pick a random box and draw a gold coin. What's P(other coin in box is gold)?
Implement a function to compute P(at least k successes in n independent trials) with success probability p.
A system has three components with reliabilities 0.95, 0.90, and 0.85. Component 1 is in series with a parallel combination of components 2 and 3. What's the system reliability?
Advanced
Monty Hall Problem: You're on a game show with 3 doors. Behind one is a car; behind the others, goats. You pick door 1. The host, who knows what's behind each door, opens door 3 to reveal a goat. Should you switch to door 2? Prove your answer using Bayes' theorem.
Implement a simulation to estimate π using random sampling (Monte Carlo method).
Design a probabilistic data structure (Bloom filter) and analyze its false positive probability.
Summary
- Sample space is the set of all outcomes; events are subsets
- Probability axioms provide the foundation: non-negative, normalized, additive
- Counting (permutations, combinations) enables probability calculation
- Conditional probability P(A|B) = P(A∩B)/P(B) restricts to a sub-universe
- Bayes' theorem updates beliefs based on evidence
- Independence means P(A∩B) = P(A)P(B)
These concepts are fundamental to algorithm analysis, machine learning, networking, and many other CS areas.