Probability Fundamentals

Introduction

Probability theory is the mathematical framework for reasoning about uncertainty. In computer science, probability appears everywhere: analyzing randomized algorithms, machine learning, network protocols, cryptography, and data analysis. Understanding probability is essential for any serious computer scientist.

This reading covers the foundations of probability: sample spaces, events, probability axioms, conditional probability, and Bayes' theorem.

Learning Objectives

By the end of this reading, you will be able to:

  • Define sample spaces and events formally
  • Apply probability axioms to compute probabilities
  • Use counting techniques for probability calculations
  • Understand and apply conditional probability
  • Apply Bayes' theorem to update beliefs with evidence
  • Recognize independence and compute joint probabilities

1. Sample Spaces and Events

Sample Space

The sample space (denoted Ω or S) is the set of all possible outcomes of a random experiment.

Examples:

Coin flip:       Ω = {H, T}
Die roll:        Ω = {1, 2, 3, 4, 5, 6}
Two coin flips:  Ω = {HH, HT, TH, TT}
Two die rolls:   Ω = {(1,1), (1,2), ..., (6,6)}  (36 outcomes)

Events

An event is a subset of the sample space - a collection of outcomes we're interested in.

Die roll where result is even:  E = {2, 4, 6}
Die roll where result > 4:      F = {5, 6}
At least one head in two flips: G = {HH, HT, TH}

Event Operations

Events can be combined using set operations:

OperationNotationMeaning
UnionA ∪ BA or B (or both) occurs
IntersectionA ∩ BBoth A and B occur
ComplementA' or ĀA does not occur
DifferenceA - BA occurs but not B

Example:

Ω = {1, 2, 3, 4, 5, 6}
A = {1, 2, 3}      (result ≤ 3)
B = {2, 4, 6}      (result is even)

A ∪ B = {1, 2, 3, 4, 6}
A ∩ B = {2}
A' = {4, 5, 6}
A - B = {1, 3}

Mutually Exclusive Events

Events A and B are mutually exclusive (disjoint) if they cannot both occur:

A ∩ B = ∅

Example: "Die shows 1" and "Die shows 6" are mutually exclusive.


2. Probability Axioms

Kolmogorov Axioms

Probability is a function P that assigns a number to each event, satisfying:

  1. Non-negativity: P(A) ≥ 0 for all events A
  2. Normalization: P(Ω) = 1
  3. Additivity: For mutually exclusive events A and B: P(A ∪ B) = P(A) + P(B)

Consequences of the Axioms

From these axioms, we can derive:

P(∅) = 0                           Empty event has probability 0
P(A') = 1 - P(A)                   Complement rule
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)  Inclusion-exclusion
P(A) ≤ P(B) if A ⊆ B               Monotonicity
0 ≤ P(A) ≤ 1                       Probability bounds

Computing Probabilities

For finite sample spaces with equally likely outcomes:

P(A) = |A| / |Ω| = (number of favorable outcomes) / (total outcomes)

Example: Probability of rolling an even number:

Ω = {1, 2, 3, 4, 5, 6}, |Ω| = 6
A = {2, 4, 6}, |A| = 3
P(A) = 3/6 = 1/2

3. Counting Techniques

Counting is fundamental to probability with finite sample spaces.

Multiplication Principle

If experiment 1 has n₁ outcomes and experiment 2 has n₂ outcomes, the combined experiment has n₁ × n₂ outcomes.

# Password with 3 lowercase letters followed by 2 digits
letters = 26
digits = 10
total_passwords = 26 * 26 * 26 * 10 * 10  # = 1,757,600

Permutations

Permutation: An ordered arrangement of objects.

Number of ways to arrange n distinct objects: n!

Number of ways to arrange r objects from n distinct objects: P(n,r) = n!/(n-r)!

import math

def permutations(n, r):
    """Number of ways to arrange r items from n distinct items"""
    return math.factorial(n) // math.factorial(n - r)

# Ways to arrange 3 books from 5 on a shelf
print(permutations(5, 3))  # 60

Combinations

Combination: An unordered selection of objects (order doesn't matter).

Number of ways to choose r objects from n: C(n,r) = n! / (r!(n-r)!)

Also written as $\binom{n}{r}$ ("n choose r").

def combinations(n, r):
    """Number of ways to choose r items from n (order doesn't matter)"""
    return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))

# Ways to choose 3 books from 5 (not arranging them)
print(combinations(5, 3))  # 10

# Probability of getting exactly 2 heads in 5 coin flips
favorable = combinations(5, 2)  # Ways to choose which 2 flips are heads
total = 2**5                     # Total outcomes
prob = favorable / total         # 10/32 = 0.3125

Counting Example: Poker Hands

# Standard 52-card deck
# Total 5-card hands
total_hands = combinations(52, 5)  # 2,598,960

# Probability of a flush (5 cards same suit)
# Choose 1 suit (4 ways), then choose 5 from 13 cards of that suit
flushes = 4 * combinations(13, 5)  # 5,148
# But this includes straight flushes, which we might want to exclude
# For simplicity, P(flush including straight flush) = 5148/2598960 ≈ 0.00198

# Probability of four of a kind
# Choose rank for four (13 ways), choose the kicker from remaining 48
four_of_kind = 13 * 48  # 624
p_four = four_of_kind / total_hands  # ≈ 0.00024

4. Conditional Probability

Definition

The conditional probability of A given B is the probability of A occurring, knowing that B has occurred:

P(A|B) = P(A ∩ B) / P(B)    (provided P(B) > 0)

Intuition: We're restricting our sample space to only outcomes where B occurred.

Example: Two Dice

Roll two dice. What's P(sum = 8 | first die = 3)?

Ω = {(1,1), (1,2), ..., (6,6)}  (36 outcomes)

B = {first die is 3} = {(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)}
A = {sum = 8} = {(2,6), (3,5), (4,4), (5,3), (6,2)}

A ∩ B = {(3,5)}

P(A|B) = P(A ∩ B) / P(B) = (1/36) / (6/36) = 1/6

Chain Rule (Multiplication Rule)

Rearranging the conditional probability formula:

P(A ∩ B) = P(A|B) × P(B) = P(B|A) × P(A)

For multiple events:

P(A ∩ B ∩ C) = P(A) × P(B|A) × P(C|A ∩ B)

Law of Total Probability

If B₁, B₂, ..., Bₙ partition Ω (mutually exclusive and exhaustive), then:

P(A) = Σᵢ P(A|Bᵢ) × P(Bᵢ)

Example: A factory has 3 machines producing widgets.

Machine 1: 30% of production, 2% defect rate
Machine 2: 45% of production, 3% defect rate
Machine 3: 25% of production, 5% defect rate

P(defective) = P(D|M1)P(M1) + P(D|M2)P(M2) + P(D|M3)P(M3)
             = 0.02(0.30) + 0.03(0.45) + 0.05(0.25)
             = 0.006 + 0.0135 + 0.0125
             = 0.032 = 3.2%

5. Bayes' Theorem

The Theorem

Bayes' theorem relates conditional probabilities:

P(B|A) = P(A|B) × P(B) / P(A)

Or, using the law of total probability:

P(Bᵢ|A) = P(A|Bᵢ) × P(Bᵢ) / Σⱼ P(A|Bⱼ) × P(Bⱼ)

Terminology

  • P(B): Prior probability (belief before evidence)
  • P(A|B): Likelihood (probability of evidence given hypothesis)
  • P(B|A): Posterior probability (updated belief after evidence)
  • P(A): Marginal likelihood (normalizing constant)

Example: Medical Diagnosis

A disease affects 1% of the population.
A test is 95% accurate:
  - P(positive | diseased) = 0.95  (sensitivity)
  - P(negative | healthy) = 0.95   (specificity)

If someone tests positive, what's P(diseased | positive)?

Let D = diseased, T+ = positive test

P(D) = 0.01           (prior)
P(T+|D) = 0.95        (sensitivity)
P(T+|D') = 0.05       (false positive rate = 1 - specificity)

P(T+) = P(T+|D)P(D) + P(T+|D')P(D')
      = 0.95(0.01) + 0.05(0.99)
      = 0.0095 + 0.0495
      = 0.059

P(D|T+) = P(T+|D) × P(D) / P(T+)
        = 0.95 × 0.01 / 0.059
        = 0.161

Only about 16% chance of having the disease despite positive test!

This counterintuitive result shows why understanding Bayes' theorem matters.

Implementation

def bayes_theorem(prior, likelihood, marginal):
    """
    Calculate posterior probability using Bayes' theorem

    P(H|E) = P(E|H) * P(H) / P(E)

    prior: P(H) - prior probability of hypothesis
    likelihood: P(E|H) - probability of evidence given hypothesis
    marginal: P(E) - total probability of evidence
    """
    return (likelihood * prior) / marginal

def bayes_with_total_prob(prior, likelihood, false_positive_rate):
    """
    Bayes' theorem with law of total probability for binary hypothesis

    prior: P(H)
    likelihood: P(E|H)
    false_positive_rate: P(E|not H)
    """
    # P(E) = P(E|H)P(H) + P(E|not H)P(not H)
    marginal = likelihood * prior + false_positive_rate * (1 - prior)
    return (likelihood * prior) / marginal

# Medical diagnosis example
prior_disease = 0.01
sensitivity = 0.95
false_positive = 0.05

p_disease_given_positive = bayes_with_total_prob(
    prior_disease,
    sensitivity,
    false_positive
)
print(f"P(disease | positive test) = {p_disease_given_positive:.3f}")  # 0.161

6. Independence

Definition

Events A and B are independent if the occurrence of one doesn't affect the probability of the other:

P(A|B) = P(A)    or equivalently    P(A ∩ B) = P(A) × P(B)

Testing Independence

def are_independent(p_a, p_b, p_a_and_b, tolerance=1e-9):
    """Check if events A and B are independent"""
    return abs(p_a_and_b - p_a * p_b) < tolerance

# Example: Roll a die
# A = {1, 2, 3} (low roll)
# B = {1, 3, 5} (odd roll)
# A ∩ B = {1, 3}

p_a = 3/6  # 1/2
p_b = 3/6  # 1/2
p_a_and_b = 2/6  # 1/3

print(are_independent(p_a, p_b, p_a_and_b))  # False
print(f"P(A)P(B) = {p_a * p_b}, P(A∩B) = {p_a_and_b}")
# P(A)P(B) = 0.25, P(A∩B) = 0.333...
# Not equal, so not independent

Independent vs. Mutually Exclusive

These are different concepts:

  • Mutually exclusive: A ∩ B = ∅ (can't both happen)
  • Independent: P(A ∩ B) = P(A)P(B) (don't affect each other)

If A and B are mutually exclusive with positive probabilities, they cannot be independent:

P(A ∩ B) = 0 ≠ P(A)P(B) > 0

Conditional Independence

A and B are conditionally independent given C if:

P(A ∩ B | C) = P(A|C) × P(B|C)

This is important in probabilistic graphical models (Bayesian networks).


7. Applications in Computer Science

Randomized Algorithms

import random

def randomized_quicksort_analysis():
    """
    In randomized quicksort, we pick a random pivot.

    If we pick median: perfect split, T(n) = 2T(n/2) + O(n) = O(n log n)
    If we pick min/max: worst split, T(n) = T(n-1) + O(n) = O(n²)

    What's the probability of a "good" pivot (middle 50%)?
    """
    # A pivot is "good" if it's in the middle 50% of elements
    p_good = 0.5

    # Expected number of random picks until good pivot
    expected_picks = 1 / p_good  # = 2

    # With high probability, we get O(n log n) performance
    return expected_picks

def analyze_hash_collision():
    """
    Birthday problem: probability of collision in hash table

    With n items hashed to m buckets, what's P(at least one collision)?
    """
    def p_no_collision(n, m):
        """Probability of no collision with n items, m buckets"""
        if n > m:
            return 0
        p = 1.0
        for i in range(n):
            p *= (m - i) / m
        return p

    def p_collision(n, m):
        return 1 - p_no_collision(n, m)

    # Birthday problem: 23 people, 365 days
    print(f"P(collision with 23 people): {p_collision(23, 365):.3f}")  # ≈ 0.507

    # Hash table: 1000 items, 10000 buckets
    print(f"P(collision with 1000 items, 10000 buckets): {p_collision(1000, 10000):.3f}")

analyze_hash_collision()

Network Reliability

def series_reliability(probabilities):
    """
    Components in series: all must work
    P(system works) = P(A) × P(B) × ... (if independent)
    """
    result = 1.0
    for p in probabilities:
        result *= p
    return result

def parallel_reliability(probabilities):
    """
    Components in parallel: at least one must work
    P(system works) = 1 - P(all fail) = 1 - (1-P(A))(1-P(B))...
    """
    p_all_fail = 1.0
    for p in probabilities:
        p_all_fail *= (1 - p)
    return 1 - p_all_fail

# Network with redundant paths
# Path 1 reliability: 0.9
# Path 2 reliability: 0.9
# Connected in parallel
print(f"Parallel reliability: {parallel_reliability([0.9, 0.9]):.3f}")  # 0.99

# Three routers in series, each 0.99 reliable
print(f"Series reliability: {series_reliability([0.99, 0.99, 0.99]):.3f}")  # 0.970

Spam Classification (Naive Bayes)

def naive_bayes_spam():
    """
    Simple spam classifier using Bayes' theorem

    P(spam | words) ∝ P(words | spam) × P(spam)

    Naive assumption: words are conditionally independent given class
    """
    # Training data (simplified)
    spam_words = {'free': 0.8, 'winner': 0.7, 'click': 0.6, 'hello': 0.1}
    ham_words = {'free': 0.1, 'winner': 0.05, 'click': 0.1, 'hello': 0.5}

    p_spam = 0.3  # Prior: 30% of emails are spam
    p_ham = 0.7

    def classify(email_words):
        """Classify email as spam or ham"""
        # Calculate P(words | spam) × P(spam)
        log_p_spam_given_words = 0
        log_p_ham_given_words = 0

        import math

        for word in email_words:
            if word in spam_words:
                log_p_spam_given_words += math.log(spam_words[word])
            if word in ham_words:
                log_p_ham_given_words += math.log(ham_words[word])

        log_p_spam_given_words += math.log(p_spam)
        log_p_ham_given_words += math.log(p_ham)

        if log_p_spam_given_words > log_p_ham_given_words:
            return "SPAM"
        else:
            return "HAM"

    # Test
    print(classify(['free', 'winner', 'click']))  # SPAM
    print(classify(['hello']))  # HAM

naive_bayes_spam()

Exercises

Basic

  1. A bag contains 5 red balls and 3 blue balls. Two balls are drawn without replacement. What is:

    • P(both red)?
    • P(second is blue | first is red)?
  2. A fair die is rolled twice. Find P(sum = 7).

  3. In a class of 30 students, 18 play basketball and 15 play soccer. If 10 play both, what's the probability a randomly chosen student plays neither?

Intermediate

  1. You have three boxes:

    • Box 1: 2 gold coins
    • Box 2: 1 gold, 1 silver coin
    • Box 3: 2 silver coins

    You pick a random box and draw a gold coin. What's P(other coin in box is gold)?

  2. Implement a function to compute P(at least k successes in n independent trials) with success probability p.

  3. A system has three components with reliabilities 0.95, 0.90, and 0.85. Component 1 is in series with a parallel combination of components 2 and 3. What's the system reliability?

Advanced

  1. Monty Hall Problem: You're on a game show with 3 doors. Behind one is a car; behind the others, goats. You pick door 1. The host, who knows what's behind each door, opens door 3 to reveal a goat. Should you switch to door 2? Prove your answer using Bayes' theorem.

  2. Implement a simulation to estimate π using random sampling (Monte Carlo method).

  3. Design a probabilistic data structure (Bloom filter) and analyze its false positive probability.


Summary

  • Sample space is the set of all outcomes; events are subsets
  • Probability axioms provide the foundation: non-negative, normalized, additive
  • Counting (permutations, combinations) enables probability calculation
  • Conditional probability P(A|B) = P(A∩B)/P(B) restricts to a sub-universe
  • Bayes' theorem updates beliefs based on evidence
  • Independence means P(A∩B) = P(A)P(B)

These concepts are fundamental to algorithm analysis, machine learning, networking, and many other CS areas.


Next Reading

Random Variables and Distributions →