Probability Fundamentals

Probability theory is the mathematical framework for reasoning about uncertainty. In computer science, probability appears everywhere: analyzing randomized algorithms, machine learning, network protocols, cryptography, and data analysis. Understanding probability is essential for any serious computer scientist.

1. Sample Spaces and Events

Sample Space

The sample space (denoted Ω or S) is the set of all possible outcomes of a random experiment.

Examples:

Coin flip:       Ω = {H, T}
Die roll:        Ω = {1, 2, 3, 4, 5, 6}
Two coin flips:  Ω = {HH, HT, TH, TT}
Two die rolls:   Ω = {(1,1), (1,2), ..., (6,6)}  (36 outcomes)

Events

An event is a subset of the sample space - a collection of outcomes we're interested in.

Die roll where result is even:  E = {2, 4, 6}
Die roll where result > 4:      F = {5, 6}
At least one head in two flips: G = {HH, HT, TH}

Event Operations

Events can be combined using set operations:

OperationNotationMeaning
UnionA ∪ BA or B (or both) occurs
IntersectionA ∩ BBoth A and B occur
ComplementA' or ĀA does not occur
DifferenceA - BA occurs but not B

Example:

Ω = {1, 2, 3, 4, 5, 6}
A = {1, 2, 3}      (result ≤ 3)
B = {2, 4, 6}      (result is even)

A ∪ B = {1, 2, 3, 4, 6}
A ∩ B = {2}
A' = {4, 5, 6}
A - B = {1, 3}

Mutually Exclusive Events

Events A and B are mutually exclusive (disjoint) if they cannot both occur:

A ∩ B = ∅

Example: "Die shows 1" and "Die shows 6" are mutually exclusive.


2. Probability Axioms

Kolmogorov Axioms

Probability is a function P that assigns a number to each event, satisfying:

  1. Non-negativity: P(A) ≥ 0 for all events A
  2. Normalization: P(Ω) = 1
  3. Additivity: For mutually exclusive events A and B: P(A ∪ B) = P(A) + P(B)

Consequences of the Axioms

From these axioms, we can derive:

P(∅) = 0                           Empty event has probability 0
P(A') = 1 - P(A)                   Complement rule
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)  Inclusion-exclusion
P(A) ≤ P(B) if A ⊆ B               Monotonicity
0 ≤ P(A) ≤ 1                       Probability bounds

Computing Probabilities

For finite sample spaces with equally likely outcomes:

P(A) = |A| / |Ω| = (number of favorable outcomes) / (total outcomes)

Example: Probability of rolling an even number:

Ω = {1, 2, 3, 4, 5, 6}, |Ω| = 6
A = {2, 4, 6}, |A| = 3
P(A) = 3/6 = 1/2

3. Counting Techniques

Counting is fundamental to probability with finite sample spaces.

Multiplication Principle

If experiment 1 has n₁ outcomes and experiment 2 has n₂ outcomes, the combined experiment has n₁ × n₂ outcomes.

# Password with 3 lowercase letters followed by 2 digits
letters = 26
digits = 10
total_passwords = 26 * 26 * 26 * 10 * 10  # = 1,757,600

Permutations

Permutation: An ordered arrangement of objects.

Number of ways to arrange n distinct objects: n!

Number of ways to arrange r objects from n distinct objects: P(n,r) = n!/(n-r)!

import math

def permutations(n, r):
    """Number of ways to arrange r items from n distinct items"""
    return math.factorial(n) // math.factorial(n - r)

# Ways to arrange 3 books from 5 on a shelf
print(permutations(5, 3))  # 60

Combinations

Combination: An unordered selection of objects (order doesn't matter).

Number of ways to choose r objects from n: C(n,r) = n! / (r!(n-r)!)

Also written as $\binom{n}{r}$ ("n choose r").

def combinations(n, r):
    """Number of ways to choose r items from n (order doesn't matter)"""
    return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))

# Ways to choose 3 books from 5 (not arranging them)
print(combinations(5, 3))  # 10

# Probability of getting exactly 2 heads in 5 coin flips
favorable = combinations(5, 2)  # Ways to choose which 2 flips are heads
total = 2**5                     # Total outcomes
prob = favorable / total         # 10/32 = 0.3125

Counting Example: Poker Hands

# Standard 52-card deck
# Total 5-card hands
total_hands = combinations(52, 5)  # 2,598,960

# Probability of a flush (5 cards same suit)
# Choose 1 suit (4 ways), then choose 5 from 13 cards of that suit
flushes = 4 * combinations(13, 5)  # 5,148
# But this includes straight flushes, which we might want to exclude
# For simplicity, P(flush including straight flush) = 5148/2598960 ≈ 0.00198

# Probability of four of a kind
# Choose rank for four (13 ways), choose the kicker from remaining 48
four_of_kind = 13 * 48  # 624
p_four = four_of_kind / total_hands  # ≈ 0.00024

4. Conditional Probability

Definition

The conditional probability of A given B is the probability of A occurring, knowing that B has occurred:

P(A|B) = P(A ∩ B) / P(B)    (provided P(B) > 0)

Intuition: We're restricting our sample space to only outcomes where B occurred.

Example: Two Dice

Roll two dice. What's P(sum = 8 | first die = 3)?

Ω = {(1,1), (1,2), ..., (6,6)}  (36 outcomes)

B = {first die is 3} = {(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)}
A = {sum = 8} = {(2,6), (3,5), (4,4), (5,3), (6,2)}

A ∩ B = {(3,5)}

P(A|B) = P(A ∩ B) / P(B) = (1/36) / (6/36) = 1/6

Chain Rule (Multiplication Rule)

Rearranging the conditional probability formula:

P(A ∩ B) = P(A|B) × P(B) = P(B|A) × P(A)

For multiple events:

P(A ∩ B ∩ C) = P(A) × P(B|A) × P(C|A ∩ B)

Law of Total Probability

If B₁, B₂, ..., Bₙ partition Ω (mutually exclusive and exhaustive), then:

P(A) = Σᵢ P(A|Bᵢ) × P(Bᵢ)

Example: A factory has 3 machines producing widgets.

Machine 1: 30% of production, 2% defect rate
Machine 2: 45% of production, 3% defect rate
Machine 3: 25% of production, 5% defect rate

P(defective) = P(D|M1)P(M1) + P(D|M2)P(M2) + P(D|M3)P(M3)
             = 0.02(0.30) + 0.03(0.45) + 0.05(0.25)
             = 0.006 + 0.0135 + 0.0125
             = 0.032 = 3.2%

5. Bayes' Theorem

The Theorem

Bayes' theorem relates conditional probabilities:

P(B|A) = P(A|B) × P(B) / P(A)

Or, using the law of total probability:

P(Bᵢ|A) = P(A|Bᵢ) × P(Bᵢ) / Σⱼ P(A|Bⱼ) × P(Bⱼ)

Terminology

  • P(B): Prior probability (belief before evidence)
  • P(A|B): Likelihood (probability of evidence given hypothesis)
  • P(B|A): Posterior probability (updated belief after evidence)
  • P(A): Marginal likelihood (normalizing constant)

Example: Medical Diagnosis

A disease affects 1% of the population.
A test is 95% accurate:
  - P(positive | diseased) = 0.95  (sensitivity)
  - P(negative | healthy) = 0.95   (specificity)

If someone tests positive, what's P(diseased | positive)?

Let D = diseased, T+ = positive test

P(D) = 0.01           (prior)
P(T+|D) = 0.95        (sensitivity)
P(T+|D') = 0.05       (false positive rate = 1 - specificity)

P(T+) = P(T+|D)P(D) + P(T+|D')P(D')
      = 0.95(0.01) + 0.05(0.99)
      = 0.0095 + 0.0495
      = 0.059

P(D|T+) = P(T+|D) × P(D) / P(T+)
        = 0.95 × 0.01 / 0.059
        = 0.161

Only about 16% chance of having the disease despite positive test!

This counterintuitive result shows why understanding Bayes' theorem matters.

Implementation

def bayes_theorem(prior, likelihood, marginal):
    """
    Calculate posterior probability using Bayes' theorem

    P(H|E) = P(E|H) * P(H) / P(E)

    prior: P(H) - prior probability of hypothesis
    likelihood: P(E|H) - probability of evidence given hypothesis
    marginal: P(E) - total probability of evidence
    """
    return (likelihood * prior) / marginal

def bayes_with_total_prob(prior, likelihood, false_positive_rate):
    """
    Bayes' theorem with law of total probability for binary hypothesis

    prior: P(H)
    likelihood: P(E|H)
    false_positive_rate: P(E|not H)
    """
    # P(E) = P(E|H)P(H) + P(E|not H)P(not H)
    marginal = likelihood * prior + false_positive_rate * (1 - prior)
    return (likelihood * prior) / marginal

# Medical diagnosis example
prior_disease = 0.01
sensitivity = 0.95
false_positive = 0.05

p_disease_given_positive = bayes_with_total_prob(
    prior_disease,
    sensitivity,
    false_positive
)
print(f"P(disease | positive test) = {p_disease_given_positive:.3f}")  # 0.161

6. Independence

Definition

Events A and B are independent if the occurrence of one doesn't affect the probability of the other:

P(A|B) = P(A)    or equivalently    P(A ∩ B) = P(A) × P(B)

Testing Independence

def are_independent(p_a, p_b, p_a_and_b, tolerance=1e-9):
    """Check if events A and B are independent"""
    return abs(p_a_and_b - p_a * p_b) < tolerance

# Example: Roll a die
# A = {1, 2, 3} (low roll)
# B = {1, 3, 5} (odd roll)
# A ∩ B = {1, 3}

p_a = 3/6  # 1/2
p_b = 3/6  # 1/2
p_a_and_b = 2/6  # 1/3

print(are_independent(p_a, p_b, p_a_and_b))  # False
print(f"P(A)P(B) = {p_a * p_b}, P(A∩B) = {p_a_and_b}")
# P(A)P(B) = 0.25, P(A∩B) = 0.333...
# Not equal, so not independent

Independent vs. Mutually Exclusive

These are different concepts:

  • Mutually exclusive: A ∩ B = ∅ (can't both happen)
  • Independent: P(A ∩ B) = P(A)P(B) (don't affect each other)

If A and B are mutually exclusive with positive probabilities, they cannot be independent:

P(A ∩ B) = 0 ≠ P(A)P(B) > 0

Conditional Independence

A and B are conditionally independent given C if:

P(A ∩ B | C) = P(A|C) × P(B|C)

This is important in probabilistic graphical models (Bayesian networks).


7. Applications in Computer Science

Randomized Algorithms

import random

def randomized_quicksort_analysis():
    """
    In randomized quicksort, we pick a random pivot.

    If we pick median: perfect split, T(n) = 2T(n/2) + O(n) = O(n log n)
    If we pick min/max: worst split, T(n) = T(n-1) + O(n) = O(n²)

    What's the probability of a "good" pivot (middle 50%)?
    """
    # A pivot is "good" if it's in the middle 50% of elements
    p_good = 0.5

    # Expected number of random picks until good pivot
    expected_picks = 1 / p_good  # = 2

    # With high probability, we get O(n log n) performance
    return expected_picks

def analyze_hash_collision():
    """
    Birthday problem: probability of collision in hash table

    With n items hashed to m buckets, what's P(at least one collision)?
    """
    def p_no_collision(n, m):
        """Probability of no collision with n items, m buckets"""
        if n > m:
            return 0
        p = 1.0
        for i in range(n):
            p *= (m - i) / m
        return p

    def p_collision(n, m):
        return 1 - p_no_collision(n, m)

    # Birthday problem: 23 people, 365 days
    print(f"P(collision with 23 people): {p_collision(23, 365):.3f}")  # ≈ 0.507

    # Hash table: 1000 items, 10000 buckets
    print(f"P(collision with 1000 items, 10000 buckets): {p_collision(1000, 10000):.3f}")

analyze_hash_collision()

Network Reliability

def series_reliability(probabilities):
    """
    Components in series: all must work
    P(system works) = P(A) × P(B) × ... (if independent)
    """
    result = 1.0
    for p in probabilities:
        result *= p
    return result

def parallel_reliability(probabilities):
    """
    Components in parallel: at least one must work
    P(system works) = 1 - P(all fail) = 1 - (1-P(A))(1-P(B))...
    """
    p_all_fail = 1.0
    for p in probabilities:
        p_all_fail *= (1 - p)
    return 1 - p_all_fail

# Network with redundant paths
# Path 1 reliability: 0.9
# Path 2 reliability: 0.9
# Connected in parallel
print(f"Parallel reliability: {parallel_reliability([0.9, 0.9]):.3f}")  # 0.99

# Three routers in series, each 0.99 reliable
print(f"Series reliability: {series_reliability([0.99, 0.99, 0.99]):.3f}")  # 0.970

Spam Classification (Naive Bayes)

def naive_bayes_spam():
    """
    Simple spam classifier using Bayes' theorem

    P(spam | words) ∝ P(words | spam) × P(spam)

    Naive assumption: words are conditionally independent given class
    """
    # Training data (simplified)
    spam_words = {'free': 0.8, 'winner': 0.7, 'click': 0.6, 'hello': 0.1}
    ham_words = {'free': 0.1, 'winner': 0.05, 'click': 0.1, 'hello': 0.5}

    p_spam = 0.3  # Prior: 30% of emails are spam
    p_ham = 0.7

    def classify(email_words):
        """Classify email as spam or ham"""
        # Calculate P(words | spam) × P(spam)
        log_p_spam_given_words = 0
        log_p_ham_given_words = 0

        import math

        for word in email_words:
            if word in spam_words:
                log_p_spam_given_words += math.log(spam_words[word])
            if word in ham_words:
                log_p_ham_given_words += math.log(ham_words[word])

        log_p_spam_given_words += math.log(p_spam)
        log_p_ham_given_words += math.log(p_ham)

        if log_p_spam_given_words > log_p_ham_given_words:
            return "SPAM"
        else:
            return "HAM"

    # Test
    print(classify(['free', 'winner', 'click']))  # SPAM
    print(classify(['hello']))  # HAM

naive_bayes_spam()

Exercises

Basic

  1. A bag contains 5 red balls and 3 blue balls. Two balls are drawn without replacement. What is:

    • P(both red)?
    • P(second is blue | first is red)?
  2. A fair die is rolled twice. Find P(sum = 7).

  3. In a class of 30 students, 18 play basketball and 15 play soccer. If 10 play both, what's the probability a randomly chosen student plays neither?

Intermediate

  1. You have three boxes:

    • Box 1: 2 gold coins
    • Box 2: 1 gold, 1 silver coin
    • Box 3: 2 silver coins

    You pick a random box and draw a gold coin. What's P(other coin in box is gold)?

  2. Implement a function to compute P(at least k successes in n independent trials) with success probability p.

  3. A system has three components with reliabilities 0.95, 0.90, and 0.85. Component 1 is in series with a parallel combination of components 2 and 3. What's the system reliability?

Advanced

  1. Monty Hall Problem: You're on a game show with 3 doors. Behind one is a car; behind the others, goats. You pick door 1. The host, who knows what's behind each door, opens door 3 to reveal a goat. Should you switch to door 2? Prove your answer using Bayes' theorem.

  2. Implement a simulation to estimate π using random sampling (Monte Carlo method).

  3. Design a probabilistic data structure (Bloom filter) and analyze its false positive probability.


Summary

  • Sample space is the set of all outcomes; events are subsets
  • Probability axioms provide the foundation: non-negative, normalized, additive
  • Counting (permutations, combinations) enables probability calculation
  • Conditional probability P(A|B) = P(A∩B)/P(B) restricts to a sub-universe
  • Bayes' theorem updates beliefs based on evidence
  • Independence means P(A∩B) = P(A)P(B)

These concepts are fundamental to algorithm analysis, machine learning, networking, and many other CS areas.


Next Reading

Random Variables and Distributions →