Random Variables and Distributions | Computer Science Fundamentals

A random variable provides a numerical summary of a random experiment's outcome. Instead of working with abstract sample spaces, we can use the machinery of real numbers and calculus. Random variables and their distributions are the workhorses of probabilistic analysis in computer science.

1. Random Variables

Definition

A random variable X is a function that assigns a numerical value to each outcome in a sample space:

X: Ω → ℝ

Example: Roll two dice. Let X = sum of the dice.

Outcome (1,1) → X = 2
Outcome (3,4) → X = 7
Outcome (6,6) → X = 12

Discrete vs. Continuous

Discrete random variable: Takes countable values (finite or countably infinite)

Number of heads in 10 flips
Number of requests per second
Number of bugs in code

Continuous random variable: Takes values in an interval

Response time of a server
CPU utilization percentage
Height of a person

2. Discrete Random Variables

Probability Mass Function (PMF)

For discrete X, the PMF gives the probability of each value:

p(x) = P(X = x)

Properties:

p(x) ≥ 0 for all x
Σₓ p(x) = 1

Example: Fair die roll

def die_pmf(x):
    """PMF for fair die roll"""
    if x in [1, 2, 3, 4, 5, 6]:
        return 1/6
    return 0

# Verify it sums to 1
total = sum(die_pmf(x) for x in range(1, 7))
print(f"Sum of PMF: {total}")  # 1.0

Cumulative Distribution Function (CDF)

The CDF gives the probability that X is at most x:

F(x) = P(X ≤ x) = Σᵢ≤ₓ p(i)

Properties:

F is non-decreasing
lim(x→-∞) F(x) = 0
lim(x→+∞) F(x) = 1
P(a < X ≤ b) = F(b) - F(a)

def die_cdf(x):
    """CDF for fair die roll"""
    if x < 1:
        return 0
    elif x >= 6:
        return 1
    else:
        return int(x) / 6

# P(X ≤ 3) = P(X ∈ {1,2,3}) = 3/6 = 0.5
print(f"P(X ≤ 3) = {die_cdf(3)}")

# P(2 < X ≤ 5) = F(5) - F(2)
print(f"P(2 < X ≤ 5) = {die_cdf(5) - die_cdf(2)}")  # 3/6 = 0.5

3. Expected Value (Mean)

Definition

The expected value (or mean) of X is the probability-weighted average:

E[X] = Σₓ x · p(x)    (discrete)
E[X] = ∫ x · f(x) dx  (continuous)

Intuition: The "center of mass" of the distribution; the long-run average value.

def expected_value_discrete(pmf, values):
    """Compute E[X] for discrete random variable"""
    return sum(x * pmf(x) for x in values)

# Expected value of fair die
ev_die = expected_value_discrete(die_pmf, range(1, 7))
print(f"E[die roll] = {ev_die}")  # 3.5

Properties of Expected Value

Linearity (most important property):

E[aX + b] = a·E[X] + b
E[X + Y] = E[X] + E[Y]  (always, even if X,Y dependent!)

# Example: Expected sum of two dice
# Let X = first die, Y = second die
# E[X + Y] = E[X] + E[Y] = 3.5 + 3.5 = 7

# Example: Score = 2X + 10 where X is die roll
# E[Score] = 2·E[X] + 10 = 2·3.5 + 10 = 17

Expected Value of a Function

E[g(X)] = Σₓ g(x) · p(x)

def expected_value_function(g, pmf, values):
    """Compute E[g(X)]"""
    return sum(g(x) * pmf(x) for x in values)

# E[X²] for die roll
ev_x_squared = expected_value_function(lambda x: x**2, die_pmf, range(1, 7))
print(f"E[X²] = {ev_x_squared}")  # 91/6 ≈ 15.17

4. Variance and Standard Deviation

Variance

Variance measures the spread of a distribution around its mean:

Var(X) = E[(X - μ)²] = E[X²] - (E[X])²

where μ = E[X].

def variance(pmf, values):
    """Compute Var(X)"""
    mu = expected_value_discrete(pmf, values)
    return expected_value_function(lambda x: (x - mu)**2, pmf, values)

# Alternatively using E[X²] - (E[X])²
def variance_alt(pmf, values):
    ex = expected_value_discrete(pmf, values)
    ex2 = expected_value_function(lambda x: x**2, pmf, values)
    return ex2 - ex**2

var_die = variance(die_pmf, range(1, 7))
print(f"Var(die) = {var_die:.4f}")  # 35/12 ≈ 2.9167

Standard Deviation

Standard deviation σ is the square root of variance:

σ = √Var(X)

Same units as X, easier to interpret.

Properties of Variance

Var(aX + b) = a² · Var(X)    (constants shift doesn't change spread)
Var(X + Y) = Var(X) + Var(Y)  (only if X,Y independent!)

5. Common Discrete Distributions

Bernoulli Distribution

Single trial with success probability p.

X ~ Bernoulli(p)
P(X = 1) = p, P(X = 0) = 1-p
E[X] = p
Var(X) = p(1-p)

class Bernoulli:
    def __init__(self, p):
        self.p = p

    def pmf(self, x):
        if x == 1:
            return self.p
        elif x == 0:
            return 1 - self.p
        return 0

    def mean(self):
        return self.p

    def variance(self):
        return self.p * (1 - self.p)

    def sample(self):
        import random
        return 1 if random.random() < self.p else 0

Binomial Distribution

Number of successes in n independent Bernoulli trials.

X ~ Binomial(n, p)
P(X = k) = C(n,k) · p^k · (1-p)^(n-k)
E[X] = np
Var(X) = np(1-p)

import math

class Binomial:
    def __init__(self, n, p):
        self.n = n
        self.p = p

    def pmf(self, k):
        if k < 0 or k > self.n:
            return 0
        coeff = math.comb(self.n, k)
        return coeff * (self.p ** k) * ((1 - self.p) ** (self.n - k))

    def mean(self):
        return self.n * self.p

    def variance(self):
        return self.n * self.p * (1 - self.p)

    def sample(self):
        import random
        return sum(1 for _ in range(self.n) if random.random() < self.p)

# Example: 10 coin flips, P(exactly 7 heads)
b = Binomial(10, 0.5)
print(f"P(X = 7) = {b.pmf(7):.4f}")  # 0.1172
print(f"E[X] = {b.mean()}")  # 5.0
print(f"Var(X) = {b.variance()}")  # 2.5

Geometric Distribution

Number of trials until first success.

X ~ Geometric(p)
P(X = k) = (1-p)^(k-1) · p    for k = 1, 2, 3, ...
E[X] = 1/p
Var(X) = (1-p)/p²

class Geometric:
    def __init__(self, p):
        self.p = p

    def pmf(self, k):
        if k < 1:
            return 0
        return ((1 - self.p) ** (k - 1)) * self.p

    def mean(self):
        return 1 / self.p

    def variance(self):
        return (1 - self.p) / (self.p ** 2)

    def sample(self):
        import random
        k = 1
        while random.random() >= self.p:
            k += 1
        return k

# Expected number of coin flips until heads
g = Geometric(0.5)
print(f"E[flips until heads] = {g.mean()}")  # 2.0

Poisson Distribution

Number of events in a fixed interval, with average rate λ.

X ~ Poisson(λ)
P(X = k) = (λ^k · e^(-λ)) / k!    for k = 0, 1, 2, ...
E[X] = λ
Var(X) = λ

class Poisson:
    def __init__(self, lam):
        self.lam = lam

    def pmf(self, k):
        if k < 0:
            return 0
        return (self.lam ** k) * math.exp(-self.lam) / math.factorial(k)

    def mean(self):
        return self.lam

    def variance(self):
        return self.lam

    def sample(self):
        # Using inverse transform
        import random
        L = math.exp(-self.lam)
        k = 0
        p = 1.0
        while p > L:
            k += 1
            p *= random.random()
        return k - 1

# Server receives average 5 requests per second
p = Poisson(5)
print(f"P(exactly 3 requests) = {p.pmf(3):.4f}")  # 0.1404
print(f"P(0 requests) = {p.pmf(0):.4f}")  # 0.0067

6. Continuous Random Variables

Probability Density Function (PDF)

For continuous X, the PDF f(x) gives probability via integration:

P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx

Note: P(X = x) = 0 for any specific x (area of a line is 0).

Properties:

f(x) ≥ 0
∫ f(x) dx = 1

Expected Value and Variance

E[X] = ∫ x · f(x) dx
Var(X) = ∫ (x - μ)² · f(x) dx = E[X²] - (E[X])²

7. Common Continuous Distributions

Uniform Distribution

Equal probability over an interval [a, b].

X ~ Uniform(a, b)
f(x) = 1/(b-a)  for a ≤ x ≤ b, else 0
E[X] = (a+b)/2
Var(X) = (b-a)²/12

class Uniform:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def pdf(self, x):
        if self.a <= x <= self.b:
            return 1 / (self.b - self.a)
        return 0

    def cdf(self, x):
        if x < self.a:
            return 0
        elif x > self.b:
            return 1
        return (x - self.a) / (self.b - self.a)

    def mean(self):
        return (self.a + self.b) / 2

    def variance(self):
        return (self.b - self.a) ** 2 / 12

    def sample(self):
        import random
        return random.uniform(self.a, self.b)

# Response time between 100ms and 200ms
u = Uniform(100, 200)
print(f"E[response time] = {u.mean()} ms")  # 150
print(f"P(response < 130) = {u.cdf(130)}")  # 0.3

Exponential Distribution

Time until next event in a Poisson process.

X ~ Exponential(λ)
f(x) = λ · e^(-λx)  for x ≥ 0
E[X] = 1/λ
Var(X) = 1/λ²

Memoryless property: P(X > s + t | X > s) = P(X > t)

class Exponential:
    def __init__(self, lam):
        self.lam = lam

    def pdf(self, x):
        if x < 0:
            return 0
        return self.lam * math.exp(-self.lam * x)

    def cdf(self, x):
        if x < 0:
            return 0
        return 1 - math.exp(-self.lam * x)

    def mean(self):
        return 1 / self.lam

    def variance(self):
        return 1 / (self.lam ** 2)

    def sample(self):
        import random
        # Inverse transform: F^(-1)(u) = -ln(1-u)/λ
        return -math.log(1 - random.random()) / self.lam

# Time between server requests (5 per second on average)
e = Exponential(5)
print(f"E[time between requests] = {e.mean()} seconds")  # 0.2
print(f"P(wait > 0.5 sec) = {1 - e.cdf(0.5):.4f}")  # 0.0821

Normal (Gaussian) Distribution

The "bell curve" - ubiquitous due to Central Limit Theorem.

X ~ Normal(μ, σ²)
f(x) = (1 / (σ√(2π))) · e^(-(x-μ)²/(2σ²))
E[X] = μ
Var(X) = σ²

class Normal:
    def __init__(self, mu, sigma_sq):
        self.mu = mu
        self.sigma = math.sqrt(sigma_sq)

    def pdf(self, x):
        coeff = 1 / (self.sigma * math.sqrt(2 * math.pi))
        exponent = -((x - self.mu) ** 2) / (2 * self.sigma ** 2)
        return coeff * math.exp(exponent)

    def mean(self):
        return self.mu

    def variance(self):
        return self.sigma ** 2

    def sample(self):
        import random
        # Box-Muller transform
        u1 = random.random()
        u2 = random.random()
        z = math.sqrt(-2 * math.log(u1)) * math.cos(2 * math.pi * u2)
        return self.mu + self.sigma * z

# Standard normal: N(0, 1)
std_normal = Normal(0, 1)

# About 68% within 1 σ, 95% within 2σ, 99.7% within 3σ

Standard Normal and Z-scores

Standard normal Z ~ N(0, 1)

Z-score: Standardize any normal to standard normal:

Z = (X - μ) / σ

def z_score(x, mu, sigma):
    """Convert value to z-score"""
    return (x - mu) / sigma

# IQ scores: μ = 100, σ = 15
# What z-score is IQ of 130?
z = z_score(130, 100, 15)
print(f"Z-score for IQ 130: {z}")  # 2.0
# IQ 130 is 2 standard deviations above mean

8. Applications in CS

Algorithm Analysis with Randomization

def analyze_randomized_algorithm():
    """
    Randomized QuickSort: Expected comparisons

    Let X = total comparisons
    Xij = 1 if elements i and j are compared, 0 otherwise

    X = ΣΣ Xij

    E[X] = ΣΣ E[Xij] = ΣΣ P(i and j compared)

    Elements i and j are compared iff one of them is the first pivot
    chosen from {i, i+1, ..., j}

    P(i,j compared) = 2/(j-i+1)

    E[X] = Σᵢ Σⱼ>ᵢ 2/(j-i+1)
         = Σᵢ Σₖ₌₁ⁿ⁻ⁱ 2/(k+1)
         ≈ 2n·Hₙ
         ≈ 2n·ln(n)
         = O(n log n)
    """
    def harmonic(n):
        return sum(1/k for k in range(1, n+1))

    n = 1000
    expected_comparisons = 2 * n * harmonic(n)
    print(f"Expected comparisons for n={n}: {expected_comparisons:.0f}")
    print(f"2n ln(n) approximation: {2 * n * math.log(n):.0f}")

analyze_randomized_algorithm()

Queueing Theory Basics

def mm1_queue():
    """
    M/M/1 Queue Analysis

    - Arrivals: Poisson process, rate λ
    - Service: Exponential, rate μ
    - Single server

    Traffic intensity: ρ = λ/μ (must be < 1 for stability)

    E[number in system] = ρ/(1-ρ)
    E[time in system] = 1/(μ-λ)
    E[number in queue] = ρ²/(1-ρ)
    E[time in queue] = ρ/(μ-λ)
    """
    lambda_rate = 4  # 4 requests per second
    mu_rate = 5      # Can serve 5 per second

    rho = lambda_rate / mu_rate
    print(f"Traffic intensity ρ = {rho}")

    if rho >= 1:
        print("Queue is unstable!")
        return

    E_N = rho / (1 - rho)
    E_T = 1 / (mu_rate - lambda_rate)
    E_Nq = rho**2 / (1 - rho)
    E_W = rho / (mu_rate - lambda_rate)

    print(f"E[customers in system] = {E_N:.2f}")
    print(f"E[time in system] = {E_T:.2f} seconds")
    print(f"E[customers in queue] = {E_Nq:.2f}")
    print(f"E[waiting time] = {E_W:.2f} seconds")

mm1_queue()

Monte Carlo Simulation

def monte_carlo_integration(f, a, b, n=10000):
    """
    Estimate ∫ₐᵇ f(x) dx using Monte Carlo

    E[f(X)] = (1/(b-a)) ∫ₐᵇ f(x) dx

    So: ∫ₐᵇ f(x) dx ≈ (b-a) × average of f(Xᵢ)
    """
    import random

    total = sum(f(random.uniform(a, b)) for _ in range(n))
    return (b - a) * total / n

# Estimate ∫₀¹ x² dx = 1/3
estimate = monte_carlo_integration(lambda x: x**2, 0, 1)
print(f"Estimate of ∫₀¹ x² dx = {estimate:.4f} (exact: 0.3333)")

# Estimate π using quarter circle
# Area of quarter circle = π/4
# If (x,y) uniform in [0,1]², P(x²+y² < 1) = π/4
def estimate_pi(n=100000):
    import random
    inside = sum(1 for _ in range(n)
                 if random.random()**2 + random.random()**2 < 1)
    return 4 * inside / n

print(f"Estimate of π = {estimate_pi():.4f}")

9. Joint Distributions and Covariance

Joint PMF

For two random variables X and Y:

p(x, y) = P(X = x and Y = y)

Marginal Distributions

pₓ(x) = Σᵧ p(x, y)
pᵧ(y) = Σₓ p(x, y)

Independence

X and Y are independent iff:

p(x, y) = pₓ(x) · pᵧ(y)  for all x, y

Covariance

Cov(X, Y) = E[(X - μₓ)(Y - μᵧ)] = E[XY] - E[X]E[Y]

Properties:

Cov(X, X) = Var(X)
Cov(X, Y) = Cov(Y, X)
If X, Y independent: Cov(X, Y) = 0 (converse not always true!)

Correlation

Corr(X, Y) = ρ = Cov(X, Y) / (σₓ · σᵧ)

-1 ≤ ρ ≤ 1
ρ = 1: perfect positive linear relationship
ρ = -1: perfect negative linear relationship
ρ = 0: no linear relationship

def covariance(x_samples, y_samples):
    """Compute sample covariance"""
    n = len(x_samples)
    mean_x = sum(x_samples) / n
    mean_y = sum(y_samples) / n
    return sum((x - mean_x) * (y - mean_y) for x, y in zip(x_samples, y_samples)) / (n - 1)

def correlation(x_samples, y_samples):
    """Compute sample correlation"""
    cov = covariance(x_samples, y_samples)
    std_x = (covariance(x_samples, x_samples)) ** 0.5
    std_y = (covariance(y_samples, y_samples)) ** 0.5
    return cov / (std_x * std_y)

Exercises

Basic

Let X ~ Binomial(10, 0.3). Find P(X = 3), E[X], and Var(X).
A die is rolled. Let X be the result. Find E[X²] and Var(X).
Time between earthquakes follows Exponential(0.1) per day. Find the expected time until next earthquake and P(wait > 20 days).

Intermediate

Show that Var(X) = E[X²] - (E[X])² algebraically from the definition.
Let X ~ Geometric(p). Prove E[X] = 1/p using the definition of expected value.
Simulate the coupon collector problem: How many random draws (with replacement) from n items until you have all n? What's the expected number?

Advanced

Central Limit Theorem Demonstration: Sample from any distribution, compute means of samples of size n. Show the distribution of means approaches Normal as n increases.
Implement rejection sampling to generate samples from a custom distribution.
Analyze the expected depth of a randomly built binary search tree.

Summary

Random variables assign numbers to random outcomes
PMF (discrete) and PDF (continuous) describe probability distributions
Expected value E[X] is the probability-weighted average
Variance Var(X) measures spread around the mean
Common distributions: Bernoulli, Binomial, Geometric, Poisson (discrete); Uniform, Exponential, Normal (continuous)
These concepts are essential for analyzing randomized algorithms, queueing systems, and simulations

Next Reading

Statistical Inference →