Random Variables and Distributions
Introduction
A random variable provides a numerical summary of a random experiment's outcome. Instead of working with abstract sample spaces, we can use the machinery of real numbers and calculus. Random variables and their distributions are the workhorses of probabilistic analysis in computer science.
Learning Objectives
By the end of this reading, you will be able to:
- Define discrete and continuous random variables
- Compute and interpret expected value and variance
- Work with common distributions (Bernoulli, Binomial, Geometric, Poisson, Uniform, Normal, Exponential)
- Apply these concepts to analyze algorithms and systems
1. Random Variables
Definition
A random variable X is a function that assigns a numerical value to each outcome in a sample space:
X: Ω → ℝ
Example: Roll two dice. Let X = sum of the dice.
- Outcome (1,1) → X = 2
- Outcome (3,4) → X = 7
- Outcome (6,6) → X = 12
Discrete vs. Continuous
Discrete random variable: Takes countable values (finite or countably infinite)
- Number of heads in 10 flips
- Number of requests per second
- Number of bugs in code
Continuous random variable: Takes values in an interval
- Response time of a server
- CPU utilization percentage
- Height of a person
2. Discrete Random Variables
Probability Mass Function (PMF)
For discrete X, the PMF gives the probability of each value:
p(x) = P(X = x)
Properties:
- p(x) ≥ 0 for all x
- Σₓ p(x) = 1
Example: Fair die roll
def die_pmf(x):
"""PMF for fair die roll"""
if x in [1, 2, 3, 4, 5, 6]:
return 1/6
return 0
# Verify it sums to 1
total = sum(die_pmf(x) for x in range(1, 7))
print(f"Sum of PMF: {total}") # 1.0
Cumulative Distribution Function (CDF)
The CDF gives the probability that X is at most x:
F(x) = P(X ≤ x) = Σᵢ≤ₓ p(i)
Properties:
- F is non-decreasing
- lim(x→-∞) F(x) = 0
- lim(x→+∞) F(x) = 1
- P(a < X ≤ b) = F(b) - F(a)
def die_cdf(x):
"""CDF for fair die roll"""
if x < 1:
return 0
elif x >= 6:
return 1
else:
return int(x) / 6
# P(X ≤ 3) = P(X ∈ {1,2,3}) = 3/6 = 0.5
print(f"P(X ≤ 3) = {die_cdf(3)}")
# P(2 < X ≤ 5) = F(5) - F(2)
print(f"P(2 < X ≤ 5) = {die_cdf(5) - die_cdf(2)}") # 3/6 = 0.5
3. Expected Value (Mean)
Definition
The expected value (or mean) of X is the probability-weighted average:
E[X] = Σₓ x · p(x) (discrete)
E[X] = ∫ x · f(x) dx (continuous)
Intuition: The "center of mass" of the distribution; the long-run average value.
def expected_value_discrete(pmf, values):
"""Compute E[X] for discrete random variable"""
return sum(x * pmf(x) for x in values)
# Expected value of fair die
ev_die = expected_value_discrete(die_pmf, range(1, 7))
print(f"E[die roll] = {ev_die}") # 3.5
Properties of Expected Value
Linearity (most important property):
E[aX + b] = a·E[X] + b
E[X + Y] = E[X] + E[Y] (always, even if X,Y dependent!)
# Example: Expected sum of two dice
# Let X = first die, Y = second die
# E[X + Y] = E[X] + E[Y] = 3.5 + 3.5 = 7
# Example: Score = 2X + 10 where X is die roll
# E[Score] = 2·E[X] + 10 = 2·3.5 + 10 = 17
Expected Value of a Function
E[g(X)] = Σₓ g(x) · p(x)
def expected_value_function(g, pmf, values):
"""Compute E[g(X)]"""
return sum(g(x) * pmf(x) for x in values)
# E[X²] for die roll
ev_x_squared = expected_value_function(lambda x: x**2, die_pmf, range(1, 7))
print(f"E[X²] = {ev_x_squared}") # 91/6 ≈ 15.17
4. Variance and Standard Deviation
Variance
Variance measures the spread of a distribution around its mean:
Var(X) = E[(X - μ)²] = E[X²] - (E[X])²
where μ = E[X].
def variance(pmf, values):
"""Compute Var(X)"""
mu = expected_value_discrete(pmf, values)
return expected_value_function(lambda x: (x - mu)**2, pmf, values)
# Alternatively using E[X²] - (E[X])²
def variance_alt(pmf, values):
ex = expected_value_discrete(pmf, values)
ex2 = expected_value_function(lambda x: x**2, pmf, values)
return ex2 - ex**2
var_die = variance(die_pmf, range(1, 7))
print(f"Var(die) = {var_die:.4f}") # 35/12 ≈ 2.9167
Standard Deviation
Standard deviation σ is the square root of variance:
σ = √Var(X)
Same units as X, easier to interpret.
Properties of Variance
Var(aX + b) = a² · Var(X) (constants shift doesn't change spread)
Var(X + Y) = Var(X) + Var(Y) (only if X,Y independent!)
5. Common Discrete Distributions
Bernoulli Distribution
Single trial with success probability p.
X ~ Bernoulli(p)
P(X = 1) = p, P(X = 0) = 1-p
E[X] = p
Var(X) = p(1-p)
class Bernoulli:
def __init__(self, p):
self.p = p
def pmf(self, x):
if x == 1:
return self.p
elif x == 0:
return 1 - self.p
return 0
def mean(self):
return self.p
def variance(self):
return self.p * (1 - self.p)
def sample(self):
import random
return 1 if random.random() < self.p else 0
Binomial Distribution
Number of successes in n independent Bernoulli trials.
X ~ Binomial(n, p)
P(X = k) = C(n,k) · p^k · (1-p)^(n-k)
E[X] = np
Var(X) = np(1-p)
import math
class Binomial:
def __init__(self, n, p):
self.n = n
self.p = p
def pmf(self, k):
if k < 0 or k > self.n:
return 0
coeff = math.comb(self.n, k)
return coeff * (self.p ** k) * ((1 - self.p) ** (self.n - k))
def mean(self):
return self.n * self.p
def variance(self):
return self.n * self.p * (1 - self.p)
def sample(self):
import random
return sum(1 for _ in range(self.n) if random.random() < self.p)
# Example: 10 coin flips, P(exactly 7 heads)
b = Binomial(10, 0.5)
print(f"P(X = 7) = {b.pmf(7):.4f}") # 0.1172
print(f"E[X] = {b.mean()}") # 5.0
print(f"Var(X) = {b.variance()}") # 2.5
Geometric Distribution
Number of trials until first success.
X ~ Geometric(p)
P(X = k) = (1-p)^(k-1) · p for k = 1, 2, 3, ...
E[X] = 1/p
Var(X) = (1-p)/p²
class Geometric:
def __init__(self, p):
self.p = p
def pmf(self, k):
if k < 1:
return 0
return ((1 - self.p) ** (k - 1)) * self.p
def mean(self):
return 1 / self.p
def variance(self):
return (1 - self.p) / (self.p ** 2)
def sample(self):
import random
k = 1
while random.random() >= self.p:
k += 1
return k
# Expected number of coin flips until heads
g = Geometric(0.5)
print(f"E[flips until heads] = {g.mean()}") # 2.0
Poisson Distribution
Number of events in a fixed interval, with average rate λ.
X ~ Poisson(λ)
P(X = k) = (λ^k · e^(-λ)) / k! for k = 0, 1, 2, ...
E[X] = λ
Var(X) = λ
class Poisson:
def __init__(self, lam):
self.lam = lam
def pmf(self, k):
if k < 0:
return 0
return (self.lam ** k) * math.exp(-self.lam) / math.factorial(k)
def mean(self):
return self.lam
def variance(self):
return self.lam
def sample(self):
# Using inverse transform
import random
L = math.exp(-self.lam)
k = 0
p = 1.0
while p > L:
k += 1
p *= random.random()
return k - 1
# Server receives average 5 requests per second
p = Poisson(5)
print(f"P(exactly 3 requests) = {p.pmf(3):.4f}") # 0.1404
print(f"P(0 requests) = {p.pmf(0):.4f}") # 0.0067
6. Continuous Random Variables
Probability Density Function (PDF)
For continuous X, the PDF f(x) gives probability via integration:
P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
Note: P(X = x) = 0 for any specific x (area of a line is 0).
Properties:
- f(x) ≥ 0
- ∫ f(x) dx = 1
Expected Value and Variance
E[X] = ∫ x · f(x) dx
Var(X) = ∫ (x - μ)² · f(x) dx = E[X²] - (E[X])²
7. Common Continuous Distributions
Uniform Distribution
Equal probability over an interval [a, b].
X ~ Uniform(a, b)
f(x) = 1/(b-a) for a ≤ x ≤ b, else 0
E[X] = (a+b)/2
Var(X) = (b-a)²/12
class Uniform:
def __init__(self, a, b):
self.a = a
self.b = b
def pdf(self, x):
if self.a <= x <= self.b:
return 1 / (self.b - self.a)
return 0
def cdf(self, x):
if x < self.a:
return 0
elif x > self.b:
return 1
return (x - self.a) / (self.b - self.a)
def mean(self):
return (self.a + self.b) / 2
def variance(self):
return (self.b - self.a) ** 2 / 12
def sample(self):
import random
return random.uniform(self.a, self.b)
# Response time between 100ms and 200ms
u = Uniform(100, 200)
print(f"E[response time] = {u.mean()} ms") # 150
print(f"P(response < 130) = {u.cdf(130)}") # 0.3
Exponential Distribution
Time until next event in a Poisson process.
X ~ Exponential(λ)
f(x) = λ · e^(-λx) for x ≥ 0
E[X] = 1/λ
Var(X) = 1/λ²
Memoryless property: P(X > s + t | X > s) = P(X > t)
class Exponential:
def __init__(self, lam):
self.lam = lam
def pdf(self, x):
if x < 0:
return 0
return self.lam * math.exp(-self.lam * x)
def cdf(self, x):
if x < 0:
return 0
return 1 - math.exp(-self.lam * x)
def mean(self):
return 1 / self.lam
def variance(self):
return 1 / (self.lam ** 2)
def sample(self):
import random
# Inverse transform: F^(-1)(u) = -ln(1-u)/λ
return -math.log(1 - random.random()) / self.lam
# Time between server requests (5 per second on average)
e = Exponential(5)
print(f"E[time between requests] = {e.mean()} seconds") # 0.2
print(f"P(wait > 0.5 sec) = {1 - e.cdf(0.5):.4f}") # 0.0821
Normal (Gaussian) Distribution
The "bell curve" - ubiquitous due to Central Limit Theorem.
X ~ Normal(μ, σ²)
f(x) = (1 / (σ√(2π))) · e^(-(x-μ)²/(2σ²))
E[X] = μ
Var(X) = σ²
class Normal:
def __init__(self, mu, sigma_sq):
self.mu = mu
self.sigma = math.sqrt(sigma_sq)
def pdf(self, x):
coeff = 1 / (self.sigma * math.sqrt(2 * math.pi))
exponent = -((x - self.mu) ** 2) / (2 * self.sigma ** 2)
return coeff * math.exp(exponent)
def mean(self):
return self.mu
def variance(self):
return self.sigma ** 2
def sample(self):
import random
# Box-Muller transform
u1 = random.random()
u2 = random.random()
z = math.sqrt(-2 * math.log(u1)) * math.cos(2 * math.pi * u2)
return self.mu + self.sigma * z
# Standard normal: N(0, 1)
std_normal = Normal(0, 1)
# About 68% within 1 σ, 95% within 2σ, 99.7% within 3σ
Standard Normal and Z-scores
Standard normal Z ~ N(0, 1)
Z-score: Standardize any normal to standard normal:
Z = (X - μ) / σ
def z_score(x, mu, sigma):
"""Convert value to z-score"""
return (x - mu) / sigma
# IQ scores: μ = 100, σ = 15
# What z-score is IQ of 130?
z = z_score(130, 100, 15)
print(f"Z-score for IQ 130: {z}") # 2.0
# IQ 130 is 2 standard deviations above mean
8. Applications in CS
Algorithm Analysis with Randomization
def analyze_randomized_algorithm():
"""
Randomized QuickSort: Expected comparisons
Let X = total comparisons
Xij = 1 if elements i and j are compared, 0 otherwise
X = ΣΣ Xij
E[X] = ΣΣ E[Xij] = ΣΣ P(i and j compared)
Elements i and j are compared iff one of them is the first pivot
chosen from {i, i+1, ..., j}
P(i,j compared) = 2/(j-i+1)
E[X] = Σᵢ Σⱼ>ᵢ 2/(j-i+1)
= Σᵢ Σₖ₌₁ⁿ⁻ⁱ 2/(k+1)
≈ 2n·Hₙ
≈ 2n·ln(n)
= O(n log n)
"""
def harmonic(n):
return sum(1/k for k in range(1, n+1))
n = 1000
expected_comparisons = 2 * n * harmonic(n)
print(f"Expected comparisons for n={n}: {expected_comparisons:.0f}")
print(f"2n ln(n) approximation: {2 * n * math.log(n):.0f}")
analyze_randomized_algorithm()
Queueing Theory Basics
def mm1_queue():
"""
M/M/1 Queue Analysis
- Arrivals: Poisson process, rate λ
- Service: Exponential, rate μ
- Single server
Traffic intensity: ρ = λ/μ (must be < 1 for stability)
E[number in system] = ρ/(1-ρ)
E[time in system] = 1/(μ-λ)
E[number in queue] = ρ²/(1-ρ)
E[time in queue] = ρ/(μ-λ)
"""
lambda_rate = 4 # 4 requests per second
mu_rate = 5 # Can serve 5 per second
rho = lambda_rate / mu_rate
print(f"Traffic intensity ρ = {rho}")
if rho >= 1:
print("Queue is unstable!")
return
E_N = rho / (1 - rho)
E_T = 1 / (mu_rate - lambda_rate)
E_Nq = rho**2 / (1 - rho)
E_W = rho / (mu_rate - lambda_rate)
print(f"E[customers in system] = {E_N:.2f}")
print(f"E[time in system] = {E_T:.2f} seconds")
print(f"E[customers in queue] = {E_Nq:.2f}")
print(f"E[waiting time] = {E_W:.2f} seconds")
mm1_queue()
Monte Carlo Simulation
def monte_carlo_integration(f, a, b, n=10000):
"""
Estimate ∫ₐᵇ f(x) dx using Monte Carlo
E[f(X)] = (1/(b-a)) ∫ₐᵇ f(x) dx
So: ∫ₐᵇ f(x) dx ≈ (b-a) × average of f(Xᵢ)
"""
import random
total = sum(f(random.uniform(a, b)) for _ in range(n))
return (b - a) * total / n
# Estimate ∫₀¹ x² dx = 1/3
estimate = monte_carlo_integration(lambda x: x**2, 0, 1)
print(f"Estimate of ∫₀¹ x² dx = {estimate:.4f} (exact: 0.3333)")
# Estimate π using quarter circle
# Area of quarter circle = π/4
# If (x,y) uniform in [0,1]², P(x²+y² < 1) = π/4
def estimate_pi(n=100000):
import random
inside = sum(1 for _ in range(n)
if random.random()**2 + random.random()**2 < 1)
return 4 * inside / n
print(f"Estimate of π = {estimate_pi():.4f}")
9. Joint Distributions and Covariance
Joint PMF
For two random variables X and Y:
p(x, y) = P(X = x and Y = y)
Marginal Distributions
pₓ(x) = Σᵧ p(x, y)
pᵧ(y) = Σₓ p(x, y)
Independence
X and Y are independent iff:
p(x, y) = pₓ(x) · pᵧ(y) for all x, y
Covariance
Cov(X, Y) = E[(X - μₓ)(Y - μᵧ)] = E[XY] - E[X]E[Y]
Properties:
- Cov(X, X) = Var(X)
- Cov(X, Y) = Cov(Y, X)
- If X, Y independent: Cov(X, Y) = 0 (converse not always true!)
Correlation
Corr(X, Y) = ρ = Cov(X, Y) / (σₓ · σᵧ)
- -1 ≤ ρ ≤ 1
- ρ = 1: perfect positive linear relationship
- ρ = -1: perfect negative linear relationship
- ρ = 0: no linear relationship
def covariance(x_samples, y_samples):
"""Compute sample covariance"""
n = len(x_samples)
mean_x = sum(x_samples) / n
mean_y = sum(y_samples) / n
return sum((x - mean_x) * (y - mean_y) for x, y in zip(x_samples, y_samples)) / (n - 1)
def correlation(x_samples, y_samples):
"""Compute sample correlation"""
cov = covariance(x_samples, y_samples)
std_x = (covariance(x_samples, x_samples)) ** 0.5
std_y = (covariance(y_samples, y_samples)) ** 0.5
return cov / (std_x * std_y)
Exercises
Basic
Let X ~ Binomial(10, 0.3). Find P(X = 3), E[X], and Var(X).
A die is rolled. Let X be the result. Find E[X²] and Var(X).
Time between earthquakes follows Exponential(0.1) per day. Find the expected time until next earthquake and P(wait > 20 days).
Intermediate
Show that Var(X) = E[X²] - (E[X])² algebraically from the definition.
Let X ~ Geometric(p). Prove E[X] = 1/p using the definition of expected value.
Simulate the coupon collector problem: How many random draws (with replacement) from n items until you have all n? What's the expected number?
Advanced
Central Limit Theorem Demonstration: Sample from any distribution, compute means of samples of size n. Show the distribution of means approaches Normal as n increases.
Implement rejection sampling to generate samples from a custom distribution.
Analyze the expected depth of a randomly built binary search tree.
Summary
- Random variables assign numbers to random outcomes
- PMF (discrete) and PDF (continuous) describe probability distributions
- Expected value E[X] is the probability-weighted average
- Variance Var(X) measures spread around the mean
- Common distributions: Bernoulli, Binomial, Geometric, Poisson (discrete); Uniform, Exponential, Normal (continuous)
- These concepts are essential for analyzing randomized algorithms, queueing systems, and simulations