Statistics Basics

Learn to understand and interpret data, calculate meaningful averages, grasp probability, and make data-driven decisions in business and life.

Measures of Central Tendency

These tell you what's "typical" or "average" in a dataset.

Mean (Average)

Definition: Sum of all values divided by count

Formula: Mean = Sum of all values ÷ Number of values

Example: Test scores: 85, 90, 78, 92, 85

  • Sum: 85 + 90 + 78 + 92 + 85 = 430
  • Count: 5
  • Mean: 430 ÷ 5 = 86

Life Application: Average monthly expense

  • Expenses: $2,400, $2,100, $2,600, $2,200, $2,300, $2,500
  • Mean: $14,100 ÷ 6 = $2,350/month

Business Application: Average revenue per customer

  • Total revenue: $125,000
  • Customers: 250
  • Mean: $125,000 ÷ 250 = $500/customer

Limitation: Sensitive to extreme values (outliers)

Example: Salaries: $45k, $48k, $50k, $52k, $300k

  • Mean: $99k (not representative of typical salary)

Median (Middle Value)

Definition: The middle value when data is ordered

Finding Median:

  1. Order values from least to greatest
  2. Odd count: Pick middle value
  3. Even count: Average the two middle values

Example (odd count): 15, 22, 18, 30, 25

  • Ordered: 15, 18, 22, 25, 30
  • Median: 22

Example (even count): 10, 15, 20, 25, 30, 35

  • Ordered: 10, 15, 20, 25, 30, 35
  • Median: (20 + 25) ÷ 2 = 22.5

Advantage: Not affected by outliers

Example: Same salaries as before: $45k, $48k, $50k, $52k, $300k

  • Median: $50k (much more representative)

Real Estate Application: Home prices in neighborhood

  • Prices: $280k, $295k, $310k, $305k, $1.2M (outlier)
  • Median: $305k (better represents typical home)
  • Mean: $478k (skewed by mansion)

Mode (Most Frequent)

Definition: Value that appears most often

Example: Shoe sizes sold: 7, 8, 8, 8, 9, 9, 10, 11

  • Mode: 8 (appears 3 times)

Business Application: Most popular product size

  • Helps with inventory decisions

Multiple Modes: Dataset can have more than one mode

  • Example: 5, 5, 5, 8, 10, 10, 10, 12
  • Modes: 5 and 10 (bimodal)

No Mode: All values appear equally

  • Example: 2, 4, 6, 8, 10 (no mode)

When to Use Which?

SituationBest MeasureWhy
No outliers, symmetric dataMeanMost precise
Outliers presentMedianNot skewed by extremes
Categorical data (sizes, colors)ModeShows most common
Income/salary dataMedianOutliers common
Test scores (no outliers)MeanRepresents true average
Product sizes to stockModeShows what sells most

Measures of Spread

These show how spread out or varied the data is.

Range

Definition: Difference between highest and lowest values

Formula: Range = Maximum − Minimum

Example: Test scores: 65, 78, 82, 91, 95

  • Range: 95 − 65 = 30

Business Application: Salary ranges

  • Entry: $45k, Senior: $95k
  • Range: $50k

Limitation: Only uses two values, ignores everything in between

Quartiles and Interquartile Range (IQR)

Quartiles divide ordered data into four equal parts:

  • Q1: 25th percentile (1/4 of data below)
  • Q2: 50th percentile (median)
  • Q3: 75th percentile (3/4 of data below)

IQR (Interquartile Range): Q3 − Q1

  • Shows spread of middle 50% of data
  • Less affected by outliers than range

Example: Data: 10, 12, 15, 18, 20, 23, 25, 28, 30

  • Q1 (between 12 and 15): 13.5
  • Q2 (median): 20
  • Q3 (between 25 and 28): 26.5
  • IQR: 26.5 − 13.5 = 13

Standard Deviation (Conceptual)

Definition: Average distance of values from the mean

Low standard deviation: Values clustered close to mean
High standard deviation: Values spread out widely

Example: Two classes, both with mean score of 80

Class A: 78, 79, 80, 81, 82 (low spread)
Class B: 50, 70, 80, 90, 100 (high spread)

Class A has lower standard deviation (more consistent).

Business Application:

  • Consistent sales: Low standard deviation (predictable)
  • Volatile sales: High standard deviation (unpredictable)

Probability Basics

Probability: Likelihood of an event occurring, from 0 (impossible) to 1 (certain)

Formula: Probability = (Favorable outcomes) ÷ (Total possible outcomes)

Often expressed as:

  • Decimal: 0.25
  • Fraction: 1/4
  • Percentage: 25%

Simple Probability

Example: Rolling a die, probability of getting 4

  • Favorable: 1 (only one 4)
  • Total: 6 (six sides)
  • Probability: 1/6 ≈ 0.167 or 16.7%

Example: Drawing ace from standard deck

  • Favorable: 4 (four aces)
  • Total: 52 cards
  • Probability: 4/52 = 1/13 ≈ 7.7%

Complementary Probability

Concept: Probability of event NOT happening

Formula: P(not A) = 1 − P(A)

Example: Probability it rains tomorrow is 30%

  • Probability it doesn't rain: 1 − 0.30 = 0.70 or 70%

Independent Events (AND)

Definition: Events that don't affect each other

Rule: Multiply probabilities

Example: Flip coin twice, probability of two heads

  • First flip: P(H) = 1/2
  • Second flip: P(H) = 1/2
  • Both heads: 1/2 × 1/2 = 1/4 = 25%

Business Example: Three independent sales calls

  • Each has 20% chance of success
  • Probability all three succeed: 0.20 × 0.20 × 0.20 = 0.008 = 0.8%

Dependent Events

Definition: First event affects the second

Example: Draw two cards without replacement

  • First card is ace: 4/52
  • Second card is also ace: 3/51 (only 3 aces left in 51 cards)
  • Both aces: (4/52) × (3/51) ≈ 0.0045 or 0.45%

Mutually Exclusive Events (OR)

Definition: Events that can't happen simultaneously

Rule: Add probabilities

Example: Roll die, probability of 3 OR 5

  • P(3) = 1/6
  • P(5) = 1/6
  • P(3 or 5) = 1/6 + 1/6 = 2/6 = 1/3 ≈ 33.3%

Data Interpretation

Reading Tables

Sales Data Example:

MonthUnits SoldRevenue
Jan150$15,000
Feb180$18,000
Mar165$16,500

Questions to Ask:

  • What's the average? Mean units: (150+180+165)/3 = 165
  • What's the trend? Increasing then slight decrease
  • What's the best month? February

Understanding Percentages in Data

Survey Result: "60% of 500 respondents prefer option A"

  • Number who prefer A: 500 × 0.60 = 300 people
  • Number who don't: 500 − 300 = 200 people

Correlation vs Causation

Correlation: Two things change together
Causation: One thing causes the other

Example: Ice cream sales and drowning incidents both increase in summer

  • Correlated: Yes (both increase together)
  • Causal: No (ice cream doesn't cause drowning)
  • Real cause: Hot weather affects both

Business Trap: Don't assume correlation means causation

  • Website traffic and sales correlated? Maybe, but need to test
  • Marketing spend and revenue correlated? Possible causation, but verify

Common Statistical Pitfalls

Small Sample Size

Problem: 3 people surveyed, 2 prefer product A = "67% prefer A" Reality: Sample too small for reliable conclusion

Better: Survey at least 100+ for reasonable confidence

Cherry-Picking Data

Problem: "Sales increased 50% in December!" Reality: Comparing to lowest month (November) instead of fair comparison

Better: Compare to same month last year or use annual average

Misleading Graphs

Problem: Y-axis doesn't start at zero, making changes look dramatic

Example: Stock price graph

  • Starts at $98 instead of $0
  • Change from $98 to $102 looks huge
  • Actually only 4% increase

Better: Use appropriate scale and clearly label axes

Survivorship Bias

Problem: Only looking at successful examples

Example: "Successful entrepreneurs dropped out of college" Reality: Only counting successes, ignoring thousands who dropped out and failed

Better: Look at complete dataset, including failures

Practical Business Statistics

Conversion Rate

Formula: Conversion Rate = (Conversions ÷ Visitors) × 100

Example:

  • Website visitors: 5,000
  • Purchases: 150
  • Conversion rate: (150 ÷ 5,000) × 100 = 3%

Customer Retention Rate

Formula: Retention = [(End − New) ÷ Start] × 100

Example:

  • Start of year: 500 customers
  • End of year: 480 customers
  • New customers: 100
  • Retained: [(480 − 100) ÷ 500] × 100 = 76%

Net Promoter Score (NPS)

Scale: 0-10, "How likely to recommend?"

  • 9-10: Promoters
  • 7-8: Passives
  • 0-6: Detractors

Formula: NPS = % Promoters − % Detractors

Example:

  • 100 responses: 50 promoters, 20 passives, 30 detractors
  • NPS: 50% − 30% = 20 (score can be −100 to +100)

Practice Problems

Central Tendency

  1. Find mean, median, mode: 12, 15, 18, 15, 21, 24, 15
  2. Which measure best represents typical salary: $40k, $42k, $45k, $48k, $50k, $200k?

Spread

  1. Find range and IQR for: 10, 15, 18, 22, 25, 30, 35

Probability

  1. Flip coin three times. Probability of exactly two heads?
  2. Draw one card from deck. Probability it's red OR a king?

Data Interpretation

  1. Website had 2,000 visitors, 60 made purchases. What's the conversion rate?
  2. Class A: mean score 75, std dev high. Class B: mean score 75, std dev low. Which is more consistent?

Solutions

  1. Mean: 17.14, Median: 15, Mode: 15
  2. Median ($47k); mean ($70.8k) is skewed by outlier
  3. Range: 25 (35−10), IQR: 13 (26.5−13.5)
  4. 3/8 or 37.5% (HHT, HTH, THH: 3 out of 8 outcomes)
  5. 28/52 or 53.8% (26 red + 2 additional black kings)
  6. 3% (60÷2000 × 100)
  7. Class B (lower standard deviation means more consistent scores)

Key Takeaways

Mean vs Median: use median when outliers are present
Probability basics: multiply for AND, add for OR (mutually exclusive)
Sample size matters: larger samples are more reliable
Correlation ≠ causation: don't confuse the two
Context is critical: numbers without context mislead
Visual representation: graphs can help or deceive

Real-World Applications

  • Business Decisions: Analyze sales data, customer metrics
  • Investing: Understand market statistics and risk
  • Health: Interpret medical statistics and probabilities
  • News: Critically evaluate surveys and studies
  • Marketing: A/B testing, conversion optimization
  • Personal: Understand warranties, odds, risk assessments

Next Steps

Move to Chapter 06: Financial Mathematics to learn about interest calculations, loans, investments, and the time value of money. These skills are essential for wealth building and financial planning.