Statistics Basics
Learn to understand and interpret data, calculate meaningful averages, grasp probability, and make data-driven decisions in business and life.
Measures of Central Tendency
These tell you what's "typical" or "average" in a dataset.
Mean (Average)
Definition: Sum of all values divided by count
Formula: Mean = Sum of all values ÷ Number of values
Example: Test scores: 85, 90, 78, 92, 85
- Sum: 85 + 90 + 78 + 92 + 85 = 430
- Count: 5
- Mean:
430 ÷ 5 = 86
Life Application: Average monthly expense
- Expenses: $2,400, $2,100, $2,600, $2,200, $2,300, $2,500
- Mean:
$14,100 ÷ 6 = $2,350/month
Business Application: Average revenue per customer
- Total revenue: $125,000
- Customers: 250
- Mean:
$125,000 ÷ 250 = $500/customer
Limitation: Sensitive to extreme values (outliers)
Example: Salaries: $45k, $48k, $50k, $52k, $300k
- Mean: $99k (not representative of typical salary)
Median (Middle Value)
Definition: The middle value when data is ordered
Finding Median:
- Order values from least to greatest
- Odd count: Pick middle value
- Even count: Average the two middle values
Example (odd count): 15, 22, 18, 30, 25
- Ordered: 15, 18, 22, 25, 30
- Median: 22
Example (even count): 10, 15, 20, 25, 30, 35
- Ordered: 10, 15, 20, 25, 30, 35
- Median:
(20 + 25) ÷ 2 = 22.5
Advantage: Not affected by outliers
Example: Same salaries as before: $45k, $48k, $50k, $52k, $300k
- Median: $50k (much more representative)
Real Estate Application: Home prices in neighborhood
- Prices: $280k, $295k, $310k, $305k, $1.2M (outlier)
- Median: $305k (better represents typical home)
- Mean: $478k (skewed by mansion)
Mode (Most Frequent)
Definition: Value that appears most often
Example: Shoe sizes sold: 7, 8, 8, 8, 9, 9, 10, 11
- Mode: 8 (appears 3 times)
Business Application: Most popular product size
- Helps with inventory decisions
Multiple Modes: Dataset can have more than one mode
- Example: 5, 5, 5, 8, 10, 10, 10, 12
- Modes: 5 and 10 (bimodal)
No Mode: All values appear equally
- Example: 2, 4, 6, 8, 10 (no mode)
When to Use Which?
| Situation | Best Measure | Why |
|---|---|---|
| No outliers, symmetric data | Mean | Most precise |
| Outliers present | Median | Not skewed by extremes |
| Categorical data (sizes, colors) | Mode | Shows most common |
| Income/salary data | Median | Outliers common |
| Test scores (no outliers) | Mean | Represents true average |
| Product sizes to stock | Mode | Shows what sells most |
Measures of Spread
These show how spread out or varied the data is.
Range
Definition: Difference between highest and lowest values
Formula: Range = Maximum − Minimum
Example: Test scores: 65, 78, 82, 91, 95
- Range:
95 − 65 = 30
Business Application: Salary ranges
- Entry: $45k, Senior: $95k
- Range: $50k
Limitation: Only uses two values, ignores everything in between
Quartiles and Interquartile Range (IQR)
Quartiles divide ordered data into four equal parts:
- Q1: 25th percentile (1/4 of data below)
- Q2: 50th percentile (median)
- Q3: 75th percentile (3/4 of data below)
IQR (Interquartile Range): Q3 − Q1
- Shows spread of middle 50% of data
- Less affected by outliers than range
Example: Data: 10, 12, 15, 18, 20, 23, 25, 28, 30
- Q1 (between 12 and 15): 13.5
- Q2 (median): 20
- Q3 (between 25 and 28): 26.5
- IQR:
26.5 − 13.5 = 13
Standard Deviation (Conceptual)
Definition: Average distance of values from the mean
Low standard deviation: Values clustered close to mean
High standard deviation: Values spread out widely
Example: Two classes, both with mean score of 80
Class A: 78, 79, 80, 81, 82 (low spread)
Class B: 50, 70, 80, 90, 100 (high spread)
Class A has lower standard deviation (more consistent).
Business Application:
- Consistent sales: Low standard deviation (predictable)
- Volatile sales: High standard deviation (unpredictable)
Probability Basics
Probability: Likelihood of an event occurring, from 0 (impossible) to 1 (certain)
Formula: Probability = (Favorable outcomes) ÷ (Total possible outcomes)
Often expressed as:
- Decimal: 0.25
- Fraction: 1/4
- Percentage: 25%
Simple Probability
Example: Rolling a die, probability of getting 4
- Favorable: 1 (only one 4)
- Total: 6 (six sides)
- Probability:
1/6 ≈ 0.167 or 16.7%
Example: Drawing ace from standard deck
- Favorable: 4 (four aces)
- Total: 52 cards
- Probability:
4/52 = 1/13 ≈ 7.7%
Complementary Probability
Concept: Probability of event NOT happening
Formula: P(not A) = 1 − P(A)
Example: Probability it rains tomorrow is 30%
- Probability it doesn't rain:
1 − 0.30 = 0.70 or 70%
Independent Events (AND)
Definition: Events that don't affect each other
Rule: Multiply probabilities
Example: Flip coin twice, probability of two heads
- First flip: P(H) = 1/2
- Second flip: P(H) = 1/2
- Both heads:
1/2 × 1/2 = 1/4 = 25%
Business Example: Three independent sales calls
- Each has 20% chance of success
- Probability all three succeed:
0.20 × 0.20 × 0.20 = 0.008 = 0.8%
Dependent Events
Definition: First event affects the second
Example: Draw two cards without replacement
- First card is ace: 4/52
- Second card is also ace: 3/51 (only 3 aces left in 51 cards)
- Both aces:
(4/52) × (3/51) ≈ 0.0045 or 0.45%
Mutually Exclusive Events (OR)
Definition: Events that can't happen simultaneously
Rule: Add probabilities
Example: Roll die, probability of 3 OR 5
- P(3) = 1/6
- P(5) = 1/6
- P(3 or 5) =
1/6 + 1/6 = 2/6 = 1/3 ≈ 33.3%
Data Interpretation
Reading Tables
Sales Data Example:
| Month | Units Sold | Revenue |
|---|---|---|
| Jan | 150 | $15,000 |
| Feb | 180 | $18,000 |
| Mar | 165 | $16,500 |
Questions to Ask:
- What's the average? Mean units:
(150+180+165)/3 = 165 - What's the trend? Increasing then slight decrease
- What's the best month? February
Understanding Percentages in Data
Survey Result: "60% of 500 respondents prefer option A"
- Number who prefer A:
500 × 0.60 = 300 people - Number who don't:
500 − 300 = 200 people
Correlation vs Causation
Correlation: Two things change together
Causation: One thing causes the other
Example: Ice cream sales and drowning incidents both increase in summer
- Correlated: Yes (both increase together)
- Causal: No (ice cream doesn't cause drowning)
- Real cause: Hot weather affects both
Business Trap: Don't assume correlation means causation
- Website traffic and sales correlated? Maybe, but need to test
- Marketing spend and revenue correlated? Possible causation, but verify
Common Statistical Pitfalls
Small Sample Size
Problem: 3 people surveyed, 2 prefer product A = "67% prefer A" Reality: Sample too small for reliable conclusion
Better: Survey at least 100+ for reasonable confidence
Cherry-Picking Data
Problem: "Sales increased 50% in December!" Reality: Comparing to lowest month (November) instead of fair comparison
Better: Compare to same month last year or use annual average
Misleading Graphs
Problem: Y-axis doesn't start at zero, making changes look dramatic
Example: Stock price graph
- Starts at $98 instead of $0
- Change from $98 to $102 looks huge
- Actually only 4% increase
Better: Use appropriate scale and clearly label axes
Survivorship Bias
Problem: Only looking at successful examples
Example: "Successful entrepreneurs dropped out of college" Reality: Only counting successes, ignoring thousands who dropped out and failed
Better: Look at complete dataset, including failures
Practical Business Statistics
Conversion Rate
Formula: Conversion Rate = (Conversions ÷ Visitors) × 100
Example:
- Website visitors: 5,000
- Purchases: 150
- Conversion rate:
(150 ÷ 5,000) × 100 = 3%
Customer Retention Rate
Formula: Retention = [(End − New) ÷ Start] × 100
Example:
- Start of year: 500 customers
- End of year: 480 customers
- New customers: 100
- Retained:
[(480 − 100) ÷ 500] × 100 = 76%
Net Promoter Score (NPS)
Scale: 0-10, "How likely to recommend?"
- 9-10: Promoters
- 7-8: Passives
- 0-6: Detractors
Formula: NPS = % Promoters − % Detractors
Example:
- 100 responses: 50 promoters, 20 passives, 30 detractors
- NPS:
50% − 30% = 20(score can be −100 to +100)
Practice Problems
Central Tendency
- Find mean, median, mode: 12, 15, 18, 15, 21, 24, 15
- Which measure best represents typical salary: $40k, $42k, $45k, $48k, $50k, $200k?
Spread
- Find range and IQR for: 10, 15, 18, 22, 25, 30, 35
Probability
- Flip coin three times. Probability of exactly two heads?
- Draw one card from deck. Probability it's red OR a king?
Data Interpretation
- Website had 2,000 visitors, 60 made purchases. What's the conversion rate?
- Class A: mean score 75, std dev high. Class B: mean score 75, std dev low. Which is more consistent?
Solutions
- Mean: 17.14, Median: 15, Mode: 15
- Median ($47k); mean ($70.8k) is skewed by outlier
- Range: 25 (35−10), IQR: 13 (26.5−13.5)
- 3/8 or 37.5% (HHT, HTH, THH: 3 out of 8 outcomes)
- 28/52 or 53.8% (26 red + 2 additional black kings)
- 3% (60÷2000 × 100)
- Class B (lower standard deviation means more consistent scores)
Key Takeaways
✓ Mean vs Median: use median when outliers are present
✓ Probability basics: multiply for AND, add for OR (mutually exclusive)
✓ Sample size matters: larger samples are more reliable
✓ Correlation ≠ causation: don't confuse the two
✓ Context is critical: numbers without context mislead
✓ Visual representation: graphs can help or deceive
Real-World Applications
- Business Decisions: Analyze sales data, customer metrics
- Investing: Understand market statistics and risk
- Health: Interpret medical statistics and probabilities
- News: Critically evaluate surveys and studies
- Marketing: A/B testing, conversion optimization
- Personal: Understand warranties, odds, risk assessments
Next Steps
Move to Chapter 06: Financial Mathematics to learn about interest calculations, loans, investments, and the time value of money. These skills are essential for wealth building and financial planning.