Introduction: Vocabulary and Trade-offs
This chapter establishes the vocabulary for system design and the trade-offs every decision in the rest of the tutorial boils down to.
What System Design Is
System design is the art of arranging components so a whole thing works under load, under failure, and over time.
You don't usually sit down to "design a system" from scratch. You inherit one, extend one, or build a new service inside one. Every decision (pick this database, add a cache here, split this service, use a queue) is a trade-off against a few axes that rarely move together.
There's no right answer to most design questions. There are answers that fit the constraints and answers that don't. The goal is to choose knowingly.
The Vocabulary
Six words that every conversation uses. Learn them.
Latency
Time to complete one request. Usually measured in milliseconds. Often quoted at percentiles: "p99 latency of 100ms" means 99% of requests finish in 100ms or less.
p50 median. Half of requests are faster.
p95 95th percentile. A useful operating target.
p99 99th percentile. Tail latency.
p999 99.9th percentile. Where slow stuff hides.
The mean is almost useless. One slow request pulls the mean up; percentiles describe the distribution. Care about p95, p99, and when you have budget, p999.
Throughput
Requests (or bytes, or messages) per second. A service can have great p50 latency and terrible throughput if each request blocks a shared resource.
Latency and throughput aren't the same question. A call that takes 1 second can still do 1000/s if you run it on 1000 threads.
Availability
The fraction of time the system answers correctly. Usually expressed in nines:
99% 3.65 days of downtime per year (basic)
99.9% 8.76 hours per year (good)
99.99% 52 minutes per year (serious)
99.999% 5 minutes per year (very serious)
Every extra nine costs roughly 10x more engineering. Pick the minimum your product requires.
Durability
The probability that committed data is not lost. S3 advertises 11 nines of durability. A primary PostgreSQL with synchronous replication and regular backups is probably 4 or 5 nines.
Durability is not the same as availability. A disk can be unavailable for an hour but still durable (your data comes back); it can be "available" but corrupted (the service returns, but the data is wrong).
Consistency
How fresh reads are, and how they agree with each other. There's a whole vocabulary here (Chapter 7); for now know that "strong consistency" means every reader sees the latest write immediately, and "eventual consistency" means readers may see stale values for a while.
Fault Tolerance
The system's ability to keep working when things fail. Nodes crash. Disks fill. Networks partition. Fault tolerance is a property you build in with redundancy, retries, timeouts, and isolation (Chapter 8).
Trade-offs: Pick Two (At Most)
System design is mostly a trade-off exercise. A few that come up constantly:
Latency vs consistency. Stronger consistency usually means waiting for more nodes to agree. That's latency.
Latency vs durability. "Wait until the write is on two replicas before acknowledging" gives stronger durability. It also gives higher latency.
Consistency vs availability during a network partition. The CAP theorem in one sentence. You can't have both. Chapter 7 covers this properly.
Cost vs redundancy. Three-region active-active is more reliable. It also costs three times as much, minimum.
Simplicity vs flexibility. A microservice architecture is flexible. A monolith is simple. You don't get both.
A good design is explicit about which trade-offs it's making. A design that claims to be "highly scalable and strongly consistent and low-latency and cheap" is usually hand-waving past one of those.
Percentiles, Properly
Say your API has p99 latency of 100ms. You've got 10 microservices. What's the end-to-end p99?
Not 100ms. Often closer to 1 second.
Tail latencies compound. If every service has a 1% chance of being slow, the chance that at least one of 10 is slow on a given request is much higher than 1%. The p99 of the slowest component often dominates end-to-end latency.
Practical implications:
- Fewer hops is cheaper than optimizing each hop.
- Cut tail latency aggressively. Add timeouts, hedge requests, cache defensively.
- Measure end-to-end, not just per-service.
Back-of-Envelope Math
You're in a design review. Someone proposes storing every session in memory. Can we?
Assume 10M daily active users, 1KB per session. That's 10GB. One machine can hold it. Done.
Now assume 1B users, 10KB per session. That's 10TB. Not one machine.
Back-of-envelope math answers "is this plausible?" in under a minute. Memorize these:
1M users × 1 KB = 1 GB (one machine, trivially)
100M users × 1 KB = 100 GB (one machine, carefully)
1B users × 1 KB = 1 TB (one machine, tight)
1B users × 1 KB × 365 days = 365 TB (cluster or cold storage)
1M requests/day ≈ 12 requests/sec (negligible)
1M requests/hour ≈ 280 requests/sec (one-box territory)
1K requests/sec sustained (a modest cluster)
1M requests/sec (big tech scale)
Main memory read: 100 ns
SSD random read: 100 microseconds
Datacenter round trip: 500 microseconds
Cross-region round trip: 50 ms
Global round trip: 150 ms
Get comfortable multiplying these in your head. A system design conversation without numbers is a story, not a design.
The Six Questions
Before designing anything, answer these:
1. What's the read/write ratio?
2. What's the request rate at peak?
3. What's the data volume over a year?
4. What's the latency budget for the hot path?
5. What's the consistency requirement?
6. What happens when a component fails?
A read-heavy system wants caches and replicas. A write-heavy system wants partitioning. A 10 req/sec system wants a monolith. A 10M req/sec system doesn't. These six answers shape every subsequent decision.
What This Tutorial Isn't
This is a practical system design tutorial. The focus is patterns, trade-offs, and the machinery that actually runs in production.
It isn't:
- A distributed systems research course. The papers are linked; the formalism isn't.
- An interview cram guide. Chapter 11 does a worked example; the other chapters build the toolkit.
- A prescription. Every system is different; the patterns are starting points, not templates.
Next Steps
Continue to 02-building-blocks.md to meet the components every real system is made of.