Tutorial
System Design Tutorial
A practical tutorial on designing backend systems that survive scale and failure. Covers the building blocks (load balancers, caches, queues, databases), scaling patterns, consistency and consensus, reliability and observability, architecture patterns, and a worked design example.
Chapters
01
Introduction: Vocabulary and Trade-offs
02
Building Blocks: What Systems Are Made Of
03
Scaling: More Traffic Than One Box
04
Caching: The Oldest Trick
05
Databases: Choice, Replication, Sharding
06
Queues and Events: Async Work
07
Consistency and Consensus: The Theory That Bites
08
Reliability: When Things Fail
09
Observability: Logs, Metrics, Traces
10
Architecture Patterns: Shapes for the Whole System
11
Designing a System: A Worked Example
12
Best Practices: Habits and Anti-patterns
About this tutorial
A practical tour of designing backend systems that survive real scale and real failure, from load balancers to consensus to a full worked example.
Who This Is For
- Developers who've shipped services but never sat down with the distributed systems literature
- Engineers preparing for system design interviews who want the real engineering behind the whiteboard answers
- Anyone who has watched a production system melt and wants a mental toolkit for the next one
Contents
Fundamentals
- Introduction: Vocabulary (latency, throughput, availability), trade-offs, the shape of system design problems
- Building Blocks: Clients, DNS, CDN, load balancers, reverse proxies, API gateways
Core Concepts
- Scaling: Vertical vs horizontal, statelessness, sticky sessions, load balancing strategies
- Caching: Where to cache, invalidation, TTLs, cache stampedes, Redis patterns
- Databases: SQL vs NoSQL, replication, partitioning, sharding
- Queues and Events: Message queues, streams, pub/sub, event-driven patterns, backpressure
Advanced
- Consistency and Consensus: CAP, PACELC, quorum, Raft basics, eventual consistency
- Reliability: Failure modes, retries, timeouts, circuit breakers, idempotency, graceful degradation
- Observability: Logs, metrics, traces, SLOs, alerting that doesn't cry wolf
Ecosystem
- Architecture Patterns: Monolith, modular monolith, microservices, CQRS, event sourcing
Mastery
- Designing a System: Worked example (a URL shortener), back-of-envelope math, step-by-step
- Best Practices: Evaluation patterns, common traps, anti-patterns
How to Use This Tutorial
- Read sequentially for a complete learning path
- Sketch the diagrams. A whiteboard and a marker beat reading alone
- Tie each concept to a real system. Every pattern here runs in systems you use daily; name one when you meet a new term
Quick Reference
Back-of-Envelope Numbers
Every system design conversation uses these. Learn them.
L1 cache reference 1 ns
L2 cache reference 4 ns
Main memory reference 100 ns
SSD random read 100 microseconds
Round trip in same datacenter 500 microseconds
Disk seek 10 ms
Round trip intercontinental 150 ms
1 byte 1 B
1 kilobyte 10^3 B
1 megabyte 10^6 B
1 gigabyte 10^9 B
1 terabyte 10^12 B
1 second 10^9 nanoseconds
Availability Cheat Sheet
Availability Downtime per year Downtime per month
99% 3.65 days 7.2 hours
99.9% 8.76 hours 43.2 minutes
99.99% 52.56 minutes 4.32 minutes
99.999% 5.26 minutes 26 seconds
The Six Questions to Ask About Any Design
1. What's the read/write ratio?
2. What's the request rate at peak?
3. What's the data volume over a year?
4. What's the latency budget for the hot path?
5. What's the consistency requirement?
6. What happens when a component fails?
Answer these and most of the design draws itself.
Learning Path Suggestions
Daily fluency (roughly 8 hours)
- Chapters 01 to 06 for the core building blocks
- Chapter 08 for reliability
- Chapter 11 for the worked example
Interview prep (roughly 10 hours)
- All 12 chapters
- Chapter 11 twice. Then design three more systems on your own
- Practice the six questions above on a system you already know
Going deeper (roughly 15+ hours)
- All chapters plus the additional resources below
- Read the papers referenced in chapters 05 and 07
- Build a small but real distributed thing (a key-value store, a pub/sub broker)
Additional Resources
- Designing Data-Intensive Applications by Kleppmann: the one book to read
- System Design Primer: an extensive GitHub repo
- High Scalability: case studies of real production systems
- Papers We Love: distributed systems papers, curated
- The Raft paper: a readable consensus algorithm paper
- AWS Builders' Library: field notes from operating at scale
A Note on Examples
Examples throughout use concrete systems (PostgreSQL, Redis, Kafka, NGINX, Envoy) rather than generic "component X". The patterns transfer across providers, but pointing at specific tools makes the trade-offs real.