System Design Tutorial

A practical tour of designing backend systems that survive real scale and real failure, from load balancers to consensus to a full worked example.

Who This Is For

Developers who've shipped services but never sat down with the distributed systems literature
Engineers preparing for system design interviews who want the real engineering behind the whiteboard answers
Anyone who has watched a production system melt and wants a mental toolkit for the next one

Fundamentals

Introduction: Vocabulary (latency, throughput, availability), trade-offs, the shape of system design problems
Building Blocks: Clients, DNS, CDN, load balancers, reverse proxies, API gateways

Core Concepts

Scaling: Vertical vs horizontal, statelessness, sticky sessions, load balancing strategies
Caching: Where to cache, invalidation, TTLs, cache stampedes, Redis patterns
Databases: SQL vs NoSQL, replication, partitioning, sharding
Queues and Events: Message queues, streams, pub/sub, event-driven patterns, backpressure

Advanced

Consistency and Consensus: CAP, PACELC, quorum, Raft basics, eventual consistency
Reliability: Failure modes, retries, timeouts, circuit breakers, idempotency, graceful degradation
Observability: Logs, metrics, traces, SLOs, alerting that doesn't cry wolf

Ecosystem

Architecture Patterns: Monolith, modular monolith, microservices, CQRS, event sourcing

Mastery

Designing a System: Worked example (a URL shortener), back-of-envelope math, step-by-step
Best Practices: Evaluation patterns, common traps, anti-patterns

How to Use This Tutorial

Read sequentially for a complete learning path
Sketch the diagrams. A whiteboard and a marker beat reading alone
Tie each concept to a real system. Every pattern here runs in systems you use daily; name one when you meet a new term

Quick Reference

Back-of-Envelope Numbers

Every system design conversation uses these. Learn them.

L1 cache reference              1 ns
L2 cache reference              4 ns
Main memory reference           100 ns
SSD random read                 100 microseconds
Round trip in same datacenter   500 microseconds
Disk seek                       10 ms
Round trip intercontinental     150 ms

1 byte                          1 B
1 kilobyte                      10^3 B
1 megabyte                      10^6 B
1 gigabyte                      10^9 B
1 terabyte                      10^12 B

1 second                        10^9 nanoseconds

Availability Cheat Sheet

Availability    Downtime per year   Downtime per month
99%             3.65 days           7.2 hours
99.9%           8.76 hours          43.2 minutes
99.99%          52.56 minutes       4.32 minutes
99.999%         5.26 minutes        26 seconds

The Six Questions to Ask About Any Design

1. What's the read/write ratio?
2. What's the request rate at peak?
3. What's the data volume over a year?
4. What's the latency budget for the hot path?
5. What's the consistency requirement?
6. What happens when a component fails?

Answer these and most of the design draws itself.

Learning Path Suggestions

Daily fluency (roughly 8 hours)

Chapters 01 to 06 for the core building blocks
Chapter 08 for reliability
Chapter 11 for the worked example

Interview prep (roughly 10 hours)

All 12 chapters
Chapter 11 twice. Then design three more systems on your own
Practice the six questions above on a system you already know

Going deeper (roughly 15+ hours)

All chapters plus the additional resources below
Read the papers referenced in chapters 05 and 07
Build a small but real distributed thing (a key-value store, a pub/sub broker)

Additional Resources

Designing Data-Intensive Applications by Kleppmann: the one book to read
System Design Primer: an extensive GitHub repo
High Scalability: case studies of real production systems
Papers We Love: distributed systems papers, curated
The Raft paper: a readable consensus algorithm paper
AWS Builders' Library: field notes from operating very large services

A Note on Examples

Examples throughout use concrete systems (PostgreSQL, Redis, Kafka, NGINX, Envoy) rather than generic "component X". The patterns transfer across providers, but pointing at specific tools makes the trade-offs real.

Chapters

About this tutorial