Building Blocks: What Systems Are Made Of

This chapter introduces the components every real system uses: clients, DNS, CDNs, load balancers, reverse proxies, and API gateways. If you've seen them named in job posts but never drawn one, this chapter is for you.

Following a Request

A user types api.example.com/orders/42 in their app. Before your code runs, a lot has already happened.

Client
  ↓ (DNS: resolve api.example.com)
DNS
  ↓ (returns an IP, often of a load balancer or CDN edge)
CDN / Edge
  ↓ (if cached, answers here; else forwards)
Load Balancer
  ↓ (picks a backend)
Reverse Proxy / API Gateway
  ↓ (auth, rate limit, routing)
Application Server
  ↓ (your code runs)
Database / Cache

That chain exists in some form for every real system you've ever used. Meet each piece.

Clients

A client is anything that initiates a request: a browser, a mobile app, another backend service, a CLI. "Thick" clients (a mobile app) can do work; "thin" clients (a plain HTML page) offload most work to the server.

The distinction matters for caching, for offline behavior, and for auth. Mobile apps can hold long-lived tokens and retry intelligently; a CLI calling your API from a cron job usually can't.

DNS

DNS translates api.example.com to an IP address (or a set of them).

The flow:

Client asks:    "What's api.example.com?"
Resolver asks:  root servers → TLD servers → authoritative DNS → answers
Client caches:  A records for TTL seconds

Two details that bite you in practice:

  • TTLs. A DNS record's TTL (time-to-live) tells resolvers how long to cache. Short TTLs (60s) let you rotate IPs fast, at the cost of more DNS lookups. Long TTLs (1 hour) are efficient but slow to change.
  • Propagation. Changing a DNS record isn't instant globally. Plan failovers accordingly.

DNS can itself be a load balancer: return multiple A records and clients pick one (usually the first). "Round-robin DNS" is the simplest load balancer there is.

CDN

A Content Delivery Network is a set of geographically distributed caches. When a client requests cdn.example.com/logo.png, the CDN serves from the closest edge node instead of hitting your origin.

CDNs are great for:

  • Static assets (images, CSS, JS, video).
  • API responses that can be cached (public leaderboards, exchange rates).
  • HTML pages that are cacheable per-country or per-user-segment.

They're not magic. A CDN cache miss is more expensive than a direct request, because the edge has to fetch from origin and then cache. Design for high hit rates: long TTLs, cache keys that don't vary per user unless necessary.

Common providers: Cloudflare, Fastly, Akamai, CloudFront, Bunny.

Load Balancers

A load balancer sits in front of a pool of servers and distributes requests among them. Two flavors:

Layer 4 (L4) load balancer: operates on TCP. Doesn't understand HTTP. Very fast, very simple. Example: AWS NLB, HAProxy in TCP mode.

Layer 7 (L7) load balancer: operates on HTTP. Can route by URL path, header, or cookie. Can terminate TLS, rewrite headers, apply rate limits. Example: AWS ALB, NGINX, Envoy.

Most applications want L7 at the edge (for routing and TLS) and sometimes L4 deeper inside (for speed).

Routing Algorithms

  • Round robin: each backend takes turns. Simple, fair if backends are identical.
  • Least connections: send to the backend with fewest active connections. Good for long-lived connections.
  • Weighted: backends get weights. Bigger box, bigger weight.
  • Consistent hashing: map requests to backends based on a key (user ID, session ID). Same key goes to same backend, mostly. Chapter 3 comes back to this.
  • IP hash: hash of client IP picks a backend. Poor man's stickiness.

Health Checks

A load balancer continuously pings each backend. Backends that fail the health check get removed from the pool; if they recover, they're added back. Details matter: a too-strict check removes healthy backends during a spike; a too-lenient one keeps routing to dead boxes.

Reverse Proxies

"Reverse proxy" is often used interchangeably with "L7 load balancer", but it's a bit broader. A reverse proxy sits between clients and backend(s), and can:

  • Terminate TLS.
  • Compress responses.
  • Cache.
  • Rewrite requests.
  • Enforce limits.
  • Buffer slow clients.

NGINX, Apache HTTPD (with mod_proxy), Envoy, Traefik, Caddy. You'll see at least one in every system.

A common pattern: NGINX terminates TLS and handles static files; requests for dynamic endpoints are proxied to an application server behind it.

Client → NGINX (TLS, static files, gzip) → uvicorn/puma/tomcat → app code

API Gateways

An API gateway is a reverse proxy that's been given a product manager. On top of L7 routing, it handles:

  • Authentication. Validate a JWT, check an API key.
  • Authorization. This user can call this endpoint? Yes / no.
  • Rate limiting. 1000 requests per hour per API key.
  • Request transformation. Map external shapes to internal ones.
  • Metering. Log every call for billing.
  • Developer portal. Docs, keys, analytics.

Common products: Kong, Tyk, AWS API Gateway, Google Cloud API Gateway, Envoy-based solutions (Ambassador, Gloo).

The debate in many teams is whether you need a full API gateway or whether a reverse proxy plus an auth library in the application is enough. For an internal service, probably the latter. For a public developer API with paying customers, probably the former.

Web Server vs Application Server

Blurry in modern stacks, but the distinction still matters.

Web server: handles the HTTP protocol. NGINX, Apache, Caddy. Good at serving static files, terminating TLS, managing connections.

Application server: runs your code. Gunicorn or uvicorn (Python), Puma (Ruby), Tomcat (Java), a Node.js process. Implements the request/response cycle for dynamic content.

In deployment, the web server sits in front of the application server. Clients talk to NGINX; NGINX forwards dynamic requests to uvicorn; uvicorn runs your Python code. Production Python apps almost always look like this.

In some stacks they're combined. Go's net/http is both: a web server that runs your code. Same for many modern frameworks. The separation is still a useful mental model even when one binary does both.

Putting It Together

A typical request to a non-trivial system:

Browser
  ↓ DNS lookup (api.example.com → 203.0.113.42)
Cloudflare edge (CDN + WAF)
  ↓ cache miss
AWS ALB (L7 load balancer, TLS termination)
  ↓ route by path
API Gateway (auth, rate limit)
  ↓
NGINX (reverse proxy, gzip, static fallback)
  ↓
Application server (your code, runs request handler)
  ↓
Redis (cache lookup) → hit or miss
  ↓
PostgreSQL (source of truth)

Each hop adds latency. Each hop is a component that can fail. You add them deliberately, because each one solves a problem the previous layers can't.

Common Pitfalls

Too many layers. Every proxy is a thing to operate, scale, and debug. If a "simple" system has 6 proxies in the chain, something is probably wrong.

TLS terminated too early. If the load balancer terminates TLS and forwards plaintext, anyone who sniffs inside your VPC sees everything. End-to-end TLS is cheap now; use it.

No rate limiting. A single misbehaving client can take down a service that forgot to enforce limits. Put limits at the gateway.

Confusing CDN with application caching. A CDN caches outside your origin. An application cache caches inside. They solve different problems; most systems want both.

Assuming load balancers are infinitely scalable. Load balancers have limits too (connections, bandwidth). At scale, the load balancer itself becomes the bottleneck; solutions include client-side load balancing and DNS-level load balancing.

Next Steps

Continue to 03-scaling.md to handle more traffic than one box can serve.