Architecture Patterns: Shapes for the Whole System | System Design Tutorial

The Monolith (Still the Default)

A monolith is one deployable artifact containing the whole application. One repo, one process, one database, one pipeline.

For years this was the default; then microservices took over the conference talks; now the industry has come back around to "a well-structured monolith is fine for most companies". Monoliths are:

Simple to build. Call a function. That's it.
Simple to operate. One service to deploy, one log stream to tail, one set of metrics.
Cheaper. One database, one infrastructure footprint.
Easy to test end-to-end. The whole thing starts in seconds.

Where they hurt:

Build times grow. A 20-minute build on every PR is painful.
Deploy coupling. One team's bug blocks another team's release.
Scale boundaries are coarse. You scale the whole thing, even if only one subsystem is hot.
Technology lock-in. Everything is one language, one framework.

Most of these pains are solvable short of microservices. Faster CI, feature flags, modular structure. Don't abandon the monolith until you've tried.

Modular Monolith

A monolith with internal boundaries. Each module has its own public API, its own data model, and enforced rules against reaching into another module's internals.

/app
  /users
    public_api.py       <- only this is imported by other modules
    internal/
      models.py
      repository.py
  /orders
    public_api.py
    internal/
      models.py
      repository.py
  /billing
    ...

Cross-module calls go through the public API. Internal changes don't affect other modules.

Benefits:

Teams can own modules and move independently without deploy-time isolation.
Refactoring a module is local.
You can later extract a module to its own service if it genuinely needs isolation.

Most "we need microservices" arguments dissolve under "try a modular monolith first". This is the default for product companies now. Shopify famously runs on one.

Microservices

Multiple deployable artifacts, each owning a slice of the domain.

users-service         (auth, profiles)
orders-service        (cart, checkout)
inventory-service     (stock, fulfillment)
payments-service      (charges, refunds)
notifications-service (email, SMS, push)

Each service has its own database, its own deploy pipeline, its own team (ideally).

The Real Costs

Before you pick microservices, be honest about the costs:

Network calls replace function calls. Every call is potentially slow, failing, retrying. Chapter 8 exists.
Distributed transactions are painful. What was a database transaction becomes a saga (see below).
Testing gets harder. End-to-end tests require multiple services running. Integration tests multiply.
Observability is harder. Tracing across services is mandatory; service logs alone aren't enough.
Deploy coordination. Breaking changes to an API require coordinated deploys.
Operational load. Every service is a thing to deploy, monitor, alert on, secure, update.

You're paying ongoing costs forever. Make sure you get enough benefit to justify them.

When Microservices Actually Help

Reasons that justify the cost:

Independent scaling. A read-heavy service wants replicas; a write-heavy one wants partitioning. Breaking them up lets you scale each appropriately.
Team autonomy. 50+ engineers stepping on each other in one codebase is real pain; giving each team a service they own makes development faster.
Technology diversity. ML in Python, low-latency in Go, legacy in Java. Separate services let each use its best tool.
Fault isolation. A bug in reports doesn't take down payments.
Compliance boundaries. Payments runs on PCI-certified infrastructure; the rest doesn't need to.

If none of these apply, you don't need microservices. "The AWS blog post said..." is not a reason.

Start Monolith, Extract as You Grow

The mature path is: start monolithic, keep modules clean, extract a module to its own service when you have a specific reason. The first extraction is usually the hardest; after that, the pattern is known.

Extracting also tells you if you really need it. If the extraction takes three months and breaks nothing, maybe you didn't.

CQRS (Command Query Responsibility Segregation)

In a traditional app, the same data model serves reads and writes. CQRS splits them.

Writes → command handler → write model (normalized, transactional)
                             ↓ events
Reads  → query handler   → read models (denormalized, optimized per view)

Writes modify one source of truth. Reads come from projected views optimized for specific queries.

Benefits:

Read-heavy workloads scale independently.
Complex queries get dedicated read models (one per view: "user dashboard", "admin search", "analytics").
The write model stays clean and normalized.

Costs:

Consistency is eventual between write and read models.
Complexity: now there's more than one model of the same data.

CQRS is worth it for:

High read/write ratios (100:1 or more).
Views that require expensive joins or aggregations.
Systems where certain reads need to be globally fast (CDN-cacheable projections).

CQRS is overkill for:

Simple CRUD apps.
Teams that can live with one well-indexed database.

Event Sourcing

Instead of storing current state, store the events that produced it.

Traditional:
  accounts table: id=42, balance=100

Event-sourced:
  events table (append-only):
    {id: 1, type: "AccountCreated", account_id: 42}
    {id: 2, type: "Deposited",       account_id: 42, amount: 150}
    {id: 3, type: "Withdrawn",       account_id: 42, amount:  50}

  current balance = fold over events = 100

The event log is the source of truth. Current state is a projection.

Benefits:

Full audit trail. Every change is recorded with context.
Time travel. Replay events to see state at any point in history.
Multiple projections. Build any read model from the same events.
Natural fit for CQRS.

Costs:

Schema evolution is hard. Old events persist forever; your code must still handle them.
Debugging is different. State isn't a row; it's a fold over history.
Deletion is awkward. GDPR "right to be forgotten" conflicts with "never lose events".
Ops complexity. Event stores (EventStoreDB, Axon, Kafka-as-event-store) are specialist tools.

Event sourcing is the right answer when audit is a business requirement (banking, trading, inventory ledgers). It's the wrong answer when you just want to log user actions; for that, an events table alongside normal state is enough.

Sagas: Long-Running Distributed Transactions

When one business operation spans multiple services (order → pay → fulfill → ship), you can't wrap it in a database transaction. Services may fail partway; the network may partition. You need a different pattern: the saga.

A saga is a sequence of steps, each with a compensating action. If step N fails, run the compensating actions for 1 through N-1 to roll back.

Place order
  1. Reserve inventory   (compensation: release inventory)
  2. Charge payment      (compensation: refund payment)
  3. Create shipment     (compensation: cancel shipment)
  4. Send email          (compensation: [no meaningful compensation])

Two styles:

Orchestration: a central coordinator drives the saga. Easier to reason about. Single point of control.

Choreography: each service reacts to events from other services. Decoupled. Hard to follow.

Sagas don't give you atomicity (rollback is visible). They give you eventual consistency with compensation. Design with that in mind: the "refund" after a failed order is a real operation, not a no-op.

Strangler Fig

A migration pattern. Named after the strangler fig vine, which grows around a tree and eventually replaces it.

When replacing a legacy system, you don't shut it off on a weekend. You put a router in front; route some traffic to the new system and the rest to the old; incrementally grow the new system's footprint.

Clients → router
            ├── old monolith (80% of traffic)
            └── new service  (20%, growing)

Benefits:

Risk is bounded. Roll back by routing traffic back.
Features migrate at the team's pace, not in one big-bang release.
Old and new can coexist indefinitely if needed.

Every legacy replacement should be a strangler fig. Big-bang rewrites fail at alarming rates; stranglers ship.

Picking a Shape

A decision tree:

Is this a new service with unclear requirements? → Monolith. Iterate.
Is the monolith getting painful (team coordination, scale, deploy cadence)? → Modular monolith.
Are you blocked by shared deploy, scale, or technology? → Extract one or two services, not 30.
Do you have real multi-team, multi-scale, multi-technology pressure at 50+ engineers? → Microservices.
Does a read view need independent scale or complexity? → CQRS on one bounded context.
Is audit or time travel a business requirement? → Event sourcing on the relevant entities.
Migrating off a legacy system? → Strangler fig.

Most companies should stop between 1 and 3. Microservices is a tool for large organizations; adopting it at 10 engineers is a self-inflicted wound.

Common Pitfalls

Microservices on day 1. You haven't learned your domain boundaries yet. You'll get them wrong. Extract later.

"Distributed monolith". All the cost of microservices (network, ops) with none of the independence (one team coordinates all deploys). Usually the result of splitting too fine.

Shared database across services. Every service reads and writes the same tables. Now changes to the schema block everyone. Just run a modular monolith.

Event sourcing as a default. It's a niche pattern. Use it when audit matters, not because it sounded interesting.

Saga without compensation. Failures leave the system in limbo. Every step needs a compensating action, even if it's "manual review".

No strangler fig during a rewrite. Six months of parallel effort, then one terrifying cutover. Route traffic gradually.

Next Steps

Continue to 11-designing-a-system.md to walk through a complete design exercise.