Queues and Events: Async Work | System Design Tutorial

Why Async

The synchronous request/response cycle is the default, but it breaks for three reasons:

Slow work. Generating a 200-page PDF shouldn't block an HTTP request.
Spiky load. Black Friday traffic spikes to 100x normal; better to buffer than to scale up synchronously.
Fan-out. One event triggers multiple downstream actions (email, analytics, webhooks).

Queues and streams decouple producers from consumers. The producer submits work and moves on; a consumer processes it later.

Queues vs Streams

Two mental models. Same surface, different semantics.

Queue (SQS, RabbitMQ, Redis Streams as a queue)

A message goes to one consumer. Once acknowledged, it's gone.

Producer → [msg1][msg2][msg3] → Consumer (pulls one at a time)

Good fit: "process this order", "send this email", "resize this image". Work items that should be done once.

Stream (Kafka, Kinesis, Redis Streams as a log)

A message is appended to a durable log. Multiple consumers read independently, each tracking its own position.

Producer → [msg1 msg2 msg3 msg4 ...] (append-only log)
             ↑              ↑
             Consumer A     Consumer B
             (offset 1)     (offset 3)

Good fit: events that multiple systems care about. "Order placed" → billing, analytics, email, fraud checks. Each consumer reads the whole log at its own pace.

The key differences:

                    Queue               Stream
Retention           Until acked         Configurable (days/weeks)
Replay              No                  Yes (by resetting offset)
Fan-out             Hard (copy msgs)    Native (many consumers)
Ordering            Partial             Per partition
Throughput          High                Very high

Pick the queue for task distribution; pick the stream for event distribution.

Pub/Sub

Publish/subscribe: the producer publishes to a topic; all subscribers receive a copy. It's fan-out without the producer knowing who's listening.

Implementations:

Ephemeral pub/sub (Redis PUBSUB): real-time, no persistence. If a subscriber is offline, it misses messages.
Persistent pub/sub (Kafka, Google Pub/Sub, AWS SNS+SQS): messages are buffered for offline subscribers.

For any important event, use persistent pub/sub. The cost of "missed the message" grows with the team.

Delivery Guarantees

Three flavors, only two of which are real.

At-Most-Once

Each message is delivered 0 or 1 times. If the network flakes, the message is lost.

Use when: loss is cheaper than duplication (metrics samples, keep-alive pings).

At-Least-Once

Each message is delivered 1 or more times. If the consumer acks but the ack is lost, the message is redelivered. Duplicates happen.

Use when: loss matters. Build your consumer to be idempotent so duplicates are safe.

This is the default for Kafka, SQS, RabbitMQ with ack. Most systems live here.

Exactly-Once

"Each message is processed exactly once." Mostly a lie.

Exactly-once processing is real only when the consumer's side effects are idempotent, or when the consumer and the downstream system are in the same transaction (Kafka's transactional writes, for example).

When someone says "exactly-once", ask: "exactly-once what?" Delivery? Processing? Side effects? The honest answer is usually "at-least-once delivery, idempotent processing".

Idempotency (Again)

If your consumer is idempotent, "at-least-once delivery" is "exactly-once processing". Some ways to get there:

Deduplication keys. Include a unique ID with each message; the consumer records processed IDs and skips duplicates.
Idempotent operations. SET balance = 100 is idempotent; ADD 50 to balance is not. Model operations as state transitions where possible.
Upsert at the sink. Database upserts (ON CONFLICT DO UPDATE) swallow duplicates.

Chapter 8 covers this in more depth.

Consumer Groups

Kafka's central idea. A consumer group is a set of consumers that share the work of reading a topic.

Each partition of the topic is read by exactly one consumer in the group.
Add consumers up to the number of partitions; work spreads automatically.
Different consumer groups read independently.

Topic "orders" with 6 partitions
├── group "billing"   (3 consumers, each reads 2 partitions)
└── group "analytics" (6 consumers, one per partition)

Consumer groups are how streams scale. Throughput is bounded by the number of partitions; adding partitions lets you add consumers.

Partition count is a critical design choice:

Too few → limited parallelism, consumers are bottlenecks.
Too many → overhead (metadata, coordination), cross-partition operations get painful.
Start with enough partitions to handle peak throughput × 3 or 4.

Backpressure

Producers can outpace consumers. Without backpressure, the queue grows forever; eventually the broker, the database, or the disk runs out.

Strategies for applying backpressure:

Bounded Queue + Block

The producer blocks when the queue is full. Simple, effective. Pushes the problem upstream: the producer is now slow, which may push back further.

Drop Messages

When full, drop oldest (or newest, or random). Valid when loss is tolerable (metrics, logs) and cheaper than blocking. Alert on drops.

Load Shedding

Reject requests at the edge when queue depth exceeds a threshold. Better to fail fast than to queue work that will never be served.

Rate Limiting at the Producer

Limit how fast producers can enqueue. Token bucket, leaky bucket, fixed window.

Auto-Scaling the Consumer

Add consumers when queue depth grows. Bounded by partition count on streams; limited only by cost on queues.

The worst pattern: unbounded queue with no backpressure. The system looks healthy until the broker dies.

Dead Letter Queues (DLQ)

A message that fails repeatedly (bad data, poison pill) can block the queue. Move it to a dead letter queue after N retries; keep the main queue flowing.

Typical pattern:

main queue → consumer
  on failure → retry with backoff
  after N retries → send to dead letter queue → alert engineer

Every production queue needs a DLQ and an alert on its depth. "The queue is processing fine" while 10,000 poison messages pile up in the DLQ is a real failure mode.

Event-Driven Architecture

Instead of services calling each other directly, they publish events and subscribe to events they care about.

Order service    → "OrderPlaced" → Kafka
                                    ↓
                                    Billing (charge card)
                                    Inventory (decrement stock)
                                    Email (send confirmation)
                                    Analytics (record funnel)

Pros:

Services are decoupled. Adding a new consumer doesn't require changes to producers.
Natural audit log. The event stream is "what happened".
Supports replay and reprocessing (fix a bug, reprocess the affected events).

Cons:

Harder to trace. One event fans out to many places; debugging "why didn't the email go out" is multi-service.
Eventual consistency by default. If billing fails after order is placed, the system must reconcile.
Schema evolution is hard. Events persist for weeks; consumers must handle old shapes.

Event-driven is a powerful pattern at medium-to-large scale. For a small service, the complexity isn't worth it.

Common Brokers

RabbitMQ         Classic queue. Rich routing, AMQP. Mature, great for task queues.
Kafka            Durable log, high throughput, partitioned. Industry default for streams.
AWS SQS          Managed queue. Simple, reliable, at-least-once with DLQs.
AWS SNS + SQS    Pub/sub fan-out to queues. Composable.
AWS Kinesis      Managed streams. Kafka-lite on AWS.
Google Pub/Sub   Managed pub/sub. At-least-once, ordered on keys, scalable.
Redis Streams    Lightweight log. Good for small-to-medium throughput, simple ops.
NATS             Lightweight, pub/sub + JetStream for durable. Low operational footprint.

Pick based on ops model (managed vs self-hosted), throughput, retention needs, and the existing stack. Don't mix three brokers unless each solves a distinct problem.

Common Pitfalls

Queue with no idempotency. Duplicates happen. Your consumer must handle them.

No DLQ, no alerting on queue depth. The first sign of trouble should be an alert, not a customer ticket.

Infinite retries in a loop. A poison message loops forever; the consumer spins; nothing else gets processed. Always cap retries.

Using a queue as a request/response. If you need a response, use RPC. Queues are for work; responses belong elsewhere.

Kafka for simple task queues. Kafka is great but operationally heavy. SQS or RabbitMQ is simpler for task distribution.

Ignoring message ordering. Most queues don't guarantee strict ordering across partitions. Design for out-of-order messages or partition by a key that preserves the order you need.

Next Steps

Continue to 07-consistency-and-consensus.md for the theory behind the trade-offs you've been making.