Message Queues and Pub/Sub

Updated 8 hours ago

Message queues and publish-subscribe exist because of a simple reality: in distributed systems, things fail. Services crash. Networks partition. Components run at wildly different speeds. If Service A needs to tell Service B something important, but B is down—what happens to that message?

These patterns answer that question differently. Understanding which answer you need determines which pattern to use.

The Core Problem

Direct communication between services is fragile. When Service A calls Service B synchronously, A has to wait. If B is slow, A is slow. If B is down, A fails. If A sends requests faster than B can handle them, something breaks.

Message queues and pub/sub both solve this by introducing indirection. Instead of A talking directly to B, A talks to an intermediary. The message sits between them, patient and durable, holding the request until someone is ready to fulfill it.

This indirection buys you three things: temporal decoupling (sender and receiver don't need to be available simultaneously), load leveling (bursts get absorbed into the queue), and failure isolation (one component's crash doesn't cascade).

Message Queues: One Message, One Consumer

A message queue implements point-to-point communication. Producers send messages to a queue. Consumers pull messages from the queue. Each message is processed by exactly one consumer.

Picture a bakery with a ticket dispenser. Customers take tickets (messages join the queue). When a baker is free, they call the next number. Once a baker takes a ticket, no other baker handles that customer. If more customers arrive than bakers can handle, the queue grows. If a baker is slow, customers wait longer—but nobody gets served twice, and nobody gets skipped.

This pattern is natural for work distribution. Background jobs. Order processing. Any task where you need exactly-once handling and want to scale by adding more workers.

Pub/Sub: One Message, Many Consumers

Publish-subscribe implements one-to-many communication. Publishers send messages to topics. Subscribers receive copies of messages from topics they've subscribed to. Each message can be processed by many consumers independently.

When a user changes their email address, multiple systems might care: billing needs to update invoices, marketing needs to update campaigns, support needs to update tickets. With pub/sub, you publish one "email changed" event. Each system subscribes and handles it independently. They don't know about each other. They don't coordinate. They just listen and react.

This is event-driven architecture. Systems become loosely coupled—connected by events rather than direct calls. Adding a new consumer means subscribing to the topic, not modifying the publisher.

The Fundamental Difference

Queues distribute work. Pub/sub broadcasts events.

With a queue, consumers compete. Ten messages in the queue, three workers—each worker might get three or four messages. No message gets processed twice.

With pub/sub, consumers don't compete. Ten messages published, three subscribers—each subscriber gets all ten messages. Every subscriber processes every message independently.

Some systems blur this line. Kafka topics can be consumed by multiple consumer groups (pub/sub semantics), but within a consumer group, messages are distributed among members (queue semantics). This lets you have both: broadcast to multiple applications, distribute work within each application.

Delivery Guarantees

What happens if a message gets lost? What if it gets delivered twice? These questions matter enormously, and different systems answer them differently.

At-most-once: Messages are sent once and never retried. If they get lost, they're gone. Fast and simple, but unreliable. Fine for metrics where occasional loss is acceptable.

At-least-once: Messages are retried until acknowledged. No message is lost, but duplicates are possible. The consumer must handle receiving the same message twice—typically by making processing idempotent (processing the same message twice produces the same result as processing it once).

Exactly-once: Each message is delivered exactly once, with no losses or duplicates. This is what everyone wants but it's genuinely hard to achieve. Systems that claim exactly-once semantics typically require careful configuration and come with performance costs.

Most production systems use at-least-once delivery with idempotent consumers. It's the practical sweet spot: reliable delivery without the complexity of exactly-once.

The Acknowledgment Problem

How does the messaging system know a message was successfully processed? It can't read the consumer's mind. The consumer has to tell it.

Auto-acknowledgment marks messages as processed the moment they're delivered to the consumer. Fast, but dangerous—if the consumer crashes after receiving the message but before processing it, the message is gone.

Manual acknowledgment requires the consumer to explicitly signal success after processing. If the consumer crashes before acknowledging, the message is redelivered to another consumer.

But here's the uncomfortable truth: manual acknowledgment isn't perfect either. What if the consumer successfully processes the message, but crashes before sending the acknowledgment? The message gets redelivered, processed again. You're back to needing idempotent processing.

There's no escaping this. The acknowledgment has to happen at some point, and whatever happens between processing and acknowledgment is a vulnerability. You choose where to put the risk.

Message Ordering

Do messages arrive in the order they were sent? It depends.

No ordering guarantee: Messages can arrive in any order. This allows maximum parallelism—any consumer can process any message—but makes sense only when messages are truly independent.

Partition ordering: Messages with the same key are ordered within a partition. All events for user 12345 arrive in order; events for different users can be processed in parallel. This is Kafka's model and hits the practical sweet spot for most use cases.

Strict global ordering: All messages arrive in order, globally. This requires all messages to flow through a single point, killing parallelism. Rarely needed, rarely worth the cost.

Persistence and Durability

Where do messages live?

In-memory only: Messages exist in RAM. Blazingly fast, but if the broker crashes, messages are gone. Acceptable for ephemeral data—real-time notifications where loss is tolerable.

Persistent: Messages are written to disk. Survives broker restarts. Slower than in-memory, but messages don't vanish.

Replicated: Messages are stored on multiple servers. Survives individual server failures. This is what you need for critical data.

The question is: can you afford to lose messages? If you're processing financial transactions, the answer is no. If you're tracking mouse movements for analytics, maybe losing a few is fine.

Common Queue Patterns

Work queues distribute tasks among workers. More workers means faster processing. Workers compete for messages.

Priority queues process important messages first. A password reset request jumps ahead of a weekly digest email.

Delayed queues hold messages until a specified time. Send a reminder email in 24 hours. Retry a failed operation in 5 minutes with exponential backoff.

Dead letter queues collect messages that repeatedly fail processing. After N retries, the message moves to a separate queue for manual inspection. This prevents poison messages from blocking the queue forever.

Common Pub/Sub Patterns

Fan-out broadcasts one event to many consumers. User signs up: send welcome email, create analytics event, notify sales team—all from one published event.

Topic-based routing lets subscribers filter by topic. Subscribe to orders.created but not orders.updated.

Content-based routing filters by message content, not just topic. Subscribe to all orders, but only where amount > 1000.

Durable subscriptions store messages for offline subscribers. If a service is down for maintenance, it catches up on missed messages when it returns.

Scaling

Partitioning spreads messages across multiple nodes. Topic X has 10 partitions across 10 servers. Ten times the throughput and storage.

Consumer groups (Kafka's model) let you scale consumers horizontally. Add more consumers to a group, and they divide the partitions among themselves. Each message still processed once within the group.

Backpressure handles producers that outpace consumers. Queue size limits can block producers. Rate limiting can throttle them. Auto-scaling can spin up more consumers. Without these mechanisms, queues grow unbounded until something breaks.

Choosing a System

RabbitMQ: Full-featured message broker. Supports multiple protocols (AMQP, MQTT, STOMP). Flexible routing with exchanges and queues. Good for traditional messaging needs.

Apache Kafka: High-throughput distributed streaming. Messages are persisted and can be replayed. Consumer groups provide both pub/sub and queue semantics. Good for event sourcing, streaming data, and high-volume workloads.

Amazon SQS: Managed queue service. Infinitely scalable, zero operations. Basic features, but you never have to think about infrastructure.

Google Cloud Pub/Sub: Managed pub/sub. Global distribution, high reliability. Good for event-driven architectures on Google Cloud.

Redis Pub/Sub: Lightweight, in-memory. Extremely low latency. No persistence—subscribers only get messages while connected.

NATS: High-performance, simple. Growing ecosystem with JetStream adding persistence. Good for microservices communication.

When to Use Which

Use message queues when:

Work must be processed exactly once
You need load leveling between components
Failed processing should be retried
Order matters for related messages
You're distributing tasks among workers

Use pub/sub when:

Multiple systems need the same data
You're broadcasting events
Building loosely coupled, event-driven systems
Real-time notifications to many recipients
You want to add consumers without changing producers

Many systems use both. Orders go through a queue for processing. Order events are published for other systems to consume. The patterns complement each other.

Monitoring What Matters

Queue depth: Unprocessed messages waiting. If it's growing, consumers can't keep up.

Message age: How long the oldest message has been waiting. Old messages mean delays are affecting users.

Consumer lag (Kafka): How far behind consumers are from the latest messages. Rising lag means you're falling behind.

Processing rate: Messages per second. Is it matching your publish rate?

Error rate: Failed processing attempts. Spikes indicate problems in consumer code or downstream services.

These metrics tell you if your system is healthy. Growing queue depth with stable processing rate means you need more consumers. High error rates mean something in the processing pipeline is broken. Message age tells you if your latency SLOs are at risk.

Frequently Asked Questions About Message Queues and Pub/Sub

Was this page helpful?

😔

🤨

😃