CAP Theorem

Updated 8 hours ago

The CAP theorem describes an impossible choice that every distributed system faces.

Imagine you have two database nodes. A write arrives at Node A. Before Node A can tell Node B about it, the network between them fails. Now a read arrives at Node B.

Node B has two options:

Respond with the data it has (which is now stale)
Refuse to respond until it can verify it has current data

There is no third option. Node B cannot magically know what Node A received. The network is lying to it—that's what a partition is. Node B can't tell whether Node A is down, slow, or just unreachable.

This is CAP.

The Three Properties

Consistency: Every read receives the most recent write. All nodes see the same data at the same time.

Availability: Every request to a non-failing node receives a response. The system never refuses to answer.

Partition Tolerance: The system keeps operating when network failures prevent nodes from communicating.

Why You Can't Have All Three

The theorem states: during a network partition, you must choose between consistency and availability. You cannot have both.

This isn't a design flaw to be engineered away. It's a fundamental constraint of physics. Information cannot travel faster than the network allows. When the network fails, nodes cannot coordinate.

During a partition, your node has a request and a choice: respond with possibly wrong data, or don't respond at all. There is no third option. This is CAP.

CP Systems: Choose Consistency

CP systems refuse requests rather than return potentially stale data.

When a partition occurs, nodes that can't reach a quorum stop accepting writes. They'd rather be unavailable than inconsistent.

Examples:

HBase: Refuses writes when it can't reach enough replicas
Zookeeper: Refuses operations without majority quorum
MongoDB (with majority write concern): Requires majority acknowledgment

Choose CP when:

Wrong data causes serious problems (financial transactions, inventory)
Brief unavailability is acceptable
You need strong consistency guarantees

AP Systems: Choose Availability

AP systems respond to every request, accepting that different nodes might have different data.

When a partition occurs, all nodes keep accepting requests. They'll sort out the conflicts later.

Examples:

Cassandra: Accepts writes everywhere, resolves conflicts on read
DynamoDB: Offers eventual consistency by default
DNS: Different servers might temporarily return different answers

Choose AP when:

Availability matters more than immediate consistency
Your application can handle temporary inconsistency
The system must stay responsive during network issues

What About CA?

A system that's consistent and available but not partition-tolerant would require a network that never fails.

Networks fail. Cables get cut. Switches crash. Data centers lose connectivity. Any system distributed across a real network will experience partitions.

Single-node databases (PostgreSQL on one server) are "CA" only because there's nothing to partition. The moment you add a second node, you face CAP.

The Partition Is the Key

CAP only applies during partitions. When the network is healthy, systems provide both consistency and availability.

Normal operation: No partition. Consistency and availability both work.

During partition: Must choose. Consistency or availability.

After partition heals: System reconciles any divergence.

This means the choice isn't permanent. A system can be strongly consistent when healthy and degrade to availability-focused during failures—or vice versa.

PACELC: The Rest of the Story

CAP only describes partition scenarios. PACELC extends it:

Partition → choose Availability or Consistency Else → choose Latency or Consistency

Even without partitions, there's a trade-off. Synchronous replication guarantees consistency but adds latency (wait for all nodes to acknowledge). Asynchronous replication is faster but means reads might see stale data.

A system might be PA/EL (available during partitions, low latency otherwise) or PC/EC (consistent always, accepting latency costs).

Practical Decisions

Multi-region deployment: Geographic distribution guarantees partitions will happen. Decide now: refuse writes in partitioned regions, or accept divergent data?

Caching: Every cache is an AP system. It stays available but might serve stale data. That's the trade-off you accepted when you added caching.

Tunable consistency: Cassandra and DynamoDB let you choose per-operation. Require quorum for critical reads. Accept eventual consistency for others.

Conflict resolution: AP systems must handle conflicts when partitions heal:

Last-write-wins (simple, potentially lossy)
Version vectors (track causality)
CRDTs (mathematically guarantee convergence)

Common Misconceptions

"Pick any two": Misleading framing. Partitions will happen. The real choice is between consistency and availability during partitions.

"Eventual consistency is slow": "Eventual" can mean milliseconds. It just means "not guaranteed immediate."

"CP means total unavailability": Only minority partitions become unavailable. The majority partition keeps serving requests.

"CAP is about databases": CAP applies to any distributed system. Caches, message queues, microservices—anything distributed faces these trade-offs.

Testing Your System's Behavior

You need to know how your system actually behaves during partitions:

Simulate partitions: Tools like Jepsen test real partition scenarios
Verify consistency: During partitions, do reads return stale data?
Verify availability: Do all partitions accept writes, or do some refuse?
Test recovery: After healing, how does the system reconcile divergent data?

CAP isn't prescriptive—it doesn't say which choice is correct. It's descriptive—it reveals trade-offs that exist whether you acknowledge them or not. Understanding CAP means understanding what your system will actually do when the network lies to it.

Frequently Asked Questions About CAP Theorem

Was this page helpful?

😔

🤨

😃