The things I don't understand about the CAP theorem
Network partitions, disappearing documents, and the misunderstood parts of the fundamental theorem in distributed systems
Being the IT support of the family, I’m on call 24/7 to help with forgotten passwords, printers, online payments, bad UIs, and confusing error messages. A few weeks ago, I got paged because my mom had a problem with a web app she has to use at work.
“Hmm, when I create a new document and it says there are 6, but when I open the page there are only 5”
So she immediately clicks the button to create a document again.
“Wait a moment, this might be eventually consistent. Can you refresh again?”
And there it is. The document’s there.
“Why does this happen? Is it because my computer’s slow?”
The law of leaky abstractions doesn’t apply only to code, but to systems too, and because this product didn’t hide its architectural decisions well enough, I had to talk about consistency with my mom.
For the engineers and product people, prioritizing availability over consistency was an easy trade-off. But for anyone outside of their social circle, this behavior can be utterly confusing.
Why we need trade-offs
Up to a certain level of scale (organizational or technical), you can run everything on a single machine, but scale naturally leads to distribution.
I’ve written another article about that, so I won’t go into details here.
Increasing the number of REST API replicas may not be good enough if all your requests end up in the same underlying database. If your bottleneck isn’t in the processing, serving all your users from the same store, which is bound to a specific geographical region, isn’t optimal.
Some of your users will consistently experience higher latency than others because their data will have to travel longer or because the database got winded.
So, again, this leads to distribution.
We can create read replicas of our storage in different regions and serve data from multiple sources instead of one.
For the sake of simplicity, imagine that each replica can hold a copy of the entire database.
Write requests will go to the leader replica (also called the master in older resources), and they then get replicated to the read replicas. The traffic is split between them, and the latency is low.
But this replication needs to happen over an inherently unreliable network.
So the communication is bound to fail at some point.
Network problems aren’t frequent enough to cause constant disruptions because that would make any distributed system unusable, but they’re bound to happen. When they do, for a short period of time, one part of our system will become separated from the others. Our leader replica won’t be able to communicate with the followers and replicate data.
This is what we call a network partition.
Not to be mistaken with partitioning a database. When DBAs talk about partitions, they mean splitting data across multiple nodes for performance or storage reasons.
Writing data that can’t be replicated
The leader receives a request, commits the write, but then the network blips, and it can’t send that data to its replica. Both nodes are up and running, but the communication between them has failed.
And the moment this happens, the data in the read replica becomes stale. It doesn’t have the latest data.
Now, we’re forced to pick between two bad truths.
We can keep serving read requests from the read replica (which is stale). Our application will continue to work, but some users will see old data.
We can refuse to serve reads from the read replica until it can talk to the leader again. We avoid returning stale data, but for some users, our application will randomly stop working.
This is the fundamental trade-off in distributed systems.
We cannot always have the latest data and be available to return a response.
We have to pick one.
Engineers working on financial software would most likely prefer consistency because in their domain, the correctness of every digit is crucial. Those working on social media feeds would favor availability because the exact number of likes on a post isn’t that important.
The CAP theorem
In July 2000, Eric Brewer gave a keynote called “Towards Robust Distributed Systems”, introducing the CAP theorem.
At that time, I was still in kindergarten, the internet was a very different place, and downloading a song took a full day.
Brewer’s personal webpage looks just like you’d expect from someone with an enormous impact on the industry. When you see unstyled HTML, you know the person is for real. I’m not that good of an engineer, so I still need to write CSS for my blog.
In 2002, Seth Gilbert and Nancy Lynch published a formal proof of Brewer’s idea.
This paper is important for the software engineering field because it establishes the CAP trade-off as a theorem. It described this fundamental idea that many developers would’ve had to tackle on their own as the scale of software increased.
Brewer described the fundamental qualities of distributed systems:
Consistency. Every read request receives a response indicating success and reflecting the value of the most recent write request or a response indicating failure.
Availability. Every read request receives a response indicating success (not necessarily reflecting the value of the most recent write request).
Partition Tolerance. The system continues to work when parts of it become unreachable due to network failures.
But because of the problem we described earlier, we cannot have all three at the same time.
We can only pick two.
If this is the case, can’t we decide to be both consistent and available?
While the CAP trade-off is frequently cited as a “pick 2 of 3” problem, we always need partition tolerance (P). If your application can’t handle network failures, it will only work until the first packet loss.
Partitions will happen.
So we have only two possible choices when we’re making this trade-off.
Our system can be:
AP - available, but shows stale data when the network fails.
CP - shows the latest data but occasionally stops responding.
These words don’t mean what you think
For the last 20 years, the CAP trade-off has cemented as a fundamental idea that anyone dabbling in distributed systems needs to be familiar with. But despite the focus on it, the terminology has been a source of confusion (at least to me).
It uses familiar terms with a different meaning than the one I was used to.
We already mentioned that the term partitioning has two separate meanings, but the same goes for consistency and availability.
CAP-Consistency ≠ ACID Consistency
When you hear “consistency”, you might think of the C in ACID - ensuring transactions move the database from one valid state to another.
CAP-consistency means linearizability.
“If operation B started after operation A successfully completed, then operation B must see the the system in the same state as it was on completion of operation A, or a newer state.”
Every read should return the most recent write. CAP-consistency is about the order of operations, not about transaction validity.
CAP-Availability ≠ High Availability
When we define a system as highly available, we usually mean that it’s up and running 99.9% of the time, or that it has failover and redundancy.
CAP-availability means that every request to a non-failing node must receive a response.
It doesn’t matter how fast the response is, whether it returns the latest data, or whether your system meets an SLA. A system can be “unavailable” in CAP terms while still meeting a 99.99% uptime SLA.
Single-Leader Replication
Databases are often described as CP or AP.
They’re either consistent or available. Single leader setups (like Postgres) are often cited as being CP. You will have one primary node (the leader) that will accept writes, and one or more read-only replicas (followers) that will copy these changes.
The leader sends a WAL (Write-Ahead Log) that describes the way the data needs to be mutated to reach the same state.
This WAL is sent to the replicas, and when they acknowledge it, the write is committed, and the data becomes available. So, no matter what replica future requests are handled, all of them have the same data.
And if there’s a network partition, you can decline writes.
But there are nuances.
This setup uses synchronous replication, but it can be configured to be done asynchronously. When the leader receives the request, they immediately commit it, and the replication is done in the background.
A follower may be a little behind, but it can still handle requests.
In the asynchronous replication case, our system will not be CAP-consistent (not linearizable), but it will be available.
This was one source of confusion to me. After I’ve read databases described as CP or AP, I found it confusing that they can be both. Even Martin Kleppmann believes that we shouldn’t describe them using these abbreviations.
Other Replication Philosophies
To get a better understanding of these trade-offs it’s worth looking into alternative replication philosophies. While single-leader databases make CAP-consistency more easily achievable (or at least more intuitive), multi-leader databases lean more towards the other side of the trade-off, CAP-availability.
DynamoDB is one such database.
It’s a database running on multiple nodes, and each piece of data is replicated on a few of them (but not all). When we write to Dynamo, the write goes to a quorum of replicas. There isn’t a single entrypoint that initiates the replication. Any of the replicas that have the data can do it.
While these nodes sync, they will continue serving read requests, so some of them may return stale data while the updated data gets to them.
When we say stale data, this sounds like they’d be serving your old documents for a couple of days, but in reality, the inconsistency window is very brief.
Dynamo favors availability and offers what we call eventual consistency.
Sooner or later, all nodes will have the latest data, but if your request comes before they sync, they will give you whatever they have at the current point in time.
But much like single-leader databases can do asynchronous replication, Dynamo can be configured to offer stronger consistency guarantees.
“It depends on your settings. If you accept a single replica for reads and writes (R=W=1), they are indeed CAP-available. However, if you require quorum reads and writes (R+W>N), and you have a network partition, clients on the minority side of the partition cannot reach a quorum, so quorum operations are not CAP-available (at least temporarily, until the database sets up additional replicas on the minority side).”
I’d love to write in detail about how eventual consistent systems sync data, but I don’t understand it well enough myself.
The CAP Spectrum
The more I read about these trade-offs, the more I see the wisdom in Kleppmann’s words. Labeling databases as CP or AP is an overly simplistic way to think about them.
Consistency and availability (in the context of CAP) are a spectrum.
“Even Eric Brewer admits that CAP is misleading and oversimplified. In 2000, it was meant to start a discussion about trade-offs in distributed data systems, and it did that very well. It wasn’t intended to be a breakthrough formal result, nor was it meant to be a rigorous classification scheme for data systems. 15 years later, we now have a much greater range of tools with different consistency and fault-tolerance models to choose from. CAP has served its purpose, and now it’s time to move on.”
The examples with Postgres and Dynamo show how both can offer various levels of consistency based on their configuration, and the same goes for other databases.
Amazon’s SimpleDB can provide both consistent and eventually consistent reads. If you want strong consistency, you will, however, experience higher read latency and a reduction in read throughput.
“In the research community over the past thirty years, a number of consistency models have been proposed for distributed and replicated systems. These offer consistency guarantees that lie somewhere in between strong consistency and eventual consistency. For example, a system might guarantee that a client sees data that is no more than 5 minutes out-of-date or that a client always observes the results of its own writes. Actually, some consistency models are even weaker than eventual consistency, but those I ignore as being less-than-useful.”
Some engineers go as far as challenging whether CAP is still a useful way to think about distributed systems nowadays, but I believe that it absolutely is.
It’s the fundamental idea that all these tools are built around.
PACELC
Whether we prioritize consistency or availability has consequences beyond network partitions.
If we want strong consistency, even when the system is working normally, latency will be higher because synchronous replication requires more time, and coordination between the nodes will make operations slower.
So it’s important to consider how these trade-offs affect a system even when there is no network partition.
The PACELC design principle is an extension of the CAP theorem that goes into these trade-offs.
“PACELC states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).”
So, during a network partition, you need to choose between consistency and availability. But when there’s no partition, you need to choose between availability and high latency.
You’re constantly tuning how much consistency you’re willing to trade for how much availability and latency.
Again, engineers working on banking software would prefer consistency even at the cost of higher latency in normal conditions. And those working on user-facing products prefer higher throughput.
What this means for me
The truth is that I’ve never had to implement these trade-offs myself.
I’d love to get the chance to do this in a large-scale production environment one day when my nervous system can once again handle the pressure of working on such software.
But the reality is that I’ll probably never have to implement a consensus algorithm myself.
These trade-offs are put under beautiful YAML-controlled abstractions or hidden behind the magic of the cloud provider.
But you still need to make educated decisions when you’re building software and picking technologies. When your application scales and your architecture is getting distributed, you need to know how far you ned to lean to each side of the spectrum.
“That’s why your document doesn’t appear immediately, mom.”





