Distributed Systems Series — Part 3.3: Replication, Consistency & Consensus
The Question Replication Cannot Answer Alone
Post 3.1 established why replication is necessary — availability, fault tolerance, durability, performance, and geographic distribution all require it. But replication immediately raises a question it cannot answer by itself.
When a client reads data from a replicated system, which value will it see?
If a write just happened on the primary and has not yet propagated to all replicas, a client reading from a replica might see the old value. Is that acceptable? Under what circumstances? For how long? And what guarantees does the system provide about when the new value will be visible?
These questions are answered by consistency models — formal contracts between the system and its clients that define what clients can expect to observe when they read data from a replicated system.
Consistency models are not implementation details. They are promises. Breaking them — even temporarily, even rarely — produces bugs that are extraordinarily difficult to reproduce and diagnose, because they depend on precise timing and network conditions that are hard to recreate in a test environment.
Consistency is about promises to users. Breaking them is worse than temporary failures.
A Spectrum, Not a Binary Choice
Consistency is commonly discussed as a binary — strong or eventual. This framing is useful but misleading. In reality, consistency exists on a spectrum with many distinct points, each offering different guarantees at different costs.
Understanding the full spectrum — and knowing which model fits which problem — is one of the most practically valuable skills in distributed systems engineering. It is what allows you to look at a system’s behaviour during an incident and immediately understand whether you are seeing expected behaviour for the chosen consistency model or a genuine bug.
Strong Consistency and Linearisability
Strong consistency is the guarantee most engineers intuitively expect from a database: every read returns the result of the most recent write, regardless of which replica serves the read. The system appears to have a single copy of the data even though it has many.
The precise technical term for the strongest form of this guarantee is linearisability. A system is linearisable if every operation appears to take effect instantaneously at some point between its invocation and its completion — and that point is consistent with a global ordering of all operations. In plain terms: if you write a value and then read it back, you always see the value you wrote. If two clients write concurrently, every subsequent read from any client sees one of those writes, and whichever value wins is the same value seen by all clients forever after.
Linearisability is what etcd and ZooKeeper provide. It is what PostgreSQL provides when running with synchronous replication. It is what Google Spanner provides globally using TrueTime. It is what you get when a system uses a single leader for all writes and routes all reads through that leader.
The cost is real. To provide linearisability, the system must coordinate writes across replicas before acknowledging them to clients. This coordination takes time — often tens to hundreds of milliseconds in cross-datacenter deployments. During a network partition, a linearisable system must choose between refusing writes (to preserve correctness) or accepting writes and risking inconsistency. The CAP theorem, which we will explore in Post 3.4, formalises exactly this trade-off.
Where strong consistency is required: financial transactions where double-spend must be impossible, distributed locks and leader election where exactly one winner must be guaranteed, configuration management where all nodes must see the same value simultaneously, and any system where correctness is more important than availability.
Eventual Consistency
Eventual consistency is a much weaker guarantee: if no new writes occur, all replicas will eventually converge to the same value. The system makes no promise about when convergence happens or what clients see in the interim.
This sounds alarming, but it is a deliberate and correct design choice for a large class of systems. The 2007 Amazon Dynamo paper — one of the most influential distributed systems papers ever published — describes exactly why Amazon chose eventual consistency for their shopping cart service. The design reasoning is instructive: Amazon observed that customers were more harmed by being unable to add items to their cart (availability failure) than by occasionally seeing a cart with a slightly stale item list (consistency anomaly). Eventual consistency, combined with a conflict resolution strategy that merged cart contents rather than discarding one version, produced better outcomes for the business than strong consistency would have.
DynamoDB, Cassandra, Riak, and CouchDB all implement eventual consistency as their default model. These systems achieve high availability and high write throughput precisely because they do not need to coordinate writes across replicas before acknowledging them. Each replica accepts writes independently. Replication happens asynchronously in the background. Replicas converge over time.
The challenges of eventual consistency are real and must be designed around explicitly. Clients can read stale values — a value written seconds ago may not be visible on a different replica for hundreds of milliseconds or longer. Concurrent writes to the same key on different replicas produce conflicts that must be resolved — either by the system (last-write-wins, vector clock merging) or by the application. Applications that cannot tolerate stale reads must implement their own mechanisms — reading from the primary, using quorum reads, or tracking their own write tokens.
Where eventual consistency is appropriate: social media feeds and timelines where seeing a post a few seconds late is acceptable, distributed caches where staleness is bounded by TTL, shopping carts and collaborative lists where merging concurrent edits is preferable to blocking, DNS resolution where propagation delay is expected and documented, and any system where availability and write throughput are more important than immediate read consistency.
Causal Consistency
Causal consistency sits between strong and eventual consistency on the spectrum. It provides a specific and important guarantee: writes that are causally related — where one write could have influenced another — are seen in causal order by all clients. Writes that are causally unrelated — concurrent, independent — may be seen in any order.
The concept of causality here is the same as the happens-before relationship introduced in Post 2.5 on Logical Clocks. If client A posts a comment and client B replies to that comment, B’s reply is causally dependent on A’s post. A causally consistent system guarantees that no client ever sees B’s reply without also seeing A’s post. The reply cannot appear before the original message.
This guarantee is stronger than eventual consistency but weaker than linearisability — and for many applications, it is exactly the right level. A messaging application does not need linearisability. But it does need to guarantee that if you send a message and then send a follow-up, no recipient sees the follow-up without also seeing the original message. Eventual consistency does not provide this. Causal consistency does.
Implementing causal consistency requires tracking causal dependencies — typically using vector clocks or similar mechanisms. Every write carries metadata describing what writes it causally depends on. Replicas delay delivering a write to clients until all its causal dependencies have already been delivered. This metadata overhead is the cost of the guarantee.
MongoDB’s causally consistent sessions, introduced in version 3.6, are a production example. Within a causally consistent session, reads always reflect the writes that happened before them in the same session, and reads on different nodes respect the causal ordering of writes. This allows MongoDB to offer a meaningful consistency guarantee for multi-document operations without requiring the full coordination cost of linearisability.
Where causal consistency is appropriate: messaging and collaboration applications where message ordering matters, social feeds where replies must appear after the posts they reply to, any multi-user application where the order of related events is meaningful to users but global ordering of all events is not required.
Read-Your-Writes and Monotonic Reads
These are two weaker guarantees that are nonetheless critically important for user-facing applications — and frequently the source of confusing bugs when they are absent.
Read-your-writes consistency guarantees that a client always sees its own writes. If you update your profile photo, your next read of your profile always returns the new photo — regardless of which replica serves the read. This sounds like a minimal, obvious guarantee. It is not provided automatically by eventually consistent systems.
The canonical production failure mode: a user submits a form, the write goes to Replica A, the subsequent page load reads from Replica B which has not yet received the write, and the user sees the old value. The user assumes their submission failed and submits again. This is both a user experience failure and a correctness problem.
Facebook’s engineering team documented exactly this problem in their TAO system, which powers the social graph. Ensuring that users always see their own writes across a massively distributed system required specific architectural decisions — notably routing a user’s reads to the same replica their writes went to within a session window, and using version tokens to detect when a replica was too far behind to serve a consistent read.
Implementing read-your-writes in an eventually consistent system typically requires one of: routing all reads for a session to the same replica as writes, using write tokens that clients pass on subsequent reads (allowing replicas to detect if they have received the relevant write), or always reading from the primary for a configurable window after a write.
Monotonic reads consistency guarantees that if a client reads a value, all subsequent reads by that client will see a value at least as recent. A client will never read a value older than one it has already seen. This prevents the disorienting experience of a value appearing to go backward — seeing a message, then refreshing and not seeing it, then seeing it again.
Monotonic reads is a weaker guarantee than read-your-writes but still significantly stronger than pure eventual consistency. It can be implemented by routing a client’s reads to a consistent replica (or set of replicas) across a session, ensuring that the replica the client reads from never falls behind where it was on the previous read.
Where these guarantees matter: any user-facing application where a user can observe the results of their own actions — profile edits, message sends, order placements, settings changes. The absence of read-your-writes consistency is one of the most common sources of “I thought I saved that” bug reports in distributed applications.
The Consistency Spectrum: A Mental Model
Rather than treating these as isolated models, it helps to think of them as points on a spectrum ordered by strength of guarantee and cost of implementation:
| Model | What it guarantees | Cost | Production examples |
|---|---|---|---|
| Linearisability | All reads see the most recent write; operations appear instantaneous | High latency, reduced availability during partitions | etcd, ZooKeeper, Google Spanner, PostgreSQL (sync replication) |
| Causal consistency | Causally related writes seen in order by all clients | Metadata overhead for dependency tracking | MongoDB causally consistent sessions, some Cosmos DB modes |
| Read-your-writes | A client always sees its own writes | Session affinity or write tokens required | Facebook TAO, DynamoDB strongly consistent reads per session |
| Monotonic reads | A client never reads a value older than one it has seen | Replica affinity within a session | Many eventually consistent systems with session routing |
| Eventual consistency | Replicas will converge; no guarantee about when | Lowest — no coordination required | DynamoDB (default), Cassandra, Riak, DNS |
The models are not mutually exclusive. A system can provide eventual consistency by default and offer stronger guarantees — like read-your-writes or linearisable reads — as opt-in options with a performance cost. DynamoDB’s strongly consistent reads, Cassandra’s quorum reads, and MongoDB’s read concern levels all work this way.
How to Choose the Right Consistency Model
The choice of consistency model is one of the most consequential architectural decisions in a distributed system. It cannot be changed easily after the fact, because it affects replication design, client behaviour, and the kinds of bugs the system will produce under stress.
Start by asking what the application can tolerate, not what it ideally wants. Every application ideally wants linearisability. The question is what it can actually live with when the cost of linearisability — higher latency, reduced availability during partitions — is taken into account.
Choose linearisability when: correctness is non-negotiable and the cost of a consistency anomaly is higher than the cost of unavailability. Financial systems, distributed locks, leader election, and configuration management fall into this category. A double-charge or a split-brain is worse than a brief outage.
Choose causal consistency when: the application involves users communicating with each other and the order of related messages matters, but global ordering of all events is not required. Messaging, collaborative editing, and social feeds are natural fits.
Choose read-your-writes when: users interact with the results of their own actions and the user experience breaks if they do not see their own writes immediately. This is the minimum viable consistency for most user-facing CRUD applications.
Choose eventual consistency when: availability and write throughput are the primary requirements, the application can tolerate stale reads or implement its own staleness detection, and conflict resolution is either unnecessary (append-only data) or can be handled by the application (last-write-wins, merge strategies). Caches, counters, DNS, and social feeds are natural fits.
Document your consistency model explicitly in system design documents and API contracts. Engineers building on top of your system need to know what to expect. Undocumented consistency behaviour is a bug waiting to happen — the next engineer to use your system will assume either more or less than you provide, and neither assumption will be correct.
Key Takeaways
- Consistency models are contracts between a system and its clients that define what clients can expect to observe when reading from a replicated system — breaking them produces bugs that are hard to reproduce and diagnose
- Linearisability is the strongest guarantee — all reads see the most recent write, operations appear instantaneous — but requires coordination that increases latency and reduces availability during partitions
- Eventual consistency is the weakest common guarantee — replicas converge eventually with no timing promise — but enables high availability and write throughput by eliminating cross-replica coordination on the write path
- Causal consistency provides a middle ground — causally related writes are seen in order — and is the natural fit for messaging, social feeds, and collaborative applications
- Read-your-writes and monotonic reads are weaker guarantees that are nonetheless critical for user-facing applications — their absence is one of the most common sources of confusing production bugs
- The right consistency model is the weakest model the application can tolerate — not the strongest model available — because stronger guarantees always carry latency and availability costs
- Consistency model choice is an architectural decision that must be made explicitly and documented — it cannot be easily changed after the fact and shapes every subsequent design decision about replication and failure handling
Frequently Asked Questions (FAQ)
What are consistency models in distributed systems?
Consistency models are formal contracts between a distributed system and its clients that define what clients can expect to observe when reading data from a replicated system. They specify rules about which writes are visible to which reads, in what order, and under what circumstances. Different models offer different trade-offs between correctness, availability, and performance.
What is the difference between strong consistency and eventual consistency?
Strong consistency (linearisability) guarantees that every read returns the result of the most recent write, as if the system had a single copy of the data. It requires coordination before acknowledging writes, which adds latency. Eventual consistency guarantees only that replicas will converge to the same value eventually — it makes no promise about when, or what clients see in the interim. It requires no cross-replica coordination on writes, enabling high availability and throughput at the cost of potentially stale reads.
What is linearisability?
Linearisability is the precise technical definition of strong consistency. A system is linearisable if every operation appears to take effect instantaneously at some point between when it was invoked and when it completed, in a way that is consistent with a global ordering of all operations. In practice: reads always return the most recent write, and concurrent operations produce results consistent with some valid sequential ordering. etcd, ZooKeeper, and Google Spanner are linearisable systems.
What is causal consistency?
Causal consistency guarantees that writes which are causally related — where one write could have influenced another — are seen in causal order by all clients. Writes that are causally unrelated may be seen in any order. This is stronger than eventual consistency (which provides no ordering guarantees) but weaker than linearisability (which requires a total global ordering of all operations). It is the natural consistency model for messaging and collaborative applications where the order of related events matters to users.
What is read-your-writes consistency and why does it matter?
Read-your-writes consistency guarantees that a client always sees its own writes — if you update a value, your next read always returns the updated value, regardless of which replica serves the read. This sounds minimal but is not provided automatically by eventually consistent systems. Its absence produces the common bug where a user submits a change, the page reloads, and the change appears not to have saved — because the read went to a replica that had not yet received the write.
How do I choose the right consistency model for my system?
Start with what the application can tolerate rather than what it ideally wants. Use linearisability when correctness is non-negotiable and a consistency anomaly is more harmful than an outage — financial transactions, distributed locks, configuration management. Use causal consistency for messaging and collaborative applications where related event ordering matters. Use read-your-writes as the minimum for user-facing CRUD applications. Use eventual consistency when availability and throughput are primary and the application can tolerate or handle stale reads. Document your choice explicitly — undocumented consistency behaviour is a reliability risk.
Can a system offer multiple consistency levels?
Yes, and many production systems do. DynamoDB offers eventually consistent reads (default, lower latency) and strongly consistent reads (opt-in, higher latency). Cassandra’s consistency levels — ONE, QUORUM, ALL — allow callers to choose per-request how many replicas must respond. MongoDB’s read concern and write concern settings control consistency per operation. This flexibility allows applications to use the strongest consistency they need where they need it and relax to weaker guarantees elsewhere for performance.
Continue the Series
Series home: Distributed Systems — Concepts, Design & Real-World Engineering
Part 3 — Replication, Consistency & Consensus
- 3.1 — Why Replication Is Necessary
- 3.2 — Replication Models: Leader-Based, Multi-Leader and Leaderless
- 3.3 — Consistency Models: Strong, Eventual, Causal and Session
- 3.4 — The CAP Theorem Correctly Understood
- 3.5 — Quorums and Voting in Distributed Systems
- 3.6 — Why Consensus Is Hard in Distributed Systems
- 3.7 — Paxos vs Raft: Consensus Algorithms Compared
- 3.8 — Performance Trade-offs in Replicated Systems
- 3.9 — Engineering Guidelines for Replication, Consistency and Consensus
Previous: ← 3.2 — Replication Models: Leader-Based, Multi-Leader and Leaderless
Next: 3.4 — The CAP Theorem Correctly Understood →
Not read Parts 1 or 2 yet?