Time Model – Why Ordering Is Harder Than It Looks

Home » Distributed Systems » Foundations » Time Model – Why Ordering Is Harder Than It Looks

Distributed Systems Series — Part 1.5: Foundations

The Third Reason Distributed Systems Are Hard

Post 1.3 established that networks are unreliable — messages are lost, delayed, and duplicated. Post 1.4 established that nodes fail in partial and slow ways that are harder to handle than clean crashes. This post addresses the third foundational constraint: even if the network were perfect and nodes never crashed, distributed systems would still be hard — because of time.

More precisely, distributed systems struggle not with time itself but with agreeing on the order of events. The time model defines what a distributed system is allowed to assume about clocks and event ordering — and the correct answer is: very little. There is no global clock. Timestamps cannot be trusted for ordering. Events that appear simultaneous may be causally related in ways timestamps cannot reveal. This post establishes why, and what reliable ordering in distributed systems actually requires.

There Is No Global Clock

In a single-machine system, time feels simple. One clock governs all execution. Events happen in a clear sequence determined by that clock. “Before” and “after” are unambiguous. Developers naturally reach for timestamps, sequence numbers, and ordering based on wall-clock time.

Distributed systems have none of this. Each node has its own physical clock that runs at its own rate. Clock synchronisation via NTP is approximate and imperfect — nodes can disagree by milliseconds to seconds, and NTP adjustments can move a clock backward. Network delays mean that a timestamp attached to a message reflects the sender’s clock at the moment of sending, not the receiver’s clock at the moment of arrival. When two events on different nodes carry timestamps, the timestamps reflect what two independent clocks said at two different moments — not a shared, authoritative timeline.

The most important mental shift required to reason correctly about distributed systems is accepting this: there is no global clock, and no engineering technique can create one. Google’s Spanner uses GPS receivers and atomic clocks to narrow clock uncertainty to approximately 7 milliseconds — the smallest production clock uncertainty achieved at global scale — and even then, Spanner’s TrueTime API exposes an uncertainty interval rather than a single timestamp, because perfect synchronisation is physically impossible.

Why Timestamps Lie

The intuitive response to the “no global clock” problem is to add timestamps to every event and sort events by time. This appears to solve the ordering problem. In practice, it creates a different one.

Consider two events on different nodes. Node A records an event at 10:00:01.000. Node B records an event at 10:00:00.950. Did B’s event happen first? Or was B’s clock running 50 milliseconds behind A’s? The timestamp cannot answer this question — it only says what each node’s local clock reported at the moment of recording. If B’s clock is 100ms slow, B’s “earlier” event actually happened 50ms after A’s “later” event. The timestamp ordering is the reverse of the causal ordering.

Network delays make this worse. Even if two clocks were perfectly synchronised, a message sent at 10:00:01 on Node A may not arrive at Node B until 10:00:01.200 — after a message that Node B sent at 10:00:01.100 has already been processed. Arrival order does not equal execution order, and execution order does not equal the causal order that matters for correctness.

How Cassandra Learned to Distrust Timestamps

Apache Cassandra’s early versions used client-supplied timestamps to resolve write conflicts between replicas through Last Write Wins (LWW) semantics: when two writes conflict, the one with the higher timestamp wins. Simple, cheap, and correct when clocks are reliable.

In practice, it caused serious data loss in production. Nodes running slightly ahead of real time consistently won conflicts over nodes running slightly behind — regardless of which write actually happened last in causal terms. A write that arrived later could carry an earlier timestamp and be silently discarded. A write that arrived earlier could carry a later timestamp and overwrite newer data.

Pinterest, running a large Cassandra deployment, documented cases where data written by users disappeared — not because of a crash or an application bug, but because a server with a 50-millisecond clock skew was systematically winning conflicts it should have lost. The data existed in the write path. It was overwritten by an older value carrying a newer timestamp.

The Cassandra team’s response was to move toward hybrid logical clocks in more recent versions and to add strong warnings against relying on LWW semantics for data where order matters. The lesson is precise: in distributed systems, the timestamp on an event is not evidence of when it happened. It is only evidence of what the local clock said at that moment — and local clocks lie.

Physical Time vs Logical Time

Because physical time is unreliable for establishing causal ordering, distributed systems frequently use logical time instead. The distinction is fundamental.

Physical time is based on wall-clock timestamps. It is useful for logging (correlating events across services by approximate time window), monitoring (calculating request latency, cache TTL expiry, rate limiting windows), and human-facing timestamps (when did this order ship). Physical time is unsafe for determining causality — which of two concurrent writes should win, whether a read reflects all writes that preceded it, whether a distributed transaction committed before another began.

Logical time is based on the relationships between events rather than when they occurred on a clock. It answers a different question: not “which event has a later timestamp?” but “could this event have influenced that event?” This shift turns out to be far more reliable for correctness because it does not depend on clock synchronisation that distributed systems cannot guarantee.

The Happens-Before Relationship

The foundational concept of logical time is the happens-before relationship, introduced by Leslie Lamport in his 1978 paper “Time, Clocks, and the Ordering of Events in a Distributed System.” Event A happens-before event B (written A → B) if any of three conditions hold:

A and B occur on the same process and A occurs before B in that process’s execution order. A is the sending of a message and B is the receipt of that same message — sending always happens-before receiving. There exists an event C such that A → C and C → B (transitivity).

If neither A → B nor B → A, then A and B are concurrent — they have no causal relationship, and no meaningful ordering exists between them. This is not a failure of the system. It is the correct answer: some events are independent and genuinely have no ordering.

The happens-before relationship is the foundation on which all logical clock mechanisms are built.

Lamport Clocks, Vector Clocks and Hybrid Logical Clocks

Three logical clock mechanisms address the ordering problem at different levels of precision.

Lamport clocks implement a simple rule: each process maintains a counter. Before sending a message, the process increments its counter and attaches it to the message. On receiving a message, the process sets its counter to max(local counter, received counter) + 1. This guarantees that if A → B then timestamp(A) < timestamp(B). The converse does not hold — a higher Lamport timestamp does not imply happens-before. Lamport clocks give consistent ordering but cannot detect concurrency.

Vector clocks extend Lamport clocks to detect concurrency. Each process maintains a vector of counters, one per process in the system. On sending, the process increments its own entry. On receiving, it takes the element-wise maximum of its vector and the received vector, then increments its own entry. Vector clocks capture the complete happens-before relationship: if all entries of clock(A) are ≤ all entries of clock(B) and at least one is strictly less, then A → B. If neither dominates the other, A and B are concurrent. Amazon DynamoDB’s original design used vector clocks to track concurrent writes and expose conflicts for application-level resolution.

Hybrid Logical Clocks (HLC) combine physical and logical time. They track wall-clock time but use a logical component to break ties and ensure that HLC timestamps are always at least as large as the physical clock reading. HLC provides the human-readable timestamp properties of physical time with the causal ordering guarantees of logical clocks. CockroachDB uses HLC for its globally distributed transactions. MongoDB’s cluster time uses a similar hybrid approach. HLC is the production answer for systems that need both human-readable timestamps and correct causal ordering — which is most real-world distributed databases. The full treatment of logical clocks in production coordination is in Post 2.5.

Concurrency Is Not a Bug — It Is the Environment

Human intuition is built around linear time: one thing happens, then another, then another. Distributed systems do not work this way. Multiple events happen concurrently on different nodes. Some events are causally related — one influenced the other. Some events are genuinely independent — they happened without any knowledge of each other and no meaningful ordering exists between them.

The replicated database write conflict illustrates this concretely. Client A writes X=1 to Replica 1. Client B simultaneously writes X=2 to Replica 2. Both writes succeed locally. When the replicas attempt to synchronise, they discover conflicting values with no clear ordering. Which value should win? The system must choose a conflict resolution strategy — last-write-wins (unreliable for the reasons this post describes), application-level merge (correct but complex), CRDTs (correct for specific data shapes), or rejecting one write (consistent but reduces availability).

This problem is not caused by poor design. It is caused by time ambiguity in a concurrent environment. The correct response is to design systems that handle concurrency explicitly rather than assuming it away. The consistency models that define how distributed systems handle concurrent writes — eventual consistency, causal consistency, linearisability — are covered in Post 3.3.

Closing Part 1: The Environment Distributed Systems Operate In

This post closes the five-post Foundations sequence. What the five posts establish together is not a set of problems to be solved but a set of permanent operating conditions to be designed around.

Networks are unreliable. Messages are lost, delayed, duplicated, and delivered out of order. Silence is always ambiguous. Every communication pattern in distributed systems exists because of this reality.

Nodes fail in partial and slow ways. Crash-recovery is the real failure model. Slow nodes are more dangerous than crashed ones. Gray failures pass health checks while producing incorrect results. Every redundancy and failure detection mechanism exists because of this reality.

Time cannot be trusted for ordering. There is no global clock. Timestamps reflect local clock readings, not causal truth. Logical time and the happens-before relationship are the reliable foundations for event ordering. Every consistency mechanism — quorums, consensus, conflict resolution — exists because of this reality.

Distributed systems are not difficult because engineers lack skill or tools. They are difficult because the world they operate in is inherently unreliable. The most important lesson from the five Foundation posts:

Distributed systems do not fail because something went wrong. They fail because reality showed up.

Strong systems do not fight this reality. They acknowledge it, design around it, and make trade-offs deliberately. That is what Parts 2 through 5 of this series are about.

Key Takeaways

  1. There is no global clock in a distributed system — each node has its own physical clock that drifts independently, and NTP synchronisation is approximate and imperfect, never producing perfectly synchronised time across nodes
  2. Timestamps cannot be trusted for causal ordering — a timestamp reflects only what the local clock said at the moment of recording, not when the event happened in relation to events on other nodes
  3. The happens-before relationship is the correct foundation for ordering — event A happens-before event B if A sent a message that B received, or A and B are on the same process and A executed first, or transitivity connects them through a third event
  4. Lamport clocks provide consistent ordering but cannot detect concurrency — vector clocks detect whether two events are causally related or genuinely concurrent, at the cost of tracking per-process counters
  5. Hybrid Logical Clocks (HLC) combine physical timestamp readability with logical clock causal correctness — used by CockroachDB and MongoDB as the production answer for systems that need both
  6. Concurrency is the environment, not an edge case — independent events on different nodes have no meaningful ordering and distributed systems must be designed to handle this rather than assume a total ordering that does not exist
  7. The three foundational constraints — unreliable networks, partial node failures, and untrusted time — are permanent operating conditions, not problems to be solved, and every mechanism in Parts 2 through 5 exists as a response to them

Frequently Asked Questions (FAQ)

What is the time model in distributed systems?

The time model is the set of assumptions a distributed system makes about clocks and event ordering. The correct assumption is that there is no global clock, that physical timestamps cannot reliably establish causal ordering between events on different nodes, and that systems must use logical time — based on message relationships and the happens-before relation — for any correctness guarantee that depends on ordering. Physical time remains useful for logging, monitoring, and TTL expiration, but not for determining which of two concurrent writes happened first or whether a read reflects all preceding writes.

Why can’t distributed systems use wall-clock timestamps for ordering?

Because physical clocks on different machines drift at different rates and are synchronised only approximately via NTP, which can also adjust clocks backward. A timestamp from one node cannot be reliably compared to a timestamp from another to determine which event happened first in causal terms. The Cassandra LWW and Pinterest data loss example illustrates this precisely: a 50-millisecond clock skew caused a server to systematically win write conflicts it should have lost, silently discarding data that users had written. The timestamp indicated one ordering; the causal reality was the reverse.

What is the happens-before relationship?

The happens-before relationship, introduced by Leslie Lamport in 1978, defines causal ordering between events without relying on clocks. Event A happens-before event B if: A and B are on the same process and A executed first; A sent a message that B received (sending always happens-before receiving); or there exists a third event C such that A happens-before C and C happens-before B. If neither A happens-before B nor B happens-before A, the events are concurrent — genuinely independent with no meaningful causal ordering between them. This is not ambiguity to be resolved but a correct description of the system’s state.

What is the difference between Lamport clocks and vector clocks?

Lamport clocks assign a single counter to each event using the rule: increment before sending, take max(local, received)+1 on receipt. They guarantee that if A happens-before B then Lamport(A) < Lamport(B) — but the converse does not hold. A higher Lamport timestamp does not imply happens-before, so Lamport clocks cannot detect concurrency. Vector clocks assign a vector of counters (one per process) and take the element-wise maximum on receipt. They fully capture the happens-before relationship: if vector(A) dominates vector(B) entry-wise, then A happened-before B; if neither dominates, they are concurrent. The trade-off is storage: vector clocks require O(N) space per event for a cluster of N processes.

What are Hybrid Logical Clocks and which production systems use them?

Hybrid Logical Clocks (HLC) combine physical wall-clock time with a logical component that ensures HLC timestamps are always at least as large as the physical clock reading and strictly increase even when the physical clock does not. This provides human-readable timestamp values (unlike pure logical clocks) while guaranteeing causal ordering correctness (unlike pure physical clocks). CockroachDB uses HLC for its globally distributed transactions — every write receives an HLC timestamp that is both machine-readable for ordering and human-readable for debugging. MongoDB uses a similar hybrid approach in its cluster time mechanism.

Can two events in a distributed system have no meaningful ordering?

Yes, and this is the correct answer, not a failure. Two events that are concurrent — neither happens-before the other — have no causal relationship. They occurred independently, without either event having the ability to influence the other. There is no fact of the matter about which happened first. Distributed systems must be designed to handle this reality rather than force a total ordering that does not exist. Conflict resolution strategies (last-write-wins, CRDTs, application-level merge) are different approaches to handling concurrent writes that have no inherent ordering — each making different trade-offs between simplicity, correctness, and availability.

What should I read next after completing Part 1?

Part 2 covers how distributed systems cope with the three constraints Part 1 established. Post 2.1 covers the communication patterns that work despite unreliable networks. Post 2.2 covers retries, idempotency, and circuit breakers — the mechanisms that make network communication safe. Post 2.5 covers logical clocks in production coordination systems, building directly on the Lamport and vector clock foundations established here. The complete Part 2 reading guide is at the Part 2 pillar page.


Why the time model is the constraint that surprises engineers most

The network model and the node failure model are hard to ignore — services crash, timeouts fire, requests fail. Engineers encounter them early and learn to design around them because the symptoms are visible. The time model is different. Its violations are silent. The data is wrong, but no error was logged. The conflict was resolved, but the wrong write won. The event was ordered, but the ordering was incorrect. You discover it not through an alert but through a user reporting that something that should exist does not, or that something they changed reverted to an earlier state.

The Pinterest Cassandra data loss incident in this post stayed with me specifically because of that silence. No crash. No error. No alert. A 50-millisecond clock skew, systematically winning write conflicts for weeks, discarding data that users had written. The users noticed. The monitoring did not. That is the characteristic signature of a time model assumption being violated — incorrect behaviour that looks like correct behaviour until someone looks at the output rather than the infrastructure.

At IDFC First Bank, the time model question that comes up most concretely is ordering in financial event streams. A mutual fund NAV update and an order placement that references that NAV need to be ordered correctly. A settlement and the position update that follows it need to be causally sequenced. We use HLC for transaction timestamps precisely because audit logs need to be human-readable and transaction ordering needs to be causally correct simultaneously. Physical timestamps alone cannot provide both. A pure logical clock cannot provide the first. HLC is the production answer — not an advanced option but the correct default for any financial system that needs both properties.

The closing line from the Part 1 synthesis — “distributed systems do not fail because something went wrong, they fail because reality showed up” — is the thing I come back to most when debugging production incidents. It reframes the question correctly. The question is not what broke. The question is which assumption about the environment turned out to be wrong. Was it the network assumption? The node assumption? The time assumption? Once you know which model assumption reality violated, the path to the fix is usually clear. Without that framing, production debugging in distributed systems often feels like searching for something that left no trace.

Part 2 starts with Post 2.0 — the bridge from these three foundational constraints to the mechanisms that address them. Read it as the bridge it is, not as standalone content. Every mechanism in Part 2 is a response to a specific constraint Part 1 established. The connection is deliberate and the sequence matters.


Series home: Distributed Systems — Concepts, Design & Real-World Engineering

Part 1 — Foundations (complete)

Previous: ← 1.4 — Node & Failure Model: Crashes, Slow Nodes and Partial Failure

Part 2 — Communication & Coordination starts here:

← Back to Part 1 Foundations overview and reading guide

Discover more from Rahul Suryawanshi

Subscribe now to keep reading and get access to the full archive.

Continue reading