
Keep CALM and CRDT On
Shadaj Laddad∗
University of California, Berkeley
shadaj@cs.berkeley.edu
Conor Power∗
University of California, Berkeley
conorpower@cs.berkeley.edu
Mae Milano
University of California, Berkeley
mpmilano@cs.berkeley.edu
Alvin Cheung
University of California, Berkeley
akcheung@cs.berkeley.edu
Natacha Crooks
University of California, Berkeley
ncrooks@cs.berkeley.edu
Joseph M. Hellerstein
University of California, Berkeley
hellerstein@cs.berkeley.edu
ABSTRACT
Despite decades of research and practical experience, developers
have few tools for programming reliable distributed applications
without resorting to expensive coordination techniques. Conict-
free replicated datatypes (CRDTs) are a promising line of work
that enable coordination-free replication and oer certain eventual
consistency guarantees in a relatively simple object-oriented API.
Yet CRDT guarantees extend only to data updates; observations of
CRDT state are unconstrained and unsafe. We propose an agenda
that embraces the simplicity of CRDTs, but provides richer, more
uniform guarantees. We extend CRDTs with a query model that
reasons about which queries are safe without coordination by ap-
plying monotonicity results from the CALM Theorem, and lay out
a larger agenda for developing CRDT data stores that let developers
safely and eciently interact with replicated application state.
1 INTRODUCTION
Consistency is a central theme of distributed computing research,
with major implications for practitioners. Modern cloud-hosted
applications are frequently distributed to optimize for latency and
availability. When application state is replicated across the globe,
developers often face stark choices regarding replica consistency.
Strong consistency can be enforced in a general-purpose way at
the storage or memory layer via classical distributed coordination
(consensus, transactions, etc.), but this is often unattractive for la-
tency and availability reasons. Alternatively, application developers
can build on “weakly” consistent storage models that do not use
coordination; in this case developers must reason about consistency
at the application level.
The last decade has seen a surge of research interest in rea-
soning about application consistency, featuring everything from
complex formal invariants [
54
] to multi-tiered consistency annota-
tions [
19
,
40
] to explicit happens-before annotations on operations
[
12
]. In recent years, one approach has risen above the noise among
practitioners: Conict-Free Replicated Data Types [
52
]. CRDTs are
provided as an API by a few commercial platforms (e.g., Enterprise
Redis, Akka, Basho Riak [
7
,
27
,
48
]), and have been documented in
use by a number of products and services including PayPal, League
of Legends, and Soundcloud [
6
,
18
,
38
]. There is also are a growing
set of open source CRDT packages that have thousands of stars on
∗equal contribution
This work is licensed under a Creative Commons Attribution 4.0 International License.
GitHub [
25
,
26
,
43
], and blog posts explaining CRDTs to developers
in pragmatic, informal terms [13, 49, 57].
The attractiveness of CRDTs lies in their combination of (1) an
easy-to-explain API, and (2) the promise of formal safety guarantees.
Designing a CRDT centers around providing a function to merge
any two replicas, with the requirement that this single function
is associative, commutative and idempotent (ACI), and dening
atomic operations that clients can use to update a replica. From the
user’s perspective, the CRDT’s object-oriented API often mimics a
familiar collection; many of the CRDTs in the literature are simple
adaptations of well-known data types like Sets and Counters.
The formal safety properties of CRDTs, as originally phrased by
Shapiro et al, leverage “a well dened interface ... [with] mathe-
matically sound rules to guarantee state convergence” [
52
]. This
guarantee is achieved via the ACI properties of the merge function.
Classic anomalies in eventually consistent systems are caused by
reordered, duplicated, or late-arriving updates—none of which can
aect the result of an idempotent, commutative, and associative
function execution.
But this strong convergence guarantee addresses only state up-
dates and oers no APIs (or guarantees!) for visibility into the state
of a CRDT. Although useful queries are often included in the pre-
sentation of CRDT designs, these have no impact on the correctness
of the CRDT and are no safer to use than arbitrary queries executed
directly on the underlying state. In one of the precursor papers
to CRDTs that also proposes ACI merge functions, Helland and
Campbell go as far as noting ironically that READs are “annoying”
and may not commute with other actions [15].
Example 1 (The Potato and the Ferrari, a.k.a. Early Read).
A canonical CRDT is the Two-Phase Set (2P-Set) [
51
], which is a pair
of sets
(𝐴, 𝑅)
that track items to be added (
𝐴
) and removed (
𝑅
). The
merge function for two 2P-Sets is dened simply as the pairwise union,
(𝐴1∪𝐴2, 𝑅1∪𝑅2)
and is patently ACI. This scheme was used in the
well-known Amazon Dynamo shopping cart example [11].
Implicit in this design is a
query 𝑄=𝐴−𝑅
returning the intended
contents of the set. Consider a scenario where a shopper adds a potato
and a Ferrari to their cart, then removes the Ferrari, and “checks out”
by computing the query
𝑄
. In one or more replicas of the 2P-Set, the
checkout request could arrive before the removal of the sports car. This
truly expensive consistency bug arises when the query “reads” the
state of the 2P-Set “too early”, before all the removals have eventually
arrived. And there is no way to know that all the removals have indeed
arrived without the coordination that CRDTs supposedly do not need.
In practice, the soundness of state convergence in CRDTs does
not translate to predictable guarantees for computations that exam-
ine them. One might say that CRDTs provide Schrödinger consistency
arXiv:2210.12605v1 [cs.DB] 23 Oct 2022