Keep CALM and CRDT On

2025-05-06 0 0 842.97KB 8 页 10玖币
侵权投诉
Keep CALM and CRDT On
Shadaj Laddad
University of California, Berkeley
shadaj@cs.berkeley.edu
Conor Power
University of California, Berkeley
conorpower@cs.berkeley.edu
Mae Milano
University of California, Berkeley
mpmilano@cs.berkeley.edu
Alvin Cheung
University of California, Berkeley
akcheung@cs.berkeley.edu
Natacha Crooks
University of California, Berkeley
ncrooks@cs.berkeley.edu
Joseph M. Hellerstein
University of California, Berkeley
hellerstein@cs.berkeley.edu
ABSTRACT
Despite decades of research and practical experience, developers
have few tools for programming reliable distributed applications
without resorting to expensive coordination techniques. Conict-
free replicated datatypes (CRDTs) are a promising line of work
that enable coordination-free replication and oer certain eventual
consistency guarantees in a relatively simple object-oriented API.
Yet CRDT guarantees extend only to data updates; observations of
CRDT state are unconstrained and unsafe. We propose an agenda
that embraces the simplicity of CRDTs, but provides richer, more
uniform guarantees. We extend CRDTs with a query model that
reasons about which queries are safe without coordination by ap-
plying monotonicity results from the CALM Theorem, and lay out
a larger agenda for developing CRDT data stores that let developers
safely and eciently interact with replicated application state.
1 INTRODUCTION
Consistency is a central theme of distributed computing research,
with major implications for practitioners. Modern cloud-hosted
applications are frequently distributed to optimize for latency and
availability. When application state is replicated across the globe,
developers often face stark choices regarding replica consistency.
Strong consistency can be enforced in a general-purpose way at
the storage or memory layer via classical distributed coordination
(consensus, transactions, etc.), but this is often unattractive for la-
tency and availability reasons. Alternatively, application developers
can build on “weakly” consistent storage models that do not use
coordination; in this case developers must reason about consistency
at the application level.
The last decade has seen a surge of research interest in rea-
soning about application consistency, featuring everything from
complex formal invariants [
54
] to multi-tiered consistency annota-
tions [
19
,
40
] to explicit happens-before annotations on operations
[
12
]. In recent years, one approach has risen above the noise among
practitioners: Conict-Free Replicated Data Types [
52
]. CRDTs are
provided as an API by a few commercial platforms (e.g., Enterprise
Redis, Akka, Basho Riak [
7
,
27
,
48
]), and have been documented in
use by a number of products and services including PayPal, League
of Legends, and Soundcloud [
6
,
18
,
38
]. There is also are a growing
set of open source CRDT packages that have thousands of stars on
equal contribution
This work is licensed under a Creative Commons Attribution 4.0 International License.
GitHub [
25
,
26
,
43
], and blog posts explaining CRDTs to developers
in pragmatic, informal terms [13, 49, 57].
The attractiveness of CRDTs lies in their combination of (1) an
easy-to-explain API, and (2) the promise of formal safety guarantees.
Designing a CRDT centers around providing a function to merge
any two replicas, with the requirement that this single function
is associative, commutative and idempotent (ACI), and dening
atomic operations that clients can use to update a replica. From the
user’s perspective, the CRDT’s object-oriented API often mimics a
familiar collection; many of the CRDTs in the literature are simple
adaptations of well-known data types like Sets and Counters.
The formal safety properties of CRDTs, as originally phrased by
Shapiro et al, leverage “a well dened interface ... [with] mathe-
matically sound rules to guarantee state convergence” [
52
]. This
guarantee is achieved via the ACI properties of the merge function.
Classic anomalies in eventually consistent systems are caused by
reordered, duplicated, or late-arriving updates—none of which can
aect the result of an idempotent, commutative, and associative
function execution.
But this strong convergence guarantee addresses only state up-
dates and oers no APIs (or guarantees!) for visibility into the state
of a CRDT. Although useful queries are often included in the pre-
sentation of CRDT designs, these have no impact on the correctness
of the CRDT and are no safer to use than arbitrary queries executed
directly on the underlying state. In one of the precursor papers
to CRDTs that also proposes ACI merge functions, Helland and
Campbell go as far as noting ironically that READs are “annoying”
and may not commute with other actions [15].
Example 1 (The Potato and the Ferrari, a.k.a. Early Read).
A canonical CRDT is the Two-Phase Set (2P-Set) [
51
], which is a pair
of sets
(𝐴, 𝑅)
that track items to be added (
𝐴
) and removed (
𝑅
). The
merge function for two 2P-Sets is dened simply as the pairwise union,
(𝐴1𝐴2, 𝑅1𝑅2)
and is patently ACI. This scheme was used in the
well-known Amazon Dynamo shopping cart example [11].
Implicit in this design is a
query 𝑄=𝐴𝑅
returning the intended
contents of the set. Consider a scenario where a shopper adds a potato
and a Ferrari to their cart, then removes the Ferrari, and “checks out”
by computing the query
𝑄
. In one or more replicas of the 2P-Set, the
checkout request could arrive before the removal of the sports car. This
truly expensive consistency bug arises when the query “reads” the
state of the 2P-Set “too early”, before all the removals have eventually
arrived. And there is no way to know that all the removals have indeed
arrived without the coordination that CRDTs supposedly do not need.
In practice, the soundness of state convergence in CRDTs does
not translate to predictable guarantees for computations that exam-
ine them. One might say that CRDTs provide Schrödinger consistency
arXiv:2210.12605v1 [cs.DB] 23 Oct 2022
guarantees: they are guaranteed to be consistent only if they are
not viewed.
The weak consistency of CRDT queries is not a secret in the
research literature [
63
,
64
], and is mentioned in the initial pa-
pers [
51
,
52
]. At the same time, bloggers and other developer-facing
venues have latched onto the formal language of the initial pa-
pers (“principled” [
52
], “mathematically sound” [
52
], “theoretically
sound” [
51
]), and sometimes without caveats. For example, one
online article argues that CRDTs “allow multiple replicas in dier-
ent regions to mathematically resolve to the same state without
coordination ... multiple active copies present accurate views of
the shared datasets at low latencies” [
20
]. This dangerous misread
of CRDT guarantees suggests that more work is needed to ensure
developers use CRDTs safely.
We believe that the gaps in CRDT guarantees can be addressed on
two fronts: (a) dening more precisely what developers must reason
about when using CRDTs in their applications and (b) building
data systems for CRDTs that automatically manage replication
and query execution to deliver stronger consistency guarantees.
The unconstrained nature of queries in CRDTs raises an intriguing
question: can we develop a more formal query model that makes
it possible to precisely dene when execution on a single replica
yields consistent results?
In this paper, we explore how the CALM (Consistency As Logical
Monotonicity) theorem [
3
], originally formulated as a denition
for consistency in distributed logic programs, can be used as the
basis of a query model for CRDTs that delineates queries that can
be executed locally from those that require coordination among a
quorum. Because monotonicity can be identied as a static prop-
erty, this view of queries paves the path for a CRDT data system
that provides ecient and safe execution of queries. Guided by this
vision, we map out a research path that weaves together query op-
timization, storage abstractions, provenance, and more to bring the
coordination-free benets of CRDTs to developers while preserving
the consistency guarantees they expect.
2 AN OVERVIEW OF CRDTS
Most discussions of CRDTs begin by introducing two functionally
equivalent representations: state-based CRDTs (a.k.a. CvRDTs) and
op-based CRDTs (a.k.a. CmRDTs). Essentially, state-based CRDTs
gossip data, while op-based CRDTs gossip logical log records.
2.1 State-Based CRDTs
We begin by reviewing the denition of state-based CRDTs. CvRDTs
encapsulate the current
𝑠𝑡𝑎𝑡𝑒
of the replica; let the type of
𝑠𝑡𝑎𝑡𝑒
be
called
𝑇
. The API for state-based CRDTs contains three classes of
methods, all of which run locally on a single replica’s state:
Merge
:
merge
is a single, required method that takes a value
𝑣
of
type
𝑇
as input. It combines
𝑠𝑡𝑎𝑡𝑒
with
𝑣
to generate a value
𝑠𝑡𝑎𝑡𝑒
of type
𝑇
, and updates itself so that
𝑠𝑡𝑎𝑡𝑒 =𝑠𝑡𝑎𝑡𝑒
.Constraint: the
merge function must be ACI.
Operations
: these are methods that clients use to modify
𝑠𝑡𝑎𝑡𝑒
.
Constraint: operations must be monotonic with respect to the type
𝑇
.
Queries
: these are methods that do not modify
𝑠𝑡𝑎𝑡𝑒
, but return a
result that may be dependent on 𝑠𝑡𝑎𝑡𝑒.
merge(adds, removes) {
state.adds = union(state.adds, adds);
state.removes = union(state.removes, removes);
}
operation add(i) { state.adds = union(state.adds, Set(i)); }
operation remove(i) { state.removes = union(state.removes, Set(i)); }
query contents() { return diff(state.adds, state.removes); }
Figure 1: Pseudocode for a Two-Phase Set CRDT.
Figure 2: Hasse diagrams for G-Set and a cardinality counter,
and a monotone function between them (dashed lines).
An example of a 2P-Set CvRDT is shown in Figure 1. In CvRDTs,
nodes gossip their replicas to each other and apply
merge
upon
receipt of gossip. If we focus only on
𝑠𝑡𝑎𝑡𝑒
and
merge
, a CvRDT
is simply a new name for a classical mathematical construct: the
join semi-lattice. A join semi-lattice is dened in precisely the same
way: it is a pair
𝑆=(𝐷, ⊔)
where
𝐷
is a domain (i.e. type) and
is
an operation (called “join” or “least upper bound”) that is ACI.
A well-known property of a join semi-lattice
𝑆
is that it is iso-
morphic to a partial order
𝑆
on
𝐷
such that given two elements
𝑠, 𝑡 𝐷
,
𝑠𝑆𝑡
i
𝑠𝑡=𝑡
. (We will drop the subscript on
when
it is clear from context.) The familiar “Hasse diagrams” for semi-
lattices capture this by laying out the elements of
𝐷
on a y-axis
corresponding to the
ordering (Figure 2). Note that this order is
partial: two elements 𝑠, 𝑡 𝐷may be incomparable: 𝑠̸≤ 𝑡𝑡̸𝑠.
Operations are monotonic with respect to the partial order
:
if an operation replaces
𝑠𝑡𝑎𝑡𝑒
with
𝑠𝑡𝑎𝑡𝑒
, we require that
𝑠𝑡𝑎𝑡𝑒
𝑠𝑡𝑎𝑡𝑒
. A good way to enforce this is to forbid operations from
modifying
𝑠𝑡𝑎𝑡𝑒
directly, and instead require them to invoke
merge
to perform the state update—the ACI properties of
merge
then
ensure monotonic updates. Viewed through that lens, CvRDTs only
update their state through the ACI
merge
function, and are precisely
join semi-lattices [30].
Note that although queries are included in the API surface of
CvRDTs, they are unconstrained and may perform arbitrary com-
putations on the underlying state. Therefore, the choice of queries
included with a CRDT design have no eect on the safety of obser-
vations through them—their consistency guarantees are no stronger
than a single query that just emits the internal state of the CRDT.
2.2 Op-Based CRDTs and Compression
The idea of op-based CRDTs (CmRDTs) is to gossip logs of opera-
tions rather than state. Each given replica
𝑟
applies its operations
sequentially, so upon gossip we require that another replica
𝑟
ap-
plies the log records from
𝑟
in an order that produces the same
nal state as
𝑟
. In typical deployments of op-based CRDTs, this
restriction is imposed by requiring the network between replicas to
2
摘要:

KeepCALMandCRDTOnShadajLaddad∗UniversityofCalifornia,Berkeleyshadaj@cs.berkeley.eduConorPower∗UniversityofCalifornia,Berkeleyconorpower@cs.berkeley.eduMaeMilanoUniversityofCalifornia,Berkeleympmilano@cs.berkeley.eduAlvinCheungUniversityofCalifornia,Berkeleyakcheung@cs.berkeley.eduNatachaCrooksUniver...

展开>> 收起<<
Keep CALM and CRDT On.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:8 页 大小:842.97KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注