Keep CALM and CRDT On

2025-05-06 1 0 842.97KB 8 页 10玖币

侵权投诉

Shadaj Laddad∗

University of California, Berkeley

shadaj@cs.berkeley.edu

Conor Power∗

University of California, Berkeley

conorpower@cs.berkeley.edu

Mae Milano

University of California, Berkeley

mpmilano@cs.berkeley.edu

Alvin Cheung

University of California, Berkeley

akcheung@cs.berkeley.edu

Natacha Crooks

University of California, Berkeley

ncrooks@cs.berkeley.edu

Joseph M. Hellerstein

University of California, Berkeley

hellerstein@cs.berkeley.edu

ABSTRACT

Despite decades of research and practical experience, developers

have few tools for programming reliable distributed applications

without resorting to expensive coordination techniques. Conict-

free replicated datatypes (CRDTs) are a promising line of work

that enable coordination-free replication and oer certain eventual

consistency guarantees in a relatively simple object-oriented API.

Yet CRDT guarantees extend only to data updates; observations of

CRDT state are unconstrained and unsafe. We propose an agenda

that embraces the simplicity of CRDTs, but provides richer, more

uniform guarantees. We extend CRDTs with a query model that

reasons about which queries are safe without coordination by ap-

plying monotonicity results from the CALM Theorem, and lay out

a larger agenda for developing CRDT data stores that let developers

safely and eciently interact with replicated application state.

1 INTRODUCTION

Consistency is a central theme of distributed computing research,

with major implications for practitioners. Modern cloud-hosted

applications are frequently distributed to optimize for latency and

availability. When application state is replicated across the globe,

developers often face stark choices regarding replica consistency.

Strong consistency can be enforced in a general-purpose way at

the storage or memory layer via classical distributed coordination

(consensus, transactions, etc.), but this is often unattractive for la-

tency and availability reasons. Alternatively, application developers

can build on “weakly” consistent storage models that do not use

coordination; in this case developers must reason about consistency

at the application level.

The last decade has seen a surge of research interest in rea-

soning about application consistency, featuring everything from

complex formal invariants [

] to multi-tiered consistency annota-

tions [

] to explicit happens-before annotations on operations

[

]. In recent years, one approach has risen above the noise among

practitioners: Conict-Free Replicated Data Types [

]. CRDTs are

provided as an API by a few commercial platforms (e.g., Enterprise

Redis, Akka, Basho Riak [

]), and have been documented in

use by a number of products and services including PayPal, League

of Legends, and Soundcloud [

]. There is also are a growing

set of open source CRDT packages that have thousands of stars on

∗equal contribution

This work is licensed under a Creative Commons Attribution 4.0 International License.

GitHub [

], and blog posts explaining CRDTs to developers

in pragmatic, informal terms [13, 49, 57].

The attractiveness of CRDTs lies in their combination of (1) an

easy-to-explain API, and (2) the promise of formal safety guarantees.

Designing a CRDT centers around providing a function to merge

any two replicas, with the requirement that this single function

is associative, commutative and idempotent (ACI), and dening

atomic operations that clients can use to update a replica. From the

user’s perspective, the CRDT’s object-oriented API often mimics a

familiar collection; many of the CRDTs in the literature are simple

adaptations of well-known data types like Sets and Counters.

The formal safety properties of CRDTs, as originally phrased by

Shapiro et al, leverage “a well dened interface ... [with] mathe-

matically sound rules to guarantee state convergence” [

]. This

guarantee is achieved via the ACI properties of the merge function.

Classic anomalies in eventually consistent systems are caused by

reordered, duplicated, or late-arriving updates—none of which can

aect the result of an idempotent, commutative, and associative

function execution.

But this strong convergence guarantee addresses only state up-

dates and oers no APIs (or guarantees!) for visibility into the state

of a CRDT. Although useful queries are often included in the pre-

sentation of CRDT designs, these have no impact on the correctness

of the CRDT and are no safer to use than arbitrary queries executed

directly on the underlying state. In one of the precursor papers

to CRDTs that also proposes ACI merge functions, Helland and

Campbell go as far as noting ironically that READs are “annoying”

and may not commute with other actions [15].

Example 1 (The Potato and the Ferrari, a.k.a. Early Read).

A canonical CRDT is the Two-Phase Set (2P-Set) [

], which is a pair

of sets

(𝐴, 𝑅)

that track items to be added (

𝐴

) and removed (

𝑅

). The

merge function for two 2P-Sets is dened simply as the pairwise union,

(𝐴1∪𝐴2, 𝑅1∪𝑅2)

and is patently ACI. This scheme was used in the

well-known Amazon Dynamo shopping cart example [11].

Implicit in this design is a

query 𝑄=𝐴−𝑅

returning the intended

contents of the set. Consider a scenario where a shopper adds a potato

and a Ferrari to their cart, then removes the Ferrari, and “checks out”

by computing the query

𝑄

. In one or more replicas of the 2P-Set, the

checkout request could arrive before the removal of the sports car. This

truly expensive consistency bug arises when the query “reads” the

state of the 2P-Set “too early”, before all the removals have eventually

arrived. And there is no way to know that all the removals have indeed

arrived without the coordination that CRDTs supposedly do not need.

In practice, the soundness of state convergence in CRDTs does

not translate to predictable guarantees for computations that exam-

ine them. One might say that CRDTs provide Schrödinger consistency

arXiv:2210.12605v1 [cs.DB] 23 Oct 2022

guarantees: they are guaranteed to be consistent only if they are

not viewed.

The weak consistency of CRDT queries is not a secret in the

research literature [

], and is mentioned in the initial pa-

pers [

]. At the same time, bloggers and other developer-facing

venues have latched onto the formal language of the initial pa-

pers (“principled” [

], “mathematically sound” [

], “theoretically

sound” [

]), and sometimes without caveats. For example, one

online article argues that CRDTs “allow multiple replicas in dier-

ent regions to mathematically resolve to the same state without

coordination ... multiple active copies present accurate views of

the shared datasets at low latencies” [

]. This dangerous misread

of CRDT guarantees suggests that more work is needed to ensure

developers use CRDTs safely.

We believe that the gaps in CRDT guarantees can be addressed on

two fronts: (a) dening more precisely what developers must reason

about when using CRDTs in their applications and (b) building

data systems for CRDTs that automatically manage replication

and query execution to deliver stronger consistency guarantees.

The unconstrained nature of queries in CRDTs raises an intriguing

question: can we develop a more formal query model that makes

it possible to precisely dene when execution on a single replica

yields consistent results?

In this paper, we explore how the CALM (Consistency As Logical

Monotonicity) theorem [

], originally formulated as a denition

for consistency in distributed logic programs, can be used as the

basis of a query model for CRDTs that delineates queries that can

be executed locally from those that require coordination among a

quorum. Because monotonicity can be identied as a static prop-

erty, this view of queries paves the path for a CRDT data system

that provides ecient and safe execution of queries. Guided by this

vision, we map out a research path that weaves together query op-

timization, storage abstractions, provenance, and more to bring the

coordination-free benets of CRDTs to developers while preserving

the consistency guarantees they expect.

2 AN OVERVIEW OF CRDTS

Most discussions of CRDTs begin by introducing two functionally

equivalent representations: state-based CRDTs (a.k.a. CvRDTs) and

op-based CRDTs (a.k.a. CmRDTs). Essentially, state-based CRDTs

gossip data, while op-based CRDTs gossip logical log records.

2.1 State-Based CRDTs

We begin by reviewing the denition of state-based CRDTs. CvRDTs

encapsulate the current

𝑠𝑡𝑎𝑡𝑒

of the replica; let the type of

𝑠𝑡𝑎𝑡𝑒

called

𝑇

. The API for state-based CRDTs contains three classes of

methods, all of which run locally on a single replica’s state:

Merge

merge

is a single, required method that takes a value

𝑣

type

𝑇

as input. It combines

𝑠𝑡𝑎𝑡𝑒

with

𝑣

to generate a value

𝑠𝑡𝑎𝑡𝑒 ′

of type

𝑇

, and updates itself so that

𝑠𝑡𝑎𝑡𝑒 =𝑠𝑡𝑎𝑡𝑒 ′

.Constraint: the

merge function must be ACI.

Operations

: these are methods that clients use to modify

𝑠𝑡𝑎𝑡𝑒

Constraint: operations must be monotonic with respect to the type

𝑇

Queries

: these are methods that do not modify

𝑠𝑡𝑎𝑡𝑒

, but return a

result that may be dependent on 𝑠𝑡𝑎𝑡𝑒.

merge(adds, removes) {

state.adds = union(state.adds, adds);

state.removes = union(state.removes, removes);

}

operation add(i) { state.adds = union(state.adds, Set(i)); }

operation remove(i) { state.removes = union(state.removes, Set(i)); }

query contents() { return diff(state.adds, state.removes); }

Figure 1: Pseudocode for a Two-Phase Set CRDT.

Figure 2: Hasse diagrams for G-Set and a cardinality counter,

and a monotone function between them (dashed lines).

An example of a 2P-Set CvRDT is shown in Figure 1. In CvRDTs,

nodes gossip their replicas to each other and apply

merge

upon

receipt of gossip. If we focus only on

𝑠𝑡𝑎𝑡𝑒

and

merge

, a CvRDT

is simply a new name for a classical mathematical construct: the

join semi-lattice. A join semi-lattice is dened in precisely the same

way: it is a pair

𝑆=(𝐷, ⊔)

where

𝐷

is a domain (i.e. type) and

⊔

an operation (called “join” or “least upper bound”) that is ACI.

A well-known property of a join semi-lattice

𝑆

is that it is iso-

morphic to a partial order

≤𝑆

𝐷

such that given two elements

𝑠, 𝑡 ∈𝐷

𝑠≤𝑆𝑡

i

𝑠⊔𝑡=𝑡

. (We will drop the subscript on

≤

when

it is clear from context.) The familiar “Hasse diagrams” for semi-

lattices capture this by laying out the elements of

𝐷

on a y-axis

corresponding to the

≤

ordering (Figure 2). Note that this order is

partial: two elements 𝑠, 𝑡 ∈𝐷may be incomparable: 𝑠̸≤ 𝑡∧𝑡̸≤ 𝑠.

Operations are monotonic with respect to the partial order

≤

if an operation replaces

𝑠𝑡𝑎𝑡𝑒

with

𝑠𝑡𝑎𝑡𝑒 ′

, we require that

𝑠𝑡𝑎𝑡𝑒 ≤

𝑠𝑡𝑎𝑡𝑒 ′

. A good way to enforce this is to forbid operations from

modifying

𝑠𝑡𝑎𝑡𝑒

directly, and instead require them to invoke

merge

to perform the state update—the ACI properties of

merge

then

ensure monotonic updates. Viewed through that lens, CvRDTs only

update their state through the ACI

merge

function, and are precisely

join semi-lattices [30].

Note that although queries are included in the API surface of

CvRDTs, they are unconstrained and may perform arbitrary com-

putations on the underlying state. Therefore, the choice of queries

included with a CRDT design have no eect on the safety of obser-

vations through them—their consistency guarantees are no stronger

than a single query that just emits the internal state of the CRDT.

2.2 Op-Based CRDTs and Compression

The idea of op-based CRDTs (CmRDTs) is to gossip logs of opera-

tions rather than state. Each given replica

𝑟

applies its operations

sequentially, so upon gossip we require that another replica

𝑟′

ap-

plies the log records from

𝑟

in an order that produces the same

nal state as

𝑟

. In typical deployments of op-based CRDTs, this

restriction is imposed by requiring the network between replicas to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

KeepCALMandCRDTOnShadajLaddad∗UniversityofCalifornia,Berkeleyshadaj@cs.berkeley.eduConorPower∗UniversityofCalifornia,Berkeleyconorpower@cs.berkeley.eduMaeMilanoUniversityofCalifornia,Berkeleympmilano@cs.berkeley.eduAlvinCheungUniversityofCalifornia,Berkeleyakcheung@cs.berkeley.eduNatachaCrooksUniver...

展开>> 收起<<

Keep CALM and CRDT On.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Keep CALM and CRDT On

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: