EEMARQ Efficient Lock-Free Range Queries with Memory Reclamation

2025-08-18 1 0 1.53MB 46 页 10玖币

侵权投诉

EEMARQ: Efﬁcient Lock-Free Range Queries

with Memory Reclamation

Gali Shefﬁ !

Department of Computer Science, Technion, Haifa, Israel

Pedro Ramalhete !

Cisco Systems, Switzerland

Erez Petrank !

Department of Computer Science, Technion, Haifa, Israel

Abstract

Multi-Version Concurrency Control (MVCC) is a common mechanism for achieving linearizable range queries

in database systems and concurrent data-structures. The core idea is to keep previous versions of nodes to serve

range queries, while still providing atomic reads and updates. Existing concurrent data-structure implementations,

that support linearizable range queries, are either slow, use locks, or rely on blocking reclamation schemes.

We present EEMARQ, the ﬁrst scheme that uses MVCC with lock-free memory reclamation to obtain a fully

lock-free data-structure supporting linearizable inserts, deletes, contains, and range queries. Evaluation shows

that EEMARQ outperforms existing solutions across most workloads, with lower space overhead and while

providing full lock freedom.

2012 ACM Subject Classiﬁcation

Software and its engineering

→

Memory management; Theory of

computation →Concurrency

Keywords and phrases safe memory reclamation, lock-freedom, snapshot, concurrency, range query

Digital Object Identiﬁer 10.4230/LIPIcs.CVIT.2016.23

licensed under Creative Commons License CC-BY 4.0

42nd Conference on Very Important Topics (CVIT 2016).

Editors: John Q. Open and Joan R. Access; Article No. 23; pp. 23:1–23:46

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

arXiv:2210.17086v1 [cs.DB] 31 Oct 2022

23:2 EEMARQ: Efﬁcient Lock-Free Range Queries with Memory Reclamation

1 Introduction

Online Analytical Processing (OLAP) transactions are typically long and may read data from a

large subset of the records in a database [52, 57]. As such, analytical workloads pose a signiﬁcant

challenge in the design and implementation of efﬁcient concurrency controls for database management

systems (DBMS). Two-Phase Locking (2PL) [64] is sometimes used, but locking each record before

it is read implies a high synchronization cost and, moreover, the inability to modify these records

over long periods. Another way to deal with OLAP queries is to use Optimistic Concurrency

Controls [25], where the records are not locked, but they need to be validated at commit time to

guarantee serializability [47]. If during the time that the analytical transaction executes, there is any

modiﬁcation to one of these records, the analytical query will have to abort and restart. Aborting can

prevent long read-only queries from ever completing.

DBMS designers typically address these obstacles using Multi-Version Concurrency Control

(MVCC). MVCC’s core idea is to keep previous versions of a record, allowing transactions to read

data from a ﬁxed point in time. For managing the older versions, each record is associated with its

list of older records. Each version record contains a copy of an older version, and its respective time

stamp, indicating its commit time. Each update of the record’s values adds a new version record to

the top of the version list, and every read of an older version is done by traversing the version list,

until the relevant timestamp is reached.

Keeping the version lists relatively short is fundamental for high performance and low memory

footprint [11,19,34,35,37]. However, the version lists should be carefully pruned, as a missing version

record can be harmful to an ongoing range query. In the Database setting, the common approach is

to garbage collect old versions once the version list lengths exceed a certain threshold [11, 37, 53].

During garbage collection, the entire database is scanned for record instances with versions that are

not required for future scanners. However, garbage collecting old versions with update-intensive

workloads considerably slows down the entire system. An alternative approach is to use transactions

and associate each scan operation with an undo-log [6, 42]. But this requires allocating memory for

storing the undo logs and a safe memory reclamation scheme to recycle log entries.

In contrast to the DBMS approach, many concurrent in-memory data-structures implementations

do not provide an MVCC mechanism, and simply give up on range queries. Many map-based

data-structures provide linearizable [30] insertions, deletions and membership queries of single items.

Most data-structures that do provide range queries are blocking. They either use blocking MVCC

mechanisms [3, 12, 44], or rely on the garbage collector of managed programming languages [7, 22,

51, 67] which is blocking. It may seem like lock-free data-structures can simply employ a lock-free

reclamation scheme with an MVCC mechanism to obtain full lock-freedom, but interestingly this

poses a whole new challenge.

Safe manual reclamation (SMR) [41, 56, 59, 61, 66] algorithms rely on retire() invocations by

the program, announcing that a certain object has been unlinked from a data-structure. The task

of the SMR mechanism is to decide which retired objects can be safely reclaimed, making their

memory space available for re-allocation. Most SMR techniques [41, 56, 61, 66] heavily rely on the

fact that retired objects are no longer reachable for threads that concurrently traverse the data structure.

Typically, objects are retired when deleted from the data-structure. However, when using version

lists, if we retire objects when they are deleted from the current version of the data-structure, they

would still be reachable by range queries via the version list links. Therefore, it is not safe to recycle

old objects with existing memory reclamation schemes. Epoch-based reclamation (EBR) [15, 24] is

an exception to the rule because it only requires that an operation does not access nodes that were

retired before the operation started. Namely, an existing range query prohibits reclamation of any

deleted node, and subsequent range queries do not access these nodes, so they can be deleted safely.

G. Shefﬁ, P. Ramalhete and E. Petrank 23:3

Therefore, EBR can be used without making any MVCC-speciﬁc adjustments, and indeed EBR is

used in many in-memory solutions [3, 44, 65]. However, EBR is not robust [59,61, 66]. I.e., a slow

executing thread may prevent the reclamation of an unbounded number of retired objects, which may

affect performance, and theoretically block all new allocations.

One lock-free memory reclamation scheme that can be adopted to provide lock-free support for

MVCC is the VBR optimistic memory reclamation scheme [59, 60]. VBR allows a program to access

reclaimed space, but it raises a warning when accessed data has been re-allocated. This allows the

program to retry the access with refreshed data. The main advantages of VBR are that it is lock-free,

it is fast, and it has very low memory footprint. Any retired object can be immediately reclaimed

safely. The main drawback

is that reclaimed memory cannot be returned to the operating system,

but must be kept for subsequent node allocations, as program threads may still access reclaimed

nodes. Similarly to other schemes, the correctness of VBR depends on the inability of operations to

visit previously retired objects. In data structures that do not use multi versions, the deleting thread

adequately retires a node after disconnecting it from the data structure. But in the MVCC setting,

disconnected nodes remain connected via the version list, foiling correctness of VBR.

In this paper we modify VBR to work correctly in the MVCC setting. It turns out that for the

speciﬁc case of old nodes in the version list, correctness can be obtained. VBR keeps a slow-ticking

epoch clock and it maintains a birth epoch ﬁeld for each node. Interestingly, this birth epoch can

be used to tell whether a retired node in the version list has been re-allocated. As we show in this

paper, an invariant of nodes in the version list is that they have non-increasing birth-epoch numbers.

Moreover, if one of the nodes in the version list is re-allocated, then this node must foil the invariant.

Therefore, when a range query traverses a version list to locate the version to use for its traversal, it

can easily detect a node that has been re-allocated and whose data is irrelevant. When a range query

detects such re-allocation, it restarts. As shown in the evaluation, restarting happens infrequently with

VBR and the obtained performance is the best among existing schemes in the literature. Using the

modiﬁed VBR in the MVCC setting, a thread can delete a node and retire it after disconnecting it

from the data structure (and while it is still reachable via version list pointers).

Recently, two novel papers [44, 65] presented efﬁcient MVCC-based key-value stores. The main

new idea is to track and keep a version list of modiﬁed ﬁelds only and not of entire nodes. For many

data-structures, all ﬁelds are immutable except for one or two pointer ﬁelds. For such data-structures,

it is enough to keep a version list of pointers only. The ﬁrst paper proposed a lock-free mechanism,

based on versioned CAS objects (vCAS) [65], and the second proposed a blocking mechanism,

based on Bundle objects (Bundles) [44]. While copying one ﬁeld instead of the entire node reduces

the space overhead, the resulting indirection is harmful for performance: to dereference a pointer

during a traversal, one must ﬁrst move to the top node of the version list, and then use its pointer to

continue with the traversal. As we show in Section 4, this indirection suffers from high overheads in

read-intensive workloads. Bundles ameliorate this overhead by caching the most recent pointer in

the node to allow quick access for traversals that do not use older versions. However, range queries

still need to dereference twice as many references during a traversal. The vCAS approach presents

a more complicated optimization that completely eliminates indirection (which we further discuss

in Section 3). However, its applicability depends on assumptions on the original data-structure that

many data structures do not satisfy. Therefore, it is unclear how it can be integrated into general

state-of-the-art concurrent data-structures. In terms of memory reclamation, both schemes use the

EBR technique (that may block in theory, due to allocation heap exhaustion). This means that even

vCAS does not provide a lock-free range query mechanism. A subsequent paper on MVCC-speciﬁc

VBR also necessitates type-preservation. However, this does not constitute a problem in our setting, as all allocated

memory objects are of the same type. For more details, see Section 3.

CVIT 2016

23:4 EEMARQ: Efﬁcient Lock-Free Range Queries with Memory Reclamation

garbage collectors [8], provides a robust memory reclamation method for collecting the version

records, but it does not deal with the actual data-structure nodes.

In this paper we present EEMARQ (End-to-End lock-free MAp with Range Queries), a design

for a lock-free in-memory map with MVCC and a robust lock-free memory management scheme.

EEMARQ provides linearizable and high-performance inserts, deletes, searches, and range queries.

The design starts by applying vCAS to a linked-list. The linked-list is simple, and so it allows

applying vCAS easily. However, the linked-list does not satisfy the optimization assumptions required

by [65], and so a (non-trivial) extension is required to ﬁt the linked-list. In the unoptimized variant,

there is a version node for each of the updates (insert or delete) and this node is used to route the scans

in the adequate timestamp. But this is costly, because traversals end up accessing twice the number of

the original nodes. A natural extension is to associate the list nodes with the data that is originally

kept in the version nodes. We augment the linked-list construction to allow the optimization of [65]

for it. They use a clever mapping from the original version nodes to existing list nodes, that allows

moving the data from version nodes to list nodes, and then elide the version nodes. Now traversals

need no extra memory accesses to version nodes. This method is explained in Section 3.2.

Second, we extend VBR by adding support for reachable retired nodes on the version list. The

extended VBR allows keeping retired reachable version nodes in the data structure (which the original

VBR forbids) while maintaining high performance, lock-freedom, and robustness.

Finally, we deal with the inefﬁciency of a linked-list by adding a fast indexing to the linked-list

nodes. A fast index can be obtained from a binary search tree or a skip list. But the advantage we

get from separating the linked-list from the indexing mechanism is that we do not have to maintain

versions for the index (e.g., for the binary search tree or the skip list), but only for the underlying

linked-list. This separation between the design of the versioned linked-list and the non-versioned

index, simpliﬁes each of the sub-designs, and also obtains high performance, because operations on

the index do not need to maintain versions. Previous work uses this separation idea between the lower

level of the skip list or leaves of the tree from the rest for various goals (e.g., [44, 69]).

The combination of all these three ideas, i.e., optimized versioned linked-list, extended VBR, and

independent indexing, yields a highly performant data structure design with range queries. Evaluation

shows that EEMARQ outperforms both vCAS and Bundles (while providing full lock-freedom). A

proof of correctness is provided in the appendix.

2 Related Work

Existing work on linearizable range queries in the shared-memory setting includes many solutions

which are not based on MVCC techniques. Some data-structure interfaces originally include a tailor-

made range query operation. E.g., there are trees [12, 13, 22, 67], hash tries [55], queues [45, 46, 54],

skip lists [5] and graphs [31] with a built-in range query mechanism. Other and more general

solutions execute range queries by explicitly taking a snapshot of the whole data-structure, followed

by collecting the set of keys in the given range [1, 4]. The Snapcollector [51] forces the cooperation

of all executing threads while a certain thread is scanning the data-structure. Despite being lock-free

and general, the Snapcollector’s memory and performance overheads are high. The Snapcollector

was enhanced to support range queries that do not force a snapshot of the whole data-structure [16].

However, this solution still suffers from major time and space overheads.

Another way to implement range queries is to use transactional memory [23, 32, 49, 50]. Trans-

actions can either be implemented in software or in hardware, allowing range queries to take effect

atomically. Although transactions may seem as ideal candidates for long queries, software transactions

incur high performance overheads and hardware transactions frequently abort when accessing big

memory chunks. Read-log-update (RLU) [39] borrows software transaction techniques and extends

G. Shefﬁ, P. Ramalhete and E. Petrank 23:5

read-copy-update (RCU) [40] to support multiple updates. RLU yields relatively simple implementa-

tions, but its integration involves re-designing the entire data-structure. In addition, similarly to RCU,

it suffers from high overheads in write-extensive workloads.

Arbel-Raviv and Brown exploited the EBR manual reclamation to support range queries [3]. Their

technique uses EBR’s global epoch clock for associating data-structure items with their insertion and

deletion timestamps. EEMARQ uses a similar technique for associating data-structure modiﬁcations

with respective timestamps. They also took advantage of EBR’s retire-lists, in order to locate nodes

that were deleted during a range query scan. However, while their solution indeed avoids extensive

scanning helping when deleting an item from the data-structure (as imposed by the Snapcollector [51]),

it may still impose signiﬁcant performance overheads (as shown in Section 4). EEMARQ minimizes

these overheads by keeping a reachable path to deleted nodes (for more details, see Section 3.2).

Multi-Version Concurrency Control

MVCC easily provides isolation between concurrent

reads and updates. I.e., range queries can work on a consistent view of the data, while not interfering

with update operations. This powerful technique is widely used in commercial systems [19, 21], as

well as in research-oriented DBMS [33,53], in-memory shared environments [44,65] and transactional

memory [23, 32, 36, 49, 50]. MVCC has been investigated in the DBMS setting for the last four

decades, both from a theoretical [9, 10, 35, 48, 68] and a practical [19, 33, 37, 42, 53] point of view.

A lot of effort has been put in designing MVCC implementations and addressing the unwanted

side-effects of long version lists. This issue is crucial both for accelerating version list scans and

for reducing the contention between updates and garbage collection. Accelerating DBMS version

list scans (independently of the constant need to prone them) has been investigated in [35]. Most

DBMS-related work that focuses on this issue, tries to minimize the problem by eagerly collecting

unnecessary versions [2, 11, 21, 34, 38, 42, 53].

Safe Memory Reclamation

Most existing in-memory environments that enable linearizable

range queries, avoid managing their memory, by relying on automatic (and blocking) garbage

collection of old versions [7,22,51,67]. Solutions that do manually manage their allocated memory [3,

44, 65], use EBR for safe reclamation. In EBR, a shared epoch counter is incremented periodically,

and upon each operation invocation, the threads announce their observed epoch. The epoch clock

can be advanced when no executing thread has an announcement with a previous epoch. During

reclamation, only nodes that had been retired at least two epochs ago are reclaimed. EBR is safe for

MVCC because a running scan prevents the advance of the epoch clock, and also the reclamation of

any node in the data-structure that was not deleted before the scan.

The MVCC-oriented garbage collector from [8] incorporates reference counting (RC) [17, 18, 28],

in order to maintain lock-freedom while safely reclaiming old versions. Any object can be immediately

reclaimed once its reference count reaches zero, without the need in invoking explicit retire() calls.

While RC simpliﬁes reclamation, it incurs high performance overheads and does not guarantee a tight

bound on unreclaimed garbage (i.e., it is not robust).

Other reclamation schemes were also considered when designing EEMARQ. NBR [61] was one

of the strongest candidates, as it is fast and lock-free (under some hardware assumptions). However,

it is not clear whether NBR can be integrated into a skip list implementation (which serves as one of

our fast indexes). Pointer–based reclamation methods (e.g., Hazard Pointers [41]) allow threads to

protect speciﬁc objects (i.e., temporarily prevent their reclamation), by announcing their future access

to these objects, or publishing an announcement indicating the protection of a bigger set of objects

(e.g., Hazard Eras [56], Interval-Based Reclamation [66], Margin Pointers [62]). Although these

schemes are robust (as opposed to EBR), it is unclear how they can be used in MVCC environments.

They require that reaching a reclaimed node from a protected one would be impossible (even if the

CVIT 2016

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EEMARQ:EfcientLock-FreeRangeQuerieswithMemoryReclamationGaliShef!DepartmentofComputerScience,Technion,Haifa,IsraelPedroRamalhete!CiscoSystems,SwitzerlandErezPetrank!DepartmentofComputerScience,Technion,Haifa,IsraelAbstractMulti-VersionConcurrencyControl(MVCC)isacommonmechanismforachievinglineariza...

展开>> 收起<<

EEMARQ Efficient Lock-Free Range Queries with Memory Reclamation.pdf

共46页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EEMARQ Efficient Lock-Free Range Queries with Memory Reclamation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: