EEMARQ Efficient Lock-Free Range Queries with Memory Reclamation

2025-08-18 1 0 1.53MB 46 页 10玖币
侵权投诉
EEMARQ: Efficient Lock-Free Range Queries
with Memory Reclamation
Gali Sheffi !
Department of Computer Science, Technion, Haifa, Israel
Pedro Ramalhete !
Cisco Systems, Switzerland
Erez Petrank !
Department of Computer Science, Technion, Haifa, Israel
Abstract
Multi-Version Concurrency Control (MVCC) is a common mechanism for achieving linearizable range queries
in database systems and concurrent data-structures. The core idea is to keep previous versions of nodes to serve
range queries, while still providing atomic reads and updates. Existing concurrent data-structure implementations,
that support linearizable range queries, are either slow, use locks, or rely on blocking reclamation schemes.
We present EEMARQ, the first scheme that uses MVCC with lock-free memory reclamation to obtain a fully
lock-free data-structure supporting linearizable inserts, deletes, contains, and range queries. Evaluation shows
that EEMARQ outperforms existing solutions across most workloads, with lower space overhead and while
providing full lock freedom.
2012 ACM Subject Classification
Software and its engineering
Memory management; Theory of
computation Concurrency
Keywords and phrases safe memory reclamation, lock-freedom, snapshot, concurrency, range query
Digital Object Identifier 10.4230/LIPIcs.CVIT.2016.23
© G. Sheffi, P. Ramalhete and E. Petrank;
licensed under Creative Commons License CC-BY 4.0
42nd Conference on Very Important Topics (CVIT 2016).
Editors: John Q. Open and Joan R. Access; Article No. 23; pp. 23:1–23:46
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
arXiv:2210.17086v1 [cs.DB] 31 Oct 2022
23:2 EEMARQ: Efficient Lock-Free Range Queries with Memory Reclamation
1 Introduction
Online Analytical Processing (OLAP) transactions are typically long and may read data from a
large subset of the records in a database [52, 57]. As such, analytical workloads pose a significant
challenge in the design and implementation of efficient concurrency controls for database management
systems (DBMS). Two-Phase Locking (2PL) [64] is sometimes used, but locking each record before
it is read implies a high synchronization cost and, moreover, the inability to modify these records
over long periods. Another way to deal with OLAP queries is to use Optimistic Concurrency
Controls [25], where the records are not locked, but they need to be validated at commit time to
guarantee serializability [47]. If during the time that the analytical transaction executes, there is any
modification to one of these records, the analytical query will have to abort and restart. Aborting can
prevent long read-only queries from ever completing.
DBMS designers typically address these obstacles using Multi-Version Concurrency Control
(MVCC). MVCC’s core idea is to keep previous versions of a record, allowing transactions to read
data from a fixed point in time. For managing the older versions, each record is associated with its
list of older records. Each version record contains a copy of an older version, and its respective time
stamp, indicating its commit time. Each update of the record’s values adds a new version record to
the top of the version list, and every read of an older version is done by traversing the version list,
until the relevant timestamp is reached.
Keeping the version lists relatively short is fundamental for high performance and low memory
footprint [11,19,34,35,37]. However, the version lists should be carefully pruned, as a missing version
record can be harmful to an ongoing range query. In the Database setting, the common approach is
to garbage collect old versions once the version list lengths exceed a certain threshold [11, 37, 53].
During garbage collection, the entire database is scanned for record instances with versions that are
not required for future scanners. However, garbage collecting old versions with update-intensive
workloads considerably slows down the entire system. An alternative approach is to use transactions
and associate each scan operation with an undo-log [6, 42]. But this requires allocating memory for
storing the undo logs and a safe memory reclamation scheme to recycle log entries.
In contrast to the DBMS approach, many concurrent in-memory data-structures implementations
do not provide an MVCC mechanism, and simply give up on range queries. Many map-based
data-structures provide linearizable [30] insertions, deletions and membership queries of single items.
Most data-structures that do provide range queries are blocking. They either use blocking MVCC
mechanisms [3, 12, 44], or rely on the garbage collector of managed programming languages [7, 22,
51, 67] which is blocking. It may seem like lock-free data-structures can simply employ a lock-free
reclamation scheme with an MVCC mechanism to obtain full lock-freedom, but interestingly this
poses a whole new challenge.
Safe manual reclamation (SMR) [41, 56, 59, 61, 66] algorithms rely on retire() invocations by
the program, announcing that a certain object has been unlinked from a data-structure. The task
of the SMR mechanism is to decide which retired objects can be safely reclaimed, making their
memory space available for re-allocation. Most SMR techniques [41, 56, 61, 66] heavily rely on the
fact that retired objects are no longer reachable for threads that concurrently traverse the data structure.
Typically, objects are retired when deleted from the data-structure. However, when using version
lists, if we retire objects when they are deleted from the current version of the data-structure, they
would still be reachable by range queries via the version list links. Therefore, it is not safe to recycle
old objects with existing memory reclamation schemes. Epoch-based reclamation (EBR) [15, 24] is
an exception to the rule because it only requires that an operation does not access nodes that were
retired before the operation started. Namely, an existing range query prohibits reclamation of any
deleted node, and subsequent range queries do not access these nodes, so they can be deleted safely.
G. Sheffi, P. Ramalhete and E. Petrank 23:3
Therefore, EBR can be used without making any MVCC-specific adjustments, and indeed EBR is
used in many in-memory solutions [3, 44, 65]. However, EBR is not robust [59,61, 66]. I.e., a slow
executing thread may prevent the reclamation of an unbounded number of retired objects, which may
affect performance, and theoretically block all new allocations.
One lock-free memory reclamation scheme that can be adopted to provide lock-free support for
MVCC is the VBR optimistic memory reclamation scheme [59, 60]. VBR allows a program to access
reclaimed space, but it raises a warning when accessed data has been re-allocated. This allows the
program to retry the access with refreshed data. The main advantages of VBR are that it is lock-free,
it is fast, and it has very low memory footprint. Any retired object can be immediately reclaimed
safely. The main drawback
1
is that reclaimed memory cannot be returned to the operating system,
but must be kept for subsequent node allocations, as program threads may still access reclaimed
nodes. Similarly to other schemes, the correctness of VBR depends on the inability of operations to
visit previously retired objects. In data structures that do not use multi versions, the deleting thread
adequately retires a node after disconnecting it from the data structure. But in the MVCC setting,
disconnected nodes remain connected via the version list, foiling correctness of VBR.
In this paper we modify VBR to work correctly in the MVCC setting. It turns out that for the
specific case of old nodes in the version list, correctness can be obtained. VBR keeps a slow-ticking
epoch clock and it maintains a birth epoch field for each node. Interestingly, this birth epoch can
be used to tell whether a retired node in the version list has been re-allocated. As we show in this
paper, an invariant of nodes in the version list is that they have non-increasing birth-epoch numbers.
Moreover, if one of the nodes in the version list is re-allocated, then this node must foil the invariant.
Therefore, when a range query traverses a version list to locate the version to use for its traversal, it
can easily detect a node that has been re-allocated and whose data is irrelevant. When a range query
detects such re-allocation, it restarts. As shown in the evaluation, restarting happens infrequently with
VBR and the obtained performance is the best among existing schemes in the literature. Using the
modified VBR in the MVCC setting, a thread can delete a node and retire it after disconnecting it
from the data structure (and while it is still reachable via version list pointers).
Recently, two novel papers [44, 65] presented efficient MVCC-based key-value stores. The main
new idea is to track and keep a version list of modified fields only and not of entire nodes. For many
data-structures, all fields are immutable except for one or two pointer fields. For such data-structures,
it is enough to keep a version list of pointers only. The first paper proposed a lock-free mechanism,
based on versioned CAS objects (vCAS) [65], and the second proposed a blocking mechanism,
based on Bundle objects (Bundles) [44]. While copying one field instead of the entire node reduces
the space overhead, the resulting indirection is harmful for performance: to dereference a pointer
during a traversal, one must first move to the top node of the version list, and then use its pointer to
continue with the traversal. As we show in Section 4, this indirection suffers from high overheads in
read-intensive workloads. Bundles ameliorate this overhead by caching the most recent pointer in
the node to allow quick access for traversals that do not use older versions. However, range queries
still need to dereference twice as many references during a traversal. The vCAS approach presents
a more complicated optimization that completely eliminates indirection (which we further discuss
in Section 3). However, its applicability depends on assumptions on the original data-structure that
many data structures do not satisfy. Therefore, it is unclear how it can be integrated into general
state-of-the-art concurrent data-structures. In terms of memory reclamation, both schemes use the
EBR technique (that may block in theory, due to allocation heap exhaustion). This means that even
vCAS does not provide a lock-free range query mechanism. A subsequent paper on MVCC-specific
1
VBR also necessitates type-preservation. However, this does not constitute a problem in our setting, as all allocated
memory objects are of the same type. For more details, see Section 3.
CVIT 2016
23:4 EEMARQ: Efficient Lock-Free Range Queries with Memory Reclamation
garbage collectors [8], provides a robust memory reclamation method for collecting the version
records, but it does not deal with the actual data-structure nodes.
In this paper we present EEMARQ (End-to-End lock-free MAp with Range Queries), a design
for a lock-free in-memory map with MVCC and a robust lock-free memory management scheme.
EEMARQ provides linearizable and high-performance inserts, deletes, searches, and range queries.
The design starts by applying vCAS to a linked-list. The linked-list is simple, and so it allows
applying vCAS easily. However, the linked-list does not satisfy the optimization assumptions required
by [65], and so a (non-trivial) extension is required to fit the linked-list. In the unoptimized variant,
there is a version node for each of the updates (insert or delete) and this node is used to route the scans
in the adequate timestamp. But this is costly, because traversals end up accessing twice the number of
the original nodes. A natural extension is to associate the list nodes with the data that is originally
kept in the version nodes. We augment the linked-list construction to allow the optimization of [65]
for it. They use a clever mapping from the original version nodes to existing list nodes, that allows
moving the data from version nodes to list nodes, and then elide the version nodes. Now traversals
need no extra memory accesses to version nodes. This method is explained in Section 3.2.
Second, we extend VBR by adding support for reachable retired nodes on the version list. The
extended VBR allows keeping retired reachable version nodes in the data structure (which the original
VBR forbids) while maintaining high performance, lock-freedom, and robustness.
Finally, we deal with the inefficiency of a linked-list by adding a fast indexing to the linked-list
nodes. A fast index can be obtained from a binary search tree or a skip list. But the advantage we
get from separating the linked-list from the indexing mechanism is that we do not have to maintain
versions for the index (e.g., for the binary search tree or the skip list), but only for the underlying
linked-list. This separation between the design of the versioned linked-list and the non-versioned
index, simplifies each of the sub-designs, and also obtains high performance, because operations on
the index do not need to maintain versions. Previous work uses this separation idea between the lower
level of the skip list or leaves of the tree from the rest for various goals (e.g., [44, 69]).
The combination of all these three ideas, i.e., optimized versioned linked-list, extended VBR, and
independent indexing, yields a highly performant data structure design with range queries. Evaluation
shows that EEMARQ outperforms both vCAS and Bundles (while providing full lock-freedom). A
proof of correctness is provided in the appendix.
2 Related Work
Existing work on linearizable range queries in the shared-memory setting includes many solutions
which are not based on MVCC techniques. Some data-structure interfaces originally include a tailor-
made range query operation. E.g., there are trees [12, 13, 22, 67], hash tries [55], queues [45, 46, 54],
skip lists [5] and graphs [31] with a built-in range query mechanism. Other and more general
solutions execute range queries by explicitly taking a snapshot of the whole data-structure, followed
by collecting the set of keys in the given range [1, 4]. The Snapcollector [51] forces the cooperation
of all executing threads while a certain thread is scanning the data-structure. Despite being lock-free
and general, the Snapcollector’s memory and performance overheads are high. The Snapcollector
was enhanced to support range queries that do not force a snapshot of the whole data-structure [16].
However, this solution still suffers from major time and space overheads.
Another way to implement range queries is to use transactional memory [23, 32, 49, 50]. Trans-
actions can either be implemented in software or in hardware, allowing range queries to take effect
atomically. Although transactions may seem as ideal candidates for long queries, software transactions
incur high performance overheads and hardware transactions frequently abort when accessing big
memory chunks. Read-log-update (RLU) [39] borrows software transaction techniques and extends
G. Sheffi, P. Ramalhete and E. Petrank 23:5
read-copy-update (RCU) [40] to support multiple updates. RLU yields relatively simple implementa-
tions, but its integration involves re-designing the entire data-structure. In addition, similarly to RCU,
it suffers from high overheads in write-extensive workloads.
Arbel-Raviv and Brown exploited the EBR manual reclamation to support range queries [3]. Their
technique uses EBR’s global epoch clock for associating data-structure items with their insertion and
deletion timestamps. EEMARQ uses a similar technique for associating data-structure modifications
with respective timestamps. They also took advantage of EBR’s retire-lists, in order to locate nodes
that were deleted during a range query scan. However, while their solution indeed avoids extensive
scanning helping when deleting an item from the data-structure (as imposed by the Snapcollector [51]),
it may still impose significant performance overheads (as shown in Section 4). EEMARQ minimizes
these overheads by keeping a reachable path to deleted nodes (for more details, see Section 3.2).
Multi-Version Concurrency Control
MVCC easily provides isolation between concurrent
reads and updates. I.e., range queries can work on a consistent view of the data, while not interfering
with update operations. This powerful technique is widely used in commercial systems [19, 21], as
well as in research-oriented DBMS [33,53], in-memory shared environments [44,65] and transactional
memory [23, 32, 36, 49, 50]. MVCC has been investigated in the DBMS setting for the last four
decades, both from a theoretical [9, 10, 35, 48, 68] and a practical [19, 33, 37, 42, 53] point of view.
A lot of effort has been put in designing MVCC implementations and addressing the unwanted
side-effects of long version lists. This issue is crucial both for accelerating version list scans and
for reducing the contention between updates and garbage collection. Accelerating DBMS version
list scans (independently of the constant need to prone them) has been investigated in [35]. Most
DBMS-related work that focuses on this issue, tries to minimize the problem by eagerly collecting
unnecessary versions [2, 11, 21, 34, 38, 42, 53].
Safe Memory Reclamation
Most existing in-memory environments that enable linearizable
range queries, avoid managing their memory, by relying on automatic (and blocking) garbage
collection of old versions [7,22,51,67]. Solutions that do manually manage their allocated memory [3,
44, 65], use EBR for safe reclamation. In EBR, a shared epoch counter is incremented periodically,
and upon each operation invocation, the threads announce their observed epoch. The epoch clock
can be advanced when no executing thread has an announcement with a previous epoch. During
reclamation, only nodes that had been retired at least two epochs ago are reclaimed. EBR is safe for
MVCC because a running scan prevents the advance of the epoch clock, and also the reclamation of
any node in the data-structure that was not deleted before the scan.
The MVCC-oriented garbage collector from [8] incorporates reference counting (RC) [17, 18, 28],
in order to maintain lock-freedom while safely reclaiming old versions. Any object can be immediately
reclaimed once its reference count reaches zero, without the need in invoking explicit retire() calls.
While RC simplifies reclamation, it incurs high performance overheads and does not guarantee a tight
bound on unreclaimed garbage (i.e., it is not robust).
Other reclamation schemes were also considered when designing EEMARQ. NBR [61] was one
of the strongest candidates, as it is fast and lock-free (under some hardware assumptions). However,
it is not clear whether NBR can be integrated into a skip list implementation (which serves as one of
our fast indexes). Pointer–based reclamation methods (e.g., Hazard Pointers [41]) allow threads to
protect specific objects (i.e., temporarily prevent their reclamation), by announcing their future access
to these objects, or publishing an announcement indicating the protection of a bigger set of objects
(e.g., Hazard Eras [56], Interval-Based Reclamation [66], Margin Pointers [62]). Although these
schemes are robust (as opposed to EBR), it is unclear how they can be used in MVCC environments.
They require that reaching a reclaimed node from a protected one would be impossible (even if the
CVIT 2016
摘要:

EEMARQ:EfcientLock-FreeRangeQuerieswithMemoryReclamationGaliShef!DepartmentofComputerScience,Technion,Haifa,IsraelPedroRamalhete!CiscoSystems,SwitzerlandErezPetrank!DepartmentofComputerScience,Technion,Haifa,IsraelAbstractMulti-VersionConcurrencyControl(MVCC)isacommonmechanismforachievinglineariza...

展开>> 收起<<
EEMARQ Efficient Lock-Free Range Queries with Memory Reclamation.pdf

共46页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:46 页 大小:1.53MB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 46
客服
关注