Offset-value coding in database query processing

2025-05-06 0 0 705.15KB 9 页 10玖币
侵权投诉
Oset-value coding in database query processing
Goetz Graefe
Google Inc
Madison, Wisconsin, USA
GoetzG@Google.com
Thanh Do
Celonis Inc
Madison, Wisconsin, USA
T.Do@Celonis.com
ABSTRACT
Recent work [
9
] shows how oset-value coding speeds up data-
base query execution, not only sorting but also duplicate removal
and grouping (aggregation) in sorted streams, order-preserving ex-
change (shue), merge join, and more. It already saves thousands
of CPUs in Google’s Napa and F1 Query systems, e.g., in grouping
algorithms and in log-structured merge-forests.
In order to realize the full benet of interesting orderings, however,
query execution algorithms must not only consume and exploit
oset-value codes but also produce oset-value codes for the next
operator in the pipeline. Our research has sought ways to pro-
duce oset-value codes without comparing successive output rows
one-by-one, column-by-column. This short paper introduces a new
theorem and, based on its proof and a simple corollary, describes in
detail how order-preserving algorithms (from lter to merge join
and even shue) can compute oset-value codes for their outputs.
These computations are surprisingly simple and very ecient.
This paper is the extended version of an EDBT 2023 publication [
21
].
1 INTRODUCTION
Tree-of-losers priority queues [
13
,
25
] and oset-value coding [
7
,
23
] enable the most ecient sort algorithmsfor large records. When
sorting a database table with
𝑁
rows, the provable lower bound
for row comparisons is
𝑙𝑜𝑔2(𝑁
!
) 𝑁𝑙𝑜𝑔2(𝑁/𝑒)
with
𝑒
2
.
718.
External merge sort using tree-of-losers priority queues for both
run generation and merging exceeds this lower bound by only 1-2%.
Oset-value coding truncates shared row prexes and turns prex
sizes into order-preserving surrogate keys. These surrogate keys
limit the count of column value comparisons for
𝑁
rows with
𝐾
key columns to
𝑁×𝐾
[
9
]. Importantly, there is no
𝑙𝑜𝑔(𝑁)
factor.
Oset-value codes alone decide the remainder of the
𝑙𝑜𝑔2(𝑁
!
)
row
comparisons.
Recent work shows how oset-value coding complements “inter-
esting orderings” [
32
] to speed up database query processing, not
only sorting but also duplicate removal and grouping (aggregation)
in sorted streams, order-preserving exchange (shue), merge join,
and more [
9
,
10
]. For the full benet of interesting orderings, all
execution algorithms relying on sort order must also exploit oset-
value coding in their comparisons of rows and columns. Moreover,
order-preserving query execution algorithms must not only con-
sume but also produce oset-value codes, to be consumed and
exploited by the next operator in the pipeline.
Nevertheless, computing oset-value codes in order-preserving
query execution algorithms has not received any attention, and the
Work done at Google Inc.
only method known to-date – comparing an operator’s output row-
by-row, column-by-column – is so expensive that it renders produc-
ing oset-value across database operators much less attractive or
even worthless. In contrast, new, simple techniques eciently com-
pute oset-value codes for order-preserving algorithms in database
query execution. The new techniques do not require comparisons
row-by-row, column-by-column; instead, oset-value codes in the
output depend only on oset-value codes in the inputs. There is
no need for additional column value comparisons beyond those
required in the operation itself, e.g., column value comparisons in
the merge logic of merge join. The new techniques are implemented
in Google’s Napa [1] and F1 Query [31, 33] systems.
The rest of the paper is structured as follows. We summarize
related prior work in Section 2 and explain oset-value coding
and tree-of-losers priority queues in Section 3. In Section 4, we
rst establish a solid foundation for ecient computation of oset-
value codes by introducing a new theorem, its proof, and a pertinent
corollary. Based on this theory, we then present how oset-value
codes can be eciently derived for lter, project, segmented sorting,
duplicate removal, grouping, merge join (inner, outer, and semi
joins), nested-loops join, order-preserving exchange, and more. We
present performance results in Section 6 and nally conclude the
paper in Section 7.
2 RELATED PRIOR WORK
The context of our work are Google’s Napa [
1
] and F1 Query [
31
,
33
]
systems. Napa is a data warehouse that maintains thousands of
materialized views in log-structured merge-forests [
29
]. F1 Query
is a federated query processing platform that executes SQL queries
over tables in various Google storage systems such as Spanner [
8
],
BigTable [
6
], and Napa. Both Napa and F1 Query employ sort order
for ecient data access and data manipulation.
Pioneering work on sorting established the benets of tree-of-
losers priority queues and of oset-value coding, both for run gen-
eration and external merge steps [
7
,
13
,
23
,
25
]. Pioneering work
in the database eld established the value of sort-based query exe-
cution, notably merge join but also duplicate removal and group-
ing [
4
,
11
,
32
]. Surveys on database query evaluation [
15
,
17
] cover
sorting and oset-value coding but fail to recognize oset-value
coding as a signicant opportunity throughout sort-based query
execution. Recent work [
9
,
10
] lls that gap but fails to oer e-
cient derivation of oset-value codes for output rows. The present
short paper lls this remaining gap.
3 BACKGROUND: OFFSET-VALUE CODING
AND TREE-OF-LOSERS PRIORITY QUEUES
A tree-of-losers priority queue [
13
,
25
], also known as a tournament
tree, embeds a balanced binary tree in an array, with the tree’s unary
arXiv:2210.00034v3 [cs.DB] 17 Feb 2023
Goetz Graefe and Thanh Do
Table 1: Oset-value codes in a sorted le or stream.
Rows and their Descending OVC Ascending OVC
column values oset 𝑑𝑜𝑚𝑎𝑖𝑛 𝑣𝑎𝑙𝑢𝑒 OVC arity oset 𝑣𝑎𝑙𝑢𝑒 OVC
57 3 9 0 95 95 4 5 405
5 7 3 12 3 88 388 1 12 112
584 6 1 92 192 3 8 308
592 7 1 91 191 3 9 309
5 9 2 7 4 - 400 0 - 0
5 9 34 2 97 297 2 3 203
5 9 3 73 93 393 1 7 107
Figure 1: Strings in a tree-of-losers priority queue.
root in array slot 0. It is ecient due to leaf-to-root passes with one
comparison per tree level; root-to-leaf passes with two comparisons
per tree level are not required. As in a sports tournament where
each round of competition eliminates one contestant, the principal
rules are that (i) two candidate key values compete at each binary
node in the tree and (ii) after a comparison of two candidates, the
loser remains in the node and the winner becomes a candidate in
the next tree level. Thus, in a priority queue with
𝑁
entries, a new
overall winner reaches the root after 𝑙𝑜𝑔2(𝑁)comparisons.
When merging runs, a xed pair of runs competes at each leaf
node. Run generation merges “sorted” runs of a single row each.
Run generation by replacement selection can try to extract longer
sorted runs from the unsorted input: one additional comparison
per input row doubles the expected run size, halves the run count,
and saves one comparison per row during merging.
Figure 1, adapted from [
25
], shows a tree-of-losers priority queue
merging
𝐹=
12 sorted runs. The left half with runs 0-7 is cropped
from Figure 1. The dashed boxes represent the current candidates
from runs to be merged; the solid boxes are tree nodes in a tree-
of-losers priority queue. The example node in the top-right corner
explains the numbers in each node.
The overall smallest key value, “061”, is in the tree’s root node.
Interestingly, key value “154” is above key value “087”. After “154”
emerged as the winner from the left subtree and “061” from the right
subtree, “154” was the loser in the “nal match” of this “tournament”
and was left behind. The runner-up of the right subtree, “087”, had
to remain within that subtree.
The next step in the merge logic moves key value “061” from the
tree root to the merge output. The following step gets a new key
value from input 9, the origin of key value “061”. This successor key
starts a new leaf-to-root pass at the the same leaf. This leaf-to-root
pass bubbles the next lowest key value to the root in
𝑙𝑜𝑔2(𝐹)
steps:
If that next key value is “092”, it wins over “503” but loses to “087”
and is left behind; then “087” wins over “154” and reaches the root.
With only leaf-to-root passes, run generation and merging with
tree-of-losers priority queues guarantee near-optimal comparison
counts. With excellent expected and worst-case run-time complex-
ity yet very limited implementation complexity, some hardware
supports tree-of-losers priority queues, e.g., the UPT “update tree”
instruction of IBM’s 370- and z-series mainframes [23, 30].
Oset-value coding [
7
] captures one row’s key value relative to
another key that is earlier in the sort sequence. Oset-value codes
are set after comparisons. A loser’s new oset is the position where
the keys rst dier, e.g., a column index; the value is the loser’s
data value at that oset. Equivalently, the oset is the size of the
shared prex. For example, a duplicate key shares the entire key
and hence has an oset equal to the key size.
Table 1 shows the derivation of descending and ascending oset-
value codes in a stream of rows in ascending order on all columns.
Each key is encoded relative to its predecessor. With four sort
columns, the arity of the sort key is 4; the domain of each column is
1. . . 99. Descending oset-value codes take the actual oset but the
negative of the column value. Ascending oset-value codes take
the negative oset but the actual column value. Table 1 ignores that
small key domains permit encoding multiple key columns together.
IBM’s CFC “compare and form codeword” instruction supports
oset-value coding for descending normalized keys using blocks of
bytes as values and counts of blocks as osets [23, 30].
If two rows and their key values
𝐴
and
𝐵
are encoded relative
to the same key
𝐶
, and if the osets of
𝐴
and
𝐵
dier, then the one
with the higher oset is earlier in the sort sequence. Otherwise,
if the two data values at that oset dier, then these data values
decide the comparison. Otherwise, additional data values in
𝐴
and
𝐵
must be compared. With osets and values combined as shown
in Table 1, often a single integer comparison can sort two keys
𝐴
and 𝐵encoded relative to the same key 𝐶.
If
𝐴
sorts lower (earlier) than
𝐵
, then
𝐵
is the loser in this com-
parison and can be encoded relative to the winner
𝐴
. If oset-value
codes decided the comparison, the oset-value code of
𝐵
relative
to
𝐶
is also the oset-value code of
𝐵
relative to
𝐴
. Otherwise, the
oset for
𝐵
increases by the count of column comparisons required
to determine winner and loser, and the value is taken at that new
oset. Similar rules apply if 𝐴is the loser in the comparison.
摘要:

Offset-valuecodingindatabasequeryprocessingGoetzGraefeGoogleIncMadison,Wisconsin,USAGoetzG@Google.comThanhDo∗CelonisIncMadison,Wisconsin,USAT.Do@Celonis.comABSTRACTRecentwork[9]showshowoffset-valuecodingspeedsupdata-basequeryexecution,notonlysortingbutalsoduplicateremovalandgrouping(aggregation)inso...

展开>> 收起<<
Offset-value coding in database query processing.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:705.15KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注