Offset-value coding in database query processing

2025-05-06 0 0 705.15KB 9 页 10玖币

侵权投诉

Oset-value coding in database query processing

Goetz Graefe

Google Inc

Madison, Wisconsin, USA

GoetzG@Google.com

Thanh Do∗

Celonis Inc

Madison, Wisconsin, USA

T.Do@Celonis.com

ABSTRACT

Recent work [

] shows how oset-value coding speeds up data-

base query execution, not only sorting but also duplicate removal

and grouping (aggregation) in sorted streams, order-preserving ex-

change (shue), merge join, and more. It already saves thousands

of CPUs in Google’s Napa and F1 Query systems, e.g., in grouping

algorithms and in log-structured merge-forests.

In order to realize the full benet of interesting orderings, however,

query execution algorithms must not only consume and exploit

oset-value codes but also produce oset-value codes for the next

operator in the pipeline. Our research has sought ways to pro-

duce oset-value codes without comparing successive output rows

one-by-one, column-by-column. This short paper introduces a new

theorem and, based on its proof and a simple corollary, describes in

detail how order-preserving algorithms (from lter to merge join

and even shue) can compute oset-value codes for their outputs.

These computations are surprisingly simple and very ecient.

This paper is the extended version of an EDBT 2023 publication [

1 INTRODUCTION

Tree-of-losers priority queues [

] and oset-value coding [

] enable the most ecient sort algorithmsfor large records. When

sorting a database table with

𝑁

rows, the provable lower bound

for row comparisons is

𝑙𝑜𝑔2(𝑁

) ≈ 𝑁𝑙𝑜𝑔2(𝑁/𝑒)

with

𝑒≈

718.

External merge sort using tree-of-losers priority queues for both

run generation and merging exceeds this lower bound by only 1-2%.

Oset-value coding truncates shared row prexes and turns prex

sizes into order-preserving surrogate keys. These surrogate keys

limit the count of column value comparisons for

𝑁

rows with

𝐾

key columns to

𝑁×𝐾

[

]. Importantly, there is no

𝑙𝑜𝑔(𝑁)

factor.

Oset-value codes alone decide the remainder of the

𝑙𝑜𝑔2(𝑁

)

row

comparisons.

Recent work shows how oset-value coding complements “inter-

esting orderings” [

] to speed up database query processing, not

only sorting but also duplicate removal and grouping (aggregation)

in sorted streams, order-preserving exchange (shue), merge join,

and more [

]. For the full benet of interesting orderings, all

execution algorithms relying on sort order must also exploit oset-

value coding in their comparisons of rows and columns. Moreover,

order-preserving query execution algorithms must not only con-

sume but also produce oset-value codes, to be consumed and

exploited by the next operator in the pipeline.

Nevertheless, computing oset-value codes in order-preserving

query execution algorithms has not received any attention, and the

∗Work done at Google Inc.

only method known to-date – comparing an operator’s output row-

by-row, column-by-column – is so expensive that it renders produc-

ing oset-value across database operators much less attractive or

even worthless. In contrast, new, simple techniques eciently com-

pute oset-value codes for order-preserving algorithms in database

query execution. The new techniques do not require comparisons

row-by-row, column-by-column; instead, oset-value codes in the

output depend only on oset-value codes in the inputs. There is

no need for additional column value comparisons beyond those

required in the operation itself, e.g., column value comparisons in

the merge logic of merge join. The new techniques are implemented

in Google’s Napa [1] and F1 Query [31, 33] systems.

The rest of the paper is structured as follows. We summarize

related prior work in Section 2 and explain oset-value coding

and tree-of-losers priority queues in Section 3. In Section 4, we

rst establish a solid foundation for ecient computation of oset-

value codes by introducing a new theorem, its proof, and a pertinent

corollary. Based on this theory, we then present how oset-value

codes can be eciently derived for lter, project, segmented sorting,

duplicate removal, grouping, merge join (inner, outer, and semi

joins), nested-loops join, order-preserving exchange, and more. We

present performance results in Section 6 and nally conclude the

paper in Section 7.

2 RELATED PRIOR WORK

The context of our work are Google’s Napa [

] and F1 Query [

]

systems. Napa is a data warehouse that maintains thousands of

materialized views in log-structured merge-forests [

]. F1 Query

is a federated query processing platform that executes SQL queries

over tables in various Google storage systems such as Spanner [

BigTable [

], and Napa. Both Napa and F1 Query employ sort order

for ecient data access and data manipulation.

Pioneering work on sorting established the benets of tree-of-

losers priority queues and of oset-value coding, both for run gen-

eration and external merge steps [

]. Pioneering work

in the database eld established the value of sort-based query exe-

cution, notably merge join but also duplicate removal and group-

ing [

]. Surveys on database query evaluation [

] cover

sorting and oset-value coding but fail to recognize oset-value

coding as a signicant opportunity throughout sort-based query

execution. Recent work [

] lls that gap but fails to oer e-

cient derivation of oset-value codes for output rows. The present

short paper lls this remaining gap.

3 BACKGROUND: OFFSET-VALUE CODING

AND TREE-OF-LOSERS PRIORITY QUEUES

A tree-of-losers priority queue [

], also known as a tournament

tree, embeds a balanced binary tree in an array, with the tree’s unary

arXiv:2210.00034v3 [cs.DB] 17 Feb 2023

Goetz Graefe and Thanh Do

Table 1: Oset-value codes in a sorted le or stream.

Rows and their Descending OVC Ascending OVC

column values oset 𝑑𝑜𝑚𝑎𝑖𝑛 −𝑣𝑎𝑙𝑢𝑒 OVC arity −oset 𝑣𝑎𝑙𝑢𝑒 OVC

57 3 9 0 95 95 4 5 405

5 7 3 12 3 88 388 1 12 112

584 6 1 92 192 3 8 308

592 7 1 91 191 3 9 309

5 9 2 7 4 - 400 0 - 0

5 9 34 2 97 297 2 3 203

5 9 3 73 93 393 1 7 107

Figure 1: Strings in a tree-of-losers priority queue.

root in array slot 0. It is ecient due to leaf-to-root passes with one

comparison per tree level; root-to-leaf passes with two comparisons

per tree level are not required. As in a sports tournament where

each round of competition eliminates one contestant, the principal

rules are that (i) two candidate key values compete at each binary

node in the tree and (ii) after a comparison of two candidates, the

loser remains in the node and the winner becomes a candidate in

the next tree level. Thus, in a priority queue with

𝑁

entries, a new

overall winner reaches the root after 𝑙𝑜𝑔2(𝑁)comparisons.

When merging runs, a xed pair of runs competes at each leaf

node. Run generation merges “sorted” runs of a single row each.

Run generation by replacement selection can try to extract longer

sorted runs from the unsorted input: one additional comparison

per input row doubles the expected run size, halves the run count,

and saves one comparison per row during merging.

Figure 1, adapted from [

], shows a tree-of-losers priority queue

merging

𝐹=

12 sorted runs. The left half with runs 0-7 is cropped

from Figure 1. The dashed boxes represent the current candidates

from runs to be merged; the solid boxes are tree nodes in a tree-

of-losers priority queue. The example node in the top-right corner

explains the numbers in each node.

The overall smallest key value, “061”, is in the tree’s root node.

Interestingly, key value “154” is above key value “087”. After “154”

emerged as the winner from the left subtree and “061” from the right

subtree, “154” was the loser in the “nal match” of this “tournament”

and was left behind. The runner-up of the right subtree, “087”, had

to remain within that subtree.

The next step in the merge logic moves key value “061” from the

tree root to the merge output. The following step gets a new key

value from input 9, the origin of key value “061”. This successor key

starts a new leaf-to-root pass at the the same leaf. This leaf-to-root

pass bubbles the next lowest key value to the root in

𝑙𝑜𝑔2(𝐹)

steps:

If that next key value is “092”, it wins over “503” but loses to “087”

and is left behind; then “087” wins over “154” and reaches the root.

With only leaf-to-root passes, run generation and merging with

tree-of-losers priority queues guarantee near-optimal comparison

counts. With excellent expected and worst-case run-time complex-

ity yet very limited implementation complexity, some hardware

supports tree-of-losers priority queues, e.g., the UPT “update tree”

instruction of IBM’s 370- and z-series mainframes [23, 30].

Oset-value coding [

] captures one row’s key value relative to

another key that is earlier in the sort sequence. Oset-value codes

are set after comparisons. A loser’s new oset is the position where

the keys rst dier, e.g., a column index; the value is the loser’s

data value at that oset. Equivalently, the oset is the size of the

shared prex. For example, a duplicate key shares the entire key

and hence has an oset equal to the key size.

Table 1 shows the derivation of descending and ascending oset-

value codes in a stream of rows in ascending order on all columns.

Each key is encoded relative to its predecessor. With four sort

columns, the arity of the sort key is 4; the domain of each column is

1. . . 99. Descending oset-value codes take the actual oset but the

negative of the column value. Ascending oset-value codes take

the negative oset but the actual column value. Table 1 ignores that

small key domains permit encoding multiple key columns together.

IBM’s CFC “compare and form codeword” instruction supports

oset-value coding for descending normalized keys using blocks of

bytes as values and counts of blocks as osets [23, 30].

If two rows and their key values

𝐴

and

𝐵

are encoded relative

to the same key

𝐶

, and if the osets of

𝐴

and

𝐵

dier, then the one

with the higher oset is earlier in the sort sequence. Otherwise,

if the two data values at that oset dier, then these data values

decide the comparison. Otherwise, additional data values in

𝐴

and

𝐵

must be compared. With osets and values combined as shown

in Table 1, often a single integer comparison can sort two keys

𝐴

and 𝐵encoded relative to the same key 𝐶.

𝐴

sorts lower (earlier) than

𝐵

, then

𝐵

is the loser in this com-

parison and can be encoded relative to the winner

𝐴

. If oset-value

codes decided the comparison, the oset-value code of

𝐵

relative

𝐶

is also the oset-value code of

𝐵

relative to

𝐴

. Otherwise, the

oset for

𝐵

increases by the count of column comparisons required

to determine winner and loser, and the value is taken at that new

oset. Similar rules apply if 𝐴is the loser in the comparison.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Offset-valuecodingindatabasequeryprocessingGoetzGraefeGoogleIncMadison,Wisconsin,USAGoetzG@Google.comThanhDo∗CelonisIncMadison,Wisconsin,USAT.Do@Celonis.comABSTRACTRecentwork[9]showshowoffset-valuecodingspeedsupdata-basequeryexecution,notonlysortingbutalsoduplicateremovalandgrouping(aggregation)inso...

展开>> 收起<<

Offset-value coding in database query processing.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Offset-value coding in database query processing

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: