Efficient Selection of Informative Alternative Relational Query Plans for Database Education Hu Wang Hui Li Sourav S. Bhowmick Zihao Ma Jiangtao Cui

2025-05-03 0 0 8.07MB 17 页 10玖币

侵权投诉

Efﬁcient Selection of Informative Alternative

Relational Query Plans for Database Education

Hu Wang#, Hui Li#, Sourav S. Bhowmick†, Zihao Ma#, Jiangtao Cui#

#School of Computer Science and Technology, Xidian University, China

huwang@stu.xidian.edu.cn,hli@xidian.edu.cn,zhma@stu.xidian.edu.cn

†School of Computer Science and Engineering, Nanyang Technological University, Singapore

assourav@ntu.edu.sg

Abstract—Given an SQL query, a relational query engine

selects a query execution plan (QEP) for processing it. Off-the-

shelf RDBMS typically only expose the selected QEP to users

without revealing information about representative alternative

query plans (AQP) considered during QEP selection in a user-

friendly manner. Easy access to representative AQPs is useful

in several applications such as database education where it

can facilitate learners to comprehend the QEP selection process

during query optimization. In this paper, we present a novel

end-to-end and generic framework called ARENA that facilitates

exploration of informative alternative query plans of a given

SQL query to aid the comprehension of QEP selection. Under

the hood, ARENA addresses a novel problem called informative

plan selection problem (TIPS) which aims to discover a set of k

alternative plans from the underlying plan space so that the plan

interestingness of the set is maximized. Speciﬁcally, we explore

two variants of the problem, namely batch TIPS and incremental

TIPS, to cater to diverse set of users. Due to the computational

hardness of the problem, we present a 2-approximation algorithm

to address it efﬁciently. Exhaustive experimental study demon-

strates the superiority of ARENA to several baseline approaches.

As a representative use case of ARENA, we demonstrate its

effectiveness in supplementing database education through a user

study and academic assessment involving real-world learners.

Index Terms—Query processing and optimization

I. INTRODUCTION

A relational query engine selects a query execution plan

(QEP), which represents an execution strategy of an SQL query,

from many alternative query plans (AQPs) by comparing

their estimated costs. Major off-the-shelf relational database

systems (RDBMS) typically only expose the QEP to an end user.

They do not reveal a representative set of AQPs considered

by the underlying query optimizer during the selection of a

QEP in a user-friendly manner. Although an RDBMS (e.g.,

PostgreSQL) may allow one to manually pose SQL queries

with various constraints on conﬁguration parameters to view

the corresponding QEPs containing speciﬁc physical operators

(e.g., SET enable hashjoin = true), such strategy demands not

only familiarity of the syntax and semantics of conﬁguration

parameters, but also a clear idea of the AQPs that one is

interested. This is often impractical to assume (e.g., in educa-

tion environment where many learners are taking the database

systems course for the ﬁrst time). Consider the following

motivating scenario.

Example 1.1: Lena is currently enrolled in an undergraduate

database systems course. She wishes to explore some of the

representative AQPs for the following query from the TPC-H

benchmark after viewing its QEP as depicted in Fig. 1(a).

SELECT s_acctbal, s_name, n_name, s_address,

s_phone

FROM supplier, nation, region

WHERE s_acctbal>=5000 and s_nationkey =

n_nationkey and n_regionkey = r_regionkey;

Speciﬁcally, Lena is curious to know whether there are AQPs

that have similar (resp. different) structure and physical oper-

ators but very different (resp. similar) estimated cost? If there

are, then how they look like? Examples of such AQPs are

shown in Fig. 1(b)-(d).

Clearly, a user-friendly tool that can facilitate retrieval and

exploration of “informative” AQPs associated with a given

query can greatly aid in answering Lena’s questions. Unfortu-

nately, very scant attention has been given to build such tools

by the academic and industrial communities. Recent efforts

for technological support to augment learning of relational

query processing primarily focused on the natural language

descriptions of QEPs [28; 43] and visualization of the plan

space [16]. In this paper, we present an end-to-end frame-

work called ARENA (AlteRnative quEry plaN ExplorAtion)

to address this challenge. Our approach not only has positive

feedback from learners (Section VIII-A) but, more importantly,

eventually improves academic outcomes (Section VIII-B).

Selecting a set of representative AQPs is a technically chal-

lenging problem. Firstly, the plan space can be prohibitively

large [6]. Importantly, not all AQPs potentially support under-

standing of the QEP selection process. Hence, it is ineffective

to expose all plans. Instead, it is paramount to select those that

are informative. However, how do we quantify informativeness

in order to select plans? For instance, in Fig. 1(b)-(d), which

plans should be revealed to Lena? Certainly, any informative-

ness measure needs to consider the plans (including the QEP)

that an individual has already viewed for her query so that

highly similar plans are not exposed to them. It should also

be cognizant of individuals’ interests in this context. Secondly,

it can be prohibitively expensive to scan all AQPs to select

informative ones. How can we design efﬁcient techniques to

select informative plans? Importantly, this cannot be integrated

arXiv:2210.13722v4 [cs.DB] 16 Nov 2023

HashJoin

Cost: 657

TableScan

Cost: 220

supplier

HashJoin

Cost: 433

TableScan

Cost: 216

nation region

TableScan

Cost: 216

QEP

HashJoin

Cost: 1426

HashJoin

Cost: 1200

TableScan

Cost: 216

region nation

TableScan

Cost: 216

AP1

HashJoin

Cost: 663

HashJoin

Cost: 433

TableScan

Cost: 216

nation region

TableScan

Cost: 216

AP2

HashJoin

Cost: 1600

HashJoin

Cost: 1200

TableScan

Cost: 216

region nation

TableScan

Cost: 216

AP3

(a) QEP (c) Alternative plan 2 (AP2)(b) Alternative plan 1 (AP1) (d) Alternative plan 3 (AP3)

Filter

Cost: 222

TableScan

Cost: 220

supplier

Filter

Cost: 222

TableScan

Cost: 220

supplier

Filter

Cost: 222

TableScan

Cost: 220

supplier

Filter

Cost: 222

Fig. 1: Example of QEP and alternative query plans (AQPs).

into the enumeration step of the query optimizer as informative

AQPs need to be selected based on the QEP an individual has

seen.

Thirdly, at ﬁrst glance, it may seem that we can select

k > 1alternative query plans where kis a value speciﬁed by

a user. Although this is a realistic assumption for many top-

kproblems, our engagement with users reveal that they may

not necessarily be conﬁdent to specify the value of kalways

(detailed in Section VIII-A). One may prefer to iteratively

view one plan-at-a-time and only cease exploration once they

are satisﬁed with the understanding of the alternative plan

choices made by the optimizer for a speciﬁc query. Hence,

kmay not only be unknown apriori but also the selection

of an AQP at each iteration to enhance their understanding of

different plan choices depends on the plans viewed by them

thus far. This demands for a ﬂexible solution framework that

can select alternative query plans in the absence or presence

of kvalue. Lastly, any proposed solution should be generic

so that it can be easily realized on top of majority of existing

off-the-shelf RDBMS.

In this paper, we formalize the aforementioned challenges

into a novel problem called informative plan selection (TIPS)

problem and present ARENA to address it. Speciﬁcally, given

an SQL query and k≥1(when kis unspeciﬁed it is set to

1), it retrieves kalternative query plans (AQPs) that maximize

plan informativeness. Such query plans are referred to as

informative query plans. To this end, we introduce two variants

of TIPS, namely batch TIPS (B-TIPS) and incremental TIPS (I-

TIPS), to cater for scenarios where kis speciﬁed or unspeciﬁed

by a user, respectively. We introduce a novel concept called

plan informativeness to capture the aforementioned challenges

associated with the selection of informative AQPs. Speciﬁcally,

it embodies the preferences of real-world users (based on their

feedback) by mapping them to distance (computed by ex-

ploiting differences between plans w.r.t. structure, content and

estimated cost) and relevance of alternative plans with respect

to the QEP or the set of alternative plans viewed by a user

thus far. Next, given the computational hardness of the TIPS

problem, we present a 2-approximation algorithm to address

both the variants of the problem efﬁciently. Speciﬁcally, it has

a linear time complexity w.r.t. the number of candidate AQPs.

Given that the number of such candidates can be very large

for some queries. we propose two pruning strategies to prune

them efﬁciently. We adopt ORCA [39] to retrieve the candidate

plan space for a given SQL query in ARENA as it enables us

to build a generic framework that can easily be realized on

majority of existing RDBMS.

Lastly, we demonstrate the usefulness of ARENA by de-

ploying it in the database systems course at Xidian University.

Speciﬁcally, we analyze the feedback from real-world learners

and their academic performance w.r.t. the understanding of

the QEP selection process. Our analysis reveals the efﬁciency

and effectiveness of our solutions to help learners understand

alternative query plans in the context of QEP selection.

In summary, this paper makes the following key contribu-

tions.

•We adopt a learner-centered approach to characterize the

notion of informative query plans for database education

by engaging real-world learners (Section II), and present

the notion of plan informativeness (Section IV) to quan-

titatively model the informativeness of AQPs.

•We present a novel informative plan selection (TIPS)

problem to obtain a collection of alternative plans for

a given SQL query by maximizing plan informativeness

(Section III).

•We present approximate algorithms with quality guaran-

tee that exploit plan informativeness and two pruning

strategies to address the TIPS problem efﬁciently (Sec-

tion VI). Our algorithms underpin the proposed end-to-

end ARENA framework to enable users to retrieve and ex-

plore informative plans for their SQL queries (Section V).

•We undertake an exhaustive performance study to demon-

strate the superiority of ARENA to representative baselines

as well as justify the design choices (Section VII).

•We conduct a user study and academic assessment in-

volving real-world learners taking the database systems

course to demonstrate the usefulness and effectiveness

of ARENA for pedagogical support to supplement their

understanding of the QEP selection process (Section VIII).

II. INFORMATIVE ALTERNATIVE PLANS

A key question that any alternative query plan (alternative

plan for brevity) selection technique needs to address is as

TABLE I: The 8 alternative plan categories.

Plan category Structure Content Cost

APIsmall small small

APII small small large

APII I small large small

APIV large small small

APVsmall large large

APV I large small large

APV II large large small

APV II I large large large

40%

20%

16%

13%

11% G1

other

(a) Informative plans

45%

23%

20%

87B1

irrelevant

other

(b) Uninformative plans

Fig. 2: Survey results.

follows: What types of alternative plans are “interesting” or

“informative”? Reconsider Example 1.1. Fig. 1(b)-(d) depict

three alternative plans for the query where the physical oper-

ator/join order differences are highlighted with red rectangles

and signiﬁcant cost differences are shown using orange nodes.

Speciﬁcally, all three AQPs have different join orders w.r.t. the

QEP. However, AP1 has very similar structure as the QEP but

signiﬁcantly higher estimated cost. AP2, on the other hand,

has a very similar estimated cost as the QEP.AP3 consumes

similar high cost as AP1. Suppose k= 2. Then, among {AP1,

AP2},{AP1,AP3}, and {AP2,AP3}, which one is the best

choice? Intuitively, it is reasonable to choose the set that is

most “informative” to a learner.

For example, by viewing AP1 or AP3, Lena learned that the

HASH JOIN operator is sensitive to the order of the joined

tables. This reinforces the knowledge she has acquired from

her classroom lectures. On the other hand, AP2 reveals that

changing the join order may not always lead to signiﬁcantly

different estimated costs. Lena was not aware of this and

therefore found it informative. Hence, {AP1,AP2}or {AP2,

AP3}are potentially the most informative set for Lena.

Feedback from volunteers. The above example illustrates that

the content, structure, and cost of a query plan w.r.t. the QEP

and already-viewed plans play pivotal roles in determining

informativeness of AQPs. To further validate this, we survey 68

unpaid volunteers who have taken a database systems course

recently or currently taking it in an academic institution.

Speciﬁcally, they are either current undergraduate students in a

computer science degree program or junior database engineers

who have recently joined industry. Note that we focus on

this group of volunteers as a key application of ARENA is

in database education.

We choose AQPs of two SQL queries on the IMDb dataset

for the volunteers to provide feedback. These queries contain

3-4 joins and involve different table scan methods. Note that

randomly selecting a large number of alternative plans for

feedback is ineffective as they not only may insufﬁciently

cover diverse types of plans w.r.t. the QEPs of the queries

but also our engagement with the volunteers reveal that it

may deter them to give feedback as they have to browse a

large number of plans. Hence, for each query, we display the

QEP and 8 alternative plans, which are heuristically selected

by evaluating their differences w.r.t. the QEP based on three

dimensions, i.e., structure, content, and estimated cost (Ta-

ble I). For instance, we may choose an alternative plan with

small difference in structure or content from the QEP but large

differences in the other two dimensions. Observe that it covers

majority of the AQPs of a given query. Note that the heuristic

model may occasionally select some similar alternative plans.

We deliberately expose these plans to the volunteers to garner

feedback on similar alternative plans.

We ask the volunteers to choose the most and least infor-

mative plans given a QEP and provide justiﬁcations for their

choices. The aforementioned plan types were not revealed to

the volunteers. We received 132 feedbacks from the volunteers.

We map the feedbacks on informative and uninformative

plans to the plan types in Table I and count the number of

occurrences. Fig. 2(a) reports the feedback on plans considered

as informative. Speciﬁcally, G1represents APII ,G2and G3

represent plans with lower cost differences. Particularly, G2

are plans with large structure and content differences (APV II )

whereas G3are those that have small structural or content

differences (APIII ,APIV ). G4represents APV I and APV III .

There are other feedbacks that are too sparse to be grouped

as representative of the volunteers.

Fig. 2(b) reports the feedback on uninformative plans se-

lected by the volunteers. B1represents plans that are very

similar to the QEP (API) and hence voted as uninformative

as they do not convey any useful information. B2represents

APV I and APV III .B3are plans that are very similar to

other alternative plans viewed by the volunteers. In addition,

there are eight other feedbacks that are orthogonal to our

problem (i.e., feedback on the visual interface design) and

hence categorized as “irrelevant”.

Summary. Our engagement with volunteers reveal that major-

ity ﬁnd alternative plans of types APII −APIV and APV II

informative. However, API,APV,APV I , and APV III are

not. Observe that the number of feedbacks for the latter two

that deem them uninformative is signiﬁcantly more than those

that consider them informative (G4vs B2). Hence, we consider

them uninformative. Volunteers also ﬁnd similar alternative

plans uninformative. In the sequel, we shall use these insights

to guide the design of plan informativeness and pruning

strategies.

III. INFORMATIVE PLAN SELECTION (TIPS) PROBLEM

In this section, we formally deﬁne the TIPS problem. We

begin with a brief background on relational query plans.

A. Relational Query Plans

Given an arbitrary SQL query, an RDBMS generates a query

execution plan (QEP) to execute it. A QEP consists of a

collection of physical operators organized in form of a tree,

namely the physical operator tree (operator tree for brevity).

Fig. 1(a) depicts an example QEP. Each physical operator,

e.g., SEQUENTIAL SCAN,INDEX SCAN, takes as input one

or more data streams and produces an output one. A QEP

explicitly describes how the underlying RDBMS shall execute

the given query. Notably, given a SQL, there are many different

query plans, other than the QEP, for executing it. For instance,

there are several physical operators in a RDBMS that implement

a join, e.g., HASH JOIN,SORT MERGE,NESTED LOOP.

We refer to each of these different plans (other than the QEP)

as an alternative query plan (AQP). Fig. 1(b)-(d) depict three

examples of AQP. It is well-known that the plan space is non-

polynomial [6].

B. Problem Deﬁnition

The informative plan selection (TIPS) problem aims to

automatically select a small number of alternative plans that

maximize plan informativeness. These plans are referred to

as informative alternative plans (informative plans/ AQPsfor

brevity). The TIPS problem has two ﬂavors:

•Batch TIPS. A user may wish to view top-kinformative

plans beside the QEP;

•Incremental TIPS. A user may iteratively view an infor-

mative plan besides what have been already shown to

him/her;

Formally, we deﬁne these two variants of TIPS as follows.

Deﬁnition 3.1: Given a QEP π∗and a budget k, the batched

informative plan selection (B-TIPS) problem aims to ﬁnd top-

kalternative plans ΠOP T such that the plan informativeness

of the plans in ΠOP T ∪ {π∗}is maximized, that is

ΠOP T = argmax

U(Π∪{π∗})s.t. |Π|=kand Π⊆Π∗\{π∗}

where Π∗denotes the space of all possible plans and U(·)

denotes the plan informativeness of a set of query plans.

Deﬁnition 3.2: Given a query plan set Πa user has seen so

far, the iterative informative plan selection (I-TIPS) problem

aims to select an alternative plan πOP T such that the plan

informativeness of the plans in Π∪ {πOP T }is maximized,

that is

πOP T = argmax

U(Π ∪ {π})subject to π∈Π∗and π /∈Π

In the next section, we shall elaborate on the notion of plan

informativeness U(·)to capture informative plans.

Example 3.3: Reconsider Example 1.1. Suppose Lena has

viewed the QEP in Fig. 1(a). She may now wish to view

another two alternative plans (i.e., k= 2) that are informative.

The B-TIPS problem is designed to address it by selecting

these alternative plans from the entire plan space so that the

plan informativeness U(·)of the QEP and the selected plans

is maximized. On the other hand, suppose another learner is

unsure about how many alternative plans should be viewed

or is dissatisiﬁed with the result returned by B-TIPS. In this

Fig. 3: Plan Tree examples.

case, the I-TIPS problem enables him to iteratively select

“AQP-at-a-time” and provide a rating w.r.t. its informativeness

until satisﬁed. The plan informativeness is maximized by

considering the rating from the learner.

Given a plan informativeness measure U(·)and the plan

space Π∗, we can ﬁnd an alternative plan for the I-TIPS

problem in O(|Π∗|)time. However, plan space increases

exponentially with the increase in the number of joins and it

is NP-hard to ﬁnd the best join order [20]. Therefore, we need

efﬁcient solutions especially when there is a large number of

joined tables.

THEOREM 3.4: Given a plan informativeness measure U(·),

B-TIPS problem is NP-hard.

Proof: (Sketch). Let us consider the facility dispersion

problem which is NP-hard [35]. In that problem, one is given

a facility set S, a function fof distance and an integer k(k <

|F|). The objective is to ﬁnd a subset S′(|S′|=k)of S

so that the given function on fis maximum. Because facility

dispersion problem is a special case of B-TIPS where {π∗}=∅

and U(·) = f,B-TIPS is also NP-hard.

IV. QUANTIFYING PLAN INTERESTINGNESS

In this section, we describe the quantiﬁcation of plan

informativeness U(·)introduced in the preceding section by

exploiting the insights from learner-centered characterization

of AQPs in Section II.

A. Lessons from Characterization of AQPs

In Section II, feedbacks from volunteers reveal that the

selection of informative plans is inﬂuenced by the followings:

(a) The structure, content, and cost differences of an alternative

plan w.r.t. the QEP (and other alterative plans) can be exploited

to differentiate between informative and uninformative alter-

native plans. Hence, we need to quantify these differences

for informative plan selection. (b) Volunteers typically do not

prefer to view alternative plans that are very similar to the

QEP or other alternative plans already viewed by them. We

now elaborate on the quantiﬁcation approach to capture two

factors and then utilizing them to quantify plan interestingness.

B. Plan Differences

Broadly, U(·)should facilitate identifying speciﬁc alter-

native plans other than the QEP such that by putting them

together a user can acquire a comprehensive understanding of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfficientSelectionofInformativeAlternativeRelationalQueryPlansforDatabaseEducationHuWang#,HuiLi#,SouravS.Bhowmick†,ZihaoMa#,JiangtaoCui##SchoolofComputerScienceandTechnology,XidianUniversity,Chinahuwang@stu.xidian.edu.cn,hli@xidian.edu.cn,zhma@stu.xidian.edu.cn†SchoolofComputerScienceandEngineering,...

展开>> 收起<<

Efficient Selection of Informative Alternative Relational Query Plans for Database Education Hu Wang Hui Li Sourav S. Bhowmick Zihao Ma Jiangtao Cui.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efficient Selection of Informative Alternative Relational Query Plans for Database Education Hu Wang Hui Li Sourav S. Bhowmick Zihao Ma Jiangtao Cui

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: