Efficient Selection of Informative Alternative Relational Query Plans for Database Education Hu Wang Hui Li Sourav S. Bhowmick Zihao Ma Jiangtao Cui

2025-05-03 0 0 8.07MB 17 页 10玖币
侵权投诉
Efficient Selection of Informative Alternative
Relational Query Plans for Database Education
Hu Wang#, Hui Li#, Sourav S. Bhowmick, Zihao Ma#, Jiangtao Cui#
#School of Computer Science and Technology, Xidian University, China
huwang@stu.xidian.edu.cn,hli@xidian.edu.cn,zhma@stu.xidian.edu.cn
School of Computer Science and Engineering, Nanyang Technological University, Singapore
assourav@ntu.edu.sg
Abstract—Given an SQL query, a relational query engine
selects a query execution plan (QEP) for processing it. Off-the-
shelf RDBMS typically only expose the selected QEP to users
without revealing information about representative alternative
query plans (AQP) considered during QEP selection in a user-
friendly manner. Easy access to representative AQPs is useful
in several applications such as database education where it
can facilitate learners to comprehend the QEP selection process
during query optimization. In this paper, we present a novel
end-to-end and generic framework called ARENA that facilitates
exploration of informative alternative query plans of a given
SQL query to aid the comprehension of QEP selection. Under
the hood, ARENA addresses a novel problem called informative
plan selection problem (TIPS) which aims to discover a set of k
alternative plans from the underlying plan space so that the plan
interestingness of the set is maximized. Specifically, we explore
two variants of the problem, namely batch TIPS and incremental
TIPS, to cater to diverse set of users. Due to the computational
hardness of the problem, we present a 2-approximation algorithm
to address it efficiently. Exhaustive experimental study demon-
strates the superiority of ARENA to several baseline approaches.
As a representative use case of ARENA, we demonstrate its
effectiveness in supplementing database education through a user
study and academic assessment involving real-world learners.
Index Terms—Query processing and optimization
I. INTRODUCTION
A relational query engine selects a query execution plan
(QEP), which represents an execution strategy of an SQL query,
from many alternative query plans (AQPs) by comparing
their estimated costs. Major off-the-shelf relational database
systems (RDBMS) typically only expose the QEP to an end user.
They do not reveal a representative set of AQPs considered
by the underlying query optimizer during the selection of a
QEP in a user-friendly manner. Although an RDBMS (e.g.,
PostgreSQL) may allow one to manually pose SQL queries
with various constraints on configuration parameters to view
the corresponding QEPs containing specific physical operators
(e.g., SET enable hashjoin = true), such strategy demands not
only familiarity of the syntax and semantics of configuration
parameters, but also a clear idea of the AQPs that one is
interested. This is often impractical to assume (e.g., in educa-
tion environment where many learners are taking the database
systems course for the first time). Consider the following
motivating scenario.
Example 1.1: Lena is currently enrolled in an undergraduate
database systems course. She wishes to explore some of the
representative AQPs for the following query from the TPC-H
benchmark after viewing its QEP as depicted in Fig. 1(a).
SELECT s_acctbal, s_name, n_name, s_address,
s_phone
FROM supplier, nation, region
WHERE s_acctbal>=5000 and s_nationkey =
n_nationkey and n_regionkey = r_regionkey;
Specifically, Lena is curious to know whether there are AQPs
that have similar (resp. different) structure and physical oper-
ators but very different (resp. similar) estimated cost? If there
are, then how they look like? Examples of such AQPs are
shown in Fig. 1(b)-(d).
Clearly, a user-friendly tool that can facilitate retrieval and
exploration of “informative” AQPs associated with a given
query can greatly aid in answering Lena’s questions. Unfortu-
nately, very scant attention has been given to build such tools
by the academic and industrial communities. Recent efforts
for technological support to augment learning of relational
query processing primarily focused on the natural language
descriptions of QEPs [28; 43] and visualization of the plan
space [16]. In this paper, we present an end-to-end frame-
work called ARENA (AlteRnative quEry plaN ExplorAtion)
to address this challenge. Our approach not only has positive
feedback from learners (Section VIII-A) but, more importantly,
eventually improves academic outcomes (Section VIII-B).
Selecting a set of representative AQPs is a technically chal-
lenging problem. Firstly, the plan space can be prohibitively
large [6]. Importantly, not all AQPs potentially support under-
standing of the QEP selection process. Hence, it is ineffective
to expose all plans. Instead, it is paramount to select those that
are informative. However, how do we quantify informativeness
in order to select plans? For instance, in Fig. 1(b)-(d), which
plans should be revealed to Lena? Certainly, any informative-
ness measure needs to consider the plans (including the QEP)
that an individual has already viewed for her query so that
highly similar plans are not exposed to them. It should also
be cognizant of individuals’ interests in this context. Secondly,
it can be prohibitively expensive to scan all AQPs to select
informative ones. How can we design efficient techniques to
select informative plans? Importantly, this cannot be integrated
arXiv:2210.13722v4 [cs.DB] 16 Nov 2023
HashJoin
Cost: 657
TableScan
Cost: 220
supplier
HashJoin
Cost: 433
TableScan
Cost: 216
nation region
TableScan
Cost: 216
QEP
HashJoin
Cost: 1426
HashJoin
Cost: 1200
TableScan
Cost: 216
region nation
TableScan
Cost: 216
AP1
HashJoin
Cost: 663
HashJoin
Cost: 433
TableScan
Cost: 216
nation region
TableScan
Cost: 216
AP2
HashJoin
Cost: 1600
HashJoin
Cost: 1200
TableScan
Cost: 216
region nation
TableScan
Cost: 216
AP3
(a) QEP (c) Alternative plan 2 (AP2)(b) Alternative plan 1 (AP1) (d) Alternative plan 3 (AP3)
Filter
Cost: 222
TableScan
Cost: 220
supplier
Filter
Cost: 222
TableScan
Cost: 220
supplier
Filter
Cost: 222
TableScan
Cost: 220
supplier
Filter
Cost: 222
Fig. 1: Example of QEP and alternative query plans (AQPs).
into the enumeration step of the query optimizer as informative
AQPs need to be selected based on the QEP an individual has
seen.
Thirdly, at first glance, it may seem that we can select
k > 1alternative query plans where kis a value specified by
a user. Although this is a realistic assumption for many top-
kproblems, our engagement with users reveal that they may
not necessarily be confident to specify the value of kalways
(detailed in Section VIII-A). One may prefer to iteratively
view one plan-at-a-time and only cease exploration once they
are satisfied with the understanding of the alternative plan
choices made by the optimizer for a specific query. Hence,
kmay not only be unknown apriori but also the selection
of an AQP at each iteration to enhance their understanding of
different plan choices depends on the plans viewed by them
thus far. This demands for a flexible solution framework that
can select alternative query plans in the absence or presence
of kvalue. Lastly, any proposed solution should be generic
so that it can be easily realized on top of majority of existing
off-the-shelf RDBMS.
In this paper, we formalize the aforementioned challenges
into a novel problem called informative plan selection (TIPS)
problem and present ARENA to address it. Specifically, given
an SQL query and k1(when kis unspecified it is set to
1), it retrieves kalternative query plans (AQPs) that maximize
plan informativeness. Such query plans are referred to as
informative query plans. To this end, we introduce two variants
of TIPS, namely batch TIPS (B-TIPS) and incremental TIPS (I-
TIPS), to cater for scenarios where kis specified or unspecified
by a user, respectively. We introduce a novel concept called
plan informativeness to capture the aforementioned challenges
associated with the selection of informative AQPs. Specifically,
it embodies the preferences of real-world users (based on their
feedback) by mapping them to distance (computed by ex-
ploiting differences between plans w.r.t. structure, content and
estimated cost) and relevance of alternative plans with respect
to the QEP or the set of alternative plans viewed by a user
thus far. Next, given the computational hardness of the TIPS
problem, we present a 2-approximation algorithm to address
both the variants of the problem efficiently. Specifically, it has
a linear time complexity w.r.t. the number of candidate AQPs.
Given that the number of such candidates can be very large
for some queries. we propose two pruning strategies to prune
them efficiently. We adopt ORCA [39] to retrieve the candidate
plan space for a given SQL query in ARENA as it enables us
to build a generic framework that can easily be realized on
majority of existing RDBMS.
Lastly, we demonstrate the usefulness of ARENA by de-
ploying it in the database systems course at Xidian University.
Specifically, we analyze the feedback from real-world learners
and their academic performance w.r.t. the understanding of
the QEP selection process. Our analysis reveals the efficiency
and effectiveness of our solutions to help learners understand
alternative query plans in the context of QEP selection.
In summary, this paper makes the following key contribu-
tions.
We adopt a learner-centered approach to characterize the
notion of informative query plans for database education
by engaging real-world learners (Section II), and present
the notion of plan informativeness (Section IV) to quan-
titatively model the informativeness of AQPs.
We present a novel informative plan selection (TIPS)
problem to obtain a collection of alternative plans for
a given SQL query by maximizing plan informativeness
(Section III).
We present approximate algorithms with quality guaran-
tee that exploit plan informativeness and two pruning
strategies to address the TIPS problem efficiently (Sec-
tion VI). Our algorithms underpin the proposed end-to-
end ARENA framework to enable users to retrieve and ex-
plore informative plans for their SQL queries (Section V).
We undertake an exhaustive performance study to demon-
strate the superiority of ARENA to representative baselines
as well as justify the design choices (Section VII).
We conduct a user study and academic assessment in-
volving real-world learners taking the database systems
course to demonstrate the usefulness and effectiveness
of ARENA for pedagogical support to supplement their
understanding of the QEP selection process (Section VIII).
II. INFORMATIVE ALTERNATIVE PLANS
A key question that any alternative query plan (alternative
plan for brevity) selection technique needs to address is as
2
TABLE I: The 8 alternative plan categories.
Plan category Structure Content Cost
APIsmall small small
APII small small large
APII I small large small
APIV large small small
APVsmall large large
APV I large small large
APV II large large small
APV II I large large large
53
40%
27
20%
21
16%
17
13%
14
11% G1
G2
G3
G4
other
(a) Informative plans
57
45%
29
23%
25
20%
87B1
B2
B3
irrelevant
other
(b) Uninformative plans
Fig. 2: Survey results.
follows: What types of alternative plans are “interesting” or
“informative”? Reconsider Example 1.1. Fig. 1(b)-(d) depict
three alternative plans for the query where the physical oper-
ator/join order differences are highlighted with red rectangles
and significant cost differences are shown using orange nodes.
Specifically, all three AQPs have different join orders w.r.t. the
QEP. However, AP1 has very similar structure as the QEP but
significantly higher estimated cost. AP2, on the other hand,
has a very similar estimated cost as the QEP.AP3 consumes
similar high cost as AP1. Suppose k= 2. Then, among {AP1,
AP2},{AP1,AP3}, and {AP2,AP3}, which one is the best
choice? Intuitively, it is reasonable to choose the set that is
most “informative” to a learner.
For example, by viewing AP1 or AP3, Lena learned that the
HASH JOIN operator is sensitive to the order of the joined
tables. This reinforces the knowledge she has acquired from
her classroom lectures. On the other hand, AP2 reveals that
changing the join order may not always lead to significantly
different estimated costs. Lena was not aware of this and
therefore found it informative. Hence, {AP1,AP2}or {AP2,
AP3}are potentially the most informative set for Lena.
Feedback from volunteers. The above example illustrates that
the content, structure, and cost of a query plan w.r.t. the QEP
and already-viewed plans play pivotal roles in determining
informativeness of AQPs. To further validate this, we survey 68
unpaid volunteers who have taken a database systems course
recently or currently taking it in an academic institution.
Specifically, they are either current undergraduate students in a
computer science degree program or junior database engineers
who have recently joined industry. Note that we focus on
this group of volunteers as a key application of ARENA is
in database education.
We choose AQPs of two SQL queries on the IMDb dataset
for the volunteers to provide feedback. These queries contain
3-4 joins and involve different table scan methods. Note that
randomly selecting a large number of alternative plans for
feedback is ineffective as they not only may insufficiently
cover diverse types of plans w.r.t. the QEPs of the queries
but also our engagement with the volunteers reveal that it
may deter them to give feedback as they have to browse a
large number of plans. Hence, for each query, we display the
QEP and 8 alternative plans, which are heuristically selected
by evaluating their differences w.r.t. the QEP based on three
dimensions, i.e., structure, content, and estimated cost (Ta-
ble I). For instance, we may choose an alternative plan with
small difference in structure or content from the QEP but large
differences in the other two dimensions. Observe that it covers
majority of the AQPs of a given query. Note that the heuristic
model may occasionally select some similar alternative plans.
We deliberately expose these plans to the volunteers to garner
feedback on similar alternative plans.
We ask the volunteers to choose the most and least infor-
mative plans given a QEP and provide justifications for their
choices. The aforementioned plan types were not revealed to
the volunteers. We received 132 feedbacks from the volunteers.
We map the feedbacks on informative and uninformative
plans to the plan types in Table I and count the number of
occurrences. Fig. 2(a) reports the feedback on plans considered
as informative. Specifically, G1represents APII ,G2and G3
represent plans with lower cost differences. Particularly, G2
are plans with large structure and content differences (APV II )
whereas G3are those that have small structural or content
differences (APIII ,APIV ). G4represents APV I and APV III .
There are other feedbacks that are too sparse to be grouped
as representative of the volunteers.
Fig. 2(b) reports the feedback on uninformative plans se-
lected by the volunteers. B1represents plans that are very
similar to the QEP (API) and hence voted as uninformative
as they do not convey any useful information. B2represents
APV I and APV III .B3are plans that are very similar to
other alternative plans viewed by the volunteers. In addition,
there are eight other feedbacks that are orthogonal to our
problem (i.e., feedback on the visual interface design) and
hence categorized as “irrelevant”.
Summary. Our engagement with volunteers reveal that major-
ity find alternative plans of types APII APIV and APV II
informative. However, API,APV,APV I , and APV III are
not. Observe that the number of feedbacks for the latter two
that deem them uninformative is significantly more than those
that consider them informative (G4vs B2). Hence, we consider
them uninformative. Volunteers also find similar alternative
plans uninformative. In the sequel, we shall use these insights
to guide the design of plan informativeness and pruning
strategies.
III. INFORMATIVE PLAN SELECTION (TIPS) PROBLEM
In this section, we formally define the TIPS problem. We
begin with a brief background on relational query plans.
A. Relational Query Plans
Given an arbitrary SQL query, an RDBMS generates a query
execution plan (QEP) to execute it. A QEP consists of a
collection of physical operators organized in form of a tree,
namely the physical operator tree (operator tree for brevity).
Fig. 1(a) depicts an example QEP. Each physical operator,
3
e.g., SEQUENTIAL SCAN,INDEX SCAN, takes as input one
or more data streams and produces an output one. A QEP
explicitly describes how the underlying RDBMS shall execute
the given query. Notably, given a SQL, there are many different
query plans, other than the QEP, for executing it. For instance,
there are several physical operators in a RDBMS that implement
a join, e.g., HASH JOIN,SORT MERGE,NESTED LOOP.
We refer to each of these different plans (other than the QEP)
as an alternative query plan (AQP). Fig. 1(b)-(d) depict three
examples of AQP. It is well-known that the plan space is non-
polynomial [6].
B. Problem Definition
The informative plan selection (TIPS) problem aims to
automatically select a small number of alternative plans that
maximize plan informativeness. These plans are referred to
as informative alternative plans (informative plans/ AQPsfor
brevity). The TIPS problem has two flavors:
Batch TIPS. A user may wish to view top-kinformative
plans beside the QEP;
Incremental TIPS. A user may iteratively view an infor-
mative plan besides what have been already shown to
him/her;
Formally, we define these two variants of TIPS as follows.
Definition 3.1: Given a QEP πand a budget k, the batched
informative plan selection (B-TIPS) problem aims to find top-
kalternative plans ΠOP T such that the plan informativeness
of the plans in ΠOP T ∪ {π}is maximized, that is
ΠOP T = argmax
Π
U{π})s.t. |Π|=kand ΠΠ\{π}
where Πdenotes the space of all possible plans and U(·)
denotes the plan informativeness of a set of query plans.
Definition 3.2: Given a query plan set Πa user has seen so
far, the iterative informative plan selection (I-TIPS) problem
aims to select an alternative plan πOP T such that the plan
informativeness of the plans in Π∪ {πOP T }is maximized,
that is
πOP T = argmax
π
U∪ {π})subject to πΠand π /Π
In the next section, we shall elaborate on the notion of plan
informativeness U(·)to capture informative plans.
Example 3.3: Reconsider Example 1.1. Suppose Lena has
viewed the QEP in Fig. 1(a). She may now wish to view
another two alternative plans (i.e., k= 2) that are informative.
The B-TIPS problem is designed to address it by selecting
these alternative plans from the entire plan space so that the
plan informativeness U(·)of the QEP and the selected plans
is maximized. On the other hand, suppose another learner is
unsure about how many alternative plans should be viewed
or is dissatisified with the result returned by B-TIPS. In this
Fig. 3: Plan Tree examples.
case, the I-TIPS problem enables him to iteratively select
AQP-at-a-time” and provide a rating w.r.t. its informativeness
until satisfied. The plan informativeness is maximized by
considering the rating from the learner.
Given a plan informativeness measure U(·)and the plan
space Π, we can find an alternative plan for the I-TIPS
problem in O(|Π|)time. However, plan space increases
exponentially with the increase in the number of joins and it
is NP-hard to find the best join order [20]. Therefore, we need
efficient solutions especially when there is a large number of
joined tables.
THEOREM 3.4: Given a plan informativeness measure U(·),
B-TIPS problem is NP-hard.
Proof: (Sketch). Let us consider the facility dispersion
problem which is NP-hard [35]. In that problem, one is given
a facility set S, a function fof distance and an integer k(k <
|F|). The objective is to find a subset S(|S|=k)of S
so that the given function on fis maximum. Because facility
dispersion problem is a special case of B-TIPS where {π}=
and U(·) = f,B-TIPS is also NP-hard.
IV. QUANTIFYING PLAN INTERESTINGNESS
In this section, we describe the quantification of plan
informativeness U(·)introduced in the preceding section by
exploiting the insights from learner-centered characterization
of AQPs in Section II.
A. Lessons from Characterization of AQPs
In Section II, feedbacks from volunteers reveal that the
selection of informative plans is influenced by the followings:
(a) The structure, content, and cost differences of an alternative
plan w.r.t. the QEP (and other alterative plans) can be exploited
to differentiate between informative and uninformative alter-
native plans. Hence, we need to quantify these differences
for informative plan selection. (b) Volunteers typically do not
prefer to view alternative plans that are very similar to the
QEP or other alternative plans already viewed by them. We
now elaborate on the quantification approach to capture two
factors and then utilizing them to quantify plan interestingness.
B. Plan Differences
Broadly, U(·)should facilitate identifying specific alter-
native plans other than the QEP such that by putting them
together a user can acquire a comprehensive understanding of
4
摘要:

EfficientSelectionofInformativeAlternativeRelationalQueryPlansforDatabaseEducationHuWang#,HuiLi#,SouravS.Bhowmick†,ZihaoMa#,JiangtaoCui##SchoolofComputerScienceandTechnology,XidianUniversity,Chinahuwang@stu.xidian.edu.cn,hli@xidian.edu.cn,zhma@stu.xidian.edu.cn†SchoolofComputerScienceandEngineering,...

展开>> 收起<<
Efficient Selection of Informative Alternative Relational Query Plans for Database Education Hu Wang Hui Li Sourav S. Bhowmick Zihao Ma Jiangtao Cui.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:8.07MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注