
TABLE I: The 8 alternative plan categories.
Plan category Structure Content Cost
APIsmall small small
APII small small large
APII I small large small
APIV large small small
APVsmall large large
APV I large small large
APV II large large small
APV II I large large large
53
40%
27
20%
21
16%
17
13%
14
11% G1
G2
G3
G4
other
(a) Informative plans
57
45%
29
23%
25
20%
87B1
B2
B3
irrelevant
other
(b) Uninformative plans
Fig. 2: Survey results.
follows: What types of alternative plans are “interesting” or
“informative”? Reconsider Example 1.1. Fig. 1(b)-(d) depict
three alternative plans for the query where the physical oper-
ator/join order differences are highlighted with red rectangles
and significant cost differences are shown using orange nodes.
Specifically, all three AQPs have different join orders w.r.t. the
QEP. However, AP1 has very similar structure as the QEP but
significantly higher estimated cost. AP2, on the other hand,
has a very similar estimated cost as the QEP.AP3 consumes
similar high cost as AP1. Suppose k= 2. Then, among {AP1,
AP2},{AP1,AP3}, and {AP2,AP3}, which one is the best
choice? Intuitively, it is reasonable to choose the set that is
most “informative” to a learner.
For example, by viewing AP1 or AP3, Lena learned that the
HASH JOIN operator is sensitive to the order of the joined
tables. This reinforces the knowledge she has acquired from
her classroom lectures. On the other hand, AP2 reveals that
changing the join order may not always lead to significantly
different estimated costs. Lena was not aware of this and
therefore found it informative. Hence, {AP1,AP2}or {AP2,
AP3}are potentially the most informative set for Lena.
Feedback from volunteers. The above example illustrates that
the content, structure, and cost of a query plan w.r.t. the QEP
and already-viewed plans play pivotal roles in determining
informativeness of AQPs. To further validate this, we survey 68
unpaid volunteers who have taken a database systems course
recently or currently taking it in an academic institution.
Specifically, they are either current undergraduate students in a
computer science degree program or junior database engineers
who have recently joined industry. Note that we focus on
this group of volunteers as a key application of ARENA is
in database education.
We choose AQPs of two SQL queries on the IMDb dataset
for the volunteers to provide feedback. These queries contain
3-4 joins and involve different table scan methods. Note that
randomly selecting a large number of alternative plans for
feedback is ineffective as they not only may insufficiently
cover diverse types of plans w.r.t. the QEPs of the queries
but also our engagement with the volunteers reveal that it
may deter them to give feedback as they have to browse a
large number of plans. Hence, for each query, we display the
QEP and 8 alternative plans, which are heuristically selected
by evaluating their differences w.r.t. the QEP based on three
dimensions, i.e., structure, content, and estimated cost (Ta-
ble I). For instance, we may choose an alternative plan with
small difference in structure or content from the QEP but large
differences in the other two dimensions. Observe that it covers
majority of the AQPs of a given query. Note that the heuristic
model may occasionally select some similar alternative plans.
We deliberately expose these plans to the volunteers to garner
feedback on similar alternative plans.
We ask the volunteers to choose the most and least infor-
mative plans given a QEP and provide justifications for their
choices. The aforementioned plan types were not revealed to
the volunteers. We received 132 feedbacks from the volunteers.
We map the feedbacks on informative and uninformative
plans to the plan types in Table I and count the number of
occurrences. Fig. 2(a) reports the feedback on plans considered
as informative. Specifically, G1represents APII ,G2and G3
represent plans with lower cost differences. Particularly, G2
are plans with large structure and content differences (APV II )
whereas G3are those that have small structural or content
differences (APIII ,APIV ). G4represents APV I and APV III .
There are other feedbacks that are too sparse to be grouped
as representative of the volunteers.
Fig. 2(b) reports the feedback on uninformative plans se-
lected by the volunteers. B1represents plans that are very
similar to the QEP (API) and hence voted as uninformative
as they do not convey any useful information. B2represents
APV I and APV III .B3are plans that are very similar to
other alternative plans viewed by the volunteers. In addition,
there are eight other feedbacks that are orthogonal to our
problem (i.e., feedback on the visual interface design) and
hence categorized as “irrelevant”.
Summary. Our engagement with volunteers reveal that major-
ity find alternative plans of types APII −APIV and APV II
informative. However, API,APV,APV I , and APV III are
not. Observe that the number of feedbacks for the latter two
that deem them uninformative is significantly more than those
that consider them informative (G4vs B2). Hence, we consider
them uninformative. Volunteers also find similar alternative
plans uninformative. In the sequel, we shall use these insights
to guide the design of plan informativeness and pruning
strategies.
III. INFORMATIVE PLAN SELECTION (TIPS) PROBLEM
In this section, we formally define the TIPS problem. We
begin with a brief background on relational query plans.
A. Relational Query Plans
Given an arbitrary SQL query, an RDBMS generates a query
execution plan (QEP) to execute it. A QEP consists of a
collection of physical operators organized in form of a tree,
namely the physical operator tree (operator tree for brevity).
Fig. 1(a) depicts an example QEP. Each physical operator,
3