
38:2 M. Raszyk, D. Basin, S. Krsti´
c, and D. Traytel Vol. 19:4
consider a brand suspicious if there is a user and a score such that all the brand’s products
were reviewed by that user with that score. An RC query computing suspicious brands is
Qsusp BB(b)∧ ∃u, s. ∀p. P(b, p)−→ S(p, u, s).
This query is domain independent and follows closely our informal description. It is not,
however, clear how to evaluate it because its second conjunct is domain dependent as it is
satisfied for every brand that does not occur in P. Finding suspicious brands using RA or
SQL is a challenge, which only the best students from an undergraduate database course
will accomplish. We give away an RA answer next (where
−
is the set difference operator
and ▷is the anti-join, also known as the generalized difference operator [AHV95]):
πbrand ((πuser ,score (S)×B)−πbrand,user,score ((πuser,score (S)×P)▷S)) ∪(B−πbrand (P)).
The highlighted expressions
πuser,score
(S) are called generators. They ensure that the
left operands of the anti-join and set difference operators include or have the same columns
(i.e., are union-compatible) as the corresponding right operands. (Following Codd [
Cod72
],
one could also use the active domain to obtain canonical, but far less efficient, generators.)
Van Gelder and Topor [
GT87
,
GT91
] present a translation from a decidable class of
domain-independent RC queries, called evaluable, to RA expressions. Their translation of
the evaluable
Qsusp
query would yield different generators, replacing both highlighted parts
by
πuser
(S)
×πscore
(S). That one can avoid this Cartesian product as shown above is subtle:
Replacing only the first highlighted generator with the product results in an inequivalent
RA expression.
Once we have identified suspicious brands, we may want to obtain the users whose scoring
made the brands suspicious. In RC, omitting u’s quantifier from Qsusp achieves just that:
Qsusp
user BB(b)∧ ∃s. ∀p. P(b, p)−→ S(p, u, s).
In contrast, RA cannot express the same property as it is domain dependent (hence also not
evaluable and thus out of scope for Van Gelder and Topor’s translation):
Qsusp
user
is satisfied
for every user if a brand has no products, i.e., it does not occur in P. Yet,
Qsusp
user
is satisfied
for finitely many users on every database instance where Pcontains at least one row for every
brand from the relation B, in other words
Qsusp
user
is relatively safe on such database instances.
How does one evaluate queries that are not evaluable or even domain dependent? The
main approaches from the literature (Section 2) are either to use variants of the active
domain semantics [
BL00
,
HS94
,
AGSS86
] or to abandon finite relations entirely and evaluate
queries using finite representations of infinite (but well-behaved) relations such as systems of
constraints [
Rev02
] or automatic structures [
BG04
]. These approaches favor expressiveness
over efficiency. But unlike query translations, they cannot benefit from decades of practical
database research and engineering.
In this work, we translate arbitrary RC queries to RA expressions under the assumption
of an infinite domain. To deal with queries that are domain dependent, our translation
produces two RA expressions, instead of a single equivalent one. The first RA expression
characterizes the original RC query’s relative safety, the decidable question of whether the
query evaluates to a finite relation for a given database, which can be the case even for
a domain-dependent query, e.g.,
Qsusp
user
. If the original query is relatively safe on a given
database, i.e., produces some finite result, then the second RA expression evaluates to the
same finite result. Taken together, the two RA expressions solve the query capturability
problem [
AH91
]: they allow us to enumerate the original RC query’s finite evaluation result,
or to learn that it would be infinite using RA operations on the unmodified database.