Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries

2025-05-06 0 0 522.63KB 33 页 10玖币
侵权投诉
Logical Methods in Computer Science
Volume 21, Issue 1, 2025, pp. 29:1–29:33
https://lmcs.episciences.org/
Submitted Feb. 23, 2023
Published Mar. 27, 2025
UNBALANCED TRIANGLE DETECTION AND ENUMERATION
HARDNESS FOR UNIONS OF CONJUNCTIVE QUERIES
KARL BRINGMANN aAND NOFAR CARMELI b
a
Saarland University and Max Planck Institute for Informatics, Saarland Informatics Campus,
Saarbr¨ucken, Germany
e-mail address: bringmann@cs.uni-saarland.de
bInria, LIRMM, University of Montpellier, CNRS, Montpellier, France
e-mail address: nofar.carmeli@inria.fr
Abstract.
We study the enumeration of answers to Unions of Conjunctive Queries (UCQs)
with optimal time guarantees. More precisely, we wish to identify the queries that can be
solved with linear preprocessing time and constant delay. Despite the basic nature of this
problem, it was shown only recently that UCQs can be solved within these time bounds
if they admit free-connex union extensions, even if all individual CQs in the union are
intractable with respect to the same complexity measure. Our goal is to understand whether
there exist additional tractable UCQs, not covered by the currently known algorithms.
As a first step, we show that some previously unclassified UCQs are hard using the
classic 3SUM hypothesis, via a known reduction from 3SUM to triangle listing in graphs. As
a second step, we identify a question about a variant of this graph task that is unavoidable
if we want to classify all self-join-free UCQs: is it possible to decide the existence of a
triangle in a vertex-unbalanced tripartite graph in linear time? We prove that this task is
equivalent in hardness to some family of UCQs. Finally, we show a dichotomy for unions
of two self-join-free CQs if we assume the answer to this question is negative.
In conclusion, this paper pinpoints a computational barrier in the form of a single
decision problem that is key to advancing our understanding of the enumeration complexity
of many UCQs. Without a breakthrough for unbalanced triangle detection, we have no
hope of finding an efficient algorithm for additional unions of two self-join-free CQs. On the
other hand, a sufficiently efficient unbalanced triangle detection algorithm can be turned
into an efficient algorithm for a family of UCQs currently not known to be tractable.
1. Introduction
Answering queries over relational databases is a fundamental problem in data management.
As the available data in the world grows bigger, so grows the importance of finding the best
possible complexity for solving this problem. Since the query itself is usually significantly
smaller than the size of the database, it is common to use data complexity [
Var82
]: we treat
the query as fixed, and examine the complexity of finding the answers to the given query
over the input database. As the number of answers to a query may be much larger than the
Key words and phrases: enumeration, fine-grained complexity, constant delay, union of conjunctive queries,
unbalanced triangle detection.
LOGICAL METHODS
IN COMPUTER SCIENCE DOI:10.46298/LMCS-21(1:29)2025
© K. Bringmann and N. Carmeli
CC
Creative Commons
29:2 K. Bringmann and N. Carmeli Vol. 21:1
size of the database itself, we cannot hope to generate all answers in linear time in the size
of the input. Instead, we use enumeration measures. Since we must read the entire input to
verify whether the query has answers, and we must print all answers, the measure of linear
preprocessing time and constant delay between two successive answers can be seen as the
optimal time complexity for answering queries. The class of queries that can be answered
within these time bounds is denoted
DelayClin
, and recent research asks which queries are in
this class [Dur20, BGS20].
Proving that a query belongs to the class
DelayClin
can be achieved by a variety of
algorithmic techniques, coupled with insights into the query structure. However, proving
that a query is unconditionally not contained in this class is beyond the state-of-the-art
lower bound techniques for the RAM model. Therefore, one must resort to conditional lower
bounds. That is, start from a hypothesis on the time complexity of a well-studied problem
and design a reduction from your problem of choice; this proves a lower bound that holds
conditional on the starting hypothesis. While such a conditional lower bound is no absolute
impossibility result, it identifies an algorithmic breakthrough that is necessary to find better
algorithms for your problem of choice, and thus it yields strong evidence that no better
algorithm exists. This paradigm has been successfully applied in the field of fine-grained
complexity theory to obtain tight conditional lower bounds for many different problems, see,
e.g., [
Wil18
,
Bri19
]. When searching for dichotomies (that characterize which problems in
a class admit efficient algorithms), research aiming for lower bounds (conditional or not)
has another advantage. The reductions showing hardness often succeed only in some cases.
This brings out the other cases, directing us to focus our attention where we have hope for
finding efficient algorithms without a major computational breakthrough. This approach
has been useful for finding tractable cases that were previously unknown [CK21].
When considering Conjunctive Queries (CQs) without self-joins, the tractability of
enumerating query answers with respect to
DelayClin
is well understood. The queries with a
free-connex structure are tractable [
BDG07
]; these are acyclic queries that remain acyclic
with the addition of an atom containing the free variables. This tractability result is
complemented by conditional lower bounds forming a dichotomy: a self-join-free CQ is in
DelayClin
if and only if it is free-connex [
BB13
,
BDG07
]. The hardness of cyclic CQs assumes
the hardness of finding hypercliques in a hypergraph [
BB13
], while the hardness of acyclic
non-free-connex CQs assumes the hardness of Boolean matrix multiplication [
BDG07
]. This
dichotomy assumes that the CQ does not contain self-joins (that is, every relation appears
in at most one atom of the query), which enables assigning different atoms with different
relations when reducing a hard problem to query answering. Not much is known regarding
the case with self-joins, but we do know that there are cases where self-joins affect the
complexity [BGS20, CS23].
The next natural class of queries to consider is Unions of Conjunctive Queries (UCQs),
which is equivalent to positive relational algebra. A union of tractable CQs is known to be
tractable [
DS11
]. However, when the union contains an intractable CQ, the picture gets
much more complex. Note that a union that contains an intractable CQ may be equivalent to
a union of tractable CQs; in which case, the UCQ is tractable [
CK21
]. This can happen for
example if the union is comprised of an intractable CQ
Q1
and a tractable CQ
Q2
subsuming
it; then the entire union is equivalent to
Q2
. Thus, it makes sense to consider non-redundant
UCQs. It was claimed that a non-redundant UCQ that contains an intractable CQ is
necessarily intractable [
CJN18
]. This claim was disproved in a surprising result showing that
a UCQ may be tractable even if it consists solely of intractable CQs [
CK21
]. Specifically,
Vol. 21:1 UNBALANCED TRIANGLE DETECTION AND ENUMERATION HARDNESS FOR UCQS 29:3
Carmeli and Kr¨oll showed that whenever each CQ in a union can become free-connex
(and thus tractable) via a so-called union extension, then the UCQ is in
DelayClin
[
CK21
].
Moreover, every UCQ that we currently know to be in
DelayClin
has a free-connex union
extension.
In the case of a union of two intractable CQs, known conditional lower bounds show that
these extensions capture all tractable queries [
CK21
]. These lower bounds rely on the same
hypotheses as those used for CQs, in addition to a hypothesis on the hardness of detecting a
4-clique in a graph. The case of a union of a tractable CQ and an intractable CQ is not
yet completely classified, and Carmeli and Kr¨oll [
CK21
] identified several open examples,
that is, specific unclassified queries for which the current techniques for an algorithm or a
conditional lower bound do not apply.
Our Contribution. Our aim is to understand whether there exist additional tractable
UCQs, not covered by the currently known algorithms. We start by showing that some
examples of UCQs left open in [
CK21
] are hard assuming the standard 3SUM conjecture
(given
n
integers, it is not possible to decide in subquadratic time whether any three of
them sum to 0). Our reductions go through an intermediate hypothesis that we call Vertex-
Unbalanced Triangle Listing (VUTL; listing all triangles in an unbalanced tripartite graph
requires super-linear time in terms of input and output size). Specifically, building on a
reduction by Fischer, Kaliciak, and Polak [
FKP24
], we show that the VUTL hypothesis
is implied by the 3SUM conjecture. We then use VUTL to show the hardness of some
previously unclassified UCQs.
When trying to reduce VUTL to further unclassified UCQs, we identified several issues.
This led us to introduce a similar hypothesis on Vertex-Unbalanced Triangle Detection
(VUTD; determining whether an unbalanced tripartite graph contains triangles requires
super-linear time in terms of input size).
1
The VUTD hypothesis implies the VUTL
hypothesis, and thus the former is easier to reduce to UCQs. For a discussion of why
VUTD is a reasonable hypothesis, we refer to Section 3. We show that VUTD exactly
captures the hardness of some family of UCQs that do not have free-connex union extensions:
The VUTD hypothesis holds if and only if no query in this family is in
DelayClin
. Thus,
determining whether the VUTD hypothesis holds is unavoidable if we want to classify all
self-join-free UCQs. Next, we focus on unions of two CQs. We show how, assuming the
VUTD hypothesis, we can conclude the hardness of any union of one tractable CQ and
one intractable CQ that does not have a free-connex union extension. Moreover, if VUTD
holds, previously known hardness results apply without assuming additional hypotheses.
This results in a dichotomy, which is our main result: a union of two self-join-free CQs is in
DelayClin
if and only if it has a free-connex union extension, assuming the VUTD hypothesis.
For these UCQs, we conclude that the currently known algorithms cover all tractable cases
that do not require a major breakthrough regarding VUTD.
The main conclusion from our paper is that to make progress in understanding the
enumeration complexity of UCQs, it suffices to study the single decision problem of detecting
triangles in unbalanced graphs. More precisely, if we ever find a linear-time algorithm for
unbalanced triangle detection, we will also get a breakthrough in UCQ evaluation in the
form of an algorithm for some UCQs that do not have a free-connex union extension. If on
the other hand, we assume that there is no linear-time algorithm for unbalanced triangle
1
Our triangle detection instances are vertex-unbalanced, in contrast to a recently formulated hypothesis
with the same name that is edge-unbalanced [KVW20], see Section 6.2 for a discussion.
29:4 K. Bringmann and N. Carmeli Vol. 21:1
detection, then for a large class of UCQs (namely, unions of two self-join-free CQs) the
currently known algorithms cover all tractable cases.
The paper is organized as follows. Section 2 provides basic definitions and results that
we use throughout the paper. In Section 3, we define VUTL and VUTD, discuss their
connections to other hypotheses, and use them to address some examples of UCQs that
were previously unclassified. In Section 4, we present a family of UCQs that is equivalent in
hardness to VUTD. Then, Section 5 shows the classification of UCQs that we can achieve
based on the VUTD hypothesis; this proves our main dichotomy result. The next section
contains discussions related to alternative hypotheses. We discuss how we can conclude the
hardness for relaxed time requirements (polylogarithmic instead of constant delay) based on
a strengthening of the VUTD hypothesis in Section 6.1, and in Section 6.2 we discuss the
difference between VUTD and a similar hypothesis for edge-unbalanced graphs that was
recently introduced. We conclude in Section 7.
2. Preliminaries
Databases and Queries. Arelation is a set of tuples of constants, where each tuple has
the same arity (length). A schema Sis a collection of relation symbols
R
, each with an
associated arity. A database
D
(over the schema S) associates with each relation symbol
R
a finite relation, which we denote by RD, whose arity is that of Rin S.
AConjunctive Query (CQ)
Q
over a schema Sis defined by an expression of the form
Q
(
x
)
:- R1
(
t1
)
, . . . , Rn
(
tn
), where each
Ri
is a relation symbol of S, each
ti
is a tuple of
variables and constants with the same arity as
Ri
, and
x
is a tuple of variables from
t1,...,
tn
.
We usually omit the explicit specification of the schema S, and assume that it consists of
the relation symbols that occur in the query at hand. We call
Q
(
x
) the head of
Q
, and
R1
(
t1
)
, . . . , Rn
(
tn
) the body of
Q
. Each
Ri
(
ti
) is an atom of
Q
, and the set of all atoms of
Q
is denoted
atoms
(
Q
). When the order of the variables in an atom is not important for our
discussion, we sometimes denote an atom
Ri
(
ti
) by
Ri
(
Ti
) where
Ti
is a set of variables. We
use
var
(
Q
) to denote the set of variables that occur in
Q
. We say that
Q
is self-join-free if
every relation symbol occurs in it at most once. If a CQ is self-join-free, we use
var
(
Ri
) to
denote the set of variables that occur in the atom containing
Ri
. The variables occurring in
the head are called the free variables and denoted by
free
(
Q
). The variables occurring in
the body but not in the head are called existential. A homomorphism
h
from a CQ
Q
to a
database
D
is a mapping of the variables in
Q
to the constants of
D
, such that for every
atom
Ri
(
ti
) of the CQ, we have that
h
(
ti
)
RD
. Each such homomorphism
h
yields an
answer h(x) to Q. We denote by Q(D) the set of all answers to Qon D.
AUnion of Conjunctive Queries (UCQ)
Q
is a finite set of CQs, denoted
Q
=
S
i=1 Qi
,
where
free
(
Qi
) is the same for all 1
i
. The set of answers to
Q
over a database
D
is
the union
Q
(
D
) =
S
i=1 Qi
(
D
). Let
Q1
and
Q2
be CQs. A body-homomorphism from
Q2
to
Q1
is a mapping
h
:
var
(
Q2
)
var
(
Q1
) such that for every atom
R
(
v
) of
Q2
, we have that
R
(
h
(
v
))
Q1
. A body-isomorphism from
Q2
to
Q1
is a bijective mapping
h
such that
h
is
a body-homomorphism from
Q2
to
Q1
and
h1
is a body-homomorphism from
Q1
to
Q2
.
We say that two CQs are body-isomorphic if there is a body-isomorphism between them. A
homomorphism from
Q2
to
Q1
is a body-homomorphism
h
such that
h
(
free
(
Q2
)) =
free
(
Q1
).
It is well known that
Q1
is contained in
Q2
(i.e.,
Q1
(
D
)
Q2
(
D
) on every input
D
) iff there
exists a homomorphism from Q2to Q1[CM77]. We say that a UCQ is non-redundant if it
Vol. 21:1 UNBALANCED TRIANGLE DETECTION AND ENUMERATION HARDNESS FOR UCQS 29:5
does not contain two different CQs such that there is a homomorphism from one to the other.
We often assume that UCQs are non-redundant; otherwise, an equivalent non-redundant
UCQ can be obtained by removing CQs.
Enumeration Complexity. An enumeration problem
P
is a collection of pairs (
I, Y
) where
I
is an input and
Y
is a finite set of answers for
I
, denoted by
P
(
I
). An enumeration
algorithm
A
for an enumeration problem
P
is an algorithm that consists of two phases:
preprocessing and enumeration. During preprocessing,
A
is given an input I, and it may
build data structures. During the enumeration phase,
A
can access the data structures built
during preprocessing, and it emits the answers
P
(
I
), one by one, without repetitions. The
time between printing any two answers during the enumeration phase is called delay. In
this paper, an enumeration problem refers to a query
Q
, the input is a database
D
, and
the answer set is
Q
(
D
). Such a problem is denoted
EnumQ
. We adopt data complexity,
where the query is treated as fixed, and the complexity is with respect to the size of the
representation of the database. We work on the common Random Access Machine (RAM)
model, where each memory cell stores Θ(
log |I|
) bits, and which supports lookup tables of
polynomial size that can be queried in constant time. The enumeration class
DelayClin
is
defined as the class of all enumeration problems that have an enumeration algorithm with
O
(
|I|
) preprocessing time and
O
(1) delay. Note that this class does not impose a restriction
on the memory used.
2
To ease notation, we identify the query with its corresponding
enumeration problem, and we denote QDelayClin to mean EnumQ⟩ ∈ DelayClin.
Hypergraphs. Ahypergraph
H
= (
V, E
) is a set
V
of vertices and a set
E
of non-empty
subsets of
V
called hyperedges (sometimes edges). Given
SV
, the induced subgraph
H
[
S
]
is (
S, E
) where
E
=
{eS|eE} \ {∅}
. Two vertices in a hypergraph are neighbors
if they appear in a common edge. A clique of a hypergraph is a set of vertices that are
pairwise neighbors in
H
. If every edge in
H
has exactly
k
vertices, we call
Hk
-uniform;
note that 2-uniform hypergraphs are just graphs. For any
ℓ>k
, an
-hyperclique in a
k
-uniform hypergraph
H
is a set
V
of
vertices, such that every subset of
V
of size
k
forms a hyperedge. A path of
H
is a sequence of vertices such that every two consecutive
vertices are neighbors. The length of a path
v1, . . . , vn
is
n
1. A simple path of
H
is a path
where every vertex appears at most once. A chordless path is a simple path in which no two
non-consecutive vertices are neighbors. A cycle is a path that starts and ends in the same
vertex. A simple cycle is a cycle of length 3 or more where every vertex appears at most once
(except for the first and last vertex). A chordless cycle is a simple cycle such that no two
non-consecutive vertices are neighbors and no edge contains all cycle vertices. A tetra of size
k
(where
k
3) is a set of
k
vertices such that every
k
1 of them are contained in an edge,
and no edge contains all
k
vertices. A hypergraph is cyclic if it contains a chordless cycle or a
tetra. A hypergraph that is not cyclic is called acyclic (this is known as
α
-acyclicity) [
BB16
].
Note that “a 2-uniform hypergraph is cyclic” is a different way of saying that “a graph has
a cycle”. A hypergraph is connected if for any two vertices
u, v
there is a path starting in
u
and ending in
v
. A tripartite graph is comprised of three sets of vertices (
V1, V2, V3
) and
three sets of edges
E1,2V1×V2
,
E2,3V2×V3
, and
E1,3V1×V3
. A triangle in a
2
Not much is known regarding the complexity of UCQ answering when the memory is restricted. See [
CK21
,
Section 6.3] for a related discussion.
摘要:

LogicalMethodsinComputerScienceVolume21,Issue1,2025,pp.29:1–29:33https://lmcs.episciences.org/SubmittedFeb.23,2023PublishedMar.27,2025UNBALANCEDTRIANGLEDETECTIONANDENUMERATIONHARDNESSFORUNIONSOFCONJUNCTIVEQUERIESKARLBRINGMANNaANDNOFARCARMELIbaSaarlandUniversityandMaxPlanckInstituteforInformatics,Saa...

展开>> 收起<<
Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:33 页 大小:522.63KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注