Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries

2025-05-06 0 0 522.63KB 33 页 10玖币

侵权投诉

Logical Methods in Computer Science

Volume 21, Issue 1, 2025, pp. 29:1–29:33

https://lmcs.episciences.org/

Submitted Feb. 23, 2023

Published Mar. 27, 2025

UNBALANCED TRIANGLE DETECTION AND ENUMERATION

HARDNESS FOR UNIONS OF CONJUNCTIVE QUERIES

KARL BRINGMANN aAND NOFAR CARMELI b

Saarland University and Max Planck Institute for Informatics, Saarland Informatics Campus,

Saarbr¨ucken, Germany

e-mail address: bringmann@cs.uni-saarland.de

bInria, LIRMM, University of Montpellier, CNRS, Montpellier, France

e-mail address: nofar.carmeli@inria.fr

Abstract.

We study the enumeration of answers to Unions of Conjunctive Queries (UCQs)

with optimal time guarantees. More precisely, we wish to identify the queries that can be

solved with linear preprocessing time and constant delay. Despite the basic nature of this

problem, it was shown only recently that UCQs can be solved within these time bounds

if they admit free-connex union extensions, even if all individual CQs in the union are

intractable with respect to the same complexity measure. Our goal is to understand whether

there exist additional tractable UCQs, not covered by the currently known algorithms.

As a ﬁrst step, we show that some previously unclassiﬁed UCQs are hard using the

classic 3SUM hypothesis, via a known reduction from 3SUM to triangle listing in graphs. As

a second step, we identify a question about a variant of this graph task that is unavoidable

if we want to classify all self-join-free UCQs: is it possible to decide the existence of a

triangle in a vertex-unbalanced tripartite graph in linear time? We prove that this task is

equivalent in hardness to some family of UCQs. Finally, we show a dichotomy for unions

of two self-join-free CQs if we assume the answer to this question is negative.

In conclusion, this paper pinpoints a computational barrier in the form of a single

decision problem that is key to advancing our understanding of the enumeration complexity

of many UCQs. Without a breakthrough for unbalanced triangle detection, we have no

hope of ﬁnding an eﬃcient algorithm for additional unions of two self-join-free CQs. On the

other hand, a suﬃciently eﬃcient unbalanced triangle detection algorithm can be turned

into an eﬃcient algorithm for a family of UCQs currently not known to be tractable.

1. Introduction

Answering queries over relational databases is a fundamental problem in data management.

As the available data in the world grows bigger, so grows the importance of ﬁnding the best

possible complexity for solving this problem. Since the query itself is usually signiﬁcantly

smaller than the size of the database, it is common to use data complexity [

Var82

]: we treat

the query as ﬁxed, and examine the complexity of ﬁnding the answers to the given query

over the input database. As the number of answers to a query may be much larger than the

Key words and phrases: enumeration, ﬁne-grained complexity, constant delay, union of conjunctive queries,

unbalanced triangle detection.

LOGICAL METHODS

IN COMPUTER SCIENCE DOI:10.46298/LMCS-21(1:29)2025

⃝Creative Commons

29:2 K. Bringmann and N. Carmeli Vol. 21:1

size of the database itself, we cannot hope to generate all answers in linear time in the size

of the input. Instead, we use enumeration measures. Since we must read the entire input to

verify whether the query has answers, and we must print all answers, the measure of linear

preprocessing time and constant delay between two successive answers can be seen as the

optimal time complexity for answering queries. The class of queries that can be answered

within these time bounds is denoted

DelayClin

, and recent research asks which queries are in

this class [Dur20, BGS20].

Proving that a query belongs to the class

DelayClin

can be achieved by a variety of

algorithmic techniques, coupled with insights into the query structure. However, proving

that a query is unconditionally not contained in this class is beyond the state-of-the-art

lower bound techniques for the RAM model. Therefore, one must resort to conditional lower

bounds. That is, start from a hypothesis on the time complexity of a well-studied problem

and design a reduction from your problem of choice; this proves a lower bound that holds

conditional on the starting hypothesis. While such a conditional lower bound is no absolute

impossibility result, it identiﬁes an algorithmic breakthrough that is necessary to ﬁnd better

algorithms for your problem of choice, and thus it yields strong evidence that no better

algorithm exists. This paradigm has been successfully applied in the ﬁeld of ﬁne-grained

complexity theory to obtain tight conditional lower bounds for many diﬀerent problems, see,

e.g., [

Wil18

Bri19

]. When searching for dichotomies (that characterize which problems in

a class admit eﬃcient algorithms), research aiming for lower bounds (conditional or not)

has another advantage. The reductions showing hardness often succeed only in some cases.

This brings out the other cases, directing us to focus our attention where we have hope for

ﬁnding eﬃcient algorithms without a major computational breakthrough. This approach

has been useful for ﬁnding tractable cases that were previously unknown [CK21].

When considering Conjunctive Queries (CQs) without self-joins, the tractability of

enumerating query answers with respect to

DelayClin

is well understood. The queries with a

free-connex structure are tractable [

BDG07

]; these are acyclic queries that remain acyclic

with the addition of an atom containing the free variables. This tractability result is

complemented by conditional lower bounds forming a dichotomy: a self-join-free CQ is in

DelayClin

if and only if it is free-connex [

BB13

BDG07

]. The hardness of cyclic CQs assumes

the hardness of ﬁnding hypercliques in a hypergraph [

BB13

], while the hardness of acyclic

non-free-connex CQs assumes the hardness of Boolean matrix multiplication [

BDG07

]. This

dichotomy assumes that the CQ does not contain self-joins (that is, every relation appears

in at most one atom of the query), which enables assigning diﬀerent atoms with diﬀerent

relations when reducing a hard problem to query answering. Not much is known regarding

the case with self-joins, but we do know that there are cases where self-joins aﬀect the

complexity [BGS20, CS23].

The next natural class of queries to consider is Unions of Conjunctive Queries (UCQs),

which is equivalent to positive relational algebra. A union of tractable CQs is known to be

tractable [

DS11

]. However, when the union contains an intractable CQ, the picture gets

much more complex. Note that a union that contains an intractable CQ may be equivalent to

a union of tractable CQs; in which case, the UCQ is tractable [

CK21

]. This can happen for

example if the union is comprised of an intractable CQ

and a tractable CQ

subsuming

it; then the entire union is equivalent to

. Thus, it makes sense to consider non-redundant

UCQs. It was claimed that a non-redundant UCQ that contains an intractable CQ is

necessarily intractable [

CJN18

]. This claim was disproved in a surprising result showing that

a UCQ may be tractable even if it consists solely of intractable CQs [

CK21

]. Speciﬁcally,

Vol. 21:1 UNBALANCED TRIANGLE DETECTION AND ENUMERATION HARDNESS FOR UCQS 29:3

Carmeli and Kr¨oll showed that whenever each CQ in a union can become free-connex

(and thus tractable) via a so-called union extension, then the UCQ is in

DelayClin

[

CK21

Moreover, every UCQ that we currently know to be in

DelayClin

has a free-connex union

extension.

In the case of a union of two intractable CQs, known conditional lower bounds show that

these extensions capture all tractable queries [

CK21

]. These lower bounds rely on the same

hypotheses as those used for CQs, in addition to a hypothesis on the hardness of detecting a

4-clique in a graph. The case of a union of a tractable CQ and an intractable CQ is not

yet completely classiﬁed, and Carmeli and Kr¨oll [

CK21

] identiﬁed several open examples,

that is, speciﬁc unclassiﬁed queries for which the current techniques for an algorithm or a

conditional lower bound do not apply.

Our Contribution. Our aim is to understand whether there exist additional tractable

UCQs, not covered by the currently known algorithms. We start by showing that some

examples of UCQs left open in [

CK21

] are hard assuming the standard 3SUM conjecture

(given

integers, it is not possible to decide in subquadratic time whether any three of

them sum to 0). Our reductions go through an intermediate hypothesis that we call Vertex-

Unbalanced Triangle Listing (VUTL; listing all triangles in an unbalanced tripartite graph

requires super-linear time in terms of input and output size). Speciﬁcally, building on a

reduction by Fischer, Kaliciak, and Polak [

FKP24

], we show that the VUTL hypothesis

is implied by the 3SUM conjecture. We then use VUTL to show the hardness of some

previously unclassiﬁed UCQs.

When trying to reduce VUTL to further unclassiﬁed UCQs, we identiﬁed several issues.

This led us to introduce a similar hypothesis on Vertex-Unbalanced Triangle Detection

(VUTD; determining whether an unbalanced tripartite graph contains triangles requires

super-linear time in terms of input size).

The VUTD hypothesis implies the VUTL

hypothesis, and thus the former is easier to reduce to UCQs. For a discussion of why

VUTD is a reasonable hypothesis, we refer to Section 3. We show that VUTD exactly

captures the hardness of some family of UCQs that do not have free-connex union extensions:

The VUTD hypothesis holds if and only if no query in this family is in

DelayClin

. Thus,

determining whether the VUTD hypothesis holds is unavoidable if we want to classify all

self-join-free UCQs. Next, we focus on unions of two CQs. We show how, assuming the

VUTD hypothesis, we can conclude the hardness of any union of one tractable CQ and

one intractable CQ that does not have a free-connex union extension. Moreover, if VUTD

holds, previously known hardness results apply without assuming additional hypotheses.

This results in a dichotomy, which is our main result: a union of two self-join-free CQs is in

DelayClin

if and only if it has a free-connex union extension, assuming the VUTD hypothesis.

For these UCQs, we conclude that the currently known algorithms cover all tractable cases

that do not require a major breakthrough regarding VUTD.

The main conclusion from our paper is that to make progress in understanding the

enumeration complexity of UCQs, it suﬃces to study the single decision problem of detecting

triangles in unbalanced graphs. More precisely, if we ever ﬁnd a linear-time algorithm for

unbalanced triangle detection, we will also get a breakthrough in UCQ evaluation in the

form of an algorithm for some UCQs that do not have a free-connex union extension. If on

the other hand, we assume that there is no linear-time algorithm for unbalanced triangle

Our triangle detection instances are vertex-unbalanced, in contrast to a recently formulated hypothesis

with the same name that is edge-unbalanced [KVW20], see Section 6.2 for a discussion.

29:4 K. Bringmann and N. Carmeli Vol. 21:1

detection, then for a large class of UCQs (namely, unions of two self-join-free CQs) the

currently known algorithms cover all tractable cases.

The paper is organized as follows. Section 2 provides basic deﬁnitions and results that

we use throughout the paper. In Section 3, we deﬁne VUTL and VUTD, discuss their

connections to other hypotheses, and use them to address some examples of UCQs that

were previously unclassiﬁed. In Section 4, we present a family of UCQs that is equivalent in

hardness to VUTD. Then, Section 5 shows the classiﬁcation of UCQs that we can achieve

based on the VUTD hypothesis; this proves our main dichotomy result. The next section

contains discussions related to alternative hypotheses. We discuss how we can conclude the

hardness for relaxed time requirements (polylogarithmic instead of constant delay) based on

a strengthening of the VUTD hypothesis in Section 6.1, and in Section 6.2 we discuss the

diﬀerence between VUTD and a similar hypothesis for edge-unbalanced graphs that was

recently introduced. We conclude in Section 7.

2. Preliminaries

Databases and Queries. Arelation is a set of tuples of constants, where each tuple has

the same arity (length). A schema Sis a collection of relation symbols

, each with an

associated arity. A database

(over the schema S) associates with each relation symbol

a ﬁnite relation, which we denote by RD, whose arity is that of Rin S.

AConjunctive Query (CQ)

over a schema Sis deﬁned by an expression of the form

(

⃗x

)

:- R1

(

⃗

)

, . . . , Rn

(

⃗

), where each

is a relation symbol of S, each

⃗

is a tuple of

variables and constants with the same arity as

, and

⃗x

is a tuple of variables from

⃗

t1,...,⃗

We usually omit the explicit speciﬁcation of the schema S, and assume that it consists of

the relation symbols that occur in the query at hand. We call

(

⃗x

) the head of

, and

(

⃗

)

, . . . , Rn

(

⃗

) the body of

. Each

(

⃗

) is an atom of

, and the set of all atoms of

is denoted

atoms

(

). When the order of the variables in an atom is not important for our

discussion, we sometimes denote an atom

(

⃗

) by

(

) where

is a set of variables. We

use

var

(

) to denote the set of variables that occur in

. We say that

is self-join-free if

every relation symbol occurs in it at most once. If a CQ is self-join-free, we use

var

(

) to

denote the set of variables that occur in the atom containing

. The variables occurring in

the head are called the free variables and denoted by

free

(

). The variables occurring in

the body but not in the head are called existential. A homomorphism

from a CQ

to a

database

is a mapping of the variables in

to the constants of

, such that for every

atom

(

⃗

) of the CQ, we have that

(

⃗

)

∈RD

. Each such homomorphism

yields an

answer h(⃗x) to Q. We denote by Q(D) the set of all answers to Qon D.

AUnion of Conjunctive Queries (UCQ)

is a ﬁnite set of CQs, denoted

Sℓ

i=1 Qi

where

free

(

) is the same for all 1

≤i≤ℓ

. The set of answers to

over a database

the union

(

) =

Sℓ

i=1 Qi

(

). Let

and

be CQs. A body-homomorphism from

is a mapping

var

(

)

→var

(

) such that for every atom

(

⃗v

) of

, we have that

(

⃗v

))

∈Q1

. A body-isomorphism from

is a bijective mapping

such that

a body-homomorphism from

and

h−1

is a body-homomorphism from

We say that two CQs are body-isomorphic if there is a body-isomorphism between them. A

homomorphism from

is a body-homomorphism

such that

(

free

(

)) =

free

(

It is well known that

is contained in

(i.e.,

(

)

⊆Q2

(

) on every input

) iﬀ there

exists a homomorphism from Q2to Q1[CM77]. We say that a UCQ is non-redundant if it

Vol. 21:1 UNBALANCED TRIANGLE DETECTION AND ENUMERATION HARDNESS FOR UCQS 29:5

does not contain two diﬀerent CQs such that there is a homomorphism from one to the other.

We often assume that UCQs are non-redundant; otherwise, an equivalent non-redundant

UCQ can be obtained by removing CQs.

Enumeration Complexity. An enumeration problem

is a collection of pairs (

I, Y

) where

is an input and

is a ﬁnite set of answers for

, denoted by

(

). An enumeration

algorithm

for an enumeration problem

is an algorithm that consists of two phases:

preprocessing and enumeration. During preprocessing,

is given an input I, and it may

build data structures. During the enumeration phase,

can access the data structures built

during preprocessing, and it emits the answers

(

), one by one, without repetitions. The

time between printing any two answers during the enumeration phase is called delay. In

this paper, an enumeration problem refers to a query

, the input is a database

, and

the answer set is

(

). Such a problem is denoted

Enum⟨Q⟩

. We adopt data complexity,

where the query is treated as ﬁxed, and the complexity is with respect to the size of the

representation of the database. We work on the common Random Access Machine (RAM)

model, where each memory cell stores Θ(

log |I|

) bits, and which supports lookup tables of

polynomial size that can be queried in constant time. The enumeration class

DelayClin

deﬁned as the class of all enumeration problems that have an enumeration algorithm with

(

|I|

) preprocessing time and

(1) delay. Note that this class does not impose a restriction

on the memory used.

To ease notation, we identify the query with its corresponding

enumeration problem, and we denote Q∈DelayClin to mean Enum⟨Q⟩ ∈ DelayClin.

Hypergraphs. Ahypergraph

= (

V, E

) is a set

of vertices and a set

of non-empty

subsets of

called hyperedges (sometimes edges). Given

S⊆V

, the induced subgraph

[

]

is (

S, E′

) where

E′

{e∩S|e∈E} \ {∅}

. Two vertices in a hypergraph are neighbors

if they appear in a common edge. A clique of a hypergraph is a set of vertices that are

pairwise neighbors in

. If every edge in

has exactly

vertices, we call

-uniform;

note that 2-uniform hypergraphs are just graphs. For any

ℓ>k

, an

ℓ

-hyperclique in a

-uniform hypergraph

is a set

V′

ℓ

vertices, such that every subset of

V′

of size

forms a hyperedge. A path of

is a sequence of vertices such that every two consecutive

vertices are neighbors. The length of a path

v1, . . . , vn

n−

1. A simple path of

is a path

where every vertex appears at most once. A chordless path is a simple path in which no two

non-consecutive vertices are neighbors. A cycle is a path that starts and ends in the same

vertex. A simple cycle is a cycle of length 3 or more where every vertex appears at most once

(except for the ﬁrst and last vertex). A chordless cycle is a simple cycle such that no two

non-consecutive vertices are neighbors and no edge contains all cycle vertices. A tetra of size

(where

k≥

3) is a set of

vertices such that every

k−

1 of them are contained in an edge,

and no edge contains all

vertices. A hypergraph is cyclic if it contains a chordless cycle or a

tetra. A hypergraph that is not cyclic is called acyclic (this is known as

-acyclicity) [

BB16

Note that “a 2-uniform hypergraph is cyclic” is a diﬀerent way of saying that “a graph has

a cycle”. A hypergraph is connected if for any two vertices

u, v

there is a path starting in

and ending in

. A tripartite graph is comprised of three sets of vertices (

V1, V2, V3

) and

three sets of edges

E1,2⊆V1×V2

E2,3⊆V2×V3

, and

E1,3⊆V1×V3

. A triangle in a

Not much is known regarding the complexity of UCQ answering when the memory is restricted. See [

CK21

Section 6.3] for a related discussion.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LogicalMethodsinComputerScienceVolume21,Issue1,2025,pp.29:1–29:33https://lmcs.episciences.org/SubmittedFeb.23,2023PublishedMar.27,2025UNBALANCEDTRIANGLEDETECTIONANDENUMERATIONHARDNESSFORUNIONSOFCONJUNCTIVEQUERIESKARLBRINGMANNaANDNOFARCARMELIbaSaarlandUniversityandMaxPlanckInstituteforInformatics,Saa...

展开>> 收起<<

Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unbalanced Triangle Detection and Enumeration Hardness for Unions of Conjunctive Queries

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: