Hierarchical Categories in Colored Searching Peyman AfshaniRasmus KillmannKasper Green Larsen Abstract

2025-04-24 0 0 401.7KB 13 页 10玖币

侵权投诉

Hierarchical Categories in Colored Searching

Peyman Afshani∗Rasmus Killmann†Kasper Green Larsen‡

Abstract

In colored range counting (CRC), the input is a set of points where each point is assigned a

“color” (or a “category”) and the goal is to store them in a data structure such that the number of

distinct categories inside a given query range can be counted eﬃciently. CRC has strong motivations

as it allows data structure to deal with categorical data.

However, colors (i.e., the categories) in the CRC problem do not have any internal structure,

whereas this is not the case for many datasets in practice where hierarchical categories exists or

where a single input belongs to multiple categories. Motivated by these, we consider variants of the

problem where such structures can be represented. We deﬁne two variants of the problem called

hierarchical range counting (HCC) and sub-category colored range counting (SCRC) and consider

hierarchical structures that can either be a DAG or a tree. We show that the two problems on some

special trees are in fact equivalent to other well-known problems in the literature. Based on these,

we also give eﬃcient data structures when the underlying hierarchy can be represented as a tree. We

show a conditional lower bound for the general case when the existing hierarchy can be any DAG,

through reductions from the orthogonal vectors problem.

1 Introduction

Range searching is a broad area of computational geometry where the goal is to store a given set Pof

input points in a data structure such that one can eﬃciently report (range reporting) or count (range

counting) the points inside a geometric query region R. Sometimes, each point pi∈Pis associated with

a weight wi∈Rand the goal could be to ﬁnd the sum of the weights in R, or the maximum weight

in P∩ R (range max problem). This is a very broad area of research and there are many well-studied

variants. See a recent excellent survey by Agarwal for more information [1].

Colored (or categorical) range searching is an important variant where each input point is associated

with a category which conceptually is represented as a color; the goal of the query is then to report

or count the number of distinct colors inside the query range. Colored range searching has strong

motivations since colors allow us to represent nominal attributes such as brand, manufacturer and so on

and in practice a data set often contains a mix of nominal and ordinal attributes.

Colored range searching was introduced in 1993 by Janardan and Lopez [14] and it has received con-

siderable attention since then. However, classical colored range searching only models a “ﬂat” categorical

structure where categories have no inherent structure. We consider variants of colored range counting

to model such structures. We show that looking at colored range searching from this angle creates a

number of interesting questions with non-trivial connections to other already existing problems.

1.1 Problem Deﬁnitions and Motivations

We begin by formally deﬁning colored range counting.

Problem 1 (colored range counting).Given an input set Pof npoints in Rd, a set Cof colors (i.e.,

categories) and a function C:P→C, store them in a data structure such that given a query range R ⊂

Rd, it can count the number of distinct colors in R, i.e., the value |CR|where CR={C(p)|p∈P∩ R}.

∗Aarhus University, Denmark peyman@cs.au.dk. Supported by Independent Research Fund Denmark (DFF) grant ID

DFF−7014−00404.

†Aarhus University, Denmark killmann@cs.au.dk. Supported by Independent Research Fund Denmark (DFF) grant

ID DFF−7014−00404.

‡Aarhus University, Denmark larsen@cs.au.dk. Supported by Independent Research Fund Denmark (DFF) Sapere

Aude Research Leader grant No 9064-00068B.

arXiv:2210.05403v1 [cs.DS] 11 Oct 2022

In the weighted version, input also includes a weight function W:C→Rand the output of the query

is the weighted sum of the distinct colors in R, i.e., the value Pc∈CRW(c).

Colored range searching assumes that the colors are completely unstructured and that each point

receives exactly one color. However, these assumptions can be inadequate as hierarchical categories are

quite common. For example, biological classiﬁcation of living organisms are done through a tree where

inclusion in a category implies inclusion in all the ancestor categories. In fact, similar phenomena happen

with respect to most notions of classiﬁcation (e.g., classiﬁcation of industries). In other scenarios, a point

may have multiple categories (e.g., a car can have a “brand”, a “color”, “fuel type” and so on). While

it is possible to view the set of categories assigned to a point as one single category, doing so ignores a

lot of structure. For example, “a blue diesel car” is both a “blue car” and also a “diesel car” but by

considering “blue diesel” as a single category, this information is lost. We believe it is worthwhile to

study the notion of structures within categories from a theoretical point of view. We are not in fact the

ﬁrst in trying to do so and we will shortly discuss some of the previous attempts.

A natural way to represent hierarchical categories is to assume that vertices of a DAG Grepresent

the set of categories where an edge ~e = (u, v) from (a category) uto (a category) vmeans that uis a

sub-category of v. We call Gacategory DAG (or a category tree if it is a tree). For a vertex v∈ G, we

deﬁne G≤(v) as the subset of vertices of Gthat can reach v(i.e., “sub-categories” of v), G≥(v) as the

subset of vertices of Gthat vcan reach (i.e., “super-categories” of v). Similarly, for a subset H⊂V(G)

we deﬁne G≥(H) = ∪v∈HG≥(v), and G≤(H) = ∪v∈HG≤(v).

Category trees allows us to represent hierarchical categories. Category DAGs allow us to capture cases

where points can have multiple categories. Consider the car example again. We can deﬁne a category

DAG Gwhere a vertex u∈ G represents the compound category {diesel,blue}with edges to vertices

dand bthat represent “diesel” and “blue” categories respectively. Thus, the set G≤(d) represents all

the “diesel” cars and it includes the category u, the “blue diesel” category, and similarly, the set G≤(b)

represents all the “blue” cars which also includes the category u, the “blue diesel” category. Likewise,

G≥(u) includes both band dsince a “blue diesel” car is both a “blue car” and a “diesel car”.

We revisit colored range searching problems, using concepts of category DAGs or trees.

Problem 2 (sub-category range counting (SCRC)).Consider an input point set P⊂Rdof npoints, a

DAG Gwith O(n)edges, and a function C:P→ G. The goal is to store the input in a data structure,

such that given a query that consists of a query range R ⊂ Rdand a query vertex vq∈ G it can output

|CR∩ G≤(vq)|where CR={C(p)|p∈P∩ R}.

Problem 3 (hierarchical color counting (HCC)).Consider an input point set P⊂Rdof npoints, a

DAG Gwith O(n)edges, and a function C:P→ G. The goal is to store the input in a data structure,

such that given a query range R ⊂ Rdone can output |GR|where GRis the set of colors in R, i.e.,

GR=Sp∈R∩PG≥(C(p)).

In the weighted version of the problem, each vertex v(i.e., category) of Gis associated with a weight

w(v)and given the query R, the goal is to compute Pv∈GRw(v)instead.

Related problems. Very recently, there have been other attempts to consider the structure of

“colors” within the computational geometry community, e.g., He and Kazi [12] consider a problem very

similar to SCRC on a tree G; the only diﬀerence is that instead of |CR∩ G≤(v)|, the query outputs

|CR∩π(v, w)|where π(v, w) is a path between two query vertices v, w ∈ G. There are also other variants

available. See [12] for further references.

1.2 Previous and Other Related Results

The study of colored range counting and its variants began in 1993 [14] and since then it has received

considerable attention. See the survey on colored range searching and its variants [9]. The problem has

at least three main variants: color range counting (CRC), color range reporting, and “type 2” color range

counting (for every distinct color, count how many times it appears). Here, we only review colored range

counting results.

In 1D, one can solve the CRC problem using O(n) space and O(log n) query time by an elegant and

simple transformation [10] which turns the problem into the unweighted 2D orthogonal range counting

problem which can be solved within said bounds [5]. Interestingly, it is also possible to show an equiva-

lence between the two problems [16]. The problem, however, is diﬃcult for 2D and beyond. Kaplan et

al. [15] showed that answering mqueries on a set of npoints requires Ω(nω/2−o(1)) time where ωis the

boolean matrix multiplication exponent. Under some assumptions (e.g., the boolean matrix multiplica-

tion conjectures), this shows that P(n) + mQ(n)≥n3/2−o(1) where P(·) and Q(·) are the preprocessing

time and the query time of any data structure that solves the 2D CRC problem.

Note that the equivalence between 1D CRC and 2D range counting also applies to the weighted case

of both problems, however, the status of the weighted 2D range counting is still unresolved. It can be

solved with O(nlog n/ log log n) space and O(log n/ log log n) query time [13] but it is not known if both

space and query time can be improved simultaneously (it is possible to improve one at the expense of

the other). The only available lower bound is a query time lower bound of Ω(log n/ log log n) [17].

Some other interesting problems related to our results are deﬁned below.

Deﬁnition 1. The following problems are deﬁned for an input that consists of a set P⊂Rdof npoints.

The goal is to build a data structure to answer the following queries.

•(orthogonal range counting) Given a query axis-aligned rectangle R, count the number of points

in R. In the weighted version, the points have weights and the goal is to compute the sum of the

weights in the query.

•(dominance range counting) This is a special case of orthogonal range counting where the query

rectangle has the form (−∞, q1]× ···(−∞, qd]which is also known as a dominance range. Or-

thogonal range counting and dominance range counting are known to be equivalent if subtraction of

weights are allowed (e.g., integer weights).

•(3-sided color counting). The input is in the plane (d= 2) and each point is assigned a color from

a set Cand the query is a 3-sided range in the form of R= [q`, qr]×(−∞, qt]and the goal is to

count the number of colors in R. In the weighted version, the colors have weights and the goal is

to compute the sum of the weights of the colors.

•(3-sided distinct coordinate counting) This is a special case of 3-sided color counting where an input

point (xi, yi)has color yi; in other words, given the query R= [q`, qr]×(−∞, qt], we would like to

count the number of distinct Y-coordinates inside it.

•(range max) Given a weight function W:P→Ras part of the input, at the query time we would

like to ﬁnd the maximum weight inside a given query range R.

•(sum-max color counting) This a combination of range max and color counting queries. Assume the

points in Phave been assigned colors from a set Cand assume we have a weight function W:P→R

on the points. Given a query range R, we would like to compute the output Pc∈CXc(R)where

Xc(R)is the maximum weight of a point with color cinside R; if no point of color cexists in R,

then Xc(R) = 0.

A sum-max color counting query includes a number of the above problems as its special case: If all

points have the same weight, then it reduces to a color counting query. If all points have the same

color, then it reduces to a range max query. If all points have distinct colors, then it reduces to a

weighted range counting query.

1.3 Our Results

Clearly, sub-category range counting (SCRC) is at least as hard as CRC. We also observe that hierarchical

color counting (HCC) is also at least as hard. Thus, getting eﬃcient results for d≥2 seems hopeless.

Consequently, we focus on the 1D problem but for two diﬀerent important DAG’s: when Gis a tree and

also when Gis an arbitrary (sparse) DAG. Our main results are the following.

For the SCRC problem, ﬁrst, we observe that the following problems are equivalent:

1. SCRC when Gis a single path on a one-dimensional input P.

2. 3-sided distinct coordinate counting (for a planar point set P).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HierarchicalCategoriesinColoredSearchingPeymanAfshani*RasmusKillmannKasperGreenLarsenAbstractIncoloredrangecounting(CRC),theinputisasetofpointswhereeachpointisassignedacolor"(oracategory")andthegoalistostoretheminadatastructuresuchthatthenumberofdistinctcategoriesinsideagivenqueryrangecanbecount...

展开>> 收起<<

Hierarchical Categories in Colored Searching Peyman AfshaniRasmus KillmannKasper Green Larsen Abstract.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Hierarchical Categories in Colored Searching Peyman AfshaniRasmus KillmannKasper Green Larsen Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: