An Efﬁcient and Continuous Voronoi Density Estimator

2025-04-22 0 0 2.4MB 13 页 10玖币

侵权投诉

Giovanni Luca Marchetti Vladislav Polianskii Anastasiia Varava

Florian T. Pokorny Danica Kragic

School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology

Stockholm, Sweden

Abstract

We introduce a non-parametric density estima-

tor deemed Radial Voronoi Density Estimator

(RVDE). RVDE is grounded in the geometry of

Voronoi tessellations and as such beneﬁts from

local geometric adaptiveness and broad conver-

gence properties. Due to its radial deﬁnition

RVDE is continuous and computable in linear

time with respect to the dataset size. This amends

for the main shortcomings of previously studied

VDEs, which are highly discontinuous and com-

putationally expensive. We provide a theoretical

study of the modes of RVDE as well as an em-

pirical investigation of its performance on high-

dimensional data. Results show that RVDE out-

performs other non-parametric density estima-

tors, including recently introduced VDEs.

1 INTRODUCTION

The problem of estimating a Probability Density Function

(PDF) from a ﬁnite set of samples lies at the heart of statis-

tics and arises in several practical scenarios (Diggle 2013;

Scott 2015). Among density estimators, the non-parametric

ones aim to infer a PDF through a closed formula. Differ-

ently from parametric methods, they do not require opti-

mization and ideally provide an estimated PDF which is

simple, interpretable and computationally efﬁcient. Two

traditional examples of non-parametric density estimators

are the Kernel Density Estimator (KDE; Gramacki 2018,

Rosenblatt 1956) and histograms (Freedman et al. 1981,

Pearson 1894). KDE consists of a mixture of local copies

of a kernel around each datapoint while histograms parti-

tion the ambient space into local cells (‘bins’) where the

estimated PDF is constant.

Proceedings of the 26th International Conference on Artiﬁcial

Intelligence and Statistics (AISTATS) 2023, Valencia, Spain.

Figure 1: An example of a density estimated via RVDE.

The Voronoi tessellation is depicted in solid gray. The es-

timated density is deﬁned by the property that its conical

integral over the rays originating from the datapoints (or-

ange) is constant.

Both histograms and KDE suffer from bias due to the prior

choice of a local geometric structure i.e., the bins and the

kernel respectively. This bias gets exacerbated in high-

dimensional ambient spaces. The reason is that datasets

grow exponentially in terms of geometric complexity, mak-

ing a ﬁxed simple geometry unsuitable for estimating high-

dimensional densities. This has led to the introduction of

the Voronoi Density Estimator (VDE; Ord 1978). VDE re-

lies on the geometric adaptiveness of Voronoi cells, which

are convex polytopes deﬁned locally by the data (Okabe et

al. 2009). The PDF estimated by VDE is constant on such

cells, thus behaving as an adaptive version of histograms.

Due to its local geometric properties, VDE possesses con-

vergence guarantees to the ground-truth PDF which are

more general than the ones of KDE.

The geometric beneﬁts of VDE, however, come with a

number of shortcomings. First, the Voronoi cells and in

particular their volumes are computationally expensive to

arXiv:2210.03964v2 [stat.ME] 7 Feb 2023

An Efﬁcient and Continuous Voronoi Density Estimator

compute in high dimensions. Although this has been re-

cently attenuated by proposing Monte Carlo approxima-

tions (Polianskii et al. 2022), VDE falls behind methods

such as KDE in terms of computational complexity. Sec-

ond, VDE (together with its generalized version from Po-

lianskii et al. 2022 deemed CVDE) is highly discontin-

uous on the boundaries of Voronoi cells. The estimated

PDF consequently suffers from large variance and instabil-

ity with respect to the dataset. This is again in contrast to

KDE, which is continuous in its ambient space.

In this work, we propose a novel non-parametric den-

sity estimator deemed Radial Voronoi Density Estimator

(RVDE) which addresses the above challenges. Similarly

to VDE, RVDE integrates to a constant on Voronoi cells

and thus shares its local geometric advantages and conver-

gence properties. In contrast to VDE, RVDE is continuous

and computable in linear time with respect to the dataset

size. The central idea behind RVDE is to deﬁne the PDF

radially from the datapoints so that the (conic) integral over

the ray cast in the corresponding Voronoi cell is constant

(see Figure 1). This is achieved via a ‘radial bandwidth’

which is deﬁned implicitly by an integral equation. Intu-

itively, the radial approach reduces the high-dimensional

geometric challenge of deﬁning a Voronoi-based estima-

tor to a one-dimensional problem. This avoids the expen-

sive volume computations of the original VDE and guar-

antees continuity because of the fundamental properties of

Voronoi tessellations. Another important aspect of RVDE

is its geometric distribution of modes. We show that the

modes either coincide with the datapoints or lie along the

edges of the Gabriel graph (Gabriel et al. 1969) depending

on a hyperparameter analogous to the bandwidth in KDE.

We compare RVDE with CVDE, KDE and the adaptive

version of the latter in a series of experiments. RVDE out-

performs the baselines in terms of the quality of the esti-

mated density on a variety of datasets. Moreover, it runs

signiﬁcantly faster and with lower sampling variance com-

pared to CVDE. This empirically conﬁrms that the geomet-

ric and continuity properties of RVDE translate into bene-

ﬁts for the estimated density in a computationally efﬁcient

manner. We provide an implementation of RVDE (together

with baselines and experiments) in C++ at a publicly avail-

able repository 1. The code is parallelized via the OpenCL

framework and comes with a Python interface. In summary

our contributions include:

• A novel density estimator (RVDE) based on the geom-

etry of Voronoi tessellations which is continuous and

computationally efﬁcient.

• A complete study of the modes of RVDE and their

geometric distribution.

1https://github.com/giovanni-marchetti/

rvde

• An empirical investigation comparing RVDE to KDE

(together with its adaptive version) and previously

studied VDEs.

2 RELATED WORK

2.1 Non-parametric Density Estimation

Non-parametric methods for density estimation trace back

to the introduction of histograms (Pearson 1894). His-

tograms have been extended by considering bin geometries

beyond the canonical rectangular one, for example triangu-

lar (Scott 1988) and hexagonal (Carr et al. 1992) geome-

tries. Another popular density estimator is KDE, ﬁrst dis-

cussed by Rosenblatt 1956 and Parzen 1962. The estimated

density is a mixture of copies of a priorly chosen distribu-

tion (‘kernel’) centered at the datapoints. KDE has been

extended to the multivariate case (Izenman 1991; Dehnad

1987) and has seen improvements such as bandwidth se-

lection methods (Marron 1987; Wand et al. 1994) and al-

gorithms for adaptive bandwidths (Wang et al. 2007; Walt

et al. 2017). Applications of KDE include estimation of

trafﬁc incidents (Xie et al. 2008), of archaeological data

(Baxter et al. 1997) and of wind speed (Bo et al. 2017) to

name a few. As discussed in Section 1, both KDE and his-

tograms suffer from lack of geometric adaptiveness due to

the choice of prior local geometries.

Another class of non-parametric methods are the orthog-

onal density estimators (Vannucci 1995; Masry 1997).

Those consist of choosing a discretized orthonormal basis

of functions and computing the coefﬁcients of the ground-

truth density via Monte-Carlo integration over the dataset.

When the basis is the Fourier one, the estimator is referred

to as ‘wavelet estimator’. The core drawback is that orthog-

onal density estimators do not scale efﬁciently to higher

dimensions. When considering canonical tensor product

bases the complexity grows exponentially w.r.t. the dimen-

sionality (Walter 1995), making the estimator unfeasible to

compute.

2.2 Voronoi Density Estimators

The ﬁrst Voronoi Density Estimator (VDE) has been pi-

oneered by Ord 1978. The estimated density relies on

Voronoi tessellations in order to achieve local geometric

adaptiveness. This is the main advantage of VDE over

methods such as KDE. The original VDE has seen appli-

cations to real-world densities such as neurons in the brain

(Duyckaerts et al. 1994), photons (Ebeling et al. 1993)

and stars in a galaxy (Vavilova et al. 2021). However, the

method is not immediately extendable to high-dimensional

spaces because of unfeasible computational complexity of

volumes and abundance of unbounded Voronoi cells. This

has been only recently amended by Polianskii et al. 2022

by introducing approximate numerical algorithms and by

G. L. Marchetti, V. Polianskii, A. Varava, F. T. Pokorny, D. Kragic

shaping of the density via a kernel. In the present work, we

aim to design an alternative version of the original VDE

which is continuous and does not rely on volume compu-

tations. The resulting estimator is thus stable and compu-

tationally efﬁcient while still beneﬁting from the geometry

of Voronoi tessellations.

3 BACKGROUND

In this section we recall the class of non-parametric density

estimators which we will be interested in throughout the

present work. To this end, let P⊆Rnbe a ﬁnite set and

consider the following central notion from computational

geometry.

Deﬁnition 1. The Voronoi cell2of p∈Pis:

C(p) = {x∈Rn| ∀q∈P d(x, q)≥d(x, p)}.(1)

The Voronoi cells are convex polytopes that intersect at the

boundary and cover the ambient space Rn. The collection

{C(p)}p∈Pis referred to as Voronoi tessellation generated

by P. Note that although the Voronoi tessellations are de-

ﬁned in an arbitrary metric space, the resulting cells might

be non-convex for distances different from the Euclidean

one. Since convexity will be crucial for the following con-

structions, we stick to the Euclidean metric for the rest of

the work.

We call density estimator any mapping associating a proba-

bility density function fP∈L1(Rn)to a ﬁnite set P⊆Rn.

The following class of density estimators generalizes the

original one by Ord 1978.

Deﬁnition 2. AVoronoi Density Estimator (VDE) is a den-

sity estimator P7→ fPsuch that for each p∈P:

ZC(p)

fP(x)dx=1

|P|.(2)

VDEs stand out among density estimators for their geo-

metric properties. This is because the Voronoi cells are ar-

bitrary polytopes that are adapted to the local geometry of

data. For VDEs all the Voronoi cells have the same esti-

mated probability, making such estimators locally adaptive

from a geometric perspective. This is reﬂected, for exam-

ple, by the general convergence properties of VDEs. The

following is the main theoretical result from Polianskii et

al. 2022.

Theorem 3.1. Let P7→ fPbe a VDE and suppose that

Pis sampled from a probability density ρ∈L1(Rn)with

support in the whole Rn. For Pof cardinality mconsider

the probability measure Pm=fPdxwhich is random in

P. Then the sequence Pmconverges to P=ρdxin distri-

bution w.r.t. xand in probability w.r.t. P. Namely, for any

2Sometimes referred to as Dirichlet cell.

measurable set E⊆Rnthe sequence of random variables

Pm(E)converges in probability to the constant P(E).

In contrast, the convergence of other density estimators

such as KDE requires the kernel bandwidth to vanish

asymptotically (Devroye et al. 1979). The bandwidth van-

ishing is necessary in order to amend for the local geomet-

ric bias inherent in KDE as discussed in Section 1.

The following canonical construction of a VDE deemed

Compactiﬁed Voronoi Density Estimator (CVDE) is dis-

cussed by Polianskii et al. 2022. Given an integrable kernel

K:Rn×Rn→R>0the estimated density is deﬁned as

fP(x) = K(p, x)

|P|Volp(C(p)) (3)

where pis the closest point in Pto xand Volp(C(p)) =

RC(p)K(p, y)dy. The latter volumes are approximated

via Monte Carlo methods since they become unfeasible to

compute exactly as dimensions grow. The resulting density

inherits the same regularity as Kwhen restricted to each

Voronoi cell but jumps discontinuously when crossing the

boundary of Voronoi cells (see Figure 3). Motivated by

this, the goal of the present work is to introduce a continu-

ous and efﬁcient VDE.

4 METHOD

4.1 Radial Voronoi Density Estimator

In this section we outline a general way of constructing a

VDE with continuous density function. Our central idea is

to deﬁne the latter radially w.r.t. the datapoints p∈P. We

start by rephrasing the integral over a Voronoi cell (Equa-

tion 2) in spherical coordinates:

ZC(p)

fP(x)dx=

=ZSn−1Zl(p+σ)

tn−1fP(p+tσ)dt

| {z }

Conical Integral

dσ. (4)

Here Sn−1⊆Rndenotes the unit sphere and l(x)∈

[0,+∞]denotes the length of the segment contained in

C(p)of the ray cast from ppassing through xi.e.,

l(x) = sup t≥0|p+tx−p

d(x, p)∈C(p).(5)

We refer to Figure 2 for a visual illustration. Note that l(x)

is deﬁned for x6=pand is continuous in its domain since

l(x) = d(x, p) = d(x, q)for x∈C(p)∩C(q).

We aim to solve Equation 4 by forcing the conical integral

in Equation 4 to be constant. To this end, we ﬁx a contin-

uous and strictly decreasing function K:R>A →R≥0

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnEfcientandContinuousVoronoiDensityEstimatorGiovanniLucaMarchettiVladislavPolianskiiAnastasiiaVaravaFlorianT.PokornyDanicaKragicSchoolofElectricalEngineeringandComputerScience,KTHRoyalInstituteofTechnologyStockholm,SwedenAbstractWeintroduceanon-parametricdensityestima-tordeemedRadialVoronoiDensity...

展开>> 收起<<

An Efﬁcient and Continuous Voronoi Density Estimator.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

An Efﬁcient and Continuous Voronoi Density Estimator

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: