An Efficient and Continuous Voronoi Density Estimator

2025-04-22 0 0 2.4MB 13 页 10玖币
侵权投诉
An Efficient and Continuous Voronoi Density Estimator
Giovanni Luca Marchetti Vladislav Polianskii Anastasiia Varava
Florian T. Pokorny Danica Kragic
School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology
Stockholm, Sweden
Abstract
We introduce a non-parametric density estima-
tor deemed Radial Voronoi Density Estimator
(RVDE). RVDE is grounded in the geometry of
Voronoi tessellations and as such benefits from
local geometric adaptiveness and broad conver-
gence properties. Due to its radial definition
RVDE is continuous and computable in linear
time with respect to the dataset size. This amends
for the main shortcomings of previously studied
VDEs, which are highly discontinuous and com-
putationally expensive. We provide a theoretical
study of the modes of RVDE as well as an em-
pirical investigation of its performance on high-
dimensional data. Results show that RVDE out-
performs other non-parametric density estima-
tors, including recently introduced VDEs.
1 INTRODUCTION
The problem of estimating a Probability Density Function
(PDF) from a finite set of samples lies at the heart of statis-
tics and arises in several practical scenarios (Diggle 2013;
Scott 2015). Among density estimators, the non-parametric
ones aim to infer a PDF through a closed formula. Differ-
ently from parametric methods, they do not require opti-
mization and ideally provide an estimated PDF which is
simple, interpretable and computationally efficient. Two
traditional examples of non-parametric density estimators
are the Kernel Density Estimator (KDE; Gramacki 2018,
Rosenblatt 1956) and histograms (Freedman et al. 1981,
Pearson 1894). KDE consists of a mixture of local copies
of a kernel around each datapoint while histograms parti-
tion the ambient space into local cells (‘bins’) where the
estimated PDF is constant.
Proceedings of the 26th International Conference on Artificial
Intelligence and Statistics (AISTATS) 2023, Valencia, Spain.
PMLR: Volume 206. Copyright 2023 by the author(s).
Figure 1: An example of a density estimated via RVDE.
The Voronoi tessellation is depicted in solid gray. The es-
timated density is defined by the property that its conical
integral over the rays originating from the datapoints (or-
ange) is constant.
Both histograms and KDE suffer from bias due to the prior
choice of a local geometric structure i.e., the bins and the
kernel respectively. This bias gets exacerbated in high-
dimensional ambient spaces. The reason is that datasets
grow exponentially in terms of geometric complexity, mak-
ing a fixed simple geometry unsuitable for estimating high-
dimensional densities. This has led to the introduction of
the Voronoi Density Estimator (VDE; Ord 1978). VDE re-
lies on the geometric adaptiveness of Voronoi cells, which
are convex polytopes defined locally by the data (Okabe et
al. 2009). The PDF estimated by VDE is constant on such
cells, thus behaving as an adaptive version of histograms.
Due to its local geometric properties, VDE possesses con-
vergence guarantees to the ground-truth PDF which are
more general than the ones of KDE.
The geometric benefits of VDE, however, come with a
number of shortcomings. First, the Voronoi cells and in
particular their volumes are computationally expensive to
arXiv:2210.03964v2 [stat.ME] 7 Feb 2023
An Efficient and Continuous Voronoi Density Estimator
compute in high dimensions. Although this has been re-
cently attenuated by proposing Monte Carlo approxima-
tions (Polianskii et al. 2022), VDE falls behind methods
such as KDE in terms of computational complexity. Sec-
ond, VDE (together with its generalized version from Po-
lianskii et al. 2022 deemed CVDE) is highly discontin-
uous on the boundaries of Voronoi cells. The estimated
PDF consequently suffers from large variance and instabil-
ity with respect to the dataset. This is again in contrast to
KDE, which is continuous in its ambient space.
In this work, we propose a novel non-parametric den-
sity estimator deemed Radial Voronoi Density Estimator
(RVDE) which addresses the above challenges. Similarly
to VDE, RVDE integrates to a constant on Voronoi cells
and thus shares its local geometric advantages and conver-
gence properties. In contrast to VDE, RVDE is continuous
and computable in linear time with respect to the dataset
size. The central idea behind RVDE is to define the PDF
radially from the datapoints so that the (conic) integral over
the ray cast in the corresponding Voronoi cell is constant
(see Figure 1). This is achieved via a ‘radial bandwidth’
which is defined implicitly by an integral equation. Intu-
itively, the radial approach reduces the high-dimensional
geometric challenge of defining a Voronoi-based estima-
tor to a one-dimensional problem. This avoids the expen-
sive volume computations of the original VDE and guar-
antees continuity because of the fundamental properties of
Voronoi tessellations. Another important aspect of RVDE
is its geometric distribution of modes. We show that the
modes either coincide with the datapoints or lie along the
edges of the Gabriel graph (Gabriel et al. 1969) depending
on a hyperparameter analogous to the bandwidth in KDE.
We compare RVDE with CVDE, KDE and the adaptive
version of the latter in a series of experiments. RVDE out-
performs the baselines in terms of the quality of the esti-
mated density on a variety of datasets. Moreover, it runs
significantly faster and with lower sampling variance com-
pared to CVDE. This empirically confirms that the geomet-
ric and continuity properties of RVDE translate into bene-
fits for the estimated density in a computationally efficient
manner. We provide an implementation of RVDE (together
with baselines and experiments) in C++ at a publicly avail-
able repository 1. The code is parallelized via the OpenCL
framework and comes with a Python interface. In summary
our contributions include:
A novel density estimator (RVDE) based on the geom-
etry of Voronoi tessellations which is continuous and
computationally efficient.
A complete study of the modes of RVDE and their
geometric distribution.
1https://github.com/giovanni-marchetti/
rvde
An empirical investigation comparing RVDE to KDE
(together with its adaptive version) and previously
studied VDEs.
2 RELATED WORK
2.1 Non-parametric Density Estimation
Non-parametric methods for density estimation trace back
to the introduction of histograms (Pearson 1894). His-
tograms have been extended by considering bin geometries
beyond the canonical rectangular one, for example triangu-
lar (Scott 1988) and hexagonal (Carr et al. 1992) geome-
tries. Another popular density estimator is KDE, first dis-
cussed by Rosenblatt 1956 and Parzen 1962. The estimated
density is a mixture of copies of a priorly chosen distribu-
tion (‘kernel’) centered at the datapoints. KDE has been
extended to the multivariate case (Izenman 1991; Dehnad
1987) and has seen improvements such as bandwidth se-
lection methods (Marron 1987; Wand et al. 1994) and al-
gorithms for adaptive bandwidths (Wang et al. 2007; Walt
et al. 2017). Applications of KDE include estimation of
traffic incidents (Xie et al. 2008), of archaeological data
(Baxter et al. 1997) and of wind speed (Bo et al. 2017) to
name a few. As discussed in Section 1, both KDE and his-
tograms suffer from lack of geometric adaptiveness due to
the choice of prior local geometries.
Another class of non-parametric methods are the orthog-
onal density estimators (Vannucci 1995; Masry 1997).
Those consist of choosing a discretized orthonormal basis
of functions and computing the coefficients of the ground-
truth density via Monte-Carlo integration over the dataset.
When the basis is the Fourier one, the estimator is referred
to as ‘wavelet estimator’. The core drawback is that orthog-
onal density estimators do not scale efficiently to higher
dimensions. When considering canonical tensor product
bases the complexity grows exponentially w.r.t. the dimen-
sionality (Walter 1995), making the estimator unfeasible to
compute.
2.2 Voronoi Density Estimators
The first Voronoi Density Estimator (VDE) has been pi-
oneered by Ord 1978. The estimated density relies on
Voronoi tessellations in order to achieve local geometric
adaptiveness. This is the main advantage of VDE over
methods such as KDE. The original VDE has seen appli-
cations to real-world densities such as neurons in the brain
(Duyckaerts et al. 1994), photons (Ebeling et al. 1993)
and stars in a galaxy (Vavilova et al. 2021). However, the
method is not immediately extendable to high-dimensional
spaces because of unfeasible computational complexity of
volumes and abundance of unbounded Voronoi cells. This
has been only recently amended by Polianskii et al. 2022
by introducing approximate numerical algorithms and by
G. L. Marchetti, V. Polianskii, A. Varava, F. T. Pokorny, D. Kragic
shaping of the density via a kernel. In the present work, we
aim to design an alternative version of the original VDE
which is continuous and does not rely on volume compu-
tations. The resulting estimator is thus stable and compu-
tationally efficient while still benefiting from the geometry
of Voronoi tessellations.
3 BACKGROUND
In this section we recall the class of non-parametric density
estimators which we will be interested in throughout the
present work. To this end, let PRnbe a finite set and
consider the following central notion from computational
geometry.
Definition 1. The Voronoi cell2of pPis:
C(p) = {xRn| ∀qP d(x, q)d(x, p)}.(1)
The Voronoi cells are convex polytopes that intersect at the
boundary and cover the ambient space Rn. The collection
{C(p)}pPis referred to as Voronoi tessellation generated
by P. Note that although the Voronoi tessellations are de-
fined in an arbitrary metric space, the resulting cells might
be non-convex for distances different from the Euclidean
one. Since convexity will be crucial for the following con-
structions, we stick to the Euclidean metric for the rest of
the work.
We call density estimator any mapping associating a proba-
bility density function fPL1(Rn)to a finite set PRn.
The following class of density estimators generalizes the
original one by Ord 1978.
Definition 2. AVoronoi Density Estimator (VDE) is a den-
sity estimator P7→ fPsuch that for each pP:
ZC(p)
fP(x)dx=1
|P|.(2)
VDEs stand out among density estimators for their geo-
metric properties. This is because the Voronoi cells are ar-
bitrary polytopes that are adapted to the local geometry of
data. For VDEs all the Voronoi cells have the same esti-
mated probability, making such estimators locally adaptive
from a geometric perspective. This is reflected, for exam-
ple, by the general convergence properties of VDEs. The
following is the main theoretical result from Polianskii et
al. 2022.
Theorem 3.1. Let P7→ fPbe a VDE and suppose that
Pis sampled from a probability density ρL1(Rn)with
support in the whole Rn. For Pof cardinality mconsider
the probability measure Pm=fPdxwhich is random in
P. Then the sequence Pmconverges to P=ρdxin distri-
bution w.r.t. xand in probability w.r.t. P. Namely, for any
2Sometimes referred to as Dirichlet cell.
measurable set ERnthe sequence of random variables
Pm(E)converges in probability to the constant P(E).
In contrast, the convergence of other density estimators
such as KDE requires the kernel bandwidth to vanish
asymptotically (Devroye et al. 1979). The bandwidth van-
ishing is necessary in order to amend for the local geomet-
ric bias inherent in KDE as discussed in Section 1.
The following canonical construction of a VDE deemed
Compactified Voronoi Density Estimator (CVDE) is dis-
cussed by Polianskii et al. 2022. Given an integrable kernel
K:Rn×RnR>0the estimated density is defined as
fP(x) = K(p, x)
|P|Volp(C(p)) (3)
where pis the closest point in Pto xand Volp(C(p)) =
RC(p)K(p, y)dy. The latter volumes are approximated
via Monte Carlo methods since they become unfeasible to
compute exactly as dimensions grow. The resulting density
inherits the same regularity as Kwhen restricted to each
Voronoi cell but jumps discontinuously when crossing the
boundary of Voronoi cells (see Figure 3). Motivated by
this, the goal of the present work is to introduce a continu-
ous and efficient VDE.
4 METHOD
4.1 Radial Voronoi Density Estimator
In this section we outline a general way of constructing a
VDE with continuous density function. Our central idea is
to define the latter radially w.r.t. the datapoints pP. We
start by rephrasing the integral over a Voronoi cell (Equa-
tion 2) in spherical coordinates:
ZC(p)
fP(x)dx=
=ZSn1Zl(p+σ)
0
tn1fP(p+)dt
| {z }
Conical Integral
dσ. (4)
Here Sn1Rndenotes the unit sphere and l(x)
[0,+]denotes the length of the segment contained in
C(p)of the ray cast from ppassing through xi.e.,
l(x) = sup t0|p+txp
d(x, p)C(p).(5)
We refer to Figure 2 for a visual illustration. Note that l(x)
is defined for x6=pand is continuous in its domain since
l(x) = d(x, p) = d(x, q)for xC(p)C(q).
We aim to solve Equation 4 by forcing the conical integral
in Equation 4 to be constant. To this end, we fix a contin-
uous and strictly decreasing function K:R>A R0
摘要:

AnEfcientandContinuousVoronoiDensityEstimatorGiovanniLucaMarchettiVladislavPolianskiiAnastasiiaVaravaFlorianT.PokornyDanicaKragicSchoolofElectricalEngineeringandComputerScience,KTHRoyalInstituteofTechnologyStockholm,SwedenAbstractWeintroduceanon-parametricdensityestima-tordeemedRadialVoronoiDensity...

展开>> 收起<<
An Efficient and Continuous Voronoi Density Estimator.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:2.4MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注