Revisiting Le Cams Equation Exact Minimax Rates over Convex Density Classes Shamindra Shrotriya Matey Neykov

2025-04-29 0 0 1.14MB 46 页 10玖币
侵权投诉
Revisiting Le Cam’s Equation: Exact Minimax Rates
over Convex Density Classes
Shamindra Shrotriya Matey Neykov
Retail Intelligence Department of Statistics and Data Science
Walmart Northwestern University
Bentonville, AR 72716 Evanston, IL 60208
shamindra.shrotriya@walmart.com mneykov@northwestern.edu
Abstract
We study the classical problem of deriving minimax rates for density estimation over convex
density classes. Building on the pioneering work of Birgé [1983], Birgé [1986], Le Cam [1973],
Wong and Shen [1995], Yang and Barron [1999], we determine the exact (up to constants)
minimax rate over any convex density class. This work thus extends these known results by
demonstrating that the local metric entropy of the density class always captures the minimax
optimal rates under such settings. Our bounds provide a unifying perspective across both
parametric and nonparametric convex density classes, under weaker assumptions on the richness
of the density class than previously considered. Our proposed ‘multistage sieve’ MLE applies to
any such convex density class. We further demonstrate that this estimator is also adaptive to
the true underlying density of interest. We apply our risk bounds to rederive known minimax
rates including bounded total variation, and Lipschitz density classes. We further illustrate the
utility of the result by deriving upper bounds for less studied classes, e.g., convex mixture of
densities.
1 Introduction
It is well known that (global) metric entropy often times determines the minimax rates for density
estimation. Specifically, the following equation sometimes informally referred to as the ‘Le Cam
equation’, is used to heuristically determine the minimax rate of convergence
log Mglo
F(ε)2,
where nis the sample size, log Mglo
F(ε)is the global metric entropy of the density set Fat a Hellinger
distance ε(see Definition 5), and ε2determines the order of the minimax rate. In this paper we
complement these known results, by establishing that local metric entropy always determines the
minimax rate for convex density classes, where the densities are assumed to be (uniformly) bounded
from above and below.
In detail, under the setting of density estimation just described, we suggest a small revision
to the Le Cam equation: namely, change the global entropy to local entropy, and the Hellinger
metric to the L2-metric. Furthermore, the same result holds when the convex density class contains
densities only (uniformly) bounded from above, and a single density which is bounded from below.
Unlike previous known results, our result unites minimax density estimation under both parametric
and nonparametric convex density classes. A further contribution is that our proposed ‘multistage
sieve’ maximum likelihood estimator (MLE) achieves these bounds regardless of the density class (as
long as it is convex). In addition, we demonstrate that this multistage sieve estimator is adaptive to
1
arXiv:2210.11436v2 [math.ST] 23 Oct 2023
the true density of interest. To the best of our knowledge we are not aware of any other estimators
with such a property, under such general settings.
We will now formally describe the setting we consider. To that end, we first define a general class
of bounded densities, i.e., F[α,β]
B. Later, we will assume that the true density of interest belongs to
a known convex subset of this general ambient density class.
Definition 1 (Ambient density class F[α,β]
B).Given constants 0< α < β < , for some fixed
dimension pN, and a common known (Borel measurable) compact support set BRp(with
positive measure), we then define the class of density functions, F[α,β]
B, as follows:
F[α,β]
B:=f:B[α, β]ZB
fdµ= 1, f measurable,(1)
where µis the dominating finite measure on B. We always take µto be a (normalized) probability
measure on B.
Furthermore, we can endow F[α,β]
Bwith the L2-metric. That is, for any two densities f, g ∈ F[α,β]
B,
we denote the L2-metric between them to be
fg2:=ZB
(fg)2dµ1
2.(2)
Remark 1.Qualitatively, we have that F[α,β]
Bis the class of all densities that are uniformly α-lower
bounded and β-upper bounded, on a common compact support BRp. Furthermore, Definition 1
implies that F[α,β]
Bforms a convex set, and that the metric space (F[α,β]
B,∥·∥2)is complete, bounded,
but may not be totally bounded1.
In this paper we will focus on the scenario where it is known that the true density f∈ F ⊂ F[α,β]
B,
where Fis a known convex set. The set Frepresents our knowledge on the true density, before
observing any data. With these mathematical preliminaries, we formalize our core density estimation
problem of interest as follows.
Core problem: Suppose that we observe nobservations X:= (X1, . . . , Xn)i.i.d.
f, for some (fixed but unknown) f∈ F. Here F ⊂ F[α,β]
Bis a convex set, which is
known to the observer. Can we propose a universal estimator for f, and derive the
exact (up to constants) squared L2-minimax rate of estimation, in expectation?
For convenience, we can illustrate the generating process for a univariate example of our density
estimation problem of interest in Figure 1. It will serve as a useful conceptual guide to later help
visualize our proposed estimator over such general convex class of densities F.
Now, without further ado, we will informally state our main result as a direct affirmative answer
to our core question of interest. Namely, there does exist a likelihood-based estimator (one can
think of it as a multistage sieve MLE), i.e., ν(X), which achieves the following rate of estimation
error
sup
f∈F
Eν(X)f2
2ε2diam2(F)2.(3)
1These fundamental (and additional) analytic properties of F[α,β]
Bare formally justified in Appendix A.1.
2
Figure 1: Illustrative generating process for a univariate density f∈ F ⊂ F[α,β]
B.
Here ε:= sup{ε:2log Mloc
F(ε, c)}, with log Mloc
F(ε, c)being the L2-local metric entropy of F
(see Definition 5). The quantity diam2(F), refers to the L2-diameter of F, which is finite by the
boundedness of F[α,β]
Bin our setting. In addition, the rate above is minimax optimal, as there is a
matching (up to constants) lower bound.
Remark 2.We will later see that we can largely relax the α-lower boundedness condition on F.
That is, the results we are about to derive can be readily generalized to convex subsets F ⊂ F[0]
B.
This is so as long as the class Fcontains a single density which is bounded away from 0.
Next, we turn our attention to reviewing some relevant literature.
1.1 Relevant Literature
Classical work
As noted, density estimation is a classical statistical estimation problem with a rich history.
Lively accounts of the key references, particularly for nonparametric density estimation as relevant
to our setting, are already covered in Yang and Barron [1999, Section 1] and Bilodeau et al. [2021,
Section 6.1]. We similarly begin with a brief panoramic overview of these references in regard to
minimax risk bounds for density estimation, before comparing and contrasting the results from the
most relevant references to our work.
In terms of minimax lower bounds on density estimation, Boyd and Steele [1978] prove a fun-
damental n1rate in the mean integrated pth power error (with p1), for any arbitrary density
estimator. Such generalized lower bounds on density estimation were also further studied in Devroye
[1983]. In the case of density estimation over classes with more assumed structure (e.g., smoothness,
or regularity assumptions) minimax lower bounds have been developed based on hypothesis testing
approaches coupled with information-theoretic techniques. We now provide brief highlights of such
key works in this direction.
In Bretagnolle and Huber [1979], the authors derive sharp lower bounds for Sobolev smooth
densities in Rd(dN), with risk measured with respect to a power of the Lq-metric (q1). In
Birgé [1986], sharp risk bounds for more general classes of such smooth families were provided using
3
metric entropy based methods, with an emphasis on the Hellinger loss. The work of Efro˘ımovich
and Pinsker [1982] provided precise (asymptotic) analysis for an ellipsoidal class of densities in
the L2-metric. Across a wide-ranging series of related and collaborative efforts Has’minski˘ı [1978],
Ibragimov and Has’minski˘ı [1977,1978], Ibragimov and Khas’minskij [1980] used Fano’s inequality
type arguments to establish lower bounds over a variety of density estimation settings. These
range from deriving lower bounds on nonparametric density estimation in the uniform metric, to
minimax risk bounds for the Gaussian white noise model, for example. The authors also develop
metric entropy based techniques in Has’minski˘ı and Ibragimov [1990] to derive minimax lower
bounds for a wide variety of density classes defined on Rd(dN), in Lq-loss (q1). Numerous
applications of optimal lower bounds using both Assouad’s and Fano’s lemma arguments for densities
on a compact support, are demonstrated in [Yu,1997, Section 29.3]. Later Yang and Barron
[1999] demonstrated that global metric entropy bounds capture minimax risk for sufficiently rich
density classes over a common compact support. Classical reference texts on minimax lower bound
techniques with an emphasis on nonparametric density estimation include Devroye [1987], Devroye
and Györfi [1985], Le Cam [1986]. More modern such references include Tsybakov [2009] and
Wainwright [2019, Chapter 15]. The latter in particular, also incorporates metric entropy based
lower bound techniques.
In addition, there is a large body of work in deriving upper bounds for specific density estimators
using metric entropy methods. This includes Barron and Cover [1991], Yatracos [1985], who employ
the minimum distance principle to derive density estimators and their metric entropy-based upper
bounds in the Hellinger and L1-metric, respectively. In a similar spirit to Birgé [1983], Birgé [1986],
van de Geer [1993] is also concerned with density estimation using Hellinger loss. However, its focus
is to use techniques from empirical process theory in order to specifically establish the Hellinger
consistency of the nonparametric MLE, over convex density classes. Upper bounds for density
estimation based on the ‘sieve’ MLE technique is studied in Wong and Shen [1995]. Recall that
a ‘sieve’ estimator effectively estimates the parameter of interest via an optimization procedure
(e.g., maximum likelihood) over a constrained subset of the parameter space [Grenander,1981,
Chapter 8]. In Birgé and Massart [1993] the authors study ‘minimum contrast estimators’ (MCEs),
which include the MLE, least squares estimators (LSEs) etc., and apply them to density estimation.
This is further developed in Birgé and Massart [1998] where they analyze convergence of MCEs
using sieve-based approaches.
Comparison to our work
By stating our main result early in the introduction, we now turn to contrasting it with the
most relevant results in the literature. These include both the aforementioned classical references,
and more recent work on convex density estimation, which have most directly inspired our efforts
in this work.
First we would like to comment on the closely related landmark papers [Birgé,1983,Birgé,
1986,Le Cam,1973]. These works consider very abstract settings and show upper bounds based
on Hellinger ball testing. Although widely believed that they do, whether these results lead to
bounds that are minimax optimal is unclear. Moreover, their estimator is quite involved and non-
constructive. In contrast, in this paper we offer a simple to state, constructive multistage sieve MLE
type of estimator, which is provably minimax optimal over any convex density class F. A crucial
difference is that we metrize the space Fwith the L2-metric as we mentioned above. Even though
in our instance the two distances are equivalent, in contrast to the Hellinger distance, the ε-local
metric entropy of the convex density class in the L2-metric can be shown to be monotonic in ε.
4
This key observation enables us to match the upper and lower bounds exactly.
Next, we will compare our work with the celebrated paper of Yang and Barron [1999], who inspect
a very similar problem. Yang and Barron [1999] demonstrate a lower and upper bound which need
not match in general but do match under certain sufficient conditions. Notably their bounds involve
only quantities depending on the global entropies of the set F(which is also assumed to be convex
for some results of Yang and Barron [1999]). This is convenient as often times global metric entropy
is easier to work with compared to local metric entropy, however under Yang and Barron [1999]’s
sufficient condition it can be seen that the two notions are equivalent. Hence, our work can be
thought of as removing the sufficient condition requirement from Yang and Barron [1999] and also
unifying parametric and nonparametric density estimation problems (over convex classes) for which
one typically needs to use different tools to obtain the accurate rates. Finally we would like to
mention Wong and Shen [1995]. In that paper the authors propose a sieve MLE estimator and
demonstrate that it is nearly minimax optimal under certain conditions. Our estimator is not the
same as the one considered by Wong and Shen [1995], and we can provably match the minimax rate
over whatever be the convex set F. A notable difference is that Wong and Shen [1995] work with
the Hellinger metric and KL divergence, which although equivalent to L2-metric in our problem,
are actually less practical in terms of matching the bounds exactly as we explained above. We will
now turn our attention to reviewing some further relevant literature.
Recent work
Our estimator and proof techniques thereof, are inspired by the recent work of Neykov [2022]
on the Gaussian sequence model. We would like to stress on the fact that the sequence model
is a very distinct problem from density estimation. In particular, our underlying metric space of
interest is (F,∥·∥2), as compared to2(Rn,∥·∥2)for the sequence model. Both of these metric
spaces differ vastly from each other in their underlying geometric structure. Furthermore, unlike our
setting, the sequence model contains additional Gaussian information on the underlying generating
process, which can be directly exploited for estimation purposes. As such, given that Neykov [2022]
provides a guiding template for our analysis, some resulting structural similarities to their work are
to be expected. However, all corresponding proofs, and estimators thereof, have to be non-trivially
adapted to our nonparametric density estimation setting. A notable example of such required
modifications, is that our estimator presented in this paper does not use proximity in Euclidean
norm, but is a likelihood-based estimator.
We additionally note that density estimation in both abstract and more concrete settings, con-
tinues to be an active area of research. It is not feasible to detail such a large and growing body
of references. However, we provide a selective overview of some interesting recent directions in
density estimation, to simply indicate the diversity of the research efforts thereof. For example,
Baldi et al. [2009], Cleanthous et al. [2020] study convergence properties of density estimators us-
ing wavelet-based methods. The papers Birgé [2014], Efromovich [2008], Goldenshluger and Lepski
[2014], Rigollet [2006], Rigollet and Tsybakov [2007], Samarov and Tsybakov [2007] study adaptive
minimax density estimation on Rd(d1) under Lp-loss (p1). Here, ‘adaptive’ refers to the fact
that the density class is defined by an unknown tuning hyperparameter, which must be explicitly
accounted for during the estimation process. Recently Wang and Marzouk [2022] used techniques
from optimal transport to study the convergence properties of various nonparametric density esti-
mators. Interestingly, Bilodeau et al. [2021] applied empirical (metric) entropy methods to establish
2Note that ∥ · ∥2here is the Euclidean metric on Rn.
5
摘要:

RevisitingLeCam’sEquation:ExactMinimaxRatesoverConvexDensityClassesShamindraShrotriyaMateyNeykovRetailIntelligenceDepartmentofStatisticsandDataScienceWalmartNorthwesternUniversityBentonville,AR72716Evanston,IL60208shamindra.shrotriya@walmart.commneykov@northwestern.eduAbstractWestudytheclassicalprob...

收起<<
Revisiting Le Cams Equation Exact Minimax Rates over Convex Density Classes Shamindra Shrotriya Matey Neykov.pdf

共46页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:46 页 大小:1.14MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 46
客服
关注