
metric entropy based methods, with an emphasis on the Hellinger loss. The work of Efro˘ımovich
and Pinsker [1982] provided precise (asymptotic) analysis for an ellipsoidal class of densities in
the L2-metric. Across a wide-ranging series of related and collaborative efforts Has’minski˘ı [1978],
Ibragimov and Has’minski˘ı [1977,1978], Ibragimov and Khas’minskij [1980] used Fano’s inequality
type arguments to establish lower bounds over a variety of density estimation settings. These
range from deriving lower bounds on nonparametric density estimation in the uniform metric, to
minimax risk bounds for the Gaussian white noise model, for example. The authors also develop
metric entropy based techniques in Has’minski˘ı and Ibragimov [1990] to derive minimax lower
bounds for a wide variety of density classes defined on Rd(d∈N), in Lq-loss (q≥1). Numerous
applications of optimal lower bounds using both Assouad’s and Fano’s lemma arguments for densities
on a compact support, are demonstrated in [Yu,1997, Section 29.3]. Later Yang and Barron
[1999] demonstrated that global metric entropy bounds capture minimax risk for sufficiently rich
density classes over a common compact support. Classical reference texts on minimax lower bound
techniques with an emphasis on nonparametric density estimation include Devroye [1987], Devroye
and Györfi [1985], Le Cam [1986]. More modern such references include Tsybakov [2009] and
Wainwright [2019, Chapter 15]. The latter in particular, also incorporates metric entropy based
lower bound techniques.
In addition, there is a large body of work in deriving upper bounds for specific density estimators
using metric entropy methods. This includes Barron and Cover [1991], Yatracos [1985], who employ
the minimum distance principle to derive density estimators and their metric entropy-based upper
bounds in the Hellinger and L1-metric, respectively. In a similar spirit to Birgé [1983], Birgé [1986],
van de Geer [1993] is also concerned with density estimation using Hellinger loss. However, its focus
is to use techniques from empirical process theory in order to specifically establish the Hellinger
consistency of the nonparametric MLE, over convex density classes. Upper bounds for density
estimation based on the ‘sieve’ MLE technique is studied in Wong and Shen [1995]. Recall that
a ‘sieve’ estimator effectively estimates the parameter of interest via an optimization procedure
(e.g., maximum likelihood) over a constrained subset of the parameter space [Grenander,1981,
Chapter 8]. In Birgé and Massart [1993] the authors study ‘minimum contrast estimators’ (MCEs),
which include the MLE, least squares estimators (LSEs) etc., and apply them to density estimation.
This is further developed in Birgé and Massart [1998] where they analyze convergence of MCEs
using sieve-based approaches.
Comparison to our work
By stating our main result early in the introduction, we now turn to contrasting it with the
most relevant results in the literature. These include both the aforementioned classical references,
and more recent work on convex density estimation, which have most directly inspired our efforts
in this work.
First we would like to comment on the closely related landmark papers [Birgé,1983,Birgé,
1986,Le Cam,1973]. These works consider very abstract settings and show upper bounds based
on Hellinger ball testing. Although widely believed that they do, whether these results lead to
bounds that are minimax optimal is unclear. Moreover, their estimator is quite involved and non-
constructive. In contrast, in this paper we offer a simple to state, constructive multistage sieve MLE
type of estimator, which is provably minimax optimal over any convex density class F. A crucial
difference is that we metrize the space Fwith the L2-metric as we mentioned above. Even though
in our instance the two distances are equivalent, in contrast to the Hellinger distance, the ε-local
metric entropy of the convex density class in the L2-metric can be shown to be monotonic in ε.
4