
as well as scale mixtures of normals with σ2
Xand σ2
Yrandom, and including multivariate Cauchy,
Student, Laplace, Logistic distributions, and many others.
Remark 1.1. L1loss is a natural, appealing and widely used choice, which is also related to both:
(i) the ubiquitous total variation distance through the identity T V (f1, f2) = supA{P(W1∈A)−
P(W2∈A)}=1
2RRd|f1(t)−f2(t)|dt, for random variables W1, W2∈Rdwith densities f1, f2;
(ii) the overlap coefficient (e.g., Weitzman, 1970) measuring the proximity of densities f1and f2,
and given by
OV L(f1, f2) = ZRd
min(f1(t), f2(t)) dt = 1 −1
2ZRd|f1(t)−f2(t)|dt , (1.4)
since 2 min(f1, f2) = f1+f2− |f1−f2|.
There has been much interest over recent years and findings related to the efficiency of predictive
density estimators in a decision-theoretic framework (e.g., George et al., 2019) and, in particular,
relationships with shrinkage estimation techniques and Bayesian predictive densities. However, fre-
quentist L1risk predictive analysis is most challenging and the determination of Bayesian densities
associated with loss (1.1), including the MRE predictive density obtained as a Bayes density with
respect to the uniform measure π(θ)=1, remains for the most part elusive (but see Section 2 for
exceptions). Recently, Nogales (2021) considered loss {L(θ, ˆq)}2and showed that the Bayes predic-
tive density matches quite generally the posterior predictive density (i.e., the conditional density of
Ygiven Xobtained by integrating out θ). It is thus of interest to study such loss functions as well,
and more generally losses of the form γ(L(θ, ˆq)).
Kubokawa et al. (2017) considered the benchmark choice for unimodal q(ky−θk2)of the plug-in
predictive mle given by density q(ky−Xk2). Using an equivalence with a point estimation problem
under a loss function which is a concave function of squared error loss, and shrinkage estimation
techniques for such losses, they provided for d≥4, general p, and unimodal q, plug-in predictive
densities of the form q(ky−ˆ
θ(X)k2);y∈Rd; that dominate the predictive mle under L1loss.
Their findings are quite general with respect to pand q, but do require four dimensions or more.
Applications for normal and scale mixtures of normal distributions are expanded on as well.
In the same paper, the authors provide and illustrate, for the univariate case (d= 1), scale expansion
improvements under L1loss of the type 1
cq(|y−x|2
c2)for c∈(1, c0)on the plug-in predictive mle (i.e.,
c= 1), requiring log-concavity of q. For Kullback-Leibler loss, the potential of scale expansion
improvements and potential inefficiency of a plug-in predictive density (also referred to an estimative
fit) can be traced back to the work of Aitchison (1975), and is well illustrated by the normal case
(1.3) where the plug-in density ˆq1∼Nd(X, σ2
YId), as well as all equivariant densities with respect to
changes in location, are dominated by the MRE density ˆqU∼Nd(X, (σ2
X+σ2
Y)Id)which expands the
scale indeed. As a slightly tangential remark, but central to predictive density estimation findings
over the past twenty years or so, we point out that ˆqUis under Kullback-Leibler loss minimax
for all d, admissible for d= 1,2, but inadmissible for d≥3and dominated by various Bayesian
predictive densities with striking parallels with shrinkage estimation under squared error loss and
normal observables (Komaki, 2001; George et al., 2006; Brown et al. 2008).
Returning to improvements by scale expansion, Fourdrinier et al. (2011) showed that any plug-in
density q(ky−ˆ
θ(X)k2)is dominated by a class of scale expansion variants for normal model qand
2