Predictive density estimators with integrated L1loss1 Pankaj Bhagwata Éric Marchand a Université de Sherbrooke Département de mathématiques Sherbrooke Qc CANADA J1K 2R1

2025-05-02 0 0 579.48KB 20 页 10玖币
侵权投诉
Predictive density estimators with integrated L1loss 1
Pankaj Bhagwata& Éric Marchand
a Université de Sherbrooke, Département de mathématiques, Sherbrooke Qc, CANADA, J1K 2R1
(e-mails: pankaj.uttam.bhagwat@usherbrooke.ca; eric.marchand@usherbrooke.ca)
Abstract
This paper addresses the problem of an efficient predictive density estimation for the density q(kyθk2)
of Ybased on Xp(kxθk2)for y, x, θ Rd. The chosen criteria are integrated L1loss given by
L(θ, ˆq) = RRdˆq(y)q(kyθk2)dy, and the associated frequentist risk, for θΘ. For absolutely continuous
and strictly decreasing q, we establish the inevitability of scale expansion improvements ˆqc(y;X) = 1
cdqky
Xk2/c2over the plug-in density ˆq1, for a subset of values c(1, c0). The finding is universal with respect
to p, q, and d2, and extended to loss functions γL(θ, ˆq)with strictly increasing γ. The finding is also
extended to include scale expansion improvements of more general plug-in densities q(kyˆ
θ(X)k2, when
the parameter space Θis a compact subset of Rd. Numerical analyses illustrative of the dominance findings
are presented and commented upon. As a complement, we demonstrate that the unimodal assumption on
qis necessary with a detailed analysis of cases where the distribution of Y|θis uniformly distributed on a
ball centered about θ. In such cases, we provide a univariate (d= 1) example where the best equivariant
estimator is a plug-in estimator, and we obtain cases (for d= 1,3) where the plug-in density ˆq1is optimal
among all ˆqc.
Keywords and phrases: Bayes estimation; Dominance; Frequentist risk; Inadmissibility; L1loss; Plug-in;
Predictive density; Restricted parameter; Scale expansion; Spherical symmetry; Uniform distribution.
1. Introduction
We consider the problem of obtaining an efficient predictive density estimator ˆq(y;X),yRd, of
the density q(kyθk2)of Ybased on spherically symmetric distributed Xp(kxθk2). In this
set-up, the densities are Lebesgue on Rd,pand qare known but not necessarily equal, and X|θand
Y|θare independently distributed. The observable Xmay be a summary statistic arising from a
sample. We evaluate the efficiency of the predictive density of ˆq(y;X)with integrated L1loss and
risk
L(θ, ˆq) = ZRdq(kyθk2)ˆq(y)dy , (1.1)
R(θ, ˆq) = ZRd
L(θ, ˆq(·;X)) p(kxθk2)dx . (1.2)
Spherically symmetric models are prominent in statistical theory and practice and inference for
such models have a long history, including shrinkage estimation techniques (e.g., Fourdrinier et al.,
2018). Our set-up includes the normal case with
X|θNd(θ, σ2
XId)and Y|θNd(θ, σ2
YId),(1.3)
1October 4, 2022
1
arXiv:2210.00972v1 [math.ST] 3 Oct 2022
as well as scale mixtures of normals with σ2
Xand σ2
Yrandom, and including multivariate Cauchy,
Student, Laplace, Logistic distributions, and many others.
Remark 1.1. L1loss is a natural, appealing and widely used choice, which is also related to both:
(i) the ubiquitous total variation distance through the identity T V (f1, f2) = supA{P(W1A)
P(W2A)}=1
2RRd|f1(t)f2(t)|dt, for random variables W1, W2Rdwith densities f1, f2;
(ii) the overlap coefficient (e.g., Weitzman, 1970) measuring the proximity of densities f1and f2,
and given by
OV L(f1, f2) = ZRd
min(f1(t), f2(t)) dt = 1 1
2ZRd|f1(t)f2(t)|dt , (1.4)
since 2 min(f1, f2) = f1+f2− |f1f2|.
There has been much interest over recent years and findings related to the efficiency of predictive
density estimators in a decision-theoretic framework (e.g., George et al., 2019) and, in particular,
relationships with shrinkage estimation techniques and Bayesian predictive densities. However, fre-
quentist L1risk predictive analysis is most challenging and the determination of Bayesian densities
associated with loss (1.1), including the MRE predictive density obtained as a Bayes density with
respect to the uniform measure π(θ)=1, remains for the most part elusive (but see Section 2 for
exceptions). Recently, Nogales (2021) considered loss {L(θ, ˆq)}2and showed that the Bayes predic-
tive density matches quite generally the posterior predictive density (i.e., the conditional density of
Ygiven Xobtained by integrating out θ). It is thus of interest to study such loss functions as well,
and more generally losses of the form γ(L(θ, ˆq)).
Kubokawa et al. (2017) considered the benchmark choice for unimodal q(kyθk2)of the plug-in
predictive mle given by density q(kyXk2). Using an equivalence with a point estimation problem
under a loss function which is a concave function of squared error loss, and shrinkage estimation
techniques for such losses, they provided for d4, general p, and unimodal q, plug-in predictive
densities of the form q(kyˆ
θ(X)k2);yRd; that dominate the predictive mle under L1loss.
Their findings are quite general with respect to pand q, but do require four dimensions or more.
Applications for normal and scale mixtures of normal distributions are expanded on as well.
In the same paper, the authors provide and illustrate, for the univariate case (d= 1), scale expansion
improvements under L1loss of the type 1
cq(|yx|2
c2)for c(1, c0)on the plug-in predictive mle (i.e.,
c= 1), requiring log-concavity of q. For Kullback-Leibler loss, the potential of scale expansion
improvements and potential inefficiency of a plug-in predictive density (also referred to an estimative
fit) can be traced back to the work of Aitchison (1975), and is well illustrated by the normal case
(1.3) where the plug-in density ˆq1Nd(X, σ2
YId), as well as all equivariant densities with respect to
changes in location, are dominated by the MRE density ˆqUNd(X, (σ2
X+σ2
Y)Id)which expands the
scale indeed. As a slightly tangential remark, but central to predictive density estimation findings
over the past twenty years or so, we point out that ˆqUis under Kullback-Leibler loss minimax
for all d, admissible for d= 1,2, but inadmissible for d3and dominated by various Bayesian
predictive densities with striking parallels with shrinkage estimation under squared error loss and
normal observables (Komaki, 2001; George et al., 2006; Brown et al. 2008).
Returning to improvements by scale expansion, Fourdrinier et al. (2011) showed that any plug-in
density q(kyˆ
θ(X)k2)is dominated by a class of scale expansion variants for normal model qand
2
Kullback-Leibler loss. Similar findings were obtained by L’Moudden & Marchand (2019) for normal
models and α-divergence loss, as well as by Kubokawa et al. (2015) for integrated L2loss. Given
the L1results for d= 1 of Kubokawa et al. (2017) and their plug-in improvements on the predictive
mle for d4, the remains the open questions of: (i) improvements for d= 2,3and ˆ
θ(X) = X
and (ii) scale expansion improvements for d2and ˆ
θ(X) = X. We provide affirmative answers to
these questions (i) and (ii), as well as to: (iii) scale expansion improvements on plug-in densities
of the form q(kyˆ
θ(X)k2)for choices of ˆ
θ(X)and when θΘwith Θbeing compact, and even
when ˆ
θ(X)is adapted to the parameter space. Moreover, these results are established for the wider
class of loss functions of the form γ(L(θ, ˆq)) with strictly increasing γ.
The paper is organized as follows. In Section 2, we first expand on Bayesian predictive densities
under L1loss and the general difficulty in determining a Bayesian solution, nevertheless recording
an explicit solution for the univariate uniform distribution case. Section 3 contains the main
dominance findings which relates to predictive densities of the form
qˆ
θ,c(y;X) = 1
cdq(kyˆ
θ(X)k2
c2), y Rd.(1.5)
In Section 3.1, we study cases with ˆ
θ(X) = X, general pand qwith unimodal q, and losses
γ(L(θ, ˆq)) with strictly increasing γ. With such densities having constant risk, we show that the
optimal scale expansion value cis such that c>1. The proof is unified and applicable for quite
generally for arbitrary (p, q, d, γ)such that d2and qis strictly decreasing on R+. Secondly in
Section 3.2, we consider situations with a compact parameter space restriction θC, such as balls
of a radius m, and show quite generally that a plug-in density qˆ
θ,1(·;X)is necessarily dominated
by a subclass of scale expansion variants qˆ
θ,c(·;X)with c(1, c0). The finding is again unified for
general p, unimodal q,γ, and d2. We do also provide in Section 3.3 cases where the plug-in
density is optimal among qˆ
θ,c, namely for uniformly distributed YU(θB, θ +B). We further
explore such phenomena in the multivariate case with Yuniformly distributed on a ball of radius
m, centered at θaided by numerical evaluations and a definite result for d= 3 and Xuniformly
distributed on a ball centered at θ. Finally, numerical illustrations and comparisons, as well as
concluding observations, are presented in Sections 4 and 5.
2. Bayesian predictive densities and L1loss
Despite the appeal of L1divergence for reporting on the efficiency of estimated densities in para-
metric and non-parametric settings (e.g., DasGupta & Lahiri, 2012; Devroye & Györfi, 1985),
drawbacks include the challenging frequentist risk analysis and, mostly, the difficulty of specifying a
Bayesian predictive density. This difficulty includes the determination of the minimum risk equiv-
ariant density for location models or, equivalently, the Bayes predictive density ˆqmre with respect
to the uniform density π(θ)=1. The equivalence follows from a general representation for the
minimum risk equivariant estimator as the Bayes estimator associated with the corresponding Haar
measure (e.g., Eaton, 1989) which is the uniform density derived from the group of location changes.
Moreover, given general results on equivariant procedures (Kiefer, 1957), such a density is minimax
and thus constitutes an interesting benchmark predictive density.
In this section, we briefly expand on such a difficulty and it is particularly instructive to contrast
Bayesian solutions for Kullback-Leibler and L1loss functions. Moreover, we do provide a Bayes
3
predictive density solution under L1loss in the uniform case with unknown location (Theorem 2.1).
Interestingly, defining the loss as the square of the L1distance leads to tractable Bayesian solutions
(Nogales, 2021).
For the sake of illustration, we suppose in this section that Xpθis to be observed and that we
wish to obtain a predictive density ˆq(y;X),yRd, for the density qθof Y. We assume that pθand
qθare Lebesgue densities, and that X|θand Y|θare independently distributed. Finally, we assume
a prior density πfor θfor which the posterior π(·|x); defined with respect to σfinite measure ν;
exists.
2.1. Kullback-Leibler loss
The familiar Kullback-Leibler loss associated with density ˆqas an estimate of qθis given by
LKL(θ, ˆq) = ZRd
qθ(y) log{qθ(y)
ˆq(y)}dy .
A useful and equivalent representation is given by
LKL(θ, ˆq) = ZRd
qθ(y)ˆq(y)
qθ(y)log( ˆq(y)
qθ(y))1dy . (2.6)
The above clearly represents the loss as a weighted (with respect to qθ(y)) average of a collection of
distances between estimates ˆq(y)and actual qθ(y)values as measured by the point estimation loss
ρ(ˆq(y)
qθ(y))with ρ(z) = zlog z1.
Now, consider estimating the density qθ(t)at a fixed value y=tand refer this as the local problem.
The Bayes estimate minimizes in ˆq(t)the expected posterior loss
E{qθ(t)ρ(ˆq(t)
qθ(t))|x}
in ˆq(t). It is then easy to infer that the local Bayes estimate is given by
ˆqπ(t;x) = E(qθ(t)|x) = ZΘ
qθ(t)π(θ|x)(θ).(2.7)
Now, for the global problem, a Bayesian predictive density ˆq(y;x),yRd, minimizes among all
densities the expected posterior loss which, from (2.6) and a change in the order of integration,
becomes equivalent to minimizing
ZRd
E{qθ(y)ρ(ˆq(y)
qθ(y))|x}dy . (2.8)
Finally, since ˆqπ(y;x)minimizes for all ythe expectation inside the above integral, and since ˆqπ(·;x)
is actually a density on Rd, it follows that ˆqπ(·;x)is the Bayes predictive density.
4
摘要:

PredictivedensityestimatorswithintegratedL1loss1PankajBhagwata&ÉricMarchandaUniversitédeSherbrooke,Départementdemathématiques,SherbrookeQc,CANADA,J1K2R1(e-mails:pankaj.uttam.bhagwat@usherbrooke.ca;eric.marchand@usherbrooke.ca)AbstractThispaperaddressestheproblemofanecientpredictivedensityestimation...

展开>> 收起<<
Predictive density estimators with integrated L1loss1 Pankaj Bhagwata Éric Marchand a Université de Sherbrooke Département de mathématiques Sherbrooke Qc CANADA J1K 2R1.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:579.48KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注