Optimal plug-in Gaussian processes for modelling derivatives Zejian Liuand Meng Li

2025-04-29 0 0 761.09KB 42 页 10玖币
侵权投诉
Optimal plug-in Gaussian processes for modelling
derivatives
Zejian Liu*and Meng Li
Department of Statistics, Rice University
Abstract
Derivatives are a key nonparametric functional in wide-ranging applications where the rate
of change of an unknown function is of interest. In the Bayesian paradigm, Gaussian pro-
cesses (GPs) are routinely used as a flexible prior for unknown functions, and are arguably one
of the most popular tools in many areas. However, little is known about the optimal modelling
strategy and theoretical properties when using GPs for derivatives. In this article, we study
a plug-in strategy by differentiating the posterior distribution with GP priors for derivatives
of any order. This practically appealing plug-in GP method has been previously perceived as
suboptimal and degraded, but this is not necessarily the case. We provide posterior contrac-
tion rates for plug-in GPs and establish that they remarkably adapt to derivative orders. We
show that the posterior measure of the regression function and its derivatives, with the same
choice of hyperparameter that does not depend on the order of derivatives, converges at the
minimax optimal rate up to a logarithmic factor for functions in certain classes. We analyze
a data-driven hyperparameter tuning method based on empirical Bayes, and show that it satis-
fies the optimal rate condition while maintaining computational efficiency. This article to the
best of our knowledge provides the first positive result for plug-in GPs in the context of infer-
ring derivative functionals, and leads to a practically simple nonparametric Bayesian method
with optimal and adaptive hyperparameter tuning for simultaneously estimating the regression
function and its derivatives. Simulations show competitive finite sample performance of the
plug-in GP method. A climate change application for analyzing the global sea-level rise is
discussed.
1 Introduction
Consider the nonparametric regression model
Yi=f(Xi) + εi, εiN(0, σ2),(1)
*zejian.liu@rice.edu
meng@rice.edu
1
arXiv:2210.11626v2 [math.ST] 21 Oct 2023
where the data Dn={Xi, Yi}n
i=1 are independent and identically distributed samples from a dis-
tribution P0on X × Rthat is determined by PX,f0, and σ2, which are respectively the marginal
distribution of Xi, the true regression function, and the noise variance that is possibly unknown.
Let pXdenote the density of PXwith respect to the Lebesgue measure µ. Here X Rpis a
compact metric space for p1.
We are interested in the inference on the derivative functions of f. Derivatives emerge as a key
nonparametric quantity when the rate of change of an unknown surface is of interest. Examples
include surface roughness for digital terrain models, temperature or rainfall slope in meteorology,
and pollution curvature for environmental data. The importance of derivatives, either as a non-
parametric functional or localized characteristic of f, can be also found in efficient modelling of
functional data (Dai et al.,2018), shape constrained function estimation (Riihim¨
aki and Vehtari,
2010;Wang and Berger,2016), and the detection of stationary points (Yu et al.,2022).
Gaussian processes (GPs) are a popular nonparametric Bayesian method in many areas such as
spatially correlated data analysis (Stein,2012;Gelfand et al.,2003;Banerjee et al.,2003), func-
tional data analysis (Shi and Choi,2011), and machine learning (Rasmussen and Williams,2006);
see also the excellent review article by Gelfand and Schliep (2016) which elaborates on the instru-
mental role GPs have played as a key ingredient in an extensive list of twenty years of modelling
work. GPs not only provide a flexible process for unknown functions but also serve as a building
block in hierarchical models for broader applications.
For function derivatives, the so-called plug-in strategy that directly differentiates the posterior
distribution of GP priors is practically appealing, as it would allow users to employ the same prior
no matter whether the inference goal is on the regression function or its derivatives. However,
this plug-in estimator has been perceived as suboptimal and degraded for a decade (Stein,2012;
Holsclaw et al.,2013) based on heuristics, while a theoretical understanding is lacking partly owing
to technical challenges posed by the irregularity and nonparametric nature of derivative functionals
(see Section 2.2 for more details). As a result, there is limited study of plug-in GPs ever since, and
substantially more complicated methods that hamper easy implementation and often restrict to one
particular derivative order are pursued.
In this article, we study the plug-in strategy with GPs for derivative functionals by characterizing
large sample properties of the plug-in posterior measure with GP priors, and obtain the first positive
result. We show that the plug-in posterior distribution, with the same choice of hyperparameter in
the GP prior, concentrates at the derivative functionals of any order at a nearly minimax rate in
specific examples, thus achieving a remarkable plug-in property for nonparametric functionals that
gains increasing attention recently (Yoo and Ghosal,2016;Liu and Li,2023). It is known that
many commonly used nonparametric methods such as smoothing splines and local polynomials
do not enjoy this property when estimating derivatives (Wahba and Wang,1990;Charnigo et al.,
2011), and the only nonparametric Bayesian method with established plug-in property, to the best
of our knowledge, is random series priors with B-splines (Yoo and Ghosal,2016).
In recent years, the nonparametric Bayesian literature has seen remarkable adaptability of GP pri-
ors in various regression settings (van der Vaart and van Zanten,2009;Bhattacharya et al.,2014;
2
Jiang and Tokdar,2021). Our findings contribute to this growing literature and indicate that the
widely used GP priors offer more than inferring regression functions. In particular, the estab-
lished theory reassures the use of plug-in GPs for optimal modelling of derivatives, and further
sheds lights into hyperparameter tuning in the presence of varying derivative orders, for which
we propose to use an empirical Bayes approach. Our analysis indicates that this data-driven hy-
perparameter tuning strategy attains theoretical optimality and adapts to the derivative order and
the true function’s smoothness level with an oversmooth kernel, while maintaining computational
efficiency. Therefore, this article shows that the Bayes procedure using GP priors automatically
adapts to the order of derivative, leading to a practically simple nonparametric Bayesian method
with guided hyperparameter tuning for simultaneously estimating the regression function and its
derivatives. These theoretical guarantees are complemented by competitive finite sample perfor-
mance using simulations, as well as a climate change application to analyzing the global sea-level
rise.
The following notation is used throughout this paper. We write X= (XT
1, . . . , XT
n)TRn×pand
Y= (Y1, . . . , Yn)TRn. Let ∥·∥be the Euclidean norm; for f, g :X R, let fbe the L
(supremum) norm, f2= (RXf2dPX)1/2the L2norm with respect to the covariate distribution
PX, and f, g2= (RXfgdPX)1/2the inner product. The corresponding L2space relative to PX
is denoted by L2
pX(X); we write L2(X)as the L2space with respect to the Lebesgue measure µ.
Denote the space of all essentially bounded functions by L(X). Let Nbe the set of all positive
integers and write N0=N∪ {0}. We let C(X)and C(X,X)denote the space of continuous
functions and continuous bivariate functions. In one-dimensional case, for R, a function
f: Ω Rand kN, we use f(k)to denote its k-th derivative as long as it exists and f(0) =f.
Let Cm(Ω) = {f: Ω R|f(k)C(Ω) for all 1 km}denote the space of m-times
continuously differentiable functions and C2m(Ω,Ω) = {K: ×R|k
xk
xK(x, x)
C(Ω,Ω) for all 1 km}denote the space of m-times continuously differentiable bivariate
functions, where k
x=k/∂xk. For two sequences anand bn, we write anbnif anCbnfor a
universal constant C > 0, and anbnif anbnand bnan.
2 Main results
2.1 Plug-in Gaussian process for derivative functionals
We assign a Gaussian process prior Πon the regression function such that fGP(0, σ2()1K).
Here K(·,·) : X × X Ris a continuous, symmetric and positive definite kernel function, and
λ > 0is a regularization parameter that possibly depends on the sample size n. The rescaling
factor σ2()1in the covariance kernel connects the posterior mean Bayes estimator with kernel
ridge regression (Wahba,1990;Cucker and Zhou,2007); see also Theorem 11.61 in Ghosal and
van der Vaart (2017) for more discussion on this connection.
It is not difficult to derive that the posterior distribution Πn(· | Dn)is also a GP: f|Dn
3
GP( ˆ
fn,ˆ
Vn), where the posterior mean ˆ
fnand posterior covariance ˆ
Vnare given by
ˆ
fn(x) = K(x, X)[K(X, X) + nλIn]1Y,
ˆ
Vn(x, x) = σ2()1{K(x, x)K(x, X)[K(X, X) + nλIn]1K(X, x)},(2)
for any x, x∈ X. Here K(X, X)is the nby nmatrix (K(Xi, Xj))n
i,j=1,K(x, X)is the 1 by n
vector (K(x, Xi))n
i=1, and Inis the nby nidentity matrix.
We now define the plug-in Gaussian process for differential operators. For simplicity we focus
on the one-dimensional case where X= [0,1] throughout the paper, but remark that the studied
plug-in strategy can be extended to multivariate cases straightforwardly despite more complicated
notation for high-order derivatives.
For any kN, define the k-th differential operator Dk:Ck[0,1] C[0,1] by Dk(f) = f(k).
If KC2k([0,1],[0,1]), then the posterior distribution of the derivative f(k)|Dn, denoted by
Πn,k(· | Dn), is also a Gaussian process since differentiation is a linear operator. In particular,
f(k)|DnGP( ˆ
f(k)
n,˜
Vk
n), where
ˆ
f(k)
n(x) = Kk0(x, X)[K(X, X) + nλIn]1Y,
˜
Vk
n(x, x) = σ2()1Kkk(x, x)Kk0(x, X)[K(X, X) + nλIn]1K0k(X, x),(3)
with Kk0(x, X) = (k
xK(x, Xi))n
i=1 and Kkk(x, x) = k
xk
xK(x, x). Then the nonparametric
plug-in procedure for Dkrefers to using the plug-in posterior measure Πn,k(· | Dn)for inference
on Dk(f).
The plug-in posterior measure Πn,k(· | Dn)has a closed-form expression with given λand σ2, sub-
stantially facilitating its implementation in practice. The plug-in strategy is practically appealing
but has been perceived as suboptimal for a decade (Stein,2012;Holsclaw et al.,2013) based on
heuristics. To the contrary, we will establish optimality of plug-in GPs and uncover its adaptivity
to derivative orders. Before we move on to studying large sample properties of Πn,k(· | Dn), in
the next section we first take a detour to present technical challenges when studying derivative
functionals that hamper theoretical development for this problem.
2.2 Nonparametric plug-in property and technical challenges
We note two technical challenges posed by derivative functionals: the irregularity of function
derivatives at fixed points and nonparametric extension of such derivatives.
The first challenge is related to the “plug-in property” in the literature. The plug-in property
proposed by Bickel and Ritov (2003) refers to the phenomenon that a rate-optimal nonparamet-
ric estimator also efficiently estimates some bounded linear functionals. A parallel concept has
been studied in the Bayesian paradigm relying on posterior distributions and posterior contraction
rates (Rivoirard and Rousseau,2012;Castillo and Nickl,2013;Castillo and Rousseau,2015).
However, function derivatives may not fall into the classical plug-in property framework. To see
this, let Dt=f(t)be a functional which maps fto its derivative at any fixed point t[0,1].
4
While it is easy to see that Dtis a linear functional, the following Proposition 1(Conway,1994,
page 13) shows that Dtis not bounded.
Proposition 1. Let t[0,1] and define Dt:C1[0,1] Rby Dt(f) = f(t). Then, there is no
bounded linear functional on L2[0,1] that agrees with Dton C1[0,1].
Therefore, it appears difficult to analyze function derivatives evaluated at a fixed point, as existing
work on the plug-in property typically assumes the functional to be bounded (Bickel and Ritov,
2003;Castillo and Nickl,2013;Castillo and Rousseau,2015).
The second challenge is linked to that the differential operator Dkpoints to function-valued func-
tionals, or nonparametric functionals, as opposed to real-valued functionals studied in the classical
plug-in property literature. Hence, one needs to analyze the function-valued functionals uniformly
for all points in the support. To distinguish the plug-in property for nonparametric functionals
from its traditional counterpart for real-valued functionals, we term this property as nonparametric
plug-in property.
We overcome these challenges by resting on an operator-theoretic framework (Smale and Zhou,
2005,2007), the equivalent kernel technique, and a recent non-asymptotic analysis of nonparamet-
ric quantities (Liu and Li,2023), and show that GPs enjoy the nonparametric plug-in property for
differential operators.
2.3 Posterior contraction for function derivatives
Throughout this article, we assume the true regression function f0Ck[0,1] and the covariance
kernel KC2k([0,1],[0,1]). Let {µi}
i=1 and {ϕi}
i=1 be the eigenvalues and eigenfunctions of
the kernel Ksuch that K(x, x) = P
i=1 µiϕi(x)ϕi(x)for any x, x[0,1], where the eigenvalues
satisfy µ1µ2≥ ··· >0and µi0, and eigenfunctions form an orthonormal basis of L2
pX[0,1].
The existence of such eigendecomposition is ensured by Mercer’s theorem. It can also be seen that
ϕiCk[0,1] for all iNas KC2k([0,1],[0,1]).
We make the following assumptions on the eigenfunctions of the covariance kernel.
Assumption (A). There exists Ck,ϕ >0such that ϕ(k)
iCk,ϕikfor any iN.
Assumption (B). There exists Lk,ϕ >0such that |ϕ(k)
i(x)ϕ(k)
i(x)| ≤ Lk,ϕik+1|xx|for all
iNand any x, x[0,1].
We will make extensive use of the so-called equivalent kernel ˜
K(Rasmussen and Williams,
2006, Chapter 7), which shares the same eigenfunctions with Kwith altered eigenvalues νi=
µi/(λ+µi)for iN, i.e., ˜
K(x, x) = P
i=1 νiϕi(x)ϕi(x). Note that ˜
Kis also a continuous,
symmetric, and positive definite kernel.
Under Assumption (A), we define an m-th order analog of effective dimension of the kernel Kwith
5
摘要:

Optimalplug-inGaussianprocessesformodellingderivativesZejianLiu*andMengLi†DepartmentofStatistics,RiceUniversityAbstractDerivativesareakeynonparametricfunctionalinwide-rangingapplicationswheretherateofchangeofanunknownfunctionisofinterest.IntheBayesianparadigm,Gaussianpro-cesses(GPs)areroutinelyuseda...

展开>> 收起<<
Optimal plug-in Gaussian processes for modelling derivatives Zejian Liuand Meng Li.pdf

共42页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:42 页 大小:761.09KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注