where the data Dn={Xi, Yi}n
i=1 are independent and identically distributed samples from a dis-
tribution P0on X × Rthat is determined by PX,f0, and σ2, which are respectively the marginal
distribution of Xi, the true regression function, and the noise variance that is possibly unknown.
Let pXdenote the density of PXwith respect to the Lebesgue measure µ. Here X ⊂ Rpis a
compact metric space for p≥1.
We are interested in the inference on the derivative functions of f. Derivatives emerge as a key
nonparametric quantity when the rate of change of an unknown surface is of interest. Examples
include surface roughness for digital terrain models, temperature or rainfall slope in meteorology,
and pollution curvature for environmental data. The importance of derivatives, either as a non-
parametric functional or localized characteristic of f, can be also found in efficient modelling of
functional data (Dai et al.,2018), shape constrained function estimation (Riihim¨
aki and Vehtari,
2010;Wang and Berger,2016), and the detection of stationary points (Yu et al.,2022).
Gaussian processes (GPs) are a popular nonparametric Bayesian method in many areas such as
spatially correlated data analysis (Stein,2012;Gelfand et al.,2003;Banerjee et al.,2003), func-
tional data analysis (Shi and Choi,2011), and machine learning (Rasmussen and Williams,2006);
see also the excellent review article by Gelfand and Schliep (2016) which elaborates on the instru-
mental role GPs have played as a key ingredient in an extensive list of twenty years of modelling
work. GPs not only provide a flexible process for unknown functions but also serve as a building
block in hierarchical models for broader applications.
For function derivatives, the so-called plug-in strategy that directly differentiates the posterior
distribution of GP priors is practically appealing, as it would allow users to employ the same prior
no matter whether the inference goal is on the regression function or its derivatives. However,
this plug-in estimator has been perceived as suboptimal and degraded for a decade (Stein,2012;
Holsclaw et al.,2013) based on heuristics, while a theoretical understanding is lacking partly owing
to technical challenges posed by the irregularity and nonparametric nature of derivative functionals
(see Section 2.2 for more details). As a result, there is limited study of plug-in GPs ever since, and
substantially more complicated methods that hamper easy implementation and often restrict to one
particular derivative order are pursued.
In this article, we study the plug-in strategy with GPs for derivative functionals by characterizing
large sample properties of the plug-in posterior measure with GP priors, and obtain the first positive
result. We show that the plug-in posterior distribution, with the same choice of hyperparameter in
the GP prior, concentrates at the derivative functionals of any order at a nearly minimax rate in
specific examples, thus achieving a remarkable plug-in property for nonparametric functionals that
gains increasing attention recently (Yoo and Ghosal,2016;Liu and Li,2023). It is known that
many commonly used nonparametric methods such as smoothing splines and local polynomials
do not enjoy this property when estimating derivatives (Wahba and Wang,1990;Charnigo et al.,
2011), and the only nonparametric Bayesian method with established plug-in property, to the best
of our knowledge, is random series priors with B-splines (Yoo and Ghosal,2016).
In recent years, the nonparametric Bayesian literature has seen remarkable adaptability of GP pri-
ors in various regression settings (van der Vaart and van Zanten,2009;Bhattacharya et al.,2014;
2