Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning

2025-05-06 0 0 463.14KB 6 页 10玖币
侵权投诉
Uncertainty Disentanglement with Non-stationary
Heteroscedastic Gaussian Processes for Active
Learning
Zeel B Patel
IIT Gandhinagar, India
Nipun Batra
IIT Gandhinagar, India
Kevin Murphy
Google, USA
Abstract
Gaussian processes are Bayesian non-parametric models used in many areas. In
this work, we propose a Non-stationary Heteroscedastic Gaussian process model
which can be learned with gradient-based techniques. We demonstrate the in-
terpretability of the proposed model by separating the overall uncertainty into
aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usabil-
ity of derived epistemic uncertainty on active learning problems. We demonstrate
the efficacy of our model with various ablations on multiple datasets.
1 Introduction
Gaussian processes (GPs) are Bayesian non-parametric models useful for many real-world regres-
sion and classification problems. The key object required to define a GP is the kernel function
K(x,x0), which measures the similarity of the input points. A common choice is the RBF ker-
nel K(x,x0;θ) = σexp ||xx
0||2
2`2where `is the length scale, and σis the signal variance. In
regression problems, we also often have observation noise with variance ω2. These three hyper-
parameters, θ= (`, σ, ω), are often learned by optimizing the negative log marginal likelihood.
However, this model uses a stationary kernel (depended only on the distance between locations)
and homoskedastic noise (constant noise variance (ω2)), and these assumptions might not hold in
real-life applications such as environment modeling [1, 2]. In particular, non-stationary kernels are
necessary if the similarity of two inputs may depend on their location in the input space. Similarly,
heteroskedastic noise may be necessary if the quality of the measurements may vary across space.
In this short paper, we provide a computationally efficient way to create GPs with non-stationary
kernels, and heteroskedastic noise, by using a Gibbs kernel [3]. Gibbs kernel can be consid-
ered the generalization of the RBF kernel where the hyper-parameters are input-dependent, i.e.,
θ(x) = (`(x), σ(x), ω(x)). These three hyper-parameter functions are themselves represented by
a “latent” GP, as in [4, 5]. In contrast to prior work, which uses EP or HMC, we use inducing
point approximations to speed up the computation of this latent GP (which is needed to evaluate
the kernel). In addition, we show how modeling variation in all three hyper-parameters allows us to
distinguish locations where the latent function value is uncertain (epistemic uncertainty), as opposed
to locations where the observation noise is high (aleatoric uncertainty). This distinction is crucial
for problems such as active learning and efficient sensor placement. (c.f. [6]).
2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems.
arXiv:2210.10964v1 [cs.LG] 20 Oct 2022
2 Methods
2.1 Non-stationary Heteroscedastic Gaussian processes
Given observations yRNat inputs XRN×D, we assume the following model:
y(x) = f(x) + ε(x), ε(x)∼ N 0, ω(x)2(1)
f(x)GP (0,Kf(x,x0)) (2)
where the kernel function is K(x,x0) = cov(f(x), f(x0)) and ε(x)is zero mean noise. We use the
following non-stationary kernel function [3]:
Kf(x,x0) = σ(x)σ(x0)s2`(x)`(x0)
`(x)2+`(x0)2exp ||xx0||2
`(x)2+`(x0)2(3)
We assume all hyperparameters of the model, θ(x)=(`(x), σ(x), ω(x)), may be input dependent,
to allow for non-stationarity and heteroscedasticity (Previous work [5] has shown that such a ker-
nel is a valid positive semi-definite kernel). We assume these “hyper-functions” h(x)(where h(·)
represents either `(·),σ(·)or ω(·)) are smooth and model them by a latent GP on the log scale:
˜
h(x)GP (µh,Kh(x,x0;φh)) (4)
where ˜
h(·) = log h(·). These latent GPs are characterized by a constant mean µhand RBF kernels
with parameters φh= (`h, σh). (We assume noise-free latent functions, so ωh= 0.)1
To make learning the latent GPs efficient, we will use a set of M(shared) inducing points X
RM×D, which we treat as additional hyper-parameters. Let zh=˜
h(X)be the (log) outputs at these
locations for hyper-parameter h. Then we can infer the expected hyper-parameter value at any other
location xusing the usual GP prediction formula for the mean:
˜
h(x) = Kh(x,X)Kh(X,X)1zh(5)
Then we can compute Kf(x,x0)at any pair of inputs using Eq. (3), where h(x) = e˜
h(x).
2.2 Learning the hyper-parameters
In total, we can have up to 2MD + 2M+ 9 parameters: MD-dimensional inducing inputs X,MD
sized `(X)(ARD), Msized each {σ(X), ω(X)}and 9latent GP hyper-parameters ({µh, φh}).
φ= (µh, φh,zh)(6)
Let φX= (X, φ)represent all the model parameters. We can compute a MAP-type II estimate of
these parameters by minimizing the following objective using gradient descent:
ˆ
φX= arg min
φX[log p(y|X, φ, X) + log p(φ)] (7)
where p(y|X, φ)is the marginal likelihood of the main GP (integrating out f= [f(xn)]N
n=1), and
where p(φ)is the prior. To ensure smoothly varying latent GPs with a large length scale and low
variance, we use a Gamma(5, 1) prior for `and a Gamma(0.5, 1) prior for σ. To ease the optimization
process, we use a non-centered parameterization of zhby learning a vector γhwith independent
values (prior for γhis N(0,1)) and we then deterministically compute zh=Lγh, where, Lis
Cholesky decomposition of Kh(X,X). We initialize φby sampling from their respective priors,
and initialize Xby selecting Mpoints from the dataset.
2.3 Active learning
Active learning (see e.g., [7]) uses some measure of uncertainty to decide which points to label so
as to learn the underlying function as quickly as possible. This can also be useful for tasks such
1In practice, we use a reasonably small value (jitter) for numerical stability.
2
摘要:

UncertaintyDisentanglementwithNon-stationaryHeteroscedasticGaussianProcessesforActiveLearningZeelBPatelIITGandhinagar,IndiaNipunBatraIITGandhinagar,IndiaKevinMurphyGoogle,USAAbstractGaussianprocessesareBayesiannon-parametricmodelsusedinmanyareas.Inthiswork,weproposeaNon-stationaryHeteroscedasticGaus...

展开>> 收起<<
Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:463.14KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注