Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning

2025-05-06 1 0 463.14KB 6 页 10玖币

侵权投诉

Uncertainty Disentanglement with Non-stationary

Heteroscedastic Gaussian Processes for Active

Learning

Zeel B Patel

IIT Gandhinagar, India

Nipun Batra

IIT Gandhinagar, India

Kevin Murphy

Google, USA

Abstract

Gaussian processes are Bayesian non-parametric models used in many areas. In

this work, we propose a Non-stationary Heteroscedastic Gaussian process model

which can be learned with gradient-based techniques. We demonstrate the in-

terpretability of the proposed model by separating the overall uncertainty into

aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usabil-

ity of derived epistemic uncertainty on active learning problems. We demonstrate

the efﬁcacy of our model with various ablations on multiple datasets.

1 Introduction

Gaussian processes (GPs) are Bayesian non-parametric models useful for many real-world regres-

sion and classiﬁcation problems. The key object required to deﬁne a GP is the kernel function

K(x,x0), which measures the similarity of the input points. A common choice is the RBF ker-

nel K(x,x0;θ) = σexp −||x−x

0||2

2`2where `is the length scale, and σis the signal variance. In

regression problems, we also often have observation noise with variance ω2. These three hyper-

parameters, θ= (`, σ, ω), are often learned by optimizing the negative log marginal likelihood.

However, this model uses a stationary kernel (depended only on the distance between locations)

and homoskedastic noise (constant noise variance (ω2)), and these assumptions might not hold in

real-life applications such as environment modeling [1, 2]. In particular, non-stationary kernels are

necessary if the similarity of two inputs may depend on their location in the input space. Similarly,

heteroskedastic noise may be necessary if the quality of the measurements may vary across space.

In this short paper, we provide a computationally efﬁcient way to create GPs with non-stationary

kernels, and heteroskedastic noise, by using a Gibbs kernel [3]. Gibbs kernel can be consid-

ered the generalization of the RBF kernel where the hyper-parameters are input-dependent, i.e.,

θ(x) = (`(x), σ(x), ω(x)). These three hyper-parameter functions are themselves represented by

a “latent” GP, as in [4, 5]. In contrast to prior work, which uses EP or HMC, we use inducing

point approximations to speed up the computation of this latent GP (which is needed to evaluate

the kernel). In addition, we show how modeling variation in all three hyper-parameters allows us to

distinguish locations where the latent function value is uncertain (epistemic uncertainty), as opposed

to locations where the observation noise is high (aleatoric uncertainty). This distinction is crucial

for problems such as active learning and efﬁcient sensor placement. (c.f. [6]).

2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems.

arXiv:2210.10964v1 [cs.LG] 20 Oct 2022

2 Methods

2.1 Non-stationary Heteroscedastic Gaussian processes

Given observations y∈RNat inputs X∈RN×D, we assume the following model:

y(x) = f(x) + ε(x), ε(x)∼ N 0, ω(x)2(1)

f(x)∼GP (0,Kf(x,x0)) (2)

where the kernel function is K(x,x0) = cov(f(x), f(x0)) and ε(x)is zero mean noise. We use the

following non-stationary kernel function [3]:

Kf(x,x0) = σ(x)σ(x0)s2`(x)`(x0)

`(x)2+`(x0)2exp −||x−x0||2

`(x)2+`(x0)2(3)

We assume all hyperparameters of the model, θ(x)=(`(x), σ(x), ω(x)), may be input dependent,

to allow for non-stationarity and heteroscedasticity (Previous work [5] has shown that such a ker-

nel is a valid positive semi-deﬁnite kernel). We assume these “hyper-functions” h(x)(where h(·)

represents either `(·),σ(·)or ω(·)) are smooth and model them by a latent GP on the log scale:

h(x)∼GP (µh,Kh(x,x0;φh)) (4)

where ˜

h(·) = log h(·). These latent GPs are characterized by a constant mean µhand RBF kernels

with parameters φh= (`h, σh). (We assume noise-free latent functions, so ωh= 0.)1

To make learning the latent GPs efﬁcient, we will use a set of M(shared) inducing points X∈

RM×D, which we treat as additional hyper-parameters. Let zh=˜

h(X)be the (log) outputs at these

locations for hyper-parameter h. Then we can infer the expected hyper-parameter value at any other

location xusing the usual GP prediction formula for the mean:

h(x) = Kh(x,X)Kh(X,X)−1zh(5)

Then we can compute Kf(x,x0)at any pair of inputs using Eq. (3), where h(x) = e˜

h(x).

2.2 Learning the hyper-parameters

In total, we can have up to 2MD + 2M+ 9 parameters: MD-dimensional inducing inputs X,MD

sized `(X)(ARD), Msized each {σ(X), ω(X)}and 9latent GP hyper-parameters ({µh, φh}).

φ= (µh, φh,zh)(6)

Let φX= (X, φ)represent all the model parameters. We can compute a MAP-type II estimate of

these parameters by minimizing the following objective using gradient descent:

φX= arg min

φX−[log p(y|X, φ, X) + log p(φ)] (7)

where p(y|X, φ)is the marginal likelihood of the main GP (integrating out f= [f(xn)]N

n=1), and

where p(φ)is the prior. To ensure smoothly varying latent GPs with a large length scale and low

variance, we use a Gamma(5, 1) prior for `and a Gamma(0.5, 1) prior for σ. To ease the optimization

process, we use a non-centered parameterization of zhby learning a vector γhwith independent

values (prior for γhis N(0,1)) and we then deterministically compute zh=Lγh, where, Lis

Cholesky decomposition of Kh(X,X). We initialize φby sampling from their respective priors,

and initialize Xby selecting Mpoints from the dataset.

2.3 Active learning

Active learning (see e.g., [7]) uses some measure of uncertainty to decide which points to label so

as to learn the underlying function as quickly as possible. This can also be useful for tasks such

1In practice, we use a reasonably small value (jitter) for numerical stability.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UncertaintyDisentanglementwithNon-stationaryHeteroscedasticGaussianProcessesforActiveLearningZeelBPatelIITGandhinagar,IndiaNipunBatraIITGandhinagar,IndiaKevinMurphyGoogle,USAAbstractGaussianprocessesareBayesiannon-parametricmodelsusedinmanyareas.Inthiswork,weproposeaNon-stationaryHeteroscedasticGaus...

展开>> 收起<<

Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: