Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

2025-05-03 0 0 913.96KB 40 页 10玖币

侵权投诉

Efﬁcient Bayes Inference in Neural Networks through

Adaptive Importance Sampling

Yunshi Huanga, Emilie Chouzenouxb,∗, Víctor Elvirac, Jean-Christophe Pesquetb

aETS Montréal, Canada

bCVN, Inria Saclay, CentraleSupélec, Université Paris-Saclay, France

cUniversity of Edinburgh, UK

Abstract

Bayesian neural networks (BNNs) have received an increased interest in the last years.

In BNNs, a complete posterior distribution of the unknown weight and bias parame-

ters of the network is produced during the training stage. This probabilistic estimation

offers several advantages with respect to point-wise estimates, in particular, the ability

to provide uncertainty quantiﬁcation when predicting new data. This feature inherent

to the Bayesian paradigm, is useful in countless machine learning applications. It is

particularly appealing in areas where decision-making has a crucial impact, such as

medical healthcare or autonomous driving. The main challenge of BNNs is the com-

putational cost of the training procedure since Bayesian techniques often face a severe

curse of dimensionality. Adaptive importance sampling (AIS) is one of the most promi-

nent Monte Carlo methodologies beneﬁting from sounded convergence guarantees and

ease for adaptation. This work aims to show that AIS constitutes a successful approach

for designing BNNs. More precisely, we propose a novel algorithm named PMC-

net that includes an efﬁcient adaptation mechanism, exploiting geometric information

on the complex (often multimodal) posterior distribution. Numerical results illustrate

the excellent performance and the improved exploration capabilities of the proposed

method for both shallow and deep neural networks.

Keywords: Bayesian neural networks, adaptive importance sampling, Bayesian

inference, deep learning, conﬁdence intervals, uncertainty quantiﬁcation.

∗Corresponding author

Email address: emilie.chouzenoux@centralesupelec.fr (Emilie Chouzenoux)

Preprint submitted to XXX April 14, 2023

arXiv:2210.00993v2 [cs.LG] 13 Apr 2023

1. Introduction

Deep neural networks (DNNs) are often the current state-of-the-art for solving a

wide range of diverse tasks in machine learning. They consist in a cascade of linear

and nonlinear operators that are usually optimized from large amounts of labeled data

using back-propagation techniques. However, this optimization procedure often relies

on ad-hoc machinery which may not lead to relevant local minima without good numer-

ical recipes. Furthermore, it provides no information regarding the uncertainty of the

obtained predictions. However, uncertainty is inherent in machine learning, stemming

either from the noise in the data values, the statistical variability of the data distribu-

tion, the sample selection procedure, and the imperfect nature of any developed model.

Quantifying this uncertainty is of paramount importance in a wide array of applied

ﬁelds such as self-driving cars, medicine, or forecasting. Bayesian neural network

(BNN) approaches offer a grounded theoretical framework to tackle model uncertainty

in the context of DNNs [1].

In the Bayesian inference framework, a statistical model is assumed between the

unknown parameters and the given data in order to build a posterior distribution of

those unknowns conditioned to the data. However, for most practical models, the pos-

terior distribution is not available in a closed form, mostly due to intractable integrals,

and approximations must be performed via Monte Carlo (MC) methods [2]. Impor-

tance sampling (IS) is a Monte Carlo family of methods that consists in simulating

random samples from a proposal distribution and weighting them properly with the

aim of building consistent estimators of the moments of the posterior distribution. The

performance of IS depends on the choice of the proposal distribution [3, 4, 5]. Adap-

tive IS (AIS) is an iterative version of IS where the proposal distributions are adapted

based on their performance at previous iterations [6]. In the last decade, many AIS

algorithms have been proposed in the literature [7, 8, 9, 10, 11, 12, 13]. However, two

main challenges still exist and need to be tackled. First, few AIS algorithms adapt

the scale parameter, which is problematic when the unknowns have different orders of

magnitude. For instance, the covariance matrix is adapted via robust moment match-

ing strategies in [13, 14]. Second, the use of the geometry of the target for adaptation

rule has only been explored scarcely in the recent AIS literature [15, 16, 17]. On the

one hand, optimization-based schemes have been proposed to accelerate MCMC algo-

rithms convergence [18, 19, 20], such as in Metropolis adjusted Langevin algorithm

(MALA), which combines an unadjusted Langevin (ULA) update with an acceptance-

rejection step. MALA performance can be further improved by a preconditioning strat-

egy [21, 22]. The recent SL-PMC algorithm [23] is up to our knowledge the only AIS-

based method that exploits ﬁrst and second-order information on the target to adapt

both the location and scale parameters of the proposals.

BNN inference is usually performed using the variational Bayesian technique [24,

25, 26], which consists in constructing a tractable approximation to the posterior distri-

bution (e.g., based on a mean ﬁeld approximation). However, the results may be sen-

sitive to the approximation error and to initialization. Promising results have recently

been reached by using MC sampling strategies instead. Again, a key ingredient for

good performance lies in an efﬁcient adaptation strategy, usually by relying on tools

from optimization. The stochastic gradient Langevin dynamics method from [27], a

mini-batched version of ULA, seems now to be able to reach state-of-the-art results

with reasonable computational cost, as illustrated in [28, 29]. One can also mention

the Hamiltonian MC sampler with local scale adaptation, proposed in [30]. In [31], the

dropout in the neural network is given by an approximation to the probabilistic deep

Gaussian process. In [32], the method called Sequential Anchored Ensembles, trains

the ensemble sequentially starting from the previous solution to reduce the computa-

tional cost of the training process.

In this paper, we propose the ﬁrst AIS algorithm for BNN inference. IS-based

methods have several advantages w.r.t. MCMC, e.g., all the generated samples are

employed in the estimation (i.e., there is no “burn-in” period) and the corresponding

adaptive schemes are more ﬂexible (see the theoretical issues of adaptive MCMC in

[2, Section 7.6.3],[33]). In return, the challenge is to design adaptive mechanisms

for the proposal densities in order to iteratively improve the performance of the IS

estimators [6]. We develop a new strategy to adapt efﬁciently the proposal using a

scaled ULA step. The scaling matrix is adapted via robust covariance estimators, using

the weighted samples of AIS, thus avoiding the computation of a costly Hessian matrix.

Another novelty is the joint mean and covariance adaptation, offering the advantage

of ﬁtting the proposal distributions locally, boosting the exploration and increasing the

performance. The most noteworthy feature of the proposed novel approach is its ability

to provide meaningful uncertainty quantiﬁcation with a reasonable computation cost.

Numerical experiments on classiﬁcation and regression problems illustrate the efﬁ-

ciency of our method when compared to a state-of-the-art back-propagation procedure

and other BNN methods. The outline of the paper is as follows. Section 2 introduces

the problem and notation related to Bayesian inference in machine learning, and recall

the principle of AIS with proposal adaptation. Section 3 presents the BNN inference

problem and the proposed AIS algorithm. Section 4 provides numerical results and

Section 5 concludes the paper.

2. Motivating framework and background

2.1. Bayesian inference in supervised machine learning

Supervised machine learning aims at estimating a vector of unknown parameters

θ∈Rdθfrom a training set of Ntrain input/output pairs of data nx(n)

0,y(n)o1≤n≤Ntrain ∈

Rdx×Rdy. Let us denote by X0∈Rdx×Ntrain , and Y∈Rdy×Ntrain the columnwise concate-

nation of nx(n)

0o1≤n≤Ntrain

, and ny(n)o1≤n≤Ntrain

, respectively. The unknown θis related

to X0and Ythrough a statistical model given by the likelihood function `(Y|θ,X0).

The prior probabilistic knowledge about the unknown is summarized in p(θ),θbeing

assumed to be independent of X0. In probabilistic machine learning, the goal is then to

infer the posterior distribution

p(θ|X0,Y) = `(Y|θ,X0)p(θ)

Z(X0,Y):=e

π(θ)

∝π(θ),(1)

where π(θ):=`(Y|θ,X0)p(θ)and Z=Rπ(θ)dθ.1

1We now drop Yand X0in Z,π(θ), and e

π(θ)to alleviate the notation.

Usually we are also interested in computing integrals of the form

I=Zh(θ)e

π(θ)dθ,(2)

where his any integrable function w.r.t. e

π(θ). However, realistic predictive models in

machine learning include non-linearities (e.g., sigmoid activation functions) and loss

functions corresponding to non-Gaussian potentials (e.g., cross-entropy). Hence, nei-

ther Eq. (2) nor the normalizing constant Zcan be computed easily. In this case, we

resort to sampling methods to ﬁnd approximations to the posterior distribution and get

access to the uncertainty in the estimation.

2.2. Adaptive Importance Sampling

In the following, we brieﬂy describe the basic importance sampling (IS) methodol-

ogy and state-of-the-art adaptive IS (AIS) algorithms.

2.2.1. Importance sampling (IS)

Importance sampling (IS) is a Monte Carlo methodology to approximate intractable

integrals. The standard IS implementation is composed of two steps. First, Ksam-

ples are simulated from the so-called proposal distribution q(·), as θk∼q(θ),k∈

{1,...,K}. Second, each sample is assigned an importance weight computed as wk=

π(θk)

q(θk),k∈ {1,...,K}. The targeted integral given by Eq. (2) can be approximated by

the self-normalized IS (SNIS) estimator given by

∑

k=1

wkh(θk),(3)

where wk=wk.∑K

j=1wjare the normalized weights. The key lies in the selection of

q(θ), which must be nonzero for every θsuch that h(θ)e

π(θ)>0. For a generic h(θ)

(or a bunch of them), a common strategy is to ﬁnd the proposal q(θ)that minimizes

in some sense (e.g., χ2divergence [34]) the mismatch with the target e

π(θ). However,

since it is usually impossible to know in advance the best proposal, adaptive mecha-

nisms are employed.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientBayesInferenceinNeuralNetworksthroughAdaptiveImportanceSamplingYunshiHuanga,EmilieChouzenouxb,,VíctorElvirac,Jean-ChristophePesquetbaETSMontréal,CanadabCVN,InriaSaclay,CentraleSupélec,UniversitéParis-Saclay,FrancecUniversityofEdinburgh,UKAbstractBayesianneuralnetworks(BNNs)havereceivedaninc...

展开>> 收起<<

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling.pdf

共40页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: