Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling

2025-05-03 0 0 913.96KB 40 页 10玖币
侵权投诉
Efficient Bayes Inference in Neural Networks through
Adaptive Importance Sampling
Yunshi Huanga, Emilie Chouzenouxb,, Víctor Elvirac, Jean-Christophe Pesquetb
aETS Montréal, Canada
bCVN, Inria Saclay, CentraleSupélec, Université Paris-Saclay, France
cUniversity of Edinburgh, UK
Abstract
Bayesian neural networks (BNNs) have received an increased interest in the last years.
In BNNs, a complete posterior distribution of the unknown weight and bias parame-
ters of the network is produced during the training stage. This probabilistic estimation
offers several advantages with respect to point-wise estimates, in particular, the ability
to provide uncertainty quantification when predicting new data. This feature inherent
to the Bayesian paradigm, is useful in countless machine learning applications. It is
particularly appealing in areas where decision-making has a crucial impact, such as
medical healthcare or autonomous driving. The main challenge of BNNs is the com-
putational cost of the training procedure since Bayesian techniques often face a severe
curse of dimensionality. Adaptive importance sampling (AIS) is one of the most promi-
nent Monte Carlo methodologies benefiting from sounded convergence guarantees and
ease for adaptation. This work aims to show that AIS constitutes a successful approach
for designing BNNs. More precisely, we propose a novel algorithm named PMC-
net that includes an efficient adaptation mechanism, exploiting geometric information
on the complex (often multimodal) posterior distribution. Numerical results illustrate
the excellent performance and the improved exploration capabilities of the proposed
method for both shallow and deep neural networks.
Keywords: Bayesian neural networks, adaptive importance sampling, Bayesian
inference, deep learning, confidence intervals, uncertainty quantification.
Corresponding author
Email address: emilie.chouzenoux@centralesupelec.fr (Emilie Chouzenoux)
Preprint submitted to XXX April 14, 2023
arXiv:2210.00993v2 [cs.LG] 13 Apr 2023
1. Introduction
Deep neural networks (DNNs) are often the current state-of-the-art for solving a
wide range of diverse tasks in machine learning. They consist in a cascade of linear
and nonlinear operators that are usually optimized from large amounts of labeled data
using back-propagation techniques. However, this optimization procedure often relies
on ad-hoc machinery which may not lead to relevant local minima without good numer-
ical recipes. Furthermore, it provides no information regarding the uncertainty of the
obtained predictions. However, uncertainty is inherent in machine learning, stemming
either from the noise in the data values, the statistical variability of the data distribu-
tion, the sample selection procedure, and the imperfect nature of any developed model.
Quantifying this uncertainty is of paramount importance in a wide array of applied
fields such as self-driving cars, medicine, or forecasting. Bayesian neural network
(BNN) approaches offer a grounded theoretical framework to tackle model uncertainty
in the context of DNNs [1].
In the Bayesian inference framework, a statistical model is assumed between the
unknown parameters and the given data in order to build a posterior distribution of
those unknowns conditioned to the data. However, for most practical models, the pos-
terior distribution is not available in a closed form, mostly due to intractable integrals,
and approximations must be performed via Monte Carlo (MC) methods [2]. Impor-
tance sampling (IS) is a Monte Carlo family of methods that consists in simulating
random samples from a proposal distribution and weighting them properly with the
aim of building consistent estimators of the moments of the posterior distribution. The
performance of IS depends on the choice of the proposal distribution [3, 4, 5]. Adap-
tive IS (AIS) is an iterative version of IS where the proposal distributions are adapted
based on their performance at previous iterations [6]. In the last decade, many AIS
algorithms have been proposed in the literature [7, 8, 9, 10, 11, 12, 13]. However, two
main challenges still exist and need to be tackled. First, few AIS algorithms adapt
the scale parameter, which is problematic when the unknowns have different orders of
magnitude. For instance, the covariance matrix is adapted via robust moment match-
2
ing strategies in [13, 14]. Second, the use of the geometry of the target for adaptation
rule has only been explored scarcely in the recent AIS literature [15, 16, 17]. On the
one hand, optimization-based schemes have been proposed to accelerate MCMC algo-
rithms convergence [18, 19, 20], such as in Metropolis adjusted Langevin algorithm
(MALA), which combines an unadjusted Langevin (ULA) update with an acceptance-
rejection step. MALA performance can be further improved by a preconditioning strat-
egy [21, 22]. The recent SL-PMC algorithm [23] is up to our knowledge the only AIS-
based method that exploits first and second-order information on the target to adapt
both the location and scale parameters of the proposals.
BNN inference is usually performed using the variational Bayesian technique [24,
25, 26], which consists in constructing a tractable approximation to the posterior distri-
bution (e.g., based on a mean field approximation). However, the results may be sen-
sitive to the approximation error and to initialization. Promising results have recently
been reached by using MC sampling strategies instead. Again, a key ingredient for
good performance lies in an efficient adaptation strategy, usually by relying on tools
from optimization. The stochastic gradient Langevin dynamics method from [27], a
mini-batched version of ULA, seems now to be able to reach state-of-the-art results
with reasonable computational cost, as illustrated in [28, 29]. One can also mention
the Hamiltonian MC sampler with local scale adaptation, proposed in [30]. In [31], the
dropout in the neural network is given by an approximation to the probabilistic deep
Gaussian process. In [32], the method called Sequential Anchored Ensembles, trains
the ensemble sequentially starting from the previous solution to reduce the computa-
tional cost of the training process.
In this paper, we propose the first AIS algorithm for BNN inference. IS-based
methods have several advantages w.r.t. MCMC, e.g., all the generated samples are
employed in the estimation (i.e., there is no “burn-in” period) and the corresponding
adaptive schemes are more flexible (see the theoretical issues of adaptive MCMC in
[2, Section 7.6.3],[33]). In return, the challenge is to design adaptive mechanisms
for the proposal densities in order to iteratively improve the performance of the IS
estimators [6]. We develop a new strategy to adapt efficiently the proposal using a
scaled ULA step. The scaling matrix is adapted via robust covariance estimators, using
3
the weighted samples of AIS, thus avoiding the computation of a costly Hessian matrix.
Another novelty is the joint mean and covariance adaptation, offering the advantage
of fitting the proposal distributions locally, boosting the exploration and increasing the
performance. The most noteworthy feature of the proposed novel approach is its ability
to provide meaningful uncertainty quantification with a reasonable computation cost.
Numerical experiments on classification and regression problems illustrate the effi-
ciency of our method when compared to a state-of-the-art back-propagation procedure
and other BNN methods. The outline of the paper is as follows. Section 2 introduces
the problem and notation related to Bayesian inference in machine learning, and recall
the principle of AIS with proposal adaptation. Section 3 presents the BNN inference
problem and the proposed AIS algorithm. Section 4 provides numerical results and
Section 5 concludes the paper.
2. Motivating framework and background
2.1. Bayesian inference in supervised machine learning
Supervised machine learning aims at estimating a vector of unknown parameters
θRdθfrom a training set of Ntrain input/output pairs of data nx(n)
0,y(n)o1nNtrain
Rdx×Rdy. Let us denote by X0Rdx×Ntrain , and YRdy×Ntrain the columnwise concate-
nation of nx(n)
0o1nNtrain
, and ny(n)o1nNtrain
, respectively. The unknown θis related
to X0and Ythrough a statistical model given by the likelihood function `(Y|θ,X0).
The prior probabilistic knowledge about the unknown is summarized in p(θ),θbeing
assumed to be independent of X0. In probabilistic machine learning, the goal is then to
infer the posterior distribution
p(θ|X0,Y) = `(Y|θ,X0)p(θ)
Z(X0,Y):=e
π(θ)
π(θ),(1)
where π(θ):=`(Y|θ,X0)p(θ)and Z=Rπ(θ)dθ.1
1We now drop Yand X0in Z,π(θ), and e
π(θ)to alleviate the notation.
4
Usually we are also interested in computing integrals of the form
I=Zh(θ)e
π(θ)dθ,(2)
where his any integrable function w.r.t. e
π(θ). However, realistic predictive models in
machine learning include non-linearities (e.g., sigmoid activation functions) and loss
functions corresponding to non-Gaussian potentials (e.g., cross-entropy). Hence, nei-
ther Eq. (2) nor the normalizing constant Zcan be computed easily. In this case, we
resort to sampling methods to find approximations to the posterior distribution and get
access to the uncertainty in the estimation.
2.2. Adaptive Importance Sampling
In the following, we briefly describe the basic importance sampling (IS) methodol-
ogy and state-of-the-art adaptive IS (AIS) algorithms.
2.2.1. Importance sampling (IS)
Importance sampling (IS) is a Monte Carlo methodology to approximate intractable
integrals. The standard IS implementation is composed of two steps. First, Ksam-
ples are simulated from the so-called proposal distribution q(·), as θkq(θ),k
{1,...,K}. Second, each sample is assigned an importance weight computed as wk=
π(θk)
q(θk),k∈ {1,...,K}. The targeted integral given by Eq. (2) can be approximated by
the self-normalized IS (SNIS) estimator given by
e
I=
K
k=1
wkh(θk),(3)
where wk=wk.K
j=1wjare the normalized weights. The key lies in the selection of
q(θ), which must be nonzero for every θsuch that h(θ)e
π(θ)>0. For a generic h(θ)
(or a bunch of them), a common strategy is to find the proposal q(θ)that minimizes
in some sense (e.g., χ2divergence [34]) the mismatch with the target e
π(θ). However,
since it is usually impossible to know in advance the best proposal, adaptive mecha-
nisms are employed.
5
摘要:

EfcientBayesInferenceinNeuralNetworksthroughAdaptiveImportanceSamplingYunshiHuanga,EmilieChouzenouxb,,VíctorElvirac,Jean-ChristophePesquetbaETSMontréal,CanadabCVN,InriaSaclay,CentraleSupélec,UniversitéParis-Saclay,FrancecUniversityofEdinburgh,UKAbstractBayesianneuralnetworks(BNNs)havereceivedaninc...

展开>> 收起<<
Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling.pdf

共40页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:40 页 大小:913.96KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 40
客服
关注