Gradient-based Adaptive Importance Samplers

2025-05-06 0 0 1.44MB 36 页 10玖币

侵权投诉

arXiv:2210.10785v3 [stat.CO] 21 Jun 2023

Víctor Elviraa,∗, Émilie Chouzenouxb, Ömer Deniz Akyildizc, Luca Martinod

aSchool of Mathematics, University of Edinburgh, UK.

bCVN, INRIA Saclay, CentraleSupélec, France.

cDept. of Mathematics, Imperial College London, UK.

dUniversidad Rey Juan Carlos, Spain.

Abstract

Importance sampling (IS) is a powerful Monte Carlo methodology for the approxima-

tion of intractable integrals, very often involving a target probability distribution. The

performance of IS heavily depends on the appropriate selection of the proposal distri-

butions where the samples are simulated from. In this paper, we propose an adaptive

importance sampler, called GRAMIS, that iteratively improves the set of proposals.

The algorithm exploits geometric information of the target to adapt the location and

scale parameters of those proposals. Moreover, in order to allow for a cooperative

adaptation, a repulsion term is introduced that favors a coordinated exploration of the

state space. This translates into a more diverse exploration and a better approximation

of the target via the mixture of proposals. Moreover, we provide a theoretical justi-

ﬁcation of the repulsion term. We show the good performance of GRAMIS in two

problems where the target has a challenging shape and cannot be easily approximated

by a standard uni-modal proposal.

Keywords: Adaptive importance sampling, Monte Carlo, Bayesian inference,

Langevin adaptation, Poisson ﬁeld, Gaussian mixture

1. Introduction

The approximation of intractable integrals is a common statistical task in many

applications of science and engineering. A relevant example is the case of Bayesian

∗Corresponding author

Email address: victor.elvira@ed.ac.uk (Víctor Elvira)

Preprint submitted to Journal of The Franklin Institute June 22, 2023

inference arising for instance in statistical machine learning. A posterior distribution

of unknown parameters is constructed by combining data (through a likelihood model)

and previous information (through the prior distribution). Unfortunately, the posterior

is often unavailable, typically due to the intractability of the marginal likelihood.

Monte Carlo methods have proved to be effective in those relevant problems, pro-

ducing a set of random samples that can be used to approximate a target probability

distribution and integrals related to it [1, 2, 3]. Importance sampling (IS) is one of

the main Monte Carlo families, with solid theoretical guarantees [3, 4]. The vanilla

version of IS simulates samples from the so-called proposal distribution. Then, each

sample receives an associated weight which is computed by taking into account the

mismatch between this proposal probability density function (pdf) and the target pdf.

While the IS mechanism is valid under very mild assumptions [3, 4], the efﬁciency of

the estimators is driven by the choice of the proposal distribution. Unfortunately this

choice is a difﬁcult task, even more in contexts where one has access to the evaluation

of an unnormalized version of the posterior distribution, as it is the case in Bayesian

inference. The two main methodological directions to overcome the limitations in IS

are the usage of several proposals, which is known as multiple IS (MIS) [5], and the

iterative adaptation of those proposals, known as adaptive importance sampling (AIS)

[6].

There exist a plethora of AIS algorithms, and we refer the interested reader to [6].

There are also strong theoretical guarantees for some subclasses of AIS algorithms, see,

e.g., [7, 8, 9]. These AIS methods can be arguably divided in three main categories.

The ﬁrst category is based on sequential moment matching and includes algorithms

that implement Rao-Blackwellization in the temporal estimators ([10, 11]), extend to

the MIS setting [12], or are based on a sequence of transformations that can be inter-

preted as a change of proposal [13]. A second AIS family comprises the population

Monte Carlo (PMC) methods which use resampling mechanisms to adapt the propos-

als [14, 15]. The PMC framework was ﬁrst introduced in [16] and then extended by

the incorporation of stochastic expectation-maximization mechanisms [17], clipping

of the importance weights [18, 19], improving the weighting and resampling mecha-

nisms [20, 21], targeting the estimation of rare-event probabilities [22], or introducing

optimization schemes [23].

Finally, a third category contains AIS methods with a hierarchical or layered struc-

ture. Examples of these algorithms are those that adapt the location parameters of the

proposals using a Markov chain Monte Carlo (MCMC) mechanism [24, 25, 26, 27].

In this category, we also include methods that exploit geometric information about

the target for the adaptation of the location parameters, yielding to optimization-based

adaptation schemes. In the layered mechanism, the past samples do not affect the

proposal adaptation which is rather driven by the geometric properties of the targeted

density. However, there also exists hybrid mechanisms, e.g., the O-PMC framework

which implements resampling and also incorporates geometric information [23].

1.1. Contribution within the state of the art

In this paper, we propose the gradient-based adaptive multiple importance sampling

(GRAMIS) method, which falls within the layered family of AIS algorithms. Its main

feature is the exploitation of geometric information about the target by incorporating

an optimization approach. It has been shown that geometric-based optimization mech-

anisms improve the convergence rate in MCMC algorithms (see the review in [28]) and

in AIS methods (e.g., [23]). In the context of MCMC, the methods in [29, 30, 31] are

called Metropolis adjusted Langevin algorithms (MALA). The Langevin-based adapta-

tion included in their MCMC proposal updates reads as a noisy gradient descent (called

drift term) that favors the exploration of the target in areas with larger probability mass,

resulting in a larger acceptance probability in the Metropolis-Hastings (MH) step. Pre-

conditioning can be added for a further improvement of the exploration. To do so, local

curvature information about the target density is used to build a matrix scaling the drift

term. Fisher metric [32], Hessian matrix [33, 34, 35], or a tractable approximation of

it [36, 37, 38, 39] have been considered for that purpose. Within AIS, the algorithms

in [40, 41] adapt the location parameters via several steps of the unadjusted Langevin

algorithm (ULA) [42]. GRAMIS also connects with the Stein variational gradient de-

scent (SVGD) algorithm [43] in the spirit of enhancing the exploratory behavior of

the adaptive algorithm. One difference is that SVGD is an MCMC algorithm, while

GRAMIS belongs to the family of layered AIS algorithms. Thus, compared to SVGD,

GRAMIS requires less stringent convergence guarantees in the adaptive process for its

IS estimators to be consistent. Another difference is that the repulsion term in SVGD

builds upon reproducing kernel Hilbert space (RKHS) arguments, while in GRAMIS

we justify the repulsion by connecting it to Poisson ﬁelds, which also translates into a

different adaptive scheme.

A limitation that is present in most AIS algorithms is the lack of adaptation of the

scale parameters of the proposals, e.g., the covariance matrices in case of Gaussian

proposals. However, suitable scale parameters are essential for an adequate perfor-

mance of the AIS algorithms, and the alternative to their adaptation is setting them a

priori, which is in general a difﬁcult task. The inefﬁciency of AIS algorithms that do

not implement adaptation in the scale parameters is particularly damaging in high di-

mensions and where the target presents strong correlations that are unknown a priori.

A covariance adaptation has been explored via robust moment-matching mechanisms

in [44, 45], second-order information in [41, 23], and sample autocorrelation [40]. The

proposed GRAMIS algorithm implements an adaptation of the covariance by using

second-order information (when it yields to a deﬁnite positive matrix). In particular,

the covariance adaptation of each proposal is adapted by using the Hessian of the log-

arithm of the target, evaluated at the updated location parameter. The second-order

information is also used to pre-condition the gradient in the adaptation of the location

parameters.

Another limitation in AIS algorithms is the lack of cooperation (or insufﬁcient co-

operation) between the multiple proposals at the adaptation stage. Some of the few

algorithms that implement a type of cooperation can be found in [17] through a prob-

abilistic clustering of all samples, and in [20] through a resampling that use the deter-

ministic mixture weights. In the paper, we implement an explicit repulsion between

the proposals during the adaptation stage in order to improve the exploration of the

algorithm.

Finally, GRAMIS implements the balance-heuristic estimator (also called deter-

ministic mixture) importance weights [46, 47, 48], which have been shown a theoret-

ical superiority (in the unnormalized IS estimators) [5] and a superior performance in

other types of AIS algorithms (e.g., in [49, 20, 23]).1

1.2. Notation

We denote by kxk=√x⊤xthe Euclidean norm of x∈Rd, where ⊤is the transpose

operation and Rdis the d-dimensional Euclidean space. ∇and ∇2denote the ﬁrst and

second differential operators in x, i.e., resulting in the gradient vector and the Hessian

matrix, respectively. The operator Σ≻0 is used to describe Σas a positive deﬁnite

matrix. Iddis the identity matrix of dimension d. Bold symbols are used for vectors

and matrices. We use the pnotation for probability density functions (pdf) only when

it is unambiguous.

1.3. Structure of the paper

The rest of the paper is structured as follows. Section 2 introduces the problem and

relevant background. In Section 3, we describe the GRAMIS algorithm. We provide

numerical examples in Section 4. Section 5 closes the paper with some conclusion.

2. Background in importance sampling

We motivate importance sampling (IS) by considering the Bayesian inference setup,

where the goal is characterizing the posterior distribution

(x|y) = ℓ(y|x)

0(x)

Z(y),(1)

1The GRAMIS algorithm builds upon the GAPIS algorithm, which was presented in the conference

paper [50]. In GRAMIS, on top of updating the location and scale parameters of the proposal parameters

by incorporating a repulsion term and using the ﬁrst and second-order information of the target, we go

beyond our preliminary work in the three following ways. First, we pre-condition the gradient steps in the

adaptation of the location parameters by using second-order information. In this way, we have a much more

efﬁcient adaptation, and we reduce a hyper-parameter w.r.t. the GAPIS algorithm. Second, we ﬁnd a strong

connection with Poisson ﬁelds, which supports the theoretical justiﬁcation of the algorithm and opens the

door for further analysis and methodological improvements. Finally, based on this connection, we crucially

modify the repulsion term between proposals (e.g., exponentiating the distance between proposal to the

dimension of the latent space), which allows the method to scale in dimensions.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

arXiv:2210.10785v3[stat.CO]21Jun2023Gradient-basedAdaptiveImportanceSamplersVíctorElviraa,∗,ÉmilieChouzenouxb,ÖmerDenizAkyildizc,LucaMartinodaSchoolofMathematics,UniversityofEdinburgh,UK.bCVN,INRIASaclay,CentraleSupélec,France.cDept.ofMathematics,ImperialCollegeLondon,UK.dUniversidadReyJuanCarlos,Sp...

展开>> 收起<<

Gradient-based Adaptive Importance Samplers.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Gradient-based Adaptive Importance Samplers

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: