Gradient-based Adaptive Importance Samplers

2025-05-06 0 0 1.44MB 36 页 10玖币
侵权投诉
arXiv:2210.10785v3 [stat.CO] 21 Jun 2023
Gradient-based Adaptive Importance Samplers
Víctor Elviraa,, Émilie Chouzenouxb, Ömer Deniz Akyildizc, Luca Martinod
aSchool of Mathematics, University of Edinburgh, UK.
bCVN, INRIA Saclay, CentraleSupélec, France.
cDept. of Mathematics, Imperial College London, UK.
dUniversidad Rey Juan Carlos, Spain.
Abstract
Importance sampling (IS) is a powerful Monte Carlo methodology for the approxima-
tion of intractable integrals, very often involving a target probability distribution. The
performance of IS heavily depends on the appropriate selection of the proposal distri-
butions where the samples are simulated from. In this paper, we propose an adaptive
importance sampler, called GRAMIS, that iteratively improves the set of proposals.
The algorithm exploits geometric information of the target to adapt the location and
scale parameters of those proposals. Moreover, in order to allow for a cooperative
adaptation, a repulsion term is introduced that favors a coordinated exploration of the
state space. This translates into a more diverse exploration and a better approximation
of the target via the mixture of proposals. Moreover, we provide a theoretical justi-
fication of the repulsion term. We show the good performance of GRAMIS in two
problems where the target has a challenging shape and cannot be easily approximated
by a standard uni-modal proposal.
Keywords: Adaptive importance sampling, Monte Carlo, Bayesian inference,
Langevin adaptation, Poisson field, Gaussian mixture
1. Introduction
The approximation of intractable integrals is a common statistical task in many
applications of science and engineering. A relevant example is the case of Bayesian
Corresponding author
Email address: victor.elvira@ed.ac.uk (Víctor Elvira)
Preprint submitted to Journal of The Franklin Institute June 22, 2023
inference arising for instance in statistical machine learning. A posterior distribution
of unknown parameters is constructed by combining data (through a likelihood model)
and previous information (through the prior distribution). Unfortunately, the posterior
is often unavailable, typically due to the intractability of the marginal likelihood.
Monte Carlo methods have proved to be effective in those relevant problems, pro-
ducing a set of random samples that can be used to approximate a target probability
distribution and integrals related to it [1, 2, 3]. Importance sampling (IS) is one of
the main Monte Carlo families, with solid theoretical guarantees [3, 4]. The vanilla
version of IS simulates samples from the so-called proposal distribution. Then, each
sample receives an associated weight which is computed by taking into account the
mismatch between this proposal probability density function (pdf) and the target pdf.
While the IS mechanism is valid under very mild assumptions [3, 4], the efficiency of
the estimators is driven by the choice of the proposal distribution. Unfortunately this
choice is a difficult task, even more in contexts where one has access to the evaluation
of an unnormalized version of the posterior distribution, as it is the case in Bayesian
inference. The two main methodological directions to overcome the limitations in IS
are the usage of several proposals, which is known as multiple IS (MIS) [5], and the
iterative adaptation of those proposals, known as adaptive importance sampling (AIS)
[6].
There exist a plethora of AIS algorithms, and we refer the interested reader to [6].
There are also strong theoretical guarantees for some subclasses of AIS algorithms, see,
e.g., [7, 8, 9]. These AIS methods can be arguably divided in three main categories.
The first category is based on sequential moment matching and includes algorithms
that implement Rao-Blackwellization in the temporal estimators ([10, 11]), extend to
the MIS setting [12], or are based on a sequence of transformations that can be inter-
preted as a change of proposal [13]. A second AIS family comprises the population
Monte Carlo (PMC) methods which use resampling mechanisms to adapt the propos-
als [14, 15]. The PMC framework was first introduced in [16] and then extended by
the incorporation of stochastic expectation-maximization mechanisms [17], clipping
of the importance weights [18, 19], improving the weighting and resampling mecha-
nisms [20, 21], targeting the estimation of rare-event probabilities [22], or introducing
2
optimization schemes [23].
Finally, a third category contains AIS methods with a hierarchical or layered struc-
ture. Examples of these algorithms are those that adapt the location parameters of the
proposals using a Markov chain Monte Carlo (MCMC) mechanism [24, 25, 26, 27].
In this category, we also include methods that exploit geometric information about
the target for the adaptation of the location parameters, yielding to optimization-based
adaptation schemes. In the layered mechanism, the past samples do not affect the
proposal adaptation which is rather driven by the geometric properties of the targeted
density. However, there also exists hybrid mechanisms, e.g., the O-PMC framework
which implements resampling and also incorporates geometric information [23].
1.1. Contribution within the state of the art
In this paper, we propose the gradient-based adaptive multiple importance sampling
(GRAMIS) method, which falls within the layered family of AIS algorithms. Its main
feature is the exploitation of geometric information about the target by incorporating
an optimization approach. It has been shown that geometric-based optimization mech-
anisms improve the convergence rate in MCMC algorithms (see the review in [28]) and
in AIS methods (e.g., [23]). In the context of MCMC, the methods in [29, 30, 31] are
called Metropolis adjusted Langevin algorithms (MALA). The Langevin-based adapta-
tion included in their MCMC proposal updates reads as a noisy gradient descent (called
drift term) that favors the exploration of the target in areas with larger probability mass,
resulting in a larger acceptance probability in the Metropolis-Hastings (MH) step. Pre-
conditioning can be added for a further improvement of the exploration. To do so, local
curvature information about the target density is used to build a matrix scaling the drift
term. Fisher metric [32], Hessian matrix [33, 34, 35], or a tractable approximation of
it [36, 37, 38, 39] have been considered for that purpose. Within AIS, the algorithms
in [40, 41] adapt the location parameters via several steps of the unadjusted Langevin
algorithm (ULA) [42]. GRAMIS also connects with the Stein variational gradient de-
scent (SVGD) algorithm [43] in the spirit of enhancing the exploratory behavior of
the adaptive algorithm. One difference is that SVGD is an MCMC algorithm, while
GRAMIS belongs to the family of layered AIS algorithms. Thus, compared to SVGD,
3
GRAMIS requires less stringent convergence guarantees in the adaptive process for its
IS estimators to be consistent. Another difference is that the repulsion term in SVGD
builds upon reproducing kernel Hilbert space (RKHS) arguments, while in GRAMIS
we justify the repulsion by connecting it to Poisson fields, which also translates into a
different adaptive scheme.
A limitation that is present in most AIS algorithms is the lack of adaptation of the
scale parameters of the proposals, e.g., the covariance matrices in case of Gaussian
proposals. However, suitable scale parameters are essential for an adequate perfor-
mance of the AIS algorithms, and the alternative to their adaptation is setting them a
priori, which is in general a difficult task. The inefficiency of AIS algorithms that do
not implement adaptation in the scale parameters is particularly damaging in high di-
mensions and where the target presents strong correlations that are unknown a priori.
A covariance adaptation has been explored via robust moment-matching mechanisms
in [44, 45], second-order information in [41, 23], and sample autocorrelation [40]. The
proposed GRAMIS algorithm implements an adaptation of the covariance by using
second-order information (when it yields to a definite positive matrix). In particular,
the covariance adaptation of each proposal is adapted by using the Hessian of the log-
arithm of the target, evaluated at the updated location parameter. The second-order
information is also used to pre-condition the gradient in the adaptation of the location
parameters.
Another limitation in AIS algorithms is the lack of cooperation (or insufficient co-
operation) between the multiple proposals at the adaptation stage. Some of the few
algorithms that implement a type of cooperation can be found in [17] through a prob-
abilistic clustering of all samples, and in [20] through a resampling that use the deter-
ministic mixture weights. In the paper, we implement an explicit repulsion between
the proposals during the adaptation stage in order to improve the exploration of the
algorithm.
Finally, GRAMIS implements the balance-heuristic estimator (also called deter-
ministic mixture) importance weights [46, 47, 48], which have been shown a theoret-
ical superiority (in the unnormalized IS estimators) [5] and a superior performance in
4
other types of AIS algorithms (e.g., in [49, 20, 23]).1
1.2. Notation
We denote by kxk=xxthe Euclidean norm of xRd, where is the transpose
operation and Rdis the d-dimensional Euclidean space. and 2denote the first and
second differential operators in x, i.e., resulting in the gradient vector and the Hessian
matrix, respectively. The operator Σ0 is used to describe Σas a positive definite
matrix. Iddis the identity matrix of dimension d. Bold symbols are used for vectors
and matrices. We use the pnotation for probability density functions (pdf) only when
it is unambiguous.
1.3. Structure of the paper
The rest of the paper is structured as follows. Section 2 introduces the problem and
relevant background. In Section 3, we describe the GRAMIS algorithm. We provide
numerical examples in Section 4. Section 5 closes the paper with some conclusion.
2. Background in importance sampling
We motivate importance sampling (IS) by considering the Bayesian inference setup,
where the goal is characterizing the posterior distribution
e
π
(x|y) = (y|x)
π
0(x)
Z(y),(1)
1The GRAMIS algorithm builds upon the GAPIS algorithm, which was presented in the conference
paper [50]. In GRAMIS, on top of updating the location and scale parameters of the proposal parameters
by incorporating a repulsion term and using the first and second-order information of the target, we go
beyond our preliminary work in the three following ways. First, we pre-condition the gradient steps in the
adaptation of the location parameters by using second-order information. In this way, we have a much more
efficient adaptation, and we reduce a hyper-parameter w.r.t. the GAPIS algorithm. Second, we find a strong
connection with Poisson fields, which supports the theoretical justification of the algorithm and opens the
door for further analysis and methodological improvements. Finally, based on this connection, we crucially
modify the repulsion term between proposals (e.g., exponentiating the distance between proposal to the
dimension of the latent space), which allows the method to scale in dimensions.
5
摘要:

arXiv:2210.10785v3[stat.CO]21Jun2023Gradient-basedAdaptiveImportanceSamplersVíctorElviraa,∗,ÉmilieChouzenouxb,ÖmerDenizAkyildizc,LucaMartinodaSchoolofMathematics,UniversityofEdinburgh,UK.bCVN,INRIASaclay,CentraleSupélec,France.cDept.ofMathematics,ImperialCollegeLondon,UK.dUniversidadReyJuanCarlos,Sp...

展开>> 收起<<
Gradient-based Adaptive Importance Samplers.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:36 页 大小:1.44MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注