Multi-fidelity Monte Carlo a pseudo-marginal approach

2025-05-02 0 0 2.83MB 22 页 10玖币
侵权投诉
MULTI-FIDELITY MONTE CARLO:
A PSEUDO-MARGINAL APPROACH
By Diana Caiand Ryan P. Adams
Princeton University
Markov chain Monte Carlo (MCMC) is an established approach
for uncertainty quantification and propagation in scientific applica-
tions. A key challenge in applying MCMC to scientific domains is
computation: the target density of interest is often a function of ex-
pensive computations, such as a high-fidelity physical simulation, an
intractable integral, or a slowly-converging iterative algorithm. Thus,
using an MCMC algorithms with an expensive target density becomes
impractical, as these expensive computations need to be evaluated
at each iteration of the algorithm. In practice, these computations
often approximated via a cheaper, low-fidelity computation, leading to
bias in the resulting target density. Multi-fidelity MCMC algorithms
combine models of varying fidelities in order to obtain an approximate
target density with lower computational cost. In this paper, we de-
scribe a class of asymptotically exact multi-fidelity MCMC algorithms
for the setting where a sequence of models of increasing fidelity can
be computed that approximates the expensive target density of inter-
est. We take a pseudo-marginal MCMC approach for multi-fidelity
inference that utilizes a cheaper, randomized-fidelity unbiased esti-
mator of the target fidelity constructed via random truncation of a
telescoping series of the low-fidelity sequence of models. Finally, we
discuss and evaluate the proposed multi-fidelity MCMC approach on
several applications, including log-Gaussian Cox process modeling,
Bayesian ODE system identification, PDE-constrained optimization,
and Gaussian process regression parameter inference.
1. Introduction.
Simulation and computational modeling play a key role in science, engi-
neering, economics, and many other areas. When these models are high-quality and accurate,
they are important for scientific discovery, design, and data-driven decision making. However,
the ability to accurately model complex physical phenomena often comes with a significant
cost—many models involve expensive computations that then need to be evaluated repeatedly in,
for instance, a sampling or optimization algorithm. Examples of model classes with expensive
computations include intractable integrals or sums, expensive quantum simulations (Troyer and
Wiese,2005), expensive numerical simulations arising from partial differential equations (PDEs)
(Raissi et al.,2017) and large systems of ordinary equations (ODEs).
In many situations, one has the ability to trade off computational cost against fidelity or
accuracy in the result. Such a tradeoff might arise from the choice of discretization or the number
of basis functions when solving a PDE, or the number of quadrature points when estimating
an integral. It is often possible to leverage lower-fidelity models to help accelerate high-quality
solutions, e.g., by using multigrid methods (Hackbusch,2013) for spatial discretizations. More
generally, multi-fidelity methods combine multiple models of varying cost and fidelity to accelerate
computational algorithms and have been applied to solving inverse problems (Higdon et al.,2002;
Cui et al.,2015;Raissi et al.,2017), trust region optimization (Alexandrov et al.,1998;Arian
Keywords and phrases: Markov Chain Monte Carlo, multi-fidelity models, inverse modeling, simulation
1
arXiv:2210.01534v1 [stat.ML] 4 Oct 2022
2D. CAI AND R.P. ADAMS
(a) Trapezoid rule with 2ktrapezoids (k= 1,2,3) (b) Lokta-Volterra ODE solutions, dt = 1/k
Fig 1: Examples of low-fidelity sequences of models.
(a)
Sequence trapezoid quadrature estimates
Ik
,
where
Ik
is the trapzoid rule with 2
k
trapezoids.
(b)
Lokta-Volterra ODE solutions for prey
u
(
t
)(blue)
and predator v(t)(red) using Euler’s method with step size dt.
et al.,2000;Fahl and Sachs,2003;Robinson et al.,2008;March and Willcox,2012), Bayesian
optimization (Jones et al.,1998;Gramacy and Lee,2009;Song et al.,2019;Wu et al.,2020;Li
et al.,2020;Brevault et al.,2020), Bayesian quadrature (Gessner et al.,2020;Xi et al.,2018),
and sequential learning (Gundersen et al.,2021;Palizhati et al.,2022).
One critically important tool for scientific and engineering computation is Markov chain Monte
Carlo (MCMC), which is widely used for uncertainty quantification, optimization, and integration.
MCMC methods are recipes for constructing a Markov chain with some desired target distribution
as the limiting distribution. Pseudo-random numbers are used to simulate transitions of the
Markov chain in order to produce samples from the target distribution. However, MCMC often
becomes impractical for high-fidelity models, where a single step of the Markov chain may, for
instance, involve a numerical simulation that takes hours or days to complete. Multi-fidelity
methods for MCMC focus on constructing Markov chain transition operators that are sometimes
able to use inexpensive low-fidelity evaluations instead of expensive high-fidelity evaluations.
The goal is to increase the effective number of samples generated by the algorithm, given a
constrained computational budget. A large focus of the multi-fidelity MCMC literature is on
two-stage Metropolis-Hastings (M-H) methods (Christen and Fox,2005;Efendiev et al.,2006),
which use a single low-fidelity model for early rejection of a proposed sample, thereby often
short-circuiting the evaluation of the expensive, high-fidelity model.
However, there are several limitations of two-stage multi-fidelity Monte Carlo. First, in many
applications, a hierarchy of cheaper, low-fidelity models is available; for instance, in the case of
integration,
k
-point quadrature estimates form a hierarchy of low-fidelity models, and in the case
of a PDE, varying the discretization. Thus, the two-stage approach does not fully utilize the
availability of a hierarchy of fidelities and may be more suitable for settings where the high- and
low-fidelity models are not hierarchically related, e.g., semi-empirical methods vs. Hartree-Fock
in computational chemistry. In addition, in such applications, there is often a limiting model of
interest, such as a continuous function that the low-fidelity discretizations approximate. Two-stage
MCMC does not asymptotically sample from this limiting target density and will at best sample
from an approximation of the biased, high-fidelity posterior. Finally, the two-stage method is
unnatural to generalize to more sophisticated MCMC algorithms such as slice sampling and
Hamiltonian Monte Carlo (HMC).
We propose a class of multi-fidelity MCMC methods designed for applications with a hierarchy
MULTI-FIDELITY MONTE CARLO: A PSEUDO-MARGINAL APPROACH 3
of low-fidelity models available. More specifically, we assume access to a sequence of low-fidelity
models that converge to a “perfect-fidelity” model in the limit. Within an MCMC algorithm, we
can approximate the perfect-fidelity target density with an unbiased estimator constructed from a
randomized truncation of the infinite telescoping series of low-fidelity target densities. This class
of multi-fidelity MCMC is an example of a pseudo-marginal MCMC (PM-MCMC) algorithm—the
unbiased estimator essentially guarantees that the algorithm is asymptotically exact in that the
limiting distribution recovers the perfect-fidelity target distribution as its marginal distribution.
Our approach introduces the fidelity of a model as an auxiliary random variable that is evolved
separately from the target variable according to its own conditional target distribution; this
technique can be used in conjunction with any suitable MCMC update that leaves the conditional
update for the target variable of interest invariant, such as M-H, slice sampling, elliptical slice
sampling, or Hamiltonian Monte Carlo. We apply the pseudo-marginal multi-fidelity MCMC
approach to several problems, including log-Gaussian Cox process modeling, Bayesian ODE
system identification, PDE-constrained optimization, and Gaussian process parameter inference.
1.1. Related work. Multi-fidelity MCMC methods are commonly applied in a two-stage proce-
dure, where the goal is to reduce the computational cost of using a single expensive high-fidelity
model by using a cheap low-fidelity model as a low-pass filter for a delayed acceptance/rejection
algorithm (Christen and Fox,2005;Efendiev et al.,2006;Cui et al.,2015); see Peherstorfer
et al. (2018) for a survey. Higdon et al. (2002) propose coupling a high-fidelity Markov chain
with a low-fidelity Markov chain via a product chain. In constrast, our approach aims to sample
from a “perfect-fidelity” target density while reducing computational cost; two-stage MCMC
algorithms result in biased estimates with respect to this target density. A related class of methods
is multilevel Monte Carlo (Giles,2008,2013;Dodwell et al.,2015;Warne et al.,2021), which uses
a hierarchy of multi-fidelity models for Monte Carlo estimation by expressing the expectation of a
high-fidelity model as a telescoping sum of low-fidelity models. Dodwell et al. (2015) use the M-H
algorithm to form the multilevel Monte Carlo estimates, simulating from a separate Markov chain
for each level of the telescoping sum. In practice multilevel Monte carlo requires choosing a finite
number of fidelities, inducing bias in the estimator with respect to the (limiting) perfect-fidelity
model. In contrast, our method uses a randomized fidelity within a single Markov chain with the
perfect-fidelity model as the target.
Our approach applies pseudo-marginal MCMC to multi-fidelity problems. There is a rich
literature developing pseudo-marginal MCMC methods (Beaumont,2003;Andrieu and Roberts,
2009) for so-called “doubly-intractable” likelihoods, which are likelihoods that are intractable to
evaluate. Several approaches in the pseudo-marginal MCMC literature are particular relevant to
our work. The first are the PM-MCMC methods introduced by Lyne et al. (2015), which describes
a class of pseudo-marginal M-H methods that use Russian roulette estimators to obtain unbiased
estimators of the likelihood. However, this method samples the variable of interest jointly with
the auxillary randomness, which often leads to sticking.
Alternatively, several methods have considered sampling the randomness separately. The idea of
clamping random numbers is explored in depth by Andrieu et al. (2010) and Murray and Graham
(2016); the latter applies to this pseudo-marginal slice sampling. In particular, our approach applies
these ideas to the specific setting of multi-fidelity models, where the random fidelity is treated as
an auxillary variable. Finally, while our approach applies to doubly-intractable problems, we are
also motivated by a larger class of multi-fidelity problems studied in the computational sciences
that may not even be inference problems, such as quantum simulations and PDE-constrained
optimization.
4D. CAI AND R.P. ADAMS
2. Multi-fidelity MCMC.
Monte Carlo methods approximate integrals and sums that can
be expressed an expectation:
Eπ(h(θ)) = Zh(θ)π(θ)1
T
T
X
t=1
h(θ(t)),where θ(t)π, (1)
and where
π:
Θ
R+
is the target density that may only be known up to a constant,
h
(
θ
)is a
function of interest, and
{θ(t)}T
t=1
are samples from
π
. Markov chain Monte Carlo methods are
then used to generate samples θ(t)from πby simulating from a Markov chain with target π.
In many settings, pointwise evaluations of the target function
π
(
θ
)are expensive or even
intractable; from here on we will assume that the goal is to compute statistics of a quantity of
interest
h
(
θ
)with respect to a perfect-fidelity target density
π
(
θ
). In practice, the estimate in
Equation (1) is instead estimated using a cheaper, low-fidelity density
πk
(
θ
), where
kN:
=
{
1
,
2
, . . .}
. In particular, we consider settings where there is a sequence of low-fidelity densities
available that converge to the target, i.e.,
πk
(
θ
)
k→∞
π
(
θ
). We assume that as
k
increases, the
model becomes higher in fidelity (with respect to
π
) but more costly to evaluate, increasing in
expense super-linearly with k.
For instance,
π
could represent a target density that depends on an intractable integral, the
solution of a PDE, the solution of a large system of ODEs, the solution of a large system of
linear equations, or the minimizer of a function. Thus, a typical evaluation of
π
requires an
approximation at a fidelity
k
with a tolerable level of bias for a given computational budget.
Here increasing
k
could correspond to finer discretizations of differential equations, increasing
numbers of quadrature points, or performing a larger number of iterations in a linear solver or
optimization routine.
In the multi-fidelity setting, the goal is to combine several models of varying fidelity within an
MCMC algorithm to reduce the computational cost of estimating Equation (1). In this paper,
we describe a class of MCMC algorithms that leverages the sequence of low-fidelity models
πk
.
Our strategy for multi-fidelity MCMC (MF-MCMC) will be to construct an unbiased estimator
of
π
(
θ
)using random choices of the fidelity
K
and then to include
K
in the Markov chain
as an auxiliary variable. By carefully constructing such a Markov chain, it will be possible to
asymptotically estimate the functional in Equation (1) as though the samples were taken from
the perfect-fidelity model; each step of the Markov chain will nevertheless only require a finite
amount of computation. Finally, our approach allows us to essentially plug in any valid MCMC
algorithm, and we apply this strategy to develop multi-fidelity variants of a number of MCMC
algorithms, such as M-H and slice sampling.
2.1. Pseudo-marginal MCMC for the multi-fidelity setting. Pseudo-marginal MCMC (Beau-
mont,2003;Andrieu and Roberts,2009) is a class of auxillary-variable MCMC algorithms that
replaces the target density
π
(
θ
)with an estimator
ˆπ
(
θ
)that is a function of a random variable.
If the estimator is nonnegative and unbiased, i.e., for all
θ
Θ,
ˆπ
(
θ
)
0and
E
[
ˆπ
(
θ
)] =
π
(
θ
),
then MCMC transitions that use the estimator still have
π
(
θ
)as their invariant distribution.
This property is sometimes referred to as “exact-approximate” MCMC as the transitions are
approximate but the limiting distribution is exact. Estimators can be constructed from a variety
of methods, including particle filtering (Andrieu and Roberts,2009); our approach will use
randomized series truncations, which has been consider in pseudo-marginal MCMC methods such
as Lyne et al. (2015), Georgoulas et al. (2017) and Biron-Lattes et al. (2022).
MULTI-FIDELITY MONTE CARLO: A PSEUDO-MARGINAL APPROACH 5
We now apply the pseudo-marginal approach to the multi-fidelity setting. Here the target
density estimator arises from a random choice of the fidelity
KN
that is governed by a
distribution
µ
on
N
. We denote the estimator using
ˆπK
(
θ
)to make the dependence on the random
fidelity Kexplicit. The estimator is constructed such that it is unbiased with respect to µ, i.e.,
X
k=1
µ(k)ˆπk(θ) = π(θ).(2)
The distribution
µ
is also constructed by the user: ideally, the estimator
ˆπK
(
θ
)will prefer
smaller values of
K
while having sufficiently low variance as to allow the Markov chain to mix
effectively. Thus the simulations can be run at inexpensive low-fidelities, while the estimates will
be as though the perfect-fidelity model were being used.
The standard pseudo-marginal MCMC approach is to construct a Markov chain that has the
following joint density as its stationary distribution:
π(θ, K) = µ(K)ˆπK(θ).(3)
Observe that while Equation (3) does not depend on the perfect-fidelity target density
π
, it
returns the desired marginal
π
via Equation (2). As a concrete example, a pseudo-marginal
M-H algorithm generates a new state
θ0
and fidelity
K0
jointly using
q
(
θ0
;
θ
)as the proposal
for
θ0
,
q
(
K0
;
K
) =
µ
(
K0
)as the proposal distribution for the fidelity, and accepts/rejects the state
according to
a=π(θ0, K0)q(θ;θ0)q(K;K0)
π(θ, K)q(θ0;θ)q(K0;K)=ˆπK0(θ0)q(θ;θ0)
ˆπK(θ)q(θ0;θ),(4)
where the equality holds since the distribution terms for
K
and
K0
cancel. Note that the right-hand
side of Equation (4) is the standard M-H ratio but that the target density
π
is replaced with the
estimator ˆπK.
However, standard pseudo-marginal MCMC using joint proposals of the state and fidelity
can “get stuck” when the estimator is noisy and fail to accept new states. Thus, we apply
the approach in Murray and Graham (2016) that augments the Markov chain to include the
randomness of the estimator via a separate update; here the randomness of the estimator arises
from the fidelity
K
. Concretely, we construct a Markov chain that simulates from Equation (3)
by alternating sampling between the conditional target densities
π
(
K|θ
)and
π
(
θ|K
)(steps 5 and
6 of Algorithm 1, respectively). We refer to this strategy as multi-fidelity MCMC (MF-MCMC),
since by conditioning on
K
=
k
, the update for the state
θ
becomes a standard deterministic
update applied to a low-fidelity model
ˆπk
(
θ
), and any appropriate MCMC update can be used
here, making it straightforward to use complex MCMC methods, such as slice sampling and
HMC. Similarly, any suitable MCMC update for the fidelity
K
can be used using the conditional
target π(K|θ).
Many techniques can be used to construct an unbiased estimator of
π
with randomness
K
;
we describe a general approach in the next section. However, it is generally difficult to guarantee
the estimator is nonnegative, as required by pseudo-marginal MCMC. One technique considered
by Lin et al. (2000) and Lyne et al. (2015) is to instead sample from the target distribution
induced by the absolute value of the estimator and applying a sign-correction to the final Monte
Carlo estimate in Equation (1), an approach borrowed from the quantum Monte Carlo literature
where it is necessary for modeling fermionic particles. This approach has been applied to the M-H
摘要:

MULTI-FIDELITYMONTECARLO:APSEUDO-MARGINALAPPROACHByDianaCai*andRyanP.Adams*PrincetonUniversity*MarkovchainMonteCarlo(MCMC)isanestablishedapproachforuncertaintyquanticationandpropagationinscienticapplica-tions.AkeychallengeinapplyingMCMCtoscienticdomainsiscomputation:thetargetdensityofinterestisof...

展开>> 收起<<
Multi-fidelity Monte Carlo a pseudo-marginal approach.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:2.83MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注