Approximate Methods for Bayesian Computation Radu V . Craiu1and Evgeny Levi2

2025-04-27 0 0 444.27KB 20 页 10玖币
侵权投诉
Approximate Methods for
Bayesian Computation
Radu V. Craiu,1and Evgeny Levi,2
1Department of Statistical Sciences, University of Toronto, Toronto, Canada, M5G 1Z5; email:
radu.craiu@utoronto.ca
2Department of Statistical Sciences, University of Toronto, Toronto, Canada, M5G 1Z5
xxxxxx 0000. 00:1–20
Copyright © 0000 by Annual Reviews.
All rights reserved
Keywords
ABC, Bayesian synthetic likelihood, coresets, divide and conquer, Markov
chain Monte Carlo, subsampling
Abstract
Rich data generating mechanisms are ubiquitous in this age of information
and require complex statistical models to draw meaningful inference. While
Bayesian analysis has seen enormous development in the last 30 years, ben-
efitting from the impetus given by the successful application of Markov chain
Monte Carlo (MCMC) sampling, the combination of big data and complex mod-
els conspire to produce significant challenges for the traditional MCMC algo-
rithms. We review modern algorithmic developments addressing the latter and
compare their performance using numerical experiments.
1
arXiv:2210.03243v2 [stat.CO] 19 Oct 2022
Contents
1. Introduction .................................................................................................... 2
2. Modern challenges for Bayesian computation: Massive Data.. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . . . . . 4
2.1. Divide and Conquer...................................................................................... 4
2.2. Subsampling.............................................................................................. 5
3. Modern Challenges for Bayesian Computation: Intractable Likelihoods.... . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . 7
3.1. Approximate Bayesian Computation (ABC) ... . .. . ... ... . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . . 8
3.2. Bayesian Synthetic likelihood ........................................................................... 8
4. Double Jeopardy .............................................................................................. 9
5. Numerical experiments ....................................................................................... 10
5.1. Description of the simulation settings .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . . . . . .. . .. . .. . .. . .. . .. . .. . 10
5.2. Logistic Regression ...................................................................................... 12
5.3. Stochastic Volatility....................................................................................... 15
6. Conclusion and future directions ............................................................................. 16
1. Introduction
The data science revolution has led to multiple pressure points in statistics. A statistical sample from
a large population exhibits, in the 21st century, very different characteristics than what one would
have seen merely a few years ago. The ubiquitous and almost continuous recording of many of our
activities has made it relatively easy to collect enormous amounts of information that require analysis
and interpretation. This drastic increase in data volume imposes sober re-evaluations of most classical
approaches to statistical inference.
The computational side of a Bayesian statistician’s toolbox is perhaps most challenged by these
developments. The impetus of Bayesian statistics that has been felt since the early 1990s has been
given by the spectacular advances in computation, especially those around Markov chain Monte Carlo
(MCMC) sampling. Thanks to methods in this class of algorithms, the statisticians have been liberated
to think freely about the Bayesian model components used for a given problem, without worrying
about the mathematical intractability of the analysis.
Indeed, given a data set y, most of the pairings of a sampling density, f(y|θ), and a prior, p(θ),
result in a posterior distribution
π(θ|y) = p(θ)f(y|θ)
Rp(θ)f(y|θ)(1)
that cannot be analyzed directly, usually because the denominator in (1) cannot be computed analyti-
cally. The latter fact impedes the calculation of quantities of interest related to π, most of which can
be expressed as
I=Zh(θ)π(θ|y), (2)
for some function hthat is determined by the question of interest. For instance, if θis univariate and
we let h(θ) = θrin (2), then Iis equal the r-th moment of π, or h(θ) = 1(−∞,t](θ)leads to the
cumulative distribution function (cdf) of πat a point t.
The classical Monte Carlo method, devised by Von Neumann & Ulam (1951) at the middle of the
twentieth century, relies on sampling independently {θ1,...,θm}from distribution πand approxi-
mating Iwith
ˆ
I=1
m
m
X
k=1
h(θk).(3)
2 Craiu and Levi
However, the unknown constant in (1) creates a knowledge gap that an MCMC algorithm closes by
constructing and running a Harris-recurrent, π-irreducible, aperiodic Markov chain whose stationary
distribution is exactly the posterior distribution π(θ|y). The values taken by the chain make up the
samples θ1,...,θm. A couple of issues emerge immediately. First, because πis the chain’s stationary
distribution, the samples will be approximatively distributed with πonly after the chain has entered its
stationary regime. Second, due to the Markov property, the samples are typically positively correlated
(although see Frigessi et al. 2000, Craiu & Meng 2005, for exceptions) which reduces the amount of
information they contain about π. To see that, let us imagine the extreme case in which the msamples
are perfectly correlated, in which case they would provide very little information about π.
The success MCMC sampling had in boosting the use of Bayesian models is largely due to the
ease of implementation of some of its most popular algorithms. For instance, the Metropolis-Hastings
algorithm (henceforth, MH) (Metropolis et al. 1953, Hastings 1970) can be implemented using the
following recursive procedure.
Step 0 Initialize the Markov chain at θ0and choose a proposal density q(·|ζ)which may or may not
depend on ζ.
Step t At the t-th step (1tm1) do:
PR Draw proposal ωtfrom the density q(·|θt);
AR Set
θt+1 =(ωtwith probability αt
θtwith probability 1αt
where
αt= min 1,π(ωt|y)q(θt|ωt)
π(θt|y)q(ωt|θt).(4)
Because of the form of the acceptance probability (4), its calculation is not prevented by the
unknown denominator in (1). Nevertheless, computation of (4) hinges on the ability to calculate the
sampling density f(y|θ)for any parameter value θand to be able to do it mtimes. The challenges
posed to Bayesian computation by the modern data and modelling environment have their roots in this
implicit assumption.
In very broad strokes, one can speak of two main challenges in modern Bayesian computation.
The first one concerns the computational price of calculating a likelihood when the data is massive,
say of order N(think of Nas being of the order of hundred of millions or even billions). Even
in the tame case of iid observations, in order to know (4) we will have to compute a likelihood (or
sampling density) that involves Nterms and this will have to be repeated each time a new MCMC
sample is produced. The cumulative cost is unsustainable as a single MCMC iteration can take days.
A second challenge emerges when the model’s complexity keeps up with the data volume and yields
an intractable likelihood so that (4) simply cannot be computed analytically. Finally, a meta-challenge
appears in Bayesian analyses that merge massive data with intractable models.
This paper discusses MCMC-adjacent methodology that is used to alleviate the pressure posed on
Bayesian computation by the above challenges. Space constraints impedes the presentation of details
and variants, but in all cases we present the main ideas and refer the interested reader to the relevant
literature. In order to gauge their computational efficiency, we run numerical experiments where, using
publicly available software packages, the algorithms are implemented on two statistical models.
In the next section we describe in more detail the challenges we just described and Section 3
summarizes some of the solutions proposed to address them. Numerical experiments meant to illus-
trate and compare different algorithms are reported in Section 4. The paper ends with comments and
discussion of future directions for research.
www.annualreviews.org Approximate Bayesian Computation 3
2. Modern challenges for Bayesian computation: Massive Data
Consider data ycollected on Nindependent items so that y={y1,...,yN}∈XNand denote
by f(y|θ)the sampling distribution which depends on parameter θΘRd. At each iteration
of the MH sampler, one needs to compute f(y|ωt) = QN
k=1 f(yk|ωt)where ωtis the proposal in
(4). Modern applications often rely on data that are large enough so that the repeated calculation
of f(y|ωt)is impractical, even impossible. It is also not unusual for data size to be so large as to
prohibit storage on a single machine, so that computation of the likelihood involves also repeated
communication between multiple machines, thus adding significantly to the computational burden.
Prompted by the obstacle of large data, computational Bayesians have designed a number of ap-
proaches to alleviate the problem. Two general ideas are currently standing out in terms of popularity
and usage: divide-and-conquer (DAC) strategies and subsampling with minimum loss of information.
2.1. Divide and Conquer
The DAC approach is based on partitioning the sample into a number of sub-samples, called batches
that are analyzed separately on a number of workers (CPUs, GPUs, servers, etc). After the batch-
specific estimates about the parameter of interest are obtained, the results are combined so that the
analyst recovers a large part of, ideally all, the information that would have been available if the whole
sample were analyzed in the usual way, on a single machine. While this idea seems applicable in a wide
range of scenarios, there are a couple of constraints that restrict its generality. First, the procedure is
computationally effective if it is designed to minimize, preferably eliminate, communication between
the workers before combining the batch-specific results. Second, it is often difficult to produce an
accurate assessment of the resulting loss of information at the combining stage. Some of the first
proponents of DAC for MCMC sampling are Neiswanger et al. (2013), Scott et al. (2016), and Wang
& Dunson (2013). In their approach, the subposterior distribution corresponding to the jth batch, is
defined as
π(j)(θ|y(j))f(y(j)|θ)[p(θ)]1/J (5)
where f, p are as in (1), y(j)is the data that was assigned to batch j,1jJ, and Jis the
total number of batches. With this choice, one immediately gets that QJ
j=1 π(j)π(θ|y). Both
Neiswanger et al. (2013) and Scott et al. (2016) consider ways to combine samples from the subpos-
teriors π(j)(θ),1jJ, in situations in which all posteriors, batch-specific and full data ones, are
Gaussian or can be approximated by mixtures of Gaussians. in this case, one can demonstrate that a
weighted average of samples from all the π(j)s have density π. The use of the Weierstrass transform
for each posterior density, proposed in Wang & Dunson (2013), extends the range of theoretical valid-
ity beyond Gaussian distributions. The authors also establish error bounds between the approximation
and the true posterior. Nemeth & Sherlock (2018) use a Gaussian process (GP) approximation of each
subposterior. Once again, the Gaussian nature of the approximation makes recombination possible
and relatively straightforward. Limitations of the method are strongly linked with those of GP-based
estimation. For instance, when the sub-posterior samplers are sluggish, large MCMC samples might
be needed which, in turn, make the calculation of the GP-based approximation very expensive. The
idea of using the values of the sub-posterior at each MCMC sample is adopted also by Changye &
Robert (2019) who propose to define the subposteriors using π(j)∝ {[p(θ)]1/J f(y(j)|θ)}λj. The
scale factor λjis used to control the uncertainty in the subposterior. Alternative ways to define the
sub-posteriors are produced by Entezari et al. (2018) who use π(j)p(θ)[f(y(j)|θ)]J. The intuitive
idea is to ”match” the size of the original sample and the batch-specific one. Their approach has been
applied successfully to BART (Chipman et al. 2010, Pratola 2016) models.
4 Craiu and Levi
摘要:

ApproximateMethodsforBayesianComputationRaduV.Craiu,1andEvgenyLevi,21DepartmentofStatisticalSciences,UniversityofToronto,Toronto,Canada,M5G1Z5;email:radu.craiu@utoronto.ca2DepartmentofStatisticalSciences,UniversityofToronto,Toronto,Canada,M5G1Z5xxxxxx0000.00:1–20Copyright©0000byAnnualReviews.Allrigh...

展开>> 收起<<
Approximate Methods for Bayesian Computation Radu V . Craiu1and Evgeny Levi2.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:20 页 大小:444.27KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注