ACORRELATED PSEUDO -MARGINAL APPROACH TO DOUBLY INTRACTABLE PROBLEMS Yu YangyMatias QuirozzRobert KohnyScott A. Sissonxy_2

2025-04-30 0 0 865.5KB 34 页 10玖币
侵权投诉
ACORRELATED PSEUDO-MARGINAL APPROACH TO DOUBLY
INTRACTABLE PROBLEMS
Yu Yang∗† Matias QuirozRobert Kohn∗† Scott A. Sisson§†
ABSTRACT
Doubly intractable models are encountered in a number of fields, e.g. social networks, ecology and
epidemiology. Inference for such models requires the evaluation of a likelihood function, whose
normalising function depends on the model parameters and is typically computationally intractable.
The normalising constant of the posterior distribution and the additional normalising function of
the likelihood function result in a so-called doubly intractable posterior, for which it is difficult to
directly apply Markov chain Monte Carlo (MCMC) methods. We propose a signed pseudo-marginal
Metropolis-Hastings (PMMH) algorithm with an unbiased block-Poisson estimator to sample from
the posterior distribution of doubly intractable models. As the estimator can be negative, the algo-
rithm targets the absolute value of the estimated posterior and uses an importance sampling correc-
tion to ensure simulation consistent estimates of the posterior mean of any function. The advantages
of our estimator over previous approaches are that its form is ideal for correlated pseudo-marginal
methods which are well known to dramatically increase sampling efficiency. Moreover, we develop
analytically derived heuristic guidelines for optimally tuning the hyperparameters of the estimator.
We demonstrate the algorithm on the Ising model and a Kent distribution model for spherical data.
Keywords Block-Poisson estimator, Ising model, Spherical data.
1 Introduction
Markov chain Monte Carlo (MCMC) methods (see, e.g., Brooks et al., 2011, for an overview) sample from the poste-
rior distribution without evaluating the normalising constant, also known as the marginal likelihood. However, in some
settings, the likelihood function itself contains an additional normalising constant that depends on the model parame-
ters and the resulting so-called doubly intractable posterior distribution falls outside the standard MCMC framework.
To distinguish these normalisation quantities, we refer to the first as a normalising constant and the latter as a normal-
ising function. Many well-known models have doubly intractable posteriors, such as the exponential random graph
models for social networks (Hunter and Handcock, 2006) and non-Gaussian Markov random field models in spatial
statistics, including the Ising model and its variants (Hughes et al., 2011; Ising, 1925; Lenz, 1920).
Several algorithms are available to tackle the doubly intractable problem in Bayesian statistics; see Park and Haran
(2018) for a review. These algorithms are classified into two main categories, with some overlap between them.
The first category of methods introduce cleverly chosen auxiliary variables that cancel the normalising function when
School of Economics, UNSW Sydney, Australia
UNSW Data Science Hub, UNSW Sydney, Australia
Department of Statistics, Stockholm University, Sweden
§School of Mathematics & Statistics, UNSW Sydney, Australia
arXiv:2210.02734v1 [stat.ME] 6 Oct 2022
Pseudo-marginal inference for doubly intractable problems
carrying out the MCMC sampling and standard MCMC such as the Metropolis-Hastings (MH) algorithm (Hastings,
1970; Metropolis et al., 1953) can thus be applied. This approach is model dependent and cannot always be applied.
The second category of methods, which applies more generally, approximates the likelihood function (including the
normalising function) and substitutes the approximation in place of the exact likelihood in the estimation procedure.
The pseudo-marginal (PM) method (Andrieu and Roberts, 2009; Beaumont, 2003) is often used when a positive
and unbiased estimator of the likelihood is available through Monte Carlo simulation. However, in some problems,
including doubly intractable models, forming an unbiased estimator that is almost surely positive is prohibitively
expensive (Jacob and Thiery, 2015). The so-called Russian roulette (RR) estimator (Lyne et al., 2015) is an example
of a method that can be used to unbiasedly estimate the likelihood function in doubly intractable models, although the
estimate is not necessarily positive.
We propose a method for exact inference on posterior expectations in doubly intractable problems based on the ap-
proach in Lyne et al. (2015), where an unbiased, but not necessarily positive, estimator of the likelihood function is
used. The algorithm targets a posterior density that uses the absolute value of the likelihood, resulting in iterates from
a perturbed target density. We follow Lyne et al. (2015) and reweight the samples from the perturbed target density
using importance sampling to obtain simulation-consistent estimates of the expectation of any function of the param-
eters with respect to the true posterior density. While our method does not sample from the target of interest, we refer
to it as exact due to its simulation-consistent property.
Our main contribution is to explore the use of the block-Poisson (BP) estimator (Quiroz et al., 2021) in the con-
text of estimating doubly intractable models using the signed PMMH approach. Our method provides the following
advantages over the Russian roulette method. First, the BP estimator has a much simpler structure and is more compu-
tationally efficient. Second, the block form of our estimator makes it possible to correlate the estimators of the doubly
intractable posterior at the current and proposed draws in the MH algorithm. Introducing such correlation dramatically
improves the efficiency of PM algorithms (Deligiannidis et al., 2018; Tran et al., 2016). Finally, under simplifying
assumptions, some statistical properties of the logarithm of the absolute value of our estimator are derived and used to
obtain heuristic guidelines to optimally tune the hyperparameters of the estimator. We demonstrate empirically that our
method outperforms Lyne et al. (2015) when estimating the Ising model. To the best of our knowledge, our method,
that of Lyne et al. (2015) and its extensions are the only alternatives in the PM framework to perform exact inference
(in the sense of consistent estimates of posterior expectations) for general doubly intractable problems. Compared
with algorithms which use auxiliary variables to avoid evaluating the normalising function, signed PMMH algorithms
are more widely applicable and generic as they do not require exact sampling from the likelihood.
The rest of the paper is organised as follows. Section 2 formally introduces the doubly intractable problem and
discusses previous research. Section 3 introduces our methodology and establishes the guidelines for tuning the
hyperparameters of the estimator. Section 4 demonstrates the proposed method in two simulation studies: the Ising
model and the Kent distribution. Section 5 analyses four real-world datasets using the Kent distribution. Section 6
concludes and outlines future research. The paper has an online supplement that contains all proofs and details of
the simulation studies. The supplement also contains an additional example applying our method to a constrained
Gaussian process (GP), where the normalising function arises from the GP prior.
2 Doubly intractable problems
2.1 Doubly intractable posterior distributions
Let p(y|θ)denote the density of the data vector y, where θis the vector of model parameters. Suppose p(y|θ) =
f(y|θ)/Z(θ), where f(y|θ)is computable while the normalising function Z(θ)is not. The reason that Z(θ)is
intractable may be that it is prohibitively expensive to evaluate numerically, or lacks a closed form. Two examples are
given below to demonstrate the intractability for both discrete and continuous observations y.
2
Pseudo-marginal inference for doubly intractable problems
Example 1 (The Ising model (Ising, 1925)).Consider an L×Llattice with binary observation yij ∈ {−1,1}in row
iand column j. The likelihood of θRis
p(y|θ) = 1
Z(θ)exp(θS(y)); S(y) =
L
X
i=1
L1
X
j=1
yi,j yi,j+1 +
L1
X
i=1
L
X
j=1
yi,j yi+1,j ;(1)
with Z(θ) = X
y
exp(θS(y)).
The normalising function Z(θ)in the Ising model is a sum over 2L2terms of the form S(y), making it computationally
intractable even for moderate values of L. See Section 4.1 for a further discussion.
Example 2 (The Kent distribution (Kent, 1982)).The density of the Kent distribution for yR3,kyk= 1, is
f(y|γ1,γ2,γ3, β, κ) = 1
c(κ, β)exp κγ1>·y+β(γ2>·y)2(γ3>·y)2;(2)
with c(κ, β) = 2π
X
j=0
Γ(j+ 0.5)
Γ(j+ 1) β2j(0.5κ)2j0.5I2j+0.5(κ),
where Iν(.)is the modified Bessel function of the first kind and γ1,γ2,γ3form a set of 3-dimensional orthonormal
vectors. The normalising function c(κ, β)is an infinite sum. See Section 4.2 for further analysis.
The doubly intractable posterior density of θis
π(θ|y) = f(y|θ)π(θ)
Z(θ)p(y),(3)
where π(θ)is the prior for θand
p(y) = Zf(y|θ)π(θ)
Z(θ)dθ(4)
is the normalising constant. Suppose we devise a Metropolis-Hastings algorithm to sample from (3) with a proposal
density q(·|θ). The probability of accepting a proposed sample θ0is
α(θ0,θ) = min 1,π(θ0)f(y|θ0)/Z(θ0)
π(θ)f(y|θ))/Z(θ)×q(θ|θ0)
q(θ0|θ).(5)
The marginal likelihood in (4) cancels in (5), but the normalising function does not. Since Z(θ)/Z(θ0)is compu-
tationally intractable, (5) cannot be evaluated and thus MCMC sampling via the Metropolis-Hastings algorithm is
impossible.
2.2 Previous research
Previous research on doubly intractable problems is mainly divided into the auxiliary variable approach and the like-
lihood approximation approach; see Park and Haran (2018) for an excellent review of both approaches.
The auxiliary variable approach cleverly chooses the joint transition kernel of the parameters and the auxiliary variables
so that the normalising function cancels in the resulting MH acceptance ratio. The most well-known algorithms are the
exchange algorithm (Murray et al., 2006) and the auxiliary variable method (Møller et al., 2006). Both algorithms are
model dependent and rely on the sampling technique used to draw observations from the likelihood function. Perfect
sampling (Propp and Wilson, 1996) is often used to generate data samples from the model without knowing the
normalising function. However, for some complex models, such as the Ising model on a large grid, perfect sampling
is prohibitively expensive. To overcome this issue, Liang (2010) and Liang et al. (2016) relax the requirement of exact
sampling and propose the double MH sampler and the adaptive exchange algorithm. However, the former generates
3
Pseudo-marginal inference for doubly intractable problems
inexact inference results and the latter suffers from memory issues as many intermediate variables need to be stored
within each iteration.
The likelihood approximation approach are often simulation consistent. Atchadé et al. (2013) directly approximate
Z(θ)through multiple importance sampling. Their approach also depends on an auxiliary variable, but does not
require perfect sampling. The downside is similar to that of the adaptive exchange algorithm; a large memory is
usually required to store the intermediate variables generated in each iteration. An alternative method is to approximate
1/Z(θ)directly using the signed PMMH algorithm to replace the likelihood function by its unbiased estimator as
proposed in Lyne et al. (2015). To obtain the unbiased estimator, 1/Z(θ)is expressed as a geometric series which
is truncated using an RR approach. The RR method first appeared in the physics literature (Carter and Cashwell,
1975) and is useful for obtaining an unbiased estimator through a finite stochastic truncation of an infinite series. To
implement RR, a tight upper bound for Z(θ)is required, otherwise the convergence of the geometric series is slow
and makes the algorithm inefficient. In practice, an upper bound is usually unavailable, which may lead to negative
estimates of the likelihood and thus a signed PMMH approach is necessary, although it inflates the asymptotic variance
of the MCMC chain (Andrieu and Vihola, 2016) compared to a standard PM approach, especially if the estimator
produces a significant proportion of negative estimates (Lyne et al., 2015). It is therefore crucial to quantify the
probability of a negative estimate when tuning the hyperparameters of the estimator, which is difficult for the RR
estimator. In contrast, our estimator is more tractable and the probability of a positive estimate is analytically derived
under simplifying assumptions. Besides the upper bound, a few other hyperparameters of the RR estimator need to
be determined for which guidelines are unavailable due to its intractability. Wei and Murray (2017) combine RR with
Markov chain coupling to produce an estimator with lower variance and a larger probability of producing positive
estimates. However, their estimator is insufficiently tractable to guarantee a positive estimate with a smaller variance,
making it difficult to derive optimal tuning guidelines. Cai and Adams (2022) propose a multi-fidelity MCMC method
to approximate the doubly intractable target density which, like the Russian roulette method, stochastically truncates
an infinite series and uses slice sampling (Murray and Graham, 2016). However, similarly to Lyne et al. (2015), the
method lacks guidelines for tuning the hyperparameters.
3 Methodology
3.1 The block-Poisson estimator
The block-Poisson estimator (Quiroz et al., 2021) is useful for estimating the likelihood unbiasedly given an unbiased
estimator of the log-likelihood obtained by data subsampling. The block-Poisson estimator consists of blocks of
Poisson estimators (Papaspiliopoulos, 2011). The Poisson estimator, like the block-Poisson estimator, is useful for
estimating exp(B)unbiasedly, assuming that there exists an unbiased estimator b
Bof B, i.e. E(b
B) = B. The idea
behind using blocks of Poisson estimators is to allow for correlation between successive iterates in the PM algorithm
as described in Section 3.2. Similarly to the likelihood approximation approaches discussed above, the BP estimator
is implemented in combination with an auxiliary variable ν, and an estimator of the normalising function. Omitting
details of the auxiliary variable method that are explained in Section 3.2, assume B(θ) = νZ(θ). Given νand an
unbiased estimator of Z(θ), the BP estimator produces an unbiased estimator of exp(νZ(θ)). The BP estimator
requires a lower bound for B(θ)to guarantee its positiveness. The BP estimator is more likely to be positive than the
RR estimator, because we can derive the probability of the estimator being positive and tune the hyperparameters to
control this probability.
Definition 1 describes the BP estimator b
LB. Lemma 1 gives the expectation and variance of b
LB. Lemmas 2 and 3
establish useful results for tuning the hyperparameters of the estimator (see Section 3.3). Section A in the supplement
contains the proofs.
4
Pseudo-marginal inference for doubly intractable problems
Definition 1. The block-Poisson estimator (Quiroz et al., 2021) is defined as
b
LB(θ) :=
λ
Y
l=1
exp(ξl(θ)),exp(ξl(θ)) = exp(a/λ +m)
χl
Y
h=1 b
B(h,l)(θ)a
,(6)
where λis the number of blocks, χlPois(m), a Poisson distribution with mean m,ais an arbitrary constant and
mis the expected number of estimators used within each block.
Lemma 1. Denote σ2
B:= Var( b
B(θ)), and assume σ2
B<and E(b
B(θ)) = B(θ). The following properties hold
for b
LB(θ)in (6):
(i) E(b
LB(θ)) = exp(B(θ)).
(ii) Var(b
LB(θ)) = exp (B(θ)a)2+σ2
B
+ 2a+exp(2B(θ)).
(iii) Var(b
LB(θ)) is minimised at a=B(θ), given fixed mand λ.
Part (i) of Lemma 1 shows that given an unbiased estimator b
B(θ)of B(θ), the BP estimator is unbiased for exp(B(θ)).
Part (iii) of Lemma 1 suggests that we can choose the lower bound a=b
B(θ), as B(θ)is unknown. Similarly
to the RR estimator, the BP estimator is not necessarily positive. By choosing a relatively large , the sufficient
condition for b
LB(θ)0, i.e. b
B(θ)a > 0, is likely to be satisfied; however, it is computationally costly as a large
value implies many products in the BP estimator. We follow Quiroz et al. (2021) and advocate the use of a soft
lower bound, i.e., one that may lead to negative estimates, but still gives a Pr(b
LB(θ)0) close to one. Lemma 2
shows that the probability Pr(b
LB(θ)0) is analytically tractable. It is crucial to have this probability close to one
for the algorithm to be efficient.
Lemma 2.
Pr(b
LB(θ)0) = 1
21 + (1 2Ψ(a, m, M))λ,
with Ψ(a, m, M) = P r(ξ < 0) = 1
2P
j=1 1(1 2 Pr(Am0))jPr(χl=j),χlPois(m)and Am=
[b
B(θ)B(θ)]/()+1.
Lemma 3 derives the variance of the logarithm of the absolute value of the block-Poisson estimator by assuming a
normal distribution for b
B(h,l)(θ). Section 3.3 tunes the hyperparameters using this result.
Lemma 3. If b
B(h,l)(θ)iid
N(B(θ), σ2
B)for all hand l, when a=B(θ), then the variance of log |b
LB|is
σ2
log|
b
LB|=(ν2
B+η2
B),
where
ηB= log(σB/()) + 0.5log 2 + EJ(ψ(0)(0.5 + J))
and
ν2
B= 0.25 EJ(ψ(1)(0.5 + J)) + VarJ(ψ(0)(0.5 + J)),
with JPois(()2/(2σ2
B)) and ψ(q)is the polygamma function of order q.
3.2 Signed block PMMH with the BP estimator
Lyne et al. (2015) use an auxiliary variable νto cancel the reciprocal of the normalising function in (3) and end up with
exp(νZ(θ)) instead. Specifically, assume that νExpon(Z(θ)). The joint density of θand the auxiliary variable ν
5
摘要:

ACORRELATEDPSEUDO-MARGINALAPPROACHTODOUBLYINTRACTABLEPROBLEMSYuYangyMatiasQuirozzRobertKohnyScottA.SissonxyABSTRACTDoublyintractablemodelsareencounteredinanumberofelds,e.g.socialnetworks,ecologyandepidemiology.Inferenceforsuchmodelsrequirestheevaluationofalikelihoodfunction,whosenormalisingfuncti...

展开>> 收起<<
ACORRELATED PSEUDO -MARGINAL APPROACH TO DOUBLY INTRACTABLE PROBLEMS Yu YangyMatias QuirozzRobert KohnyScott A. Sissonxy_2.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:865.5KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注