ACORRELATED PSEUDO -MARGINAL APPROACH TO DOUBLY INTRACTABLE PROBLEMS Yu YangyMatias QuirozzRobert KohnyScott A. Sissonxy_2

2025-04-30 0 0 865.5KB 34 页 10玖币

侵权投诉

ACORRELATED PSEUDO-MARGINAL APPROACH TO DOUBLY

INTRACTABLE PROBLEMS

Yu Yang∗† Matias Quiroz‡Robert Kohn∗† Scott A. Sisson§†

ABSTRACT

Doubly intractable models are encountered in a number of ﬁelds, e.g. social networks, ecology and

epidemiology. Inference for such models requires the evaluation of a likelihood function, whose

normalising function depends on the model parameters and is typically computationally intractable.

The normalising constant of the posterior distribution and the additional normalising function of

the likelihood function result in a so-called doubly intractable posterior, for which it is difﬁcult to

directly apply Markov chain Monte Carlo (MCMC) methods. We propose a signed pseudo-marginal

Metropolis-Hastings (PMMH) algorithm with an unbiased block-Poisson estimator to sample from

the posterior distribution of doubly intractable models. As the estimator can be negative, the algo-

rithm targets the absolute value of the estimated posterior and uses an importance sampling correc-

tion to ensure simulation consistent estimates of the posterior mean of any function. The advantages

of our estimator over previous approaches are that its form is ideal for correlated pseudo-marginal

methods which are well known to dramatically increase sampling efﬁciency. Moreover, we develop

analytically derived heuristic guidelines for optimally tuning the hyperparameters of the estimator.

We demonstrate the algorithm on the Ising model and a Kent distribution model for spherical data.

Keywords Block-Poisson estimator, Ising model, Spherical data.

1 Introduction

Markov chain Monte Carlo (MCMC) methods (see, e.g., Brooks et al., 2011, for an overview) sample from the poste-

rior distribution without evaluating the normalising constant, also known as the marginal likelihood. However, in some

settings, the likelihood function itself contains an additional normalising constant that depends on the model parame-

ters and the resulting so-called doubly intractable posterior distribution falls outside the standard MCMC framework.

To distinguish these normalisation quantities, we refer to the ﬁrst as a normalising constant and the latter as a normal-

ising function. Many well-known models have doubly intractable posteriors, such as the exponential random graph

models for social networks (Hunter and Handcock, 2006) and non-Gaussian Markov random ﬁeld models in spatial

statistics, including the Ising model and its variants (Hughes et al., 2011; Ising, 1925; Lenz, 1920).

Several algorithms are available to tackle the doubly intractable problem in Bayesian statistics; see Park and Haran

(2018) for a review. These algorithms are classiﬁed into two main categories, with some overlap between them.

The ﬁrst category of methods introduce cleverly chosen auxiliary variables that cancel the normalising function when

∗School of Economics, UNSW Sydney, Australia

†UNSW Data Science Hub, UNSW Sydney, Australia

‡Department of Statistics, Stockholm University, Sweden

§School of Mathematics & Statistics, UNSW Sydney, Australia

arXiv:2210.02734v1 [stat.ME] 6 Oct 2022

Pseudo-marginal inference for doubly intractable problems

carrying out the MCMC sampling and standard MCMC such as the Metropolis-Hastings (MH) algorithm (Hastings,

1970; Metropolis et al., 1953) can thus be applied. This approach is model dependent and cannot always be applied.

The second category of methods, which applies more generally, approximates the likelihood function (including the

normalising function) and substitutes the approximation in place of the exact likelihood in the estimation procedure.

The pseudo-marginal (PM) method (Andrieu and Roberts, 2009; Beaumont, 2003) is often used when a positive

and unbiased estimator of the likelihood is available through Monte Carlo simulation. However, in some problems,

including doubly intractable models, forming an unbiased estimator that is almost surely positive is prohibitively

expensive (Jacob and Thiery, 2015). The so-called Russian roulette (RR) estimator (Lyne et al., 2015) is an example

of a method that can be used to unbiasedly estimate the likelihood function in doubly intractable models, although the

estimate is not necessarily positive.

We propose a method for exact inference on posterior expectations in doubly intractable problems based on the ap-

proach in Lyne et al. (2015), where an unbiased, but not necessarily positive, estimator of the likelihood function is

used. The algorithm targets a posterior density that uses the absolute value of the likelihood, resulting in iterates from

a perturbed target density. We follow Lyne et al. (2015) and reweight the samples from the perturbed target density

using importance sampling to obtain simulation-consistent estimates of the expectation of any function of the param-

eters with respect to the true posterior density. While our method does not sample from the target of interest, we refer

to it as exact due to its simulation-consistent property.

Our main contribution is to explore the use of the block-Poisson (BP) estimator (Quiroz et al., 2021) in the con-

text of estimating doubly intractable models using the signed PMMH approach. Our method provides the following

advantages over the Russian roulette method. First, the BP estimator has a much simpler structure and is more compu-

tationally efﬁcient. Second, the block form of our estimator makes it possible to correlate the estimators of the doubly

intractable posterior at the current and proposed draws in the MH algorithm. Introducing such correlation dramatically

improves the efﬁciency of PM algorithms (Deligiannidis et al., 2018; Tran et al., 2016). Finally, under simplifying

assumptions, some statistical properties of the logarithm of the absolute value of our estimator are derived and used to

obtain heuristic guidelines to optimally tune the hyperparameters of the estimator. We demonstrate empirically that our

method outperforms Lyne et al. (2015) when estimating the Ising model. To the best of our knowledge, our method,

that of Lyne et al. (2015) and its extensions are the only alternatives in the PM framework to perform exact inference

(in the sense of consistent estimates of posterior expectations) for general doubly intractable problems. Compared

with algorithms which use auxiliary variables to avoid evaluating the normalising function, signed PMMH algorithms

are more widely applicable and generic as they do not require exact sampling from the likelihood.

The rest of the paper is organised as follows. Section 2 formally introduces the doubly intractable problem and

discusses previous research. Section 3 introduces our methodology and establishes the guidelines for tuning the

hyperparameters of the estimator. Section 4 demonstrates the proposed method in two simulation studies: the Ising

model and the Kent distribution. Section 5 analyses four real-world datasets using the Kent distribution. Section 6

concludes and outlines future research. The paper has an online supplement that contains all proofs and details of

the simulation studies. The supplement also contains an additional example applying our method to a constrained

Gaussian process (GP), where the normalising function arises from the GP prior.

2 Doubly intractable problems

2.1 Doubly intractable posterior distributions

Let p(y|θ)denote the density of the data vector y, where θis the vector of model parameters. Suppose p(y|θ) =

f(y|θ)/Z(θ), where f(y|θ)is computable while the normalising function Z(θ)is not. The reason that Z(θ)is

intractable may be that it is prohibitively expensive to evaluate numerically, or lacks a closed form. Two examples are

given below to demonstrate the intractability for both discrete and continuous observations y.

Pseudo-marginal inference for doubly intractable problems

Example 1 (The Ising model (Ising, 1925)).Consider an L×Llattice with binary observation yij ∈ {−1,1}in row

iand column j. The likelihood of θ∈Ris

p(y|θ) = 1

Z(θ)exp(θS(y)); S(y) =

i=1

L−1

j=1

yi,j yi,j+1 +

L−1

i=1

j=1

yi,j yi+1,j ;(1)

with Z(θ) = X

exp(θS(y)).

The normalising function Z(θ)in the Ising model is a sum over 2L2terms of the form S(y), making it computationally

intractable even for moderate values of L. See Section 4.1 for a further discussion.

Example 2 (The Kent distribution (Kent, 1982)).The density of the Kent distribution for y∈R3,kyk= 1, is

f(y|γ1,γ2,γ3, β, κ) = 1

c(κ, β)exp κγ1>·y+β(γ2>·y)2−(γ3>·y)2;(2)

with c(κ, β) = 2π

∞

j=0

Γ(j+ 0.5)

Γ(j+ 1) β2j(0.5κ)−2j−0.5I2j+0.5(κ),

where Iν(.)is the modiﬁed Bessel function of the ﬁrst kind and γ1,γ2,γ3form a set of 3-dimensional orthonormal

vectors. The normalising function c(κ, β)is an inﬁnite sum. See Section 4.2 for further analysis.

The doubly intractable posterior density of θis

π(θ|y) = f(y|θ)π(θ)

Z(θ)p(y),(3)

where π(θ)is the prior for θand

p(y) = Zf(y|θ)π(θ)

Z(θ)dθ(4)

is the normalising constant. Suppose we devise a Metropolis-Hastings algorithm to sample from (3) with a proposal

density q(·|θ). The probability of accepting a proposed sample θ0is

α(θ0,θ) = min 1,π(θ0)f(y|θ0)/Z(θ0)

π(θ)f(y|θ))/Z(θ)×q(θ|θ0)

q(θ0|θ).(5)

The marginal likelihood in (4) cancels in (5), but the normalising function does not. Since Z(θ)/Z(θ0)is compu-

tationally intractable, (5) cannot be evaluated and thus MCMC sampling via the Metropolis-Hastings algorithm is

impossible.

2.2 Previous research

Previous research on doubly intractable problems is mainly divided into the auxiliary variable approach and the like-

lihood approximation approach; see Park and Haran (2018) for an excellent review of both approaches.

The auxiliary variable approach cleverly chooses the joint transition kernel of the parameters and the auxiliary variables

so that the normalising function cancels in the resulting MH acceptance ratio. The most well-known algorithms are the

exchange algorithm (Murray et al., 2006) and the auxiliary variable method (Møller et al., 2006). Both algorithms are

model dependent and rely on the sampling technique used to draw observations from the likelihood function. Perfect

sampling (Propp and Wilson, 1996) is often used to generate data samples from the model without knowing the

normalising function. However, for some complex models, such as the Ising model on a large grid, perfect sampling

is prohibitively expensive. To overcome this issue, Liang (2010) and Liang et al. (2016) relax the requirement of exact

sampling and propose the double MH sampler and the adaptive exchange algorithm. However, the former generates

Pseudo-marginal inference for doubly intractable problems

inexact inference results and the latter suffers from memory issues as many intermediate variables need to be stored

within each iteration.

The likelihood approximation approach are often simulation consistent. Atchadé et al. (2013) directly approximate

Z(θ)through multiple importance sampling. Their approach also depends on an auxiliary variable, but does not

require perfect sampling. The downside is similar to that of the adaptive exchange algorithm; a large memory is

usually required to store the intermediate variables generated in each iteration. An alternative method is to approximate

1/Z(θ)directly using the signed PMMH algorithm to replace the likelihood function by its unbiased estimator as

proposed in Lyne et al. (2015). To obtain the unbiased estimator, 1/Z(θ)is expressed as a geometric series which

is truncated using an RR approach. The RR method ﬁrst appeared in the physics literature (Carter and Cashwell,

1975) and is useful for obtaining an unbiased estimator through a ﬁnite stochastic truncation of an inﬁnite series. To

implement RR, a tight upper bound for Z(θ)is required, otherwise the convergence of the geometric series is slow

and makes the algorithm inefﬁcient. In practice, an upper bound is usually unavailable, which may lead to negative

estimates of the likelihood and thus a signed PMMH approach is necessary, although it inﬂates the asymptotic variance

of the MCMC chain (Andrieu and Vihola, 2016) compared to a standard PM approach, especially if the estimator

produces a signiﬁcant proportion of negative estimates (Lyne et al., 2015). It is therefore crucial to quantify the

probability of a negative estimate when tuning the hyperparameters of the estimator, which is difﬁcult for the RR

estimator. In contrast, our estimator is more tractable and the probability of a positive estimate is analytically derived

under simplifying assumptions. Besides the upper bound, a few other hyperparameters of the RR estimator need to

be determined for which guidelines are unavailable due to its intractability. Wei and Murray (2017) combine RR with

Markov chain coupling to produce an estimator with lower variance and a larger probability of producing positive

estimates. However, their estimator is insufﬁciently tractable to guarantee a positive estimate with a smaller variance,

making it difﬁcult to derive optimal tuning guidelines. Cai and Adams (2022) propose a multi-ﬁdelity MCMC method

to approximate the doubly intractable target density which, like the Russian roulette method, stochastically truncates

an inﬁnite series and uses slice sampling (Murray and Graham, 2016). However, similarly to Lyne et al. (2015), the

method lacks guidelines for tuning the hyperparameters.

3 Methodology

3.1 The block-Poisson estimator

The block-Poisson estimator (Quiroz et al., 2021) is useful for estimating the likelihood unbiasedly given an unbiased

estimator of the log-likelihood obtained by data subsampling. The block-Poisson estimator consists of blocks of

Poisson estimators (Papaspiliopoulos, 2011). The Poisson estimator, like the block-Poisson estimator, is useful for

estimating exp(B)unbiasedly, assuming that there exists an unbiased estimator b

Bof B, i.e. E(b

B) = B. The idea

behind using blocks of Poisson estimators is to allow for correlation between successive iterates in the PM algorithm

as described in Section 3.2. Similarly to the likelihood approximation approaches discussed above, the BP estimator

is implemented in combination with an auxiliary variable ν, and an estimator of the normalising function. Omitting

details of the auxiliary variable method that are explained in Section 3.2, assume B(θ) = −νZ(θ). Given νand an

unbiased estimator of Z(θ), the BP estimator produces an unbiased estimator of exp(−νZ(θ)). The BP estimator

requires a lower bound for B(θ)to guarantee its positiveness. The BP estimator is more likely to be positive than the

RR estimator, because we can derive the probability of the estimator being positive and tune the hyperparameters to

control this probability.

Deﬁnition 1 describes the BP estimator b

LB. Lemma 1 gives the expectation and variance of b

LB. Lemmas 2 and 3

establish useful results for tuning the hyperparameters of the estimator (see Section 3.3). Section A in the supplement

contains the proofs.

Pseudo-marginal inference for doubly intractable problems

Deﬁnition 1. The block-Poisson estimator (Quiroz et al., 2021) is deﬁned as

LB(θ) :=

l=1

exp(ξl(θ)),exp(ξl(θ)) = exp(a/λ +m)

χl

h=1 b

B(h,l)(θ)−a

mλ ,(6)

where λis the number of blocks, χl∼Pois(m), a Poisson distribution with mean m,ais an arbitrary constant and

mis the expected number of estimators used within each block.

Lemma 1. Denote σ2

B:= Var( b

B(θ)), and assume σ2

B<∞and E(b

B(θ)) = B(θ). The following properties hold

for b

LB(θ)in (6):

(i) E(b

LB(θ)) = exp(B(θ)).

(ii) Var(b

LB(θ)) = exp (B(θ)−a)2+σ2

mλ + 2a+mλ−exp(2B(θ)).

(iii) Var(b

LB(θ)) is minimised at a=B(θ)−mλ, given ﬁxed mand λ.

Part (i) of Lemma 1 shows that given an unbiased estimator b

B(θ)of B(θ), the BP estimator is unbiased for exp(B(θ)).

Part (iii) of Lemma 1 suggests that we can choose the lower bound a=b

B(θ)−mλ, as B(θ)is unknown. Similarly

to the RR estimator, the BP estimator is not necessarily positive. By choosing a relatively large mλ, the sufﬁcient

condition for b

LB(θ)≥0, i.e. b

B(θ)−a > 0, is likely to be satisﬁed; however, it is computationally costly as a large

mλ value implies many products in the BP estimator. We follow Quiroz et al. (2021) and advocate the use of a soft

lower bound, i.e., one that may lead to negative estimates, but still gives a Pr(b

LB(θ)≥0) close to one. Lemma 2

shows that the probability Pr(b

LB(θ)≥0) is analytically tractable. It is crucial to have this probability close to one

for the algorithm to be efﬁcient.

Lemma 2.

Pr(b

LB(θ)≥0) = 1

21 + (1 −2Ψ(a, m, M))λ,

with Ψ(a, m, M) = P r(ξ < 0) = 1

2P∞

j=1 1−(1 −2 Pr(Am≤0))jPr(χl=j),χl∼Pois(m)and Am=

B(θ)−B(θ)]/(mλ)+1.

Lemma 3 derives the variance of the logarithm of the absolute value of the block-Poisson estimator by assuming a

normal distribution for b

B(h,l)(θ). Section 3.3 tunes the hyperparameters using this result.

Lemma 3. If b

B(h,l)(θ)iid

∼N(B(θ), σ2

B)for all hand l, when a=B(θ)−mλ, then the variance of log |b

LB|is

σ2

log|

LB|=mλ(ν2

B+η2

B),

where

ηB= log(σB/(mλ)) + 0.5log 2 + EJ(ψ(0)(0.5 + J))

and

ν2

B= 0.25 EJ(ψ(1)(0.5 + J)) + VarJ(ψ(0)(0.5 + J)),

with J∼Pois((mλ)2/(2σ2

B)) and ψ(q)is the polygamma function of order q.

3.2 Signed block PMMH with the BP estimator

Lyne et al. (2015) use an auxiliary variable νto cancel the reciprocal of the normalising function in (3) and end up with

exp(νZ(θ)) instead. Speciﬁcally, assume that ν∼Expon(Z(θ)). The joint density of θand the auxiliary variable ν

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ACORRELATEDPSEUDO-MARGINALAPPROACHTODOUBLYINTRACTABLEPROBLEMSYuYangyMatiasQuirozzRobertKohnyScottA.SissonxyABSTRACTDoublyintractablemodelsareencounteredinanumberofelds,e.g.socialnetworks,ecologyandepidemiology.Inferenceforsuchmodelsrequirestheevaluationofalikelihoodfunction,whosenormalisingfuncti...

展开>> 收起<<

ACORRELATED PSEUDO -MARGINAL APPROACH TO DOUBLY INTRACTABLE PROBLEMS Yu YangyMatias QuirozzRobert KohnyScott A. Sissonxy_2.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ACORRELATED PSEUDO -MARGINAL APPROACH TO DOUBLY INTRACTABLE PROBLEMS Yu YangyMatias QuirozzRobert KohnyScott A. Sissonxy_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: