Sampling using adaptive regenerative processes

2025-05-03 0 0 1.16MB 43 页 10玖币

侵权投诉

Sampling using adaptive regenerative

processes

Hector McKimm1Andi Q. Wang2Murray Pollock3

Christian P. Robert2,4Gareth O. Roberts2

1Department of Mathematics, Imperial College London, London, United Kingdom, e-mail:

h.mckimm@imperial.ac.uk

2Department of Statistics, University of Warwick, Coventry, United Kingdom, e-mail:

andi.wang@warwick.ac.uk;

C.A.M.Robert@warwick.ac.uk;Gareth.O.Roberts@warwick.ac.uk

3School of Mathematics, Newcastle University, Newcastle-upon-Tyne, United Kingdom,

e-mail: murray.pollock@newcastle.ac.uk

4CEREMADE, Universit´e Paris Dauphine PSL, Paris, France

Abstract: Enriching Brownian motion with regenerations from a ﬁxed

regeneration distribution µat a particular regeneration rate κresults in a

Markov process that has a target distribution πas its invariant distribu-

tion. For the purpose of Monte Carlo inference, implementing such a scheme

requires ﬁrstly selection of regeneration distribution µ, and secondly com-

putation of a speciﬁc constant C. Both of these tasks can be very diﬃcult

in practice for good performance. We introduce a method for adapting the

regeneration distribution, by adding point masses to it. This allows the

process to be simulated with as few regenerations as possible and obviates

the need to ﬁnd said constant C. Moreover, the choice of ﬁxed µis replaced

with the choice of the initial regeneration distribution, which is consider-

ably less diﬃcult. We establish convergence of this resulting self-reinforcing

process and explore its eﬀectiveness at sampling from a number of target

distributions. The examples show that adapting the regeneration distribu-

tion guards against poor choices of ﬁxed regeneration distribution and can

reduce the error of Monte Carlo estimates of expectations of interest, espe-

cially when πis skewed.

Keywords and phrases: Adaptive algorithm, Markov process, MCMC,

normalizing constant, regeneration distribution, Restore sampler, sampling,

simulation.

1. Introduction

Bayesian statistical problems customarily require computing quantities of the

form π[f]≡Eπ[f(X)], where Xis some random variable with distribution π

and fis a function, meaning this expectation is the integral RRdf(x)π(x) dx,

when Rdis the state space. For sophisticated models, it may be impossible

to compute this integral analytically. Furthermore, it may be impractical to

generate independent samples for use in Monte Carlo integration. In this case,

Markov chain Monte Carlo (MCMC) methods (Robert and Casella,2004) may

be used to generate a Markov chain X0, X1, . . . with limiting distribution π,

and then approximate π[f] by n−1Pn

i=1 f(Xi).

arXiv:2210.09901v2 [stat.CO] 20 Feb 2024

/Adaptive Restore 2

The chain is constructed by repeatedly applying a collection of Markov transi-

tion kernels P1, . . . , Pm, each satisfying πPi=πfor i= 1, . . . , m. The Metropolis–

Hastings algorithm (Metropolis et al.,1953;Hastings,1970) is normally used

to construct and simulate from reversible π-invariant Markov transition ker-

nels. A single kernel Pmay be used to represent P1, P2, . . . , Pm, with form

depending on whether a cycle is used (P=P1P2···Pm) or a mixture (P=

[P1+P2+···+Pm]/m). Using multiple kernels allows diﬀerent dynamics to be

used, for example, by making transitions on the local and global scales.

The MCMC framework described above is restrictive. Firstly, each kernel

must be π-invariant; for example, it is not possible for P1and P2to be indi-

vidually non-π-invariant and yet somehow compensate for each other so that

their combination Pis π-invariant. To achieve this π-invariance, each kernel is

designed to be reversible. This acts as a further restriction; by deﬁnition, re-

versible kernels satisfy detailed balance and thus have diﬀusive dynamics. That

is, chains generated using reversible kernels show random-walk-like behaviour,

which is ineﬃcient. Recently, there has been increasing interest in the use of

non-reversible Markov processes for MCMC (Bierkens, Fearnhead and Roberts,

2019;Bouchard-Cˆot´e, Vollmer and Doucet,2018;Pollock et al.,2020).

A further restriction of the typical MCMC framework is that it is diﬃcult to

make use of regeneration. At regeneration times, a Markov chain eﬀectively starts

again; its future is independent of its past. Regeneration is useful from both the-

oretical and practical perspectives. Nummelin’s splitting technique (Nummelin,

1978) may be used in MCMC algorithms to simulate regeneration events (Myk-

land, Tierney and Yu,1995;Gilks, Roberts and Sahu,1998). However, the tech-

nique scales poorly: regenerations become exponentially rarer with dimension;

see the discussion in Gilks, Roberts and Sahu (1998).

An interesting direction to address these issues appeared in Wang et al.

(2021). The authors introduced the Restore process, deﬁned by enriching an

underlying Markov process, which may not be π-invariant, with regenerations

from some ﬁxed regeneration distribution µat a regeneration rate κso that the

resulting Markov process is π-invariant. The state-dependent regeneration rate

is given by κ= ˜κ+Cµ/π, where ˜κis a functional of the derivatives of log πand

Cis a constant chosen so that κ≥0 pointwise. The segments of the process

between regeneration times, known as tours, are independent and identically dis-

tributed. When applied to Monte Carlo, we refer to this as the Restore sampler.

The process provides a general framework for using non-reversible dynamics,

local and global moves, as well as regeneration within an MCMC sampler. Sam-

ple paths of the continuous-time process are used to form a Monte Carlo sum

to approximate π[f].

An issue with the Restore sampler is the choice Cµ. For a given µand π,

it is diﬃcult to compute how large Cneeds to be to guarantee κ≥0. Fur-

thermore, it is not obvious how to set µin the ﬁrst place. This issue is in fact

crucial, since the Restore sampler is very sensitive to the choice of µ. One might

consider adapting µto become closer to π. Comparable discrete-time MCMC

methods that use both local and global moves tend to resort to cycle kernels,

for instance combining MALA (Roberts and Tweedie,1996) and Independence

/Adaptive Restore 3

Sampler kernels. Typically, the proposal distribution for the Independence Sam-

pler is initialized as a simple distribution, then adapted based on samples from

πto become closer to π. For example, Gabri´e, Rotskoﬀ and Vanden-Eijnden

(2021) deﬁne the proposal distribution using a normalizing ﬂow (Papamakarios

et al.,2021), which is tuned to become closer to π. In simulating a Restore

process, the strategy of adapting µto become closer to πcould only work if

the initial choice of µwere a ‘good’ one that allows κto be small enough for

sampling to be possible. In Section 3we give an example of κbeing so large

that, under a seemingly sensible choice of µ, sampling is not possible due to

numerical reasons. Furthermore in Section 5, we demonstrate that even if sam-

pling is numerically possible, using a ﬁxed regeneration distribution may make

it ineﬃcient.

In this work, we consider an adaptive approach so that a far smaller regen-

eration rate may be used. In particular, instead of using a ﬁxed regeneration

distribution µ, we will use at time ta regeneration distribution µt, which is

adapted so that it converges to the minimal regeneration distribution µ+. This

distribution typically has a compact support, and will result in a considerably

more robust algorithm, thus having a similar spirit to the proposed Barker pro-

posal of Livingstone and Zanella (2022). We call the novel Markov process an

adaptive Restore process and the original Restore process the standard Restore

process. An asymptotic analysis for large dimensions is presented in Supple-

ment D (McKimm et al.,2024) to justify this strategy. The regeneration dis-

tribution is initially a ﬁxed parametric distribution µ0, then as the process is

generated point masses are added, so that µtis a mixture of a parametric dis-

tribution and point masses. Throughout simulation, the minimal regeneration

rate is used. This rate is deﬁned implicitly by a particular constant C+, which

need not be known explicitly. Thus adaptive Restore obviates the need to ﬁnd

a suitably large constant.

Adaptive Restore diﬀers from adaptive MCMC methods (Andrieu and Thoms,

2008;Roberts and Rosenthal,2009;Haario et al.,2001), since the latter adapt

the Markov transition kernel used in generating the Markov chain, whilst the

former adapts the regeneration distribution.

Besides the methodological contributions of this work, from a theoretical per-

spective, this paper presents a novel application of the stochastic approxima-

tion technique to establishing convergence of self-reinforcing processes, as previ-

ously utilised in, say, Aldous, Flannery and Palacios (1988); Bena¨ım, Cloez and

Panloup (2018); Mailler and Villemonais (2020). In particular, we will adapt

the proof technique of Bena¨ım, Cloez and Panloup (2018)—which applies to

discrete-time Markov chains on a compact state space—to deduce validity of

our adaptive Restore process, which is a continuous-time Markov process on

a noncompact state space. This will be achieved by identifying a natural em-

bedded discrete-time Markov chain, taking values on a compact subset, whose

convergence implies convergence of the overall process. Theoretical summary

and comparison are provided in Section 4.1.

A secondary contribution of this article is showing that it is possible to use

a standard Restore process to estimate the normalizing constant of an unnor-

/Adaptive Restore 4

malized density.

The rest of the article is arranged as follows. Section 2reviews standard

Restore. Next, Section 3introduces the adaptive Restore process and its use as a

sampler. Section 4is a self-contained Section on the theory of adaptive Restore,

where we prove its validity; see Section 4.1 for a summary of our theoretical

contributions. Examples are then provided in Section 5and Section 6concludes.

2. The Restore process

This section describes the standard Restore process, as introduced in Wang et al.

(2021). We deﬁne the process, explain how it may be used to estimate normal-

izing constants, introduce the concept of minimal regeneration, and present the

case where the underlying process is Brownian motion.

2.1. Regeneration-enriched Markov processes

The Restore process is deﬁned as follows. Let {Yt}t≥0be a diﬀusion or jump

process on Rd. The regeneration rate κ:Rd→[0,∞), which we will deﬁne

shortly, is locally bounded and measurable. Deﬁne the tour length as

τ= inf (t≥0:Zt

κ(Ys) ds≥ξ),(1)

for ξ∼Exp(1) independent of {Yt}t≥0. Let µbe some ﬁxed distribution and

{Y(i)

t}t≥0, τ(i)∞

i=0 be i.i.d realisations of ({Yt}t≥0, τ) with Y0∼µ. The regen-

eration times are T0= 0 and Tj=Pj−1

i=0 τ(i)for j= 1,2, . . . . Then the Restore

process {Xt}t≥0is given by:

Xt=

∞

i=0

[Ti,Ti+1)(t)Y(i)

t−Ti.

Let LYbe the inﬁnitesimal generator of {Yt}t≥0. Then the (formal) inﬁnitesimal

generator of {Xt}t≥0is: LXf(x) = LYf(x) + κ(x)R[f(y)−f(x)]µ(y) dy. To use

the Restore process for Monte Carlo integration one chooses κso that {Xt}t≥0

is π-invariant. Deﬁning κas

κ(x) = L†

Yπ(x)

π(x)+Cµ(x)

π(x),(2)

with L†

Ydenoting the formal adjoint, it can be shown that RRdLXf(x)π(x) dx=

0. Hence, {Xt}t≥0is π-invariant. We will write equation (2) as

κ(x) = ˜κ(x) + Cµ(x)

π(x).(3)

/Adaptive Restore 5

We call ˜κthe partial regeneration rate,µthe regeneration distribution,C > 0 the

regeneration constant, and Cµ the regeneration measure, which must be large

enough so that κ(x)>0,∀x∈Rd. The resulting Monte Carlo method is called

the Restore Sampler. Given π-invariance of {Xt}t≥0, due to the regenerative

structure of the process, we have

π[f] = X0∼µhRτ(0)

0f(Xs) dsi

X0∼µ[τ(0)]

and almost sure convergence of the ergodic averages: as t→ ∞,

tZt

f(Xs) ds→π[f].(4)

For i= 0,1, . . . , deﬁne Zi:=RTi+1

Tif(Xs) ds. The Central Limit Theorem for

Restore processes states that

√n

RTn

0f(Xs) ds

Tn−π[f]

→ N(0, σ2

f),

where convergence is in distribution and

σ2

f:=

X0∼µZ0−τ(0) π[f]2

X0∼µ[τ(0)]2.(5)

Evidently the estimator’s variance depends on the expected tour length. This

is one motivation for choosing µso that tours are on average reasonably long.

Indeed, this is the key motivation behind the minimal regeneration measure

described in Section 2.3.

2.2. Estimating the normalizing constant

It is possible to use a Restore process to estimate the normalizing constant of an

unnormalized target distribution. When the target distribution is the posterior

of some statistical model, its normalizing constant is the marginal likelihood, also

known as the evidence. Computing the evidence allows for model comparison via

Bayes factors (Kass and Raftery,1995). Standard MCMC methods draw depen-

dent samples from πbut cannot be used to calculate the evidence. Alternative

methods such as importance sampling, thermodynamic integration (Neal,1993;

Ogata,1989), Sequential Monte Carlo (Del Moral, Doucet and Jasra,2006) or

nested sampling (Skilling,2006) must instead be used for computing the evi-

dence (Gelman and Meng,1998).

For Zthe normalizing constant, let

π(x) = ˜π(x)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SamplingusingadaptiveregenerativeprocessesHectorMcKimm1AndiQ.Wang2MurrayPollock3ChristianP.Robert2,4GarethO.Roberts21DepartmentofMathematics,ImperialCollegeLondon,London,UnitedKingdom,e-mail:h.mckimm@imperial.ac.uk2DepartmentofStatistics,UniversityofWarwick,Coventry,UnitedKingdom,e-mail:andi.wang@wa...

展开>> 收起<<

Sampling using adaptive regenerative processes.pdf

共43页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sampling using adaptive regenerative processes

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: