Sampling using adaptive regenerative processes

2025-05-03 0 0 1.16MB 43 页 10玖币
侵权投诉
Sampling using adaptive regenerative
processes
Hector McKimm1Andi Q. Wang2Murray Pollock3
Christian P. Robert2,4Gareth O. Roberts2
1Department of Mathematics, Imperial College London, London, United Kingdom, e-mail:
h.mckimm@imperial.ac.uk
2Department of Statistics, University of Warwick, Coventry, United Kingdom, e-mail:
andi.wang@warwick.ac.uk;
C.A.M.Robert@warwick.ac.uk;Gareth.O.Roberts@warwick.ac.uk
3School of Mathematics, Newcastle University, Newcastle-upon-Tyne, United Kingdom,
e-mail: murray.pollock@newcastle.ac.uk
4CEREMADE, Universit´e Paris Dauphine PSL, Paris, France
Abstract: Enriching Brownian motion with regenerations from a fixed
regeneration distribution µat a particular regeneration rate κresults in a
Markov process that has a target distribution πas its invariant distribu-
tion. For the purpose of Monte Carlo inference, implementing such a scheme
requires firstly selection of regeneration distribution µ, and secondly com-
putation of a specific constant C. Both of these tasks can be very difficult
in practice for good performance. We introduce a method for adapting the
regeneration distribution, by adding point masses to it. This allows the
process to be simulated with as few regenerations as possible and obviates
the need to find said constant C. Moreover, the choice of fixed µis replaced
with the choice of the initial regeneration distribution, which is consider-
ably less difficult. We establish convergence of this resulting self-reinforcing
process and explore its effectiveness at sampling from a number of target
distributions. The examples show that adapting the regeneration distribu-
tion guards against poor choices of fixed regeneration distribution and can
reduce the error of Monte Carlo estimates of expectations of interest, espe-
cially when πis skewed.
Keywords and phrases: Adaptive algorithm, Markov process, MCMC,
normalizing constant, regeneration distribution, Restore sampler, sampling,
simulation.
1. Introduction
Bayesian statistical problems customarily require computing quantities of the
form π[f]Eπ[f(X)], where Xis some random variable with distribution π
and fis a function, meaning this expectation is the integral RRdf(x)π(x) dx,
when Rdis the state space. For sophisticated models, it may be impossible
to compute this integral analytically. Furthermore, it may be impractical to
generate independent samples for use in Monte Carlo integration. In this case,
Markov chain Monte Carlo (MCMC) methods (Robert and Casella,2004) may
be used to generate a Markov chain X0, X1, . . . with limiting distribution π,
and then approximate π[f] by n1Pn
i=1 f(Xi).
1
arXiv:2210.09901v2 [stat.CO] 20 Feb 2024
/Adaptive Restore 2
The chain is constructed by repeatedly applying a collection of Markov transi-
tion kernels P1, . . . , Pm, each satisfying πPi=πfor i= 1, . . . , m. The Metropolis–
Hastings algorithm (Metropolis et al.,1953;Hastings,1970) is normally used
to construct and simulate from reversible π-invariant Markov transition ker-
nels. A single kernel Pmay be used to represent P1, P2, . . . , Pm, with form
depending on whether a cycle is used (P=P1P2···Pm) or a mixture (P=
[P1+P2+···+Pm]/m). Using multiple kernels allows different dynamics to be
used, for example, by making transitions on the local and global scales.
The MCMC framework described above is restrictive. Firstly, each kernel
must be π-invariant; for example, it is not possible for P1and P2to be indi-
vidually non-π-invariant and yet somehow compensate for each other so that
their combination Pis π-invariant. To achieve this π-invariance, each kernel is
designed to be reversible. This acts as a further restriction; by definition, re-
versible kernels satisfy detailed balance and thus have diffusive dynamics. That
is, chains generated using reversible kernels show random-walk-like behaviour,
which is inefficient. Recently, there has been increasing interest in the use of
non-reversible Markov processes for MCMC (Bierkens, Fearnhead and Roberts,
2019;Bouchard-Cˆot´e, Vollmer and Doucet,2018;Pollock et al.,2020).
A further restriction of the typical MCMC framework is that it is difficult to
make use of regeneration. At regeneration times, a Markov chain effectively starts
again; its future is independent of its past. Regeneration is useful from both the-
oretical and practical perspectives. Nummelin’s splitting technique (Nummelin,
1978) may be used in MCMC algorithms to simulate regeneration events (Myk-
land, Tierney and Yu,1995;Gilks, Roberts and Sahu,1998). However, the tech-
nique scales poorly: regenerations become exponentially rarer with dimension;
see the discussion in Gilks, Roberts and Sahu (1998).
An interesting direction to address these issues appeared in Wang et al.
(2021). The authors introduced the Restore process, defined by enriching an
underlying Markov process, which may not be π-invariant, with regenerations
from some fixed regeneration distribution µat a regeneration rate κso that the
resulting Markov process is π-invariant. The state-dependent regeneration rate
is given by κ= ˜κ+Cµ/π, where ˜κis a functional of the derivatives of log πand
Cis a constant chosen so that κ0 pointwise. The segments of the process
between regeneration times, known as tours, are independent and identically dis-
tributed. When applied to Monte Carlo, we refer to this as the Restore sampler.
The process provides a general framework for using non-reversible dynamics,
local and global moves, as well as regeneration within an MCMC sampler. Sam-
ple paths of the continuous-time process are used to form a Monte Carlo sum
to approximate π[f].
An issue with the Restore sampler is the choice Cµ. For a given µand π,
it is difficult to compute how large Cneeds to be to guarantee κ0. Fur-
thermore, it is not obvious how to set µin the first place. This issue is in fact
crucial, since the Restore sampler is very sensitive to the choice of µ. One might
consider adapting µto become closer to π. Comparable discrete-time MCMC
methods that use both local and global moves tend to resort to cycle kernels,
for instance combining MALA (Roberts and Tweedie,1996) and Independence
/Adaptive Restore 3
Sampler kernels. Typically, the proposal distribution for the Independence Sam-
pler is initialized as a simple distribution, then adapted based on samples from
πto become closer to π. For example, Gabri´e, Rotskoff and Vanden-Eijnden
(2021) define the proposal distribution using a normalizing flow (Papamakarios
et al.,2021), which is tuned to become closer to π. In simulating a Restore
process, the strategy of adapting µto become closer to πcould only work if
the initial choice of µwere a ‘good’ one that allows κto be small enough for
sampling to be possible. In Section 3we give an example of κbeing so large
that, under a seemingly sensible choice of µ, sampling is not possible due to
numerical reasons. Furthermore in Section 5, we demonstrate that even if sam-
pling is numerically possible, using a fixed regeneration distribution may make
it inefficient.
In this work, we consider an adaptive approach so that a far smaller regen-
eration rate may be used. In particular, instead of using a fixed regeneration
distribution µ, we will use at time ta regeneration distribution µt, which is
adapted so that it converges to the minimal regeneration distribution µ+. This
distribution typically has a compact support, and will result in a considerably
more robust algorithm, thus having a similar spirit to the proposed Barker pro-
posal of Livingstone and Zanella (2022). We call the novel Markov process an
adaptive Restore process and the original Restore process the standard Restore
process. An asymptotic analysis for large dimensions is presented in Supple-
ment D (McKimm et al.,2024) to justify this strategy. The regeneration dis-
tribution is initially a fixed parametric distribution µ0, then as the process is
generated point masses are added, so that µtis a mixture of a parametric dis-
tribution and point masses. Throughout simulation, the minimal regeneration
rate is used. This rate is defined implicitly by a particular constant C+, which
need not be known explicitly. Thus adaptive Restore obviates the need to find
a suitably large constant.
Adaptive Restore differs from adaptive MCMC methods (Andrieu and Thoms,
2008;Roberts and Rosenthal,2009;Haario et al.,2001), since the latter adapt
the Markov transition kernel used in generating the Markov chain, whilst the
former adapts the regeneration distribution.
Besides the methodological contributions of this work, from a theoretical per-
spective, this paper presents a novel application of the stochastic approxima-
tion technique to establishing convergence of self-reinforcing processes, as previ-
ously utilised in, say, Aldous, Flannery and Palacios (1988); Bena¨ım, Cloez and
Panloup (2018); Mailler and Villemonais (2020). In particular, we will adapt
the proof technique of Bena¨ım, Cloez and Panloup (2018)—which applies to
discrete-time Markov chains on a compact state space—to deduce validity of
our adaptive Restore process, which is a continuous-time Markov process on
a noncompact state space. This will be achieved by identifying a natural em-
bedded discrete-time Markov chain, taking values on a compact subset, whose
convergence implies convergence of the overall process. Theoretical summary
and comparison are provided in Section 4.1.
A secondary contribution of this article is showing that it is possible to use
a standard Restore process to estimate the normalizing constant of an unnor-
/Adaptive Restore 4
malized density.
The rest of the article is arranged as follows. Section 2reviews standard
Restore. Next, Section 3introduces the adaptive Restore process and its use as a
sampler. Section 4is a self-contained Section on the theory of adaptive Restore,
where we prove its validity; see Section 4.1 for a summary of our theoretical
contributions. Examples are then provided in Section 5and Section 6concludes.
2. The Restore process
This section describes the standard Restore process, as introduced in Wang et al.
(2021). We define the process, explain how it may be used to estimate normal-
izing constants, introduce the concept of minimal regeneration, and present the
case where the underlying process is Brownian motion.
2.1. Regeneration-enriched Markov processes
The Restore process is defined as follows. Let {Yt}t0be a diffusion or jump
process on Rd. The regeneration rate κ:Rd[0,), which we will define
shortly, is locally bounded and measurable. Define the tour length as
τ= inf (t0:Zt
0
κ(Ys) dsξ),(1)
for ξExp(1) independent of {Yt}t0. Let µbe some fixed distribution and
{Y(i)
t}t0, τ(i)
i=0 be i.i.d realisations of ({Yt}t0, τ) with Y0µ. The regen-
eration times are T0= 0 and Tj=Pj1
i=0 τ(i)for j= 1,2, . . . . Then the Restore
process {Xt}t0is given by:
Xt=
X
i=0
[Ti,Ti+1)(t)Y(i)
tTi.
Let LYbe the infinitesimal generator of {Yt}t0. Then the (formal) infinitesimal
generator of {Xt}t0is: LXf(x) = LYf(x) + κ(x)R[f(y)f(x)]µ(y) dy. To use
the Restore process for Monte Carlo integration one chooses κso that {Xt}t0
is π-invariant. Defining κas
κ(x) = L
Yπ(x)
π(x)+Cµ(x)
π(x),(2)
with L
Ydenoting the formal adjoint, it can be shown that RRdLXf(x)π(x) dx=
0. Hence, {Xt}t0is π-invariant. We will write equation (2) as
κ(x) = ˜κ(x) + Cµ(x)
π(x).(3)
/Adaptive Restore 5
We call ˜κthe partial regeneration rate,µthe regeneration distribution,C > 0 the
regeneration constant, and Cµ the regeneration measure, which must be large
enough so that κ(x)>0,xRd. The resulting Monte Carlo method is called
the Restore Sampler. Given π-invariance of {Xt}t0, due to the regenerative
structure of the process, we have
π[f] = X0µhRτ(0)
0f(Xs) dsi
X0µ[τ(0)]
and almost sure convergence of the ergodic averages: as t→ ∞,
1
tZt
0
f(Xs) dsπ[f].(4)
For i= 0,1, . . . , define Zi:=RTi+1
Tif(Xs) ds. The Central Limit Theorem for
Restore processes states that
n
RTn
0f(Xs) ds
Tnπ[f]
→ N(0, σ2
f),
where convergence is in distribution and
σ2
f:=
X0µZ0τ(0) π[f]2
X0µ[τ(0)]2.(5)
Evidently the estimator’s variance depends on the expected tour length. This
is one motivation for choosing µso that tours are on average reasonably long.
Indeed, this is the key motivation behind the minimal regeneration measure
described in Section 2.3.
2.2. Estimating the normalizing constant
It is possible to use a Restore process to estimate the normalizing constant of an
unnormalized target distribution. When the target distribution is the posterior
of some statistical model, its normalizing constant is the marginal likelihood, also
known as the evidence. Computing the evidence allows for model comparison via
Bayes factors (Kass and Raftery,1995). Standard MCMC methods draw depen-
dent samples from πbut cannot be used to calculate the evidence. Alternative
methods such as importance sampling, thermodynamic integration (Neal,1993;
Ogata,1989), Sequential Monte Carlo (Del Moral, Doucet and Jasra,2006) or
nested sampling (Skilling,2006) must instead be used for computing the evi-
dence (Gelman and Meng,1998).
For Zthe normalizing constant, let
π(x) = ˜π(x)
Z.
摘要:

SamplingusingadaptiveregenerativeprocessesHectorMcKimm1AndiQ.Wang2MurrayPollock3ChristianP.Robert2,4GarethO.Roberts21DepartmentofMathematics,ImperialCollegeLondon,London,UnitedKingdom,e-mail:h.mckimm@imperial.ac.uk2DepartmentofStatistics,UniversityofWarwick,Coventry,UnitedKingdom,e-mail:andi.wang@wa...

展开>> 收起<<
Sampling using adaptive regenerative processes.pdf

共43页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:43 页 大小:1.16MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 43
客服
关注