Adaptive Tuning for Metropolis Adjusted Langevin Trajectories Lionel Riou-Durand Pavel Sountsov Jure Vogrinc University of Warwick Google Research University of Warwick

2025-04-30 0 0 559.51KB 15 页 10玖币

侵权投诉

Adaptive Tuning for Metropolis Adjusted Langevin Trajectories

Lionel Riou-Durand Pavel Sountsov Jure Vogrinc

University of Warwick Google Research University of Warwick

Charles C. Margossian Sam Power

Flatiron Institute University of Bristol

Abstract

Hamiltonian Monte Carlo (HMC) is a widely

used sampler for continuous probability distri-

butions. In many cases, the underlying Hamil-

tonian dynamics exhibit a phenomenon of reso-

nance which decreases the efﬁciency of the al-

gorithm and makes it very sensitive to hyperpa-

rameter values. This issue can be tackled efﬁ-

ciently, either via the use of trajectory length ran-

domization (RHMC) or via partial momentum

refreshment. The second approach is connected

to the kinetic Langevin diffusion, and has been

mostly investigated through the use of General-

ized HMC (GHMC). However, GHMC induces

momentum ﬂips upon rejections causing the

sampler to backtrack and waste computational

resources. In this work we focus on a recent

algorithm bypassing this issue, named Metropo-

lis Adjusted Langevin Trajectories (MALT). We

build upon recent strategies for tuning the hyper-

parameters of RHMC which target a bound on

the Effective Sample Size (ESS) and adapt it to

MALT, thereby enabling the ﬁrst user-friendly

deployment of this algorithm. We construct a

method to optimize a sharper bound on the ESS

and reduce the estimator variance. Easily com-

patible with parallel implementation, the resul-

tant Adaptive MALT algorithm is competitive

in terms of ESS rate and hits useful tradeoffs

in memory usage when compared to GHMC,

RHMC and NUTS.

Proceedings of the 26th International Conference on Artiﬁcial In-

telligence and Statistics (AISTATS) 2023, TBD, USA. PMLR:

1 INTRODUCTION

Hamiltonian Monte Carlo (HMC; Duane et al. (1987); Neal

(2011)) is a widely-known MCMC sampler for distribu-

tions with differentiable densities. The use of multiple

gradient-informed steps per proposal suppresses random

walk behavior and makes HMC particularly powerful for

high-dimensional numerical integration. Its implementa-

tion in probabilistic-programming languages (e.g. Carpen-

ter et al.,2017;Bingham et al.,2018;Salvatier et al.,2016;

Lao et al.,2020) leverages automatic differentiation and

adaptive tuning with the no-U-turn sampler (NUTS; Hoff-

man and Gelman (2014)) which require minimal input from

the practitioner.

Robust implementations of HMC rely on randomizing the

length of Hamiltonian trajectories (Bou-Rabee and Sanz-

Serna,2017, RHMC) to avoid resonances. NUTS is well

established and remains wildly popular but recent studies

suggest optimally tuned RHMC samplers can perform bet-

ter. Furthermore, NUTS’ recursive construction (control-

ﬂow-heavy) complicates its implementation when running

many parallel chains (Radul et al.,2020;Lao and Dillon,

2019;Phan and Pradhan,2019). To take better advan-

tage of modern computing hardware such as GPUs, several

schemes for tuning RHMC have been proposed (Hoffman

et al.,2021;Sountsov and Hoffman,2021), showing signif-

icant gains of efﬁciency compared to NUTS.

A second approach to mitigate resonances is based on

the kinetic Langevin diffusion. Built upon its discretiza-

tion, Generalized HMC (Horowitz,1991, GHMC) alter-

nates single adjusted steps with partial momentum refresh-

ments. However, the original GHMC algorithm had not

been proven competitive, its efﬁciency being restrained

by momentum ﬂips induced upon rejections. To reduce

the backtracking caused by these ﬂips, Neal (2020) pro-

posed a slice-sampling approach to cluster the event times

of the rejections of a proposal. The efﬁciency of GHMC

was signiﬁcantly improved by this method, which was

later turned into a tuning-free algorithm (Hoffman and

Sountsov,2022).

arXiv:2210.12200v2 [stat.CO] 22 Feb 2023

Adaptive Tuning for Metropolis Adjusted Langevin Trajectories

In this paper, we focus on a new numerical scheme for

approximating the kinetic Langevin diffusion, recently in-

troduced as Metropolis Adjusted Langevin Trajectories

(Riou-Durand and Vogrinc,2022, MALT). By construc-

tion, MALT enables full erasing of the momentum ﬂips and

shares several desirable properties with RHMC. Compared

to GHMC, some qualitative arguments in favour of MALT

were made but no quantitative comparison had been con-

ducted. One objective is to ﬁll this gap.

Our general goal is to develop a tuning-free version of the

MALT algorithm, compatible with running many parallel

chains. In Section 3 we develop tuning heuristics and al-

gorithmic solutions for each parameter. Built upon these

principles, we present in Section 4 an adaptive sampler, ap-

plicable for multiple chains as well as for single chain set-

tings. In Section 5 we compare numerically its efﬁciency

to several adaptive samplers (GHMC, RHMC, NUTS) on a

set of benchmark models.

To summarize our contributions:

• We develop tuning heuristics for each of MALT’s hy-

perparameters, building upon recent tuning strategies

for RHMC (Hoffman et al.,2021;Sountsov and Hoff-

man,2021) which target a bound on the Effective

Sample Size (ESS).

• We propose a method to optimize a sharper bound on

the ESS, for which we derive a gradient estimate with

reduced variance. This variance reduction method is

applicable to both MALT and RHMC and allows for

speeding up the adaptation of the trajectory length.

• We construct an Adaptive MALT algorithm, easily

compatible with, but not reliant on, running multiple

chains in parallel.

• We show that Adaptive MALT is competitive in terms

of ESS rate and hits useful tradeoffs in memory us-

age when compared to previous adaptive samplers

(GHMC, RHMC, NUTS).

2 BACKGROUND

This section reviews the use of Hamiltonian dynamics to

construct MCMC samplers targeting a differentiable den-

sity Π(x)∝exp(−Φ(x)), for x∈Rd. We introduce

the notion of partial momentum refreshment and present its

connection to the kinetic Langevin diffusion, thereby mo-

tivating GHMC. Unfortunately the utility of GHMC is hin-

dered by momentum ﬂips, required to preserve detailed bal-

ance but statistically inefﬁcient; we explain how MALT by-

passes this issue. Finally we review existing tuning strate-

gies for HMC and GHMC samplers.

2.1 Hamiltonian Dynamics

To describe the motion of a particle x∈Rdand its velocity

v∈Rd(a.k.a. momentum), Hamiltonian dynamics are

deﬁned for a positive deﬁnite mass matrix M∈Rd×das

dxt=M−1vtdt, dvt=−∇Φ(xt) dt. (1)

By construction, Hamiltonian dynamics preserve the den-

sity Π∗(x, v)∝exp(−Φ(x)−v>M−1v/2). In general, the

ODE (1) does not have a closed form solution. Instead, the

leapfrog method is commonly used to integrate this ODE

numerically. It is deﬁned for a time-step h > 0as the fol-

lowing update (x1, v1) = LEAPFROG(x0, v0;M, h)such

that

v1/2=v0−(h/2)∇Φ(x0)

x1=x0+hM−1v1/2

v1=v1/2−(h/2)∇Φ(x1).

(2)

The leapfrog integrator is time-reversible (Neal,2011), i.e.

(x0,−v0) = LEAPFROG(x1,−v1;M, h). This means that

going backwards in time is equivalent to ﬂipping the mo-

menta. HMC is then built upon the following recursion.

From a position x0∈Rdand a velocity v0∼ Nd(0d, M), a

Hamiltonian trajectory of length τ > 0is proposed by com-

posing the leapfrog update for L=dτ/hesteps. The re-

sultant proposal (xL, vL)is faced with an accept-reject test

(a.k.a. Metropolis correction) to compensate for the numer-

ical error when solving (1). We note |v|2

M−1=v>M−1v.

With probability 1−exp(−∆+), where ∆+= max(0,∆)

and

∆ = Φ(xL)−Φ(x0) + |vL|2

M−1− |v0|2

M−1/2,

the proposal is rejected and its momentum is ﬂipped i.e. the

output is (x0,−v0). Flipping the momentum upon rejec-

tion is a technical requirement to ensure that the Metropolis

correction preserves Π∗. Overall, the momentum ﬂips are

erased since v0is fully refreshed at each iteration.

HMC prevents random walk behaviour by proposing tra-

jectories with persistent momentum, thereby exploring the

sampling space more efﬁciently. However, HMC exhibits

a phenomenon of resonance when the target Πhas hetero-

geneous scales. Typically, fast mixing for one component

can result in arbitrarily slow mixing for other components

(Riou-Durand and Vogrinc,2022). This issue makes the ef-

ﬁciency of HMC highly sensitive to the choice of its tuning

parameters.

One solution consists on drawing τat random at each itera-

tion (RHMC), thereby smoothing the auto-correlations be-

tween successive Hamiltonian trajectories (Bou-Rabee and

Sanz-Serna,2017). The choice of the sampling distribu-

tion for τis further discussed in Section 5. In this work,

we consider another approach based on partial momentum

refreshment.

L. Riou-Durand, P. Sountsov, J. Vogrinc, C.C. Margossian, S. Power

2.2 Partial Momentum Refreshment

The resonances of Hamiltonian trajectories can be reduced

by refreshing partially the velocity along their path. Kinetic

Langevin dynamics leverage this property by augmenting

(1) with a continuous momentum refreshment induced by

a Brownian motion (Bt)t≥0on Rd. These are deﬁned for a

damping parameter γ≥0(a.k.a. friction), as

dxt=M−1vtdt,

dvt=−(∇Φ(xt) + γvt) dt+p2γM1/2dBt.(3)

The damping controls the refreshment speed. As γ→0,

the SDE (3) reduces to the ODE (1) and exhibits reso-

nances. These resonances vanish as γgets larger. Increas-

ing γtoo much however decreases the persistence of the

momentum and encourages random walk behavior. This

tradeoff can be solved for a critical choice of damping. For

log-concave distributions, quantitative mixing rates have

been established by Eberle et al. (2019); Dalalyan and

Riou-Durand (2020); Cao et al. (2019). Choosing a mass

Min (3) is equivalent to preconditioning the augmented

space with the linear map (x, v)7→ (M1/2x, M−1/2v).

Langevin trajectories can be approximated numerically by

alternating between the leapfrog update (2) and a partial

momentum refreshment, deﬁned for η=e−γh as

v0=η v +p1−η2ξ, ξ ∼ Nd(0d, M).

This partial momentum refreshment leaves Π∗invariant.

To compensate for the numerical error, one strategy con-

sists in applying the Metropolis correction to each leapfrog

step. GHMC is built upon this principle (Horowitz,

1991), thereby inducing a momentum ﬂip to every rejected

leapfrog step; see Section 2.1. Unlike HMC, momentum

ﬂips are not erased with GHMC, which causes the sam-

pler to backtrack and limits its efﬁciency (Kennedy and

Pendleton,2001). This also exacerbates the problem of

tuning h, which must be small enough to prevent momen-

tum ﬂips from becoming overwhelming (Bou-Rabee and

Vanden-Eijnden,2010).

Several solutions have been proposed to improve GHMC’s

efﬁciency. Delayed rejection methods have been used to

reduce the number of momentum ﬂips at a higher compu-

tational cost (Mira,2001;Campos and Sanz-Serna,2015;

Park and Atchad´

e,2020). We highlight a recent approach

based on slice sampling (Neal,2020), which correlates the

accept-reject coin tosses over time and encourages larger

periods without momentum ﬂips.

2.3 Metropolis Adjusted Langevin Trajectories

In this work, we focus on a sampler recently proposed

by Riou-Durand and Vogrinc (2022) to bypass the prob-

lem caused by momentum ﬂips. This method consists in

applying the Metropolis correction to an entire numerical

Langevin trajectory, rather than applying it to each of its

L=dτ/heleapfrog steps. The resultant sampler, named

Metropolis Adjusted Langevin Trajectories (MALT), is de-

ﬁned by iterating Algorithm 1.

Algorithm 1 Metropolis Adjusted Langevin Trajectory

1: function MALT(X;M, γ, h, τ)

2: Set L← dτ/heand η←e−γh

3: Draw independent (ξi)i=0,...,L ∼ Nd(0d, M)

4: Set (x0, v0)←(X, ξ0)and ∆←0

5: for i= 1 to Ldo

6: Refresh v0

i−1←η vi−1+p1−η2ξi

7: Set (xi, vi)←LEAPFROG(xi−1, v0

i−1;M, h)

8: Update ∆←∆ + |vi|2

M−1− |v0

i−1|2

M−1/2

9: end for

10: Set X0←xLand ∆←∆ + Φ(xL)−Φ(x0)

11: Draw Z∼ Exp(1)

12: if Z < ∆then set X0←X

13: end if

14: return X0, vL, x0, v0

0,∆

15: end function

In this algorithm, the overall numerical error unfolds as

∆ = PL

i=1 δi, where

δi= Φ(xi)−Φ(xi−1) + |vi|2

M−1− |v0

i−1|2

M−1/2

is the energy difference incurred by the ith leapfrog step.

As a consequence, the probability of accepting an entire

trajectory for MALT is higher than the probability of ac-

cepting Lconsecutive leapfrog steps for GHMC. Indeed:

Paccept

GHMC =e−PL

i=1 δ+

i≤e−(PL

i=1 δi)+

=Paccept

MALT.

Detailed balance and ergodicity of Algorithm 1are proven

in Riou-Durand and Vogrinc (2022). Notably, the momen-

tum ﬂips get erased by fully refreshing v0∼ N(0d, M)

at the start of each trajectory. This ensures reversibility

of the sampler with respect to Π, and makes HMC a spe-

cial case of MALT obtained for γ= 0. As γgets larger,

Langevin trajectories become ergodic, thereby preventing

U-turns and resonances. Compared to GHMC, Algorithm 1

also exhibits an advantage for tuning the step-size, which

is supported by optimal scaling results derived in Riou-

Durand and Vogrinc (2022). The tuning of MALT’s pa-

rameters is further discussed in Section 3.

2.4 Adaptive Tuning Strategies

To measure the efﬁciency of a MCMC sampler, we con-

sider the problem of estimating E[ϕ(x)] as x∼Πfor a

square integrable function ϕ:Rd7→ R. The Monte Carlo

estimator, bϕN=N−1PN

n=1 ϕ(Xn), estimates the desired

expectation value, with Nthe number of approximate sam-

ples. A scale-free measurement of bϕN’s precision is given

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdaptiveTuningforMetropolisAdjustedLangevinTrajectoriesLionelRiou-DurandPavelSountsovJureVogrincUniversityofWarwickGoogleResearchUniversityofWarwickCharlesC.MargossianSamPowerFlatironInstituteUniversityofBristolAbstractHamiltonianMonteCarlo(HMC)isawidelyusedsamplerforcontinuousprobabilitydistri-buti...

展开>> 收起<<

Adaptive Tuning for Metropolis Adjusted Langevin Trajectories Lionel Riou-Durand Pavel Sountsov Jure Vogrinc University of Warwick Google Research University of Warwick.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adaptive Tuning for Metropolis Adjusted Langevin Trajectories Lionel Riou-Durand Pavel Sountsov Jure Vogrinc University of Warwick Google Research University of Warwick

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: