
L. Riou-Durand, P. Sountsov, J. Vogrinc, C.C. Margossian, S. Power
2.2 Partial Momentum Refreshment
The resonances of Hamiltonian trajectories can be reduced
by refreshing partially the velocity along their path. Kinetic
Langevin dynamics leverage this property by augmenting
(1) with a continuous momentum refreshment induced by
a Brownian motion (Bt)t≥0on Rd. These are defined for a
damping parameter γ≥0(a.k.a. friction), as
dxt=M−1vtdt,
dvt=−(∇Φ(xt) + γvt) dt+p2γM1/2dBt.(3)
The damping controls the refreshment speed. As γ→0,
the SDE (3) reduces to the ODE (1) and exhibits reso-
nances. These resonances vanish as γgets larger. Increas-
ing γtoo much however decreases the persistence of the
momentum and encourages random walk behavior. This
tradeoff can be solved for a critical choice of damping. For
log-concave distributions, quantitative mixing rates have
been established by Eberle et al. (2019); Dalalyan and
Riou-Durand (2020); Cao et al. (2019). Choosing a mass
Min (3) is equivalent to preconditioning the augmented
space with the linear map (x, v)7→ (M1/2x, M−1/2v).
Langevin trajectories can be approximated numerically by
alternating between the leapfrog update (2) and a partial
momentum refreshment, defined for η=e−γh as
v0=η v +p1−η2ξ, ξ ∼ Nd(0d, M).
This partial momentum refreshment leaves Π∗invariant.
To compensate for the numerical error, one strategy con-
sists in applying the Metropolis correction to each leapfrog
step. GHMC is built upon this principle (Horowitz,
1991), thereby inducing a momentum flip to every rejected
leapfrog step; see Section 2.1. Unlike HMC, momentum
flips are not erased with GHMC, which causes the sam-
pler to backtrack and limits its efficiency (Kennedy and
Pendleton,2001). This also exacerbates the problem of
tuning h, which must be small enough to prevent momen-
tum flips from becoming overwhelming (Bou-Rabee and
Vanden-Eijnden,2010).
Several solutions have been proposed to improve GHMC’s
efficiency. Delayed rejection methods have been used to
reduce the number of momentum flips at a higher compu-
tational cost (Mira,2001;Campos and Sanz-Serna,2015;
Park and Atchad´
e,2020). We highlight a recent approach
based on slice sampling (Neal,2020), which correlates the
accept-reject coin tosses over time and encourages larger
periods without momentum flips.
2.3 Metropolis Adjusted Langevin Trajectories
In this work, we focus on a sampler recently proposed
by Riou-Durand and Vogrinc (2022) to bypass the prob-
lem caused by momentum flips. This method consists in
applying the Metropolis correction to an entire numerical
Langevin trajectory, rather than applying it to each of its
L=dτ/heleapfrog steps. The resultant sampler, named
Metropolis Adjusted Langevin Trajectories (MALT), is de-
fined by iterating Algorithm 1.
Algorithm 1 Metropolis Adjusted Langevin Trajectory
1: function MALT(X;M, γ, h, τ)
2: Set L← dτ/heand η←e−γh
3: Draw independent (ξi)i=0,...,L ∼ Nd(0d, M)
4: Set (x0, v0)←(X, ξ0)and ∆←0
5: for i= 1 to Ldo
6: Refresh v0
i−1←η vi−1+p1−η2ξi
7: Set (xi, vi)←LEAPFROG(xi−1, v0
i−1;M, h)
8: Update ∆←∆ + |vi|2
M−1− |v0
i−1|2
M−1/2
9: end for
10: Set X0←xLand ∆←∆ + Φ(xL)−Φ(x0)
11: Draw Z∼ Exp(1)
12: if Z < ∆then set X0←X
13: end if
14: return X0, vL, x0, v0
0,∆
15: end function
In this algorithm, the overall numerical error unfolds as
∆ = PL
i=1 δi, where
δi= Φ(xi)−Φ(xi−1) + |vi|2
M−1− |v0
i−1|2
M−1/2
is the energy difference incurred by the ith leapfrog step.
As a consequence, the probability of accepting an entire
trajectory for MALT is higher than the probability of ac-
cepting Lconsecutive leapfrog steps for GHMC. Indeed:
Paccept
GHMC =e−PL
i=1 δ+
i≤e−(PL
i=1 δi)+
=Paccept
MALT.
Detailed balance and ergodicity of Algorithm 1are proven
in Riou-Durand and Vogrinc (2022). Notably, the momen-
tum flips get erased by fully refreshing v0∼ N(0d, M)
at the start of each trajectory. This ensures reversibility
of the sampler with respect to Π, and makes HMC a spe-
cial case of MALT obtained for γ= 0. As γgets larger,
Langevin trajectories become ergodic, thereby preventing
U-turns and resonances. Compared to GHMC, Algorithm 1
also exhibits an advantage for tuning the step-size, which
is supported by optimal scaling results derived in Riou-
Durand and Vogrinc (2022). The tuning of MALT’s pa-
rameters is further discussed in Section 3.
2.4 Adaptive Tuning Strategies
To measure the efficiency of a MCMC sampler, we con-
sider the problem of estimating E[ϕ(x)] as x∼Πfor a
square integrable function ϕ:Rd7→ R. The Monte Carlo
estimator, bϕN=N−1PN
n=1 ϕ(Xn), estimates the desired
expectation value, with Nthe number of approximate sam-
ples. A scale-free measurement of bϕN’s precision is given