
Molecular simulations are becoming more and
more important for studying complex phenom-
ena in physics, chemistry, and biology. One of
the long-lasting challenges for molecular simu-
lations is to efficiently sample the equilibrium
Boltzmann distribution, which is often charac-
terized by multiple metastable states. In this
letter, we propose a novel sampling method that
combines the well-established replica exchange
method1,2 with a recent machine-learning tech-
nique known as normalizing flows.3–5 To this
end, we first summarize the two original meth-
ods and discuss their strengths and weaknesses.
We then introduce flow-based exchange moves
and show with a few prototypical examples how
they help outperform classical replica exchange.
Replica exchange (REX), also known as par-
allel tempering, is a popular enhanced sam-
pling method.1,2 It uses a ladder of overlapping
distributions to connect the target distribution
one wishes to sample with an easier-to-sample
probability distribution, for example at higher
temperature, that we call here prior distribu-
tion. We keep a general notation and indicate
with q(x) = e−uq(x)/Zqthe prior distribution,
and with p(x) = e−up(x)/Zpthe target, where
upand uqare the respective reduced energies,
Zq=Re−uq(x)dxand Zp=Re−up(x)dxare par-
tition functions, and xis the system configu-
ration. A set of M+ 1 replicas of the system
are chosen to form a ladder of Boltzmann dis-
tributions pi(x)∝e−ui(x)from p0(x) = p(x)
to pM(x) = q(x), such that each pi(x) overlaps
with its neighbors in configuration space. In
the typical case of temperature expansion, one
has ui(x)=(kBTi)−1U(x), where U(x) is the
potential energy, kBthe Boltzmann constant,
and the temperatures Tiinterpolate between
the temperatures of the target and prior dis-
tributions, T0=Tlow and TM=Thigh, respec-
tively. Other kinds of expansions are also pos-
sible, such as solute tempering6or alchemical
transformations,7generally known as Hamilto-
nian replica exchange. Each replica is sampled
with local moves, such as Markov chain Monte
Carlo or molecular dynamics (MD), and at reg-
ular time intervals an exchange is proposed be-
tween the configurations of different replicas,
xixj, and accepted with probability
αREX = min 1,pj(xi)
pi(xi)
pi(xj)
pj(xj)
= min 1, e∆uij (xi)−∆uij (xj),(1)
where xiand xjare configurations sampled
from the corresponding distribution pior pj,
and ∆uij(x) = ui(x)−uj(x) is the difference
in reduced energy between replica iand replica
j. The best choice of intermediate piis not
trivial even in the well-studied case of temper-
ature REX, and several different approaches
have been proposed to optimize it.2,7 The to-
tal number of replicas Mrequired to allow ex-
changes increases both with the number of de-
grees of freedom of the system N,M∝√N,
and with the distance (e.g. in temperature) be-
tween prior and target.8
One of the main limitations of REX is that a
large number of replicas might be necessary to
connect the target and the prior distributions,
making the method computationally too expen-
sive. There are variants of REX that do not
require a fixed number of parallel simulations,
such as simulated tempering,9or expanded en-
semble methods,10,11 but they share the same
√Nscaling of sampling efficiency as REX (see
Fig. S2 of the Supporting Information). Sev-
eral approaches have been proposed to miti-
gate this scaling, such as using nonequilibrium
switches,12,13 but with limited practical success.
Two popular strategies to reduce the total num-
ber of replicas are to apply the tempering only
to part of the system,6or to broaden the dis-
tributions with metadynamics.14 We will show
how to combine REX with normalizing flows to
avoid altogether the need to simulate interme-
diate replicas.
Normalizing flows (NF) are a class of invert-
ible deep neural networks that can be used to
generate samples according to a given target
distribution, and are at the core of the recently
proposed Boltzmann generators.15 A normaliz-
ing flow fis an invertible function that maps
a configuration xdrawn from a prior distribu-
tion, q(x), into a new configuration x0=f(x)
that samples the output distribution of the flow,
q0(x), also called mapped distribution. Exploit-
2