Skipping the Replica Exchange Ladder with Normalizing Flows Michele InvernizziyAndreas Kr amery

2025-05-03 0 0 1.5MB 18 页 10玖币

侵权投诉

Skipping the Replica Exchange Ladder

with Normalizing Flows

Michele Invernizzi,∗,†Andreas Kr¨amer,†

Cecilia Clementi,‡,¶,§and Frank No´e∗,k,†,‡,¶

†Department of Mathematics and Computer Science, Freie Universit¨at Berlin, 14195

Berlin, Germany

‡Department of Physics, Freie Universit¨at Berlin, 14195 Berlin, Germany

¶Department of Chemistry, Rice University, 77005 Houston, United States

§Center for Theoretical Biological Physics, Rice University, 77005 Houston, United States

kMicrosoft Research AI4Science, 10178 Berlin, Germany

E-mail: michele.invernizzi@fu-berlin.de; franknoe@microsoft.com

Abstract

We combine replica exchange (parallel temper-

ing) with normalizing ﬂows, a class of deep gen-

erative models. These two sampling strategies

complement each other, resulting in an eﬃcient

strategy for sampling molecular systems char-

acterized by rare events, which we call learned

replica exchange (LREX). In LREX, a normal-

izing ﬂow is trained to map the conﬁgurations

of the fastest-mixing replica into conﬁgurations

belonging to the target distribution, allowing

direct exchanges between the two without the

need to simulate intermediate replicas. This

can signiﬁcantly reduce the computational cost

compared to standard replica exchange. The

proposed method also oﬀers several advantages

with respect to Boltzmann generators that di-

rectly use normalizing ﬂows to sample the tar-

get distribution. We apply LREX to some

prototypical molecular dynamics systems, high-

lighting the improvements over previous meth-

ods.

Graphical TOC Entry

Keywords

parallel tempering, Boltzmann generators, en-

hanced sampling, machine learning, neural net-

works, molecular dynamics

arXiv:2210.14104v2 [physics.comp-ph] 5 Dec 2022

Molecular simulations are becoming more and

more important for studying complex phenom-

ena in physics, chemistry, and biology. One of

the long-lasting challenges for molecular simu-

lations is to eﬃciently sample the equilibrium

Boltzmann distribution, which is often charac-

terized by multiple metastable states. In this

letter, we propose a novel sampling method that

combines the well-established replica exchange

method1,2 with a recent machine-learning tech-

nique known as normalizing ﬂows.3–5 To this

end, we ﬁrst summarize the two original meth-

ods and discuss their strengths and weaknesses.

We then introduce ﬂow-based exchange moves

and show with a few prototypical examples how

they help outperform classical replica exchange.

Replica exchange (REX), also known as par-

allel tempering, is a popular enhanced sam-

pling method.1,2 It uses a ladder of overlapping

distributions to connect the target distribution

one wishes to sample with an easier-to-sample

probability distribution, for example at higher

temperature, that we call here prior distribu-

tion. We keep a general notation and indicate

with q(x) = e−uq(x)/Zqthe prior distribution,

and with p(x) = e−up(x)/Zpthe target, where

upand uqare the respective reduced energies,

Zq=Re−uq(x)dxand Zp=Re−up(x)dxare par-

tition functions, and xis the system conﬁgu-

ration. A set of M+ 1 replicas of the system

are chosen to form a ladder of Boltzmann dis-

tributions pi(x)∝e−ui(x)from p0(x) = p(x)

to pM(x) = q(x), such that each pi(x) overlaps

with its neighbors in conﬁguration space. In

the typical case of temperature expansion, one

has ui(x)=(kBTi)−1U(x), where U(x) is the

potential energy, kBthe Boltzmann constant,

and the temperatures Tiinterpolate between

the temperatures of the target and prior dis-

tributions, T0=Tlow and TM=Thigh, respec-

tively. Other kinds of expansions are also pos-

sible, such as solute tempering6or alchemical

transformations,7generally known as Hamilto-

nian replica exchange. Each replica is sampled

with local moves, such as Markov chain Monte

Carlo or molecular dynamics (MD), and at reg-

ular time intervals an exchange is proposed be-

tween the conﬁgurations of diﬀerent replicas,

xixj, and accepted with probability

αREX = min 1,pj(xi)

pi(xi)

pi(xj)

pj(xj)

= min 1, e∆uij (xi)−∆uij (xj),(1)

where xiand xjare conﬁgurations sampled

from the corresponding distribution pior pj,

and ∆uij(x) = ui(x)−uj(x) is the diﬀerence

in reduced energy between replica iand replica

j. The best choice of intermediate piis not

trivial even in the well-studied case of temper-

ature REX, and several diﬀerent approaches

have been proposed to optimize it.2,7 The to-

tal number of replicas Mrequired to allow ex-

changes increases both with the number of de-

grees of freedom of the system N,M∝√N,

and with the distance (e.g. in temperature) be-

tween prior and target.8

One of the main limitations of REX is that a

large number of replicas might be necessary to

connect the target and the prior distributions,

making the method computationally too expen-

sive. There are variants of REX that do not

require a ﬁxed number of parallel simulations,

such as simulated tempering,9or expanded en-

semble methods,10,11 but they share the same

√Nscaling of sampling eﬃciency as REX (see

Fig. S2 of the Supporting Information). Sev-

eral approaches have been proposed to miti-

gate this scaling, such as using nonequilibrium

switches,12,13 but with limited practical success.

Two popular strategies to reduce the total num-

ber of replicas are to apply the tempering only

to part of the system,6or to broaden the dis-

tributions with metadynamics.14 We will show

how to combine REX with normalizing ﬂows to

avoid altogether the need to simulate interme-

diate replicas.

Normalizing ﬂows (NF) are a class of invert-

ible deep neural networks that can be used to

generate samples according to a given target

distribution, and are at the core of the recently

proposed Boltzmann generators.15 A normaliz-

ing ﬂow fis an invertible function that maps

a conﬁguration xdrawn from a prior distribu-

tion, q(x), into a new conﬁguration x0=f(x)

that samples the output distribution of the ﬂow,

q0(x), also called mapped distribution. Exploit-

ing the invertibility of f, one can compute the

output probability density as:

q0(x0) = q(x)|det Jf(x)|−1,(2)

where det Jf(x) is the determinant of the Jaco-

bian of fand |det Jf(x)|−1=|det Jf−1(x0)|.

The aim of NF is to approximate the ideal

map which transforms the prior into the tar-

get, thus q0(x) = p(x). However, even with-

out a perfect map, one can compute the impor-

tance weights wf(x) needed to reweight from

q0to pthe ensemble average of any observable

O(x), hO(x)ip=hO(f(x))wf(x)iq/hwf(x)iq, as

discussed in Ref. 15. We choose to deﬁne the

weights as

wf(x) = euq(x)−up(f(x))+log |det Jf(x)|,(3)

so that wf(x)∝p(x)/q0(x), where the precise

proportionality constant is irrelevant for the

purpose of reweighting. In Boltzmann gener-

ators, one typically chooses as prior qa normal

or uniform distribution, so that independent

and identically distributed samples can be used

to estimate the ensemble averages h·iq. The

weights are also useful to assess how eﬀective

the fmapping is in increasing the overlap be-

tween qand p, for example through the Kish

eﬀective sample size,16

neﬀ =[Pn

iwf(xi)]2

i[wf(xi)]2.(4)

In NF, the mapping fis implemented by an in-

vertible deep neural network that by construc-

tion has an easy-to-calculate det Jf. It can be

trained to learn the target probability distribu-

tion p(x) by minimizing the Kullback-Leibler

divergence:

DKL(q0kp) = −Zq0(x0) log p(x0)

q0(x0)dx0

=−Zq(x) log wf(x)dx−∆Fpq ,

(5)

where the second line has been obtained via the

change of variables x0=f(x), and ∆Fpq =

−log Zq

Zpis the free energy diﬀerence between

qand pthat is unknown but independent of

f. The loss function then simpliﬁes to Lf=

−hlog wf(x)iq, which in Ref. 15 is referred to as

energy-based training, as opposed to maximum-

likelihood training that instead requires sam-

pling the target distribution p(x) and minimiz-

ing DKL(pkq0). The loss function is an upper

bound to the free energy diﬀerence Lf≥∆Fpq,

with the identity holding only in case of the

ideal map.17

The main limitations of NF are that they

might not be expressive enough to represent the

complex map from the prior to the target, or

that they can be hard to train in practice. In

particular, energy-based training is character-

ized by mode-seeking behavior, and struggles

to converge reliably when the target distribu-

tion is multimodal.18

In this letter, we propose to use normaliz-

ing ﬂows in a replica exchange setting to map

the fast-mixing prior distribution to the target

Boltzmann distribution, giving rise to the phase

space overlap necessary for direct exchange,

without the need to sample intermediate dis-

tributions. The idea of using a map to improve

the overlap between distributions was ﬁrst pro-

posed by Jarzynski under the name of targeted

free energy perturbation,19 and has recently

been combined with NF in the learned free en-

ergy perturbation (LFEP) method.17,20–22 By

analogy, we will refer to our method as learned

replica exchange (LREX). Contrary to LFEP,

in LREX we combine the NF mapping with lo-

cal moves, which can greatly improve the sam-

pling, as shown by several recent works.18,23–26

Our approach can also be seen as a type of

Boltzmann generator,15 with two main diﬀer-

ences: (i) the prior is a nontrivial distribution

sampled via MD, and (ii) the target is directly

sampled in a replica exchange setting, rather

than only reconstructed via reweighting.

We now present the proposed LREX method

in detail. To perform LREX, we ﬁrst run a

relatively short MD simulation to gather sam-

ples from the prior distribution q(x), and use

them to train the NF. Following Ref. 17, the

NF is initialized to be the identity, thus x0=x

and q0(x) = q(x). Ideally, the prior distribu-

tion should be easy to sample while still ex-

hibiting the main features of the target, such as

all the relevant metastable basins. This choice

of prior allows us to avoid most of the prob-

lems related to energy-based training of normal-

izing ﬂows,15,27 and signiﬁcantly reduces com-

putational cost. Training in the LREX setup

typically requires only a few epochs and does

not suﬀer from mode collapse, nor from nu-

merical instabilities linked to extremely high

energies that can arise from atom clashes and

other nonphysical conﬁgurations in the prior.

While the NF is training, it is possible to esti-

mate its eﬃciency in increasing the phase space

overlap between prior and target by comput-

ing neﬀ, Eq. (4), over the training set and/or

a validation set of nprior samples. The sam-

pling eﬃciency neﬀ/n also provides an estimate

of the frequency with which exchanges will be

accepted in the ﬁnal LREX run. Once the NF

is trained, the prior and the target systems are

simulated in parallel with MD and, at ﬁxed time

intervals, an exchange between mapped conﬁg-

urations is attempted according to the following

probability:

αLREX = min 1,p(x0

q0(x0

q0(xp)

p(xp)

= min {1, wf(xq)wf−1(xp)}(6)

where xpand xqare the current conﬁgurations

of the prior and target replica, respectively, and

wf−1are the weights of the inverse mapping, de-

ﬁned as in Eq. (3). If the exchange is accepted,

the MD simulation of the prior continues from

the conﬁguration x0

q=f(xq), while the target

continues from f−1(xp). New velocities are ran-

domly assigned from the respective equilibrium

distributions.

To demonstrate the advantages of the LREX

approach over standard REX and Boltzmann

generators, we present three examples. The

ﬁrst system considered is a particle moving with

Langevin dynamics in a N-dimensional double-

well potential, shown in Fig. 1a. The ﬁrst two

dimensions, with coordinates x1and x2, feel the

2D potential introduced in Ref. 28, while all the

other N−2 are subject to a harmonic potential

(details in the SI). Target and prior distribution

are obtained from the same system, but at dif-

ferent temperatures. The target distribution is

at reduced temperature Tlow = 1, where transi-

tions between the two basins are extremely rare,

while the prior has Thigh = 5, which instead can

be sampled eﬃciently also with a short MD run.

Figure 1c shows a visual comparison of stan-

dard REX (top row) and the proposed LREX

method (bottom row). In replica exchange,

the number of intermediate temperatures grows

with the system size, independently of the fact

that the added degrees of freedom are trivial

Gaussian noise. Moreover, as the number of

replicas increases, the mixing time required to

equilibrate the simulation also increases, fur-

ther limiting the eﬃciency of the method.29

In LREX, instead, only the highest and lowest

temperatures need to be sampled, because the

NF easily learns a transformation that makes

the high-temperature conﬁguration space over-

lap with the low-temperature one. The NF ar-

chitecture and training procedure are kept iden-

tical for all sizes, and are described in detail in

the supporting information (SI). Although the

size of the neural network increases with N, the

slowdown in training time and eﬃciency is mi-

nor, thus the overall scaling of LREX with sys-

tem size is more favorable than the one of stan-

dard REX, for all the systems we studied.30

Figure 1b presents the sampling eﬃciency of

REX and LREX as a function of system size

N. Other expanded ensemble methods would

have similar scaling as REX (see Fig. S2 in SI).

The sampling eﬃciency is estimated by consid-

ering the importance weights of all the replicas

and is an upper bound estimate strictly valid

only in the inﬁnite simulation limit (see SI). In

a ﬁnite REX simulation, the eﬃciency can be

further reduced by long time correlations due

to a slow mixing time or small acceptance rate

(see Fig. S1 in SI). We also do not include the

training cost for LREX, which would shift the

curve down by a ﬁxed small amount, depending

on the total simulation length.

Next, we consider alanine dipeptide in

vacuum. This molecule has two long-lived

metastable basins that can be identiﬁed by its φ

torsion angle. The target is to sample the sys-

tem at Tlow = 300 K, and as prior we consider

a very high temperature, Thigh = 1000 K, at

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SkippingtheReplicaExchangeLadderwithNormalizingFlowsMicheleInvernizzi,,yAndreasKramer,yCeciliaClementi,z,{,xandFrankNoe,k,y,z,{yDepartmentofMathematicsandComputerScience,FreieUniversitatBerlin,14195Berlin,GermanyzDepartmentofPhysics,FreieUniversitatBerlin,14195Berlin,Germany{DepartmentofChemis...

展开>> 收起<<

Skipping the Replica Exchange Ladder with Normalizing Flows Michele InvernizziyAndreas Kr amery.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Skipping the Replica Exchange Ladder with Normalizing Flows Michele InvernizziyAndreas Kr amery

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: