Skipping the Replica Exchange Ladder with Normalizing Flows Michele InvernizziyAndreas Kr amery

2025-05-03 0 0 1.5MB 18 页 10玖币
侵权投诉
Skipping the Replica Exchange Ladder
with Normalizing Flows
Michele Invernizzi,,Andreas Kr¨amer,
Cecilia Clementi,,,§and Frank No´e,k,,,
Department of Mathematics and Computer Science, Freie Universit¨at Berlin, 14195
Berlin, Germany
Department of Physics, Freie Universit¨at Berlin, 14195 Berlin, Germany
Department of Chemistry, Rice University, 77005 Houston, United States
§Center for Theoretical Biological Physics, Rice University, 77005 Houston, United States
kMicrosoft Research AI4Science, 10178 Berlin, Germany
E-mail: michele.invernizzi@fu-berlin.de; franknoe@microsoft.com
Abstract
We combine replica exchange (parallel temper-
ing) with normalizing flows, a class of deep gen-
erative models. These two sampling strategies
complement each other, resulting in an efficient
strategy for sampling molecular systems char-
acterized by rare events, which we call learned
replica exchange (LREX). In LREX, a normal-
izing flow is trained to map the configurations
of the fastest-mixing replica into configurations
belonging to the target distribution, allowing
direct exchanges between the two without the
need to simulate intermediate replicas. This
can significantly reduce the computational cost
compared to standard replica exchange. The
proposed method also offers several advantages
with respect to Boltzmann generators that di-
rectly use normalizing flows to sample the tar-
get distribution. We apply LREX to some
prototypical molecular dynamics systems, high-
lighting the improvements over previous meth-
ods.
Graphical TOC Entry
Keywords
parallel tempering, Boltzmann generators, en-
hanced sampling, machine learning, neural net-
works, molecular dynamics
1
arXiv:2210.14104v2 [physics.comp-ph] 5 Dec 2022
Molecular simulations are becoming more and
more important for studying complex phenom-
ena in physics, chemistry, and biology. One of
the long-lasting challenges for molecular simu-
lations is to efficiently sample the equilibrium
Boltzmann distribution, which is often charac-
terized by multiple metastable states. In this
letter, we propose a novel sampling method that
combines the well-established replica exchange
method1,2 with a recent machine-learning tech-
nique known as normalizing flows.3–5 To this
end, we first summarize the two original meth-
ods and discuss their strengths and weaknesses.
We then introduce flow-based exchange moves
and show with a few prototypical examples how
they help outperform classical replica exchange.
Replica exchange (REX), also known as par-
allel tempering, is a popular enhanced sam-
pling method.1,2 It uses a ladder of overlapping
distributions to connect the target distribution
one wishes to sample with an easier-to-sample
probability distribution, for example at higher
temperature, that we call here prior distribu-
tion. We keep a general notation and indicate
with q(x) = euq(x)/Zqthe prior distribution,
and with p(x) = eup(x)/Zpthe target, where
upand uqare the respective reduced energies,
Zq=Reuq(x)dxand Zp=Reup(x)dxare par-
tition functions, and xis the system configu-
ration. A set of M+ 1 replicas of the system
are chosen to form a ladder of Boltzmann dis-
tributions pi(x)eui(x)from p0(x) = p(x)
to pM(x) = q(x), such that each pi(x) overlaps
with its neighbors in configuration space. In
the typical case of temperature expansion, one
has ui(x)=(kBTi)1U(x), where U(x) is the
potential energy, kBthe Boltzmann constant,
and the temperatures Tiinterpolate between
the temperatures of the target and prior dis-
tributions, T0=Tlow and TM=Thigh, respec-
tively. Other kinds of expansions are also pos-
sible, such as solute tempering6or alchemical
transformations,7generally known as Hamilto-
nian replica exchange. Each replica is sampled
with local moves, such as Markov chain Monte
Carlo or molecular dynamics (MD), and at reg-
ular time intervals an exchange is proposed be-
tween the configurations of different replicas,
xixj, and accepted with probability
αREX = min 1,pj(xi)
pi(xi)
pi(xj)
pj(xj)
= min 1, euij (xi)uij (xj),(1)
where xiand xjare configurations sampled
from the corresponding distribution pior pj,
and ∆uij(x) = ui(x)uj(x) is the difference
in reduced energy between replica iand replica
j. The best choice of intermediate piis not
trivial even in the well-studied case of temper-
ature REX, and several different approaches
have been proposed to optimize it.2,7 The to-
tal number of replicas Mrequired to allow ex-
changes increases both with the number of de-
grees of freedom of the system N,MN,
and with the distance (e.g. in temperature) be-
tween prior and target.8
One of the main limitations of REX is that a
large number of replicas might be necessary to
connect the target and the prior distributions,
making the method computationally too expen-
sive. There are variants of REX that do not
require a fixed number of parallel simulations,
such as simulated tempering,9or expanded en-
semble methods,10,11 but they share the same
Nscaling of sampling efficiency as REX (see
Fig. S2 of the Supporting Information). Sev-
eral approaches have been proposed to miti-
gate this scaling, such as using nonequilibrium
switches,12,13 but with limited practical success.
Two popular strategies to reduce the total num-
ber of replicas are to apply the tempering only
to part of the system,6or to broaden the dis-
tributions with metadynamics.14 We will show
how to combine REX with normalizing flows to
avoid altogether the need to simulate interme-
diate replicas.
Normalizing flows (NF) are a class of invert-
ible deep neural networks that can be used to
generate samples according to a given target
distribution, and are at the core of the recently
proposed Boltzmann generators.15 A normaliz-
ing flow fis an invertible function that maps
a configuration xdrawn from a prior distribu-
tion, q(x), into a new configuration x0=f(x)
that samples the output distribution of the flow,
q0(x), also called mapped distribution. Exploit-
2
ing the invertibility of f, one can compute the
output probability density as:
q0(x0) = q(x)|det Jf(x)|1,(2)
where det Jf(x) is the determinant of the Jaco-
bian of fand |det Jf(x)|1=|det Jf1(x0)|.
The aim of NF is to approximate the ideal
map which transforms the prior into the tar-
get, thus q0(x) = p(x). However, even with-
out a perfect map, one can compute the impor-
tance weights wf(x) needed to reweight from
q0to pthe ensemble average of any observable
O(x), hO(x)ip=hO(f(x))wf(x)iq/hwf(x)iq, as
discussed in Ref. 15. We choose to define the
weights as
wf(x) = euq(x)up(f(x))+log |det Jf(x)|,(3)
so that wf(x)p(x)/q0(x), where the precise
proportionality constant is irrelevant for the
purpose of reweighting. In Boltzmann gener-
ators, one typically chooses as prior qa normal
or uniform distribution, so that independent
and identically distributed samples can be used
to estimate the ensemble averages h·iq. The
weights are also useful to assess how effective
the fmapping is in increasing the overlap be-
tween qand p, for example through the Kish
effective sample size,16
neff =[Pn
iwf(xi)]2
Pn
i[wf(xi)]2.(4)
In NF, the mapping fis implemented by an in-
vertible deep neural network that by construc-
tion has an easy-to-calculate det Jf. It can be
trained to learn the target probability distribu-
tion p(x) by minimizing the Kullback-Leibler
divergence:
DKL(q0kp) = Zq0(x0) log p(x0)
q0(x0)dx0
=Zq(x) log wf(x)dxFpq ,
(5)
where the second line has been obtained via the
change of variables x0=f(x), and ∆Fpq =
log Zq
Zpis the free energy difference between
qand pthat is unknown but independent of
f. The loss function then simplifies to Lf=
−hlog wf(x)iq, which in Ref. 15 is referred to as
energy-based training, as opposed to maximum-
likelihood training that instead requires sam-
pling the target distribution p(x) and minimiz-
ing DKL(pkq0). The loss function is an upper
bound to the free energy difference LfFpq,
with the identity holding only in case of the
ideal map.17
The main limitations of NF are that they
might not be expressive enough to represent the
complex map from the prior to the target, or
that they can be hard to train in practice. In
particular, energy-based training is character-
ized by mode-seeking behavior, and struggles
to converge reliably when the target distribu-
tion is multimodal.18
In this letter, we propose to use normaliz-
ing flows in a replica exchange setting to map
the fast-mixing prior distribution to the target
Boltzmann distribution, giving rise to the phase
space overlap necessary for direct exchange,
without the need to sample intermediate dis-
tributions. The idea of using a map to improve
the overlap between distributions was first pro-
posed by Jarzynski under the name of targeted
free energy perturbation,19 and has recently
been combined with NF in the learned free en-
ergy perturbation (LFEP) method.17,20–22 By
analogy, we will refer to our method as learned
replica exchange (LREX). Contrary to LFEP,
in LREX we combine the NF mapping with lo-
cal moves, which can greatly improve the sam-
pling, as shown by several recent works.18,23–26
Our approach can also be seen as a type of
Boltzmann generator,15 with two main differ-
ences: (i) the prior is a nontrivial distribution
sampled via MD, and (ii) the target is directly
sampled in a replica exchange setting, rather
than only reconstructed via reweighting.
We now present the proposed LREX method
in detail. To perform LREX, we first run a
relatively short MD simulation to gather sam-
ples from the prior distribution q(x), and use
them to train the NF. Following Ref. 17, the
NF is initialized to be the identity, thus x0=x
and q0(x) = q(x). Ideally, the prior distribu-
tion should be easy to sample while still ex-
3
hibiting the main features of the target, such as
all the relevant metastable basins. This choice
of prior allows us to avoid most of the prob-
lems related to energy-based training of normal-
izing flows,15,27 and significantly reduces com-
putational cost. Training in the LREX setup
typically requires only a few epochs and does
not suffer from mode collapse, nor from nu-
merical instabilities linked to extremely high
energies that can arise from atom clashes and
other nonphysical configurations in the prior.
While the NF is training, it is possible to esti-
mate its efficiency in increasing the phase space
overlap between prior and target by comput-
ing neff, Eq. (4), over the training set and/or
a validation set of nprior samples. The sam-
pling efficiency neff/n also provides an estimate
of the frequency with which exchanges will be
accepted in the final LREX run. Once the NF
is trained, the prior and the target systems are
simulated in parallel with MD and, at fixed time
intervals, an exchange between mapped config-
urations is attempted according to the following
probability:
αLREX = min 1,p(x0
q)
q0(x0
q)
q0(xp)
p(xp)
= min {1, wf(xq)wf1(xp)}(6)
where xpand xqare the current configurations
of the prior and target replica, respectively, and
wf1are the weights of the inverse mapping, de-
fined as in Eq. (3). If the exchange is accepted,
the MD simulation of the prior continues from
the configuration x0
q=f(xq), while the target
continues from f1(xp). New velocities are ran-
domly assigned from the respective equilibrium
distributions.
To demonstrate the advantages of the LREX
approach over standard REX and Boltzmann
generators, we present three examples. The
first system considered is a particle moving with
Langevin dynamics in a N-dimensional double-
well potential, shown in Fig. 1a. The first two
dimensions, with coordinates x1and x2, feel the
2D potential introduced in Ref. 28, while all the
other N2 are subject to a harmonic potential
(details in the SI). Target and prior distribution
are obtained from the same system, but at dif-
ferent temperatures. The target distribution is
at reduced temperature Tlow = 1, where transi-
tions between the two basins are extremely rare,
while the prior has Thigh = 5, which instead can
be sampled efficiently also with a short MD run.
Figure 1c shows a visual comparison of stan-
dard REX (top row) and the proposed LREX
method (bottom row). In replica exchange,
the number of intermediate temperatures grows
with the system size, independently of the fact
that the added degrees of freedom are trivial
Gaussian noise. Moreover, as the number of
replicas increases, the mixing time required to
equilibrate the simulation also increases, fur-
ther limiting the efficiency of the method.29
In LREX, instead, only the highest and lowest
temperatures need to be sampled, because the
NF easily learns a transformation that makes
the high-temperature configuration space over-
lap with the low-temperature one. The NF ar-
chitecture and training procedure are kept iden-
tical for all sizes, and are described in detail in
the supporting information (SI). Although the
size of the neural network increases with N, the
slowdown in training time and efficiency is mi-
nor, thus the overall scaling of LREX with sys-
tem size is more favorable than the one of stan-
dard REX, for all the systems we studied.30
Figure 1b presents the sampling efficiency of
REX and LREX as a function of system size
N. Other expanded ensemble methods would
have similar scaling as REX (see Fig. S2 in SI).
The sampling efficiency is estimated by consid-
ering the importance weights of all the replicas
and is an upper bound estimate strictly valid
only in the infinite simulation limit (see SI). In
a finite REX simulation, the efficiency can be
further reduced by long time correlations due
to a slow mixing time or small acceptance rate
(see Fig. S1 in SI). We also do not include the
training cost for LREX, which would shift the
curve down by a fixed small amount, depending
on the total simulation length.
Next, we consider alanine dipeptide in
vacuum. This molecule has two long-lived
metastable basins that can be identified by its φ
torsion angle. The target is to sample the sys-
tem at Tlow = 300 K, and as prior we consider
a very high temperature, Thigh = 1000 K, at
4
摘要:

SkippingtheReplicaExchangeLadderwithNormalizingFlowsMicheleInvernizzi,,yAndreasKramer,yCeciliaClementi,z,{,xandFrankNoe,k,y,z,{yDepartmentofMathematicsandComputerScience,FreieUniversitatBerlin,14195Berlin,GermanyzDepartmentofPhysics,FreieUniversitatBerlin,14195Berlin,Germany{DepartmentofChemis...

展开>> 收起<<
Skipping the Replica Exchange Ladder with Normalizing Flows Michele InvernizziyAndreas Kr amery.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.5MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注