Computing non-equilibrium trajectories by a deep learning approach_2

2025-05-01 0 0 4.75MB 32 页 10玖币
侵权投诉
Computing non-equilibrium trajectories by a deep learning approach
Eric Simonnet
aINPHYNI, Universit´e Cˆote d’Azur et CNRS, UMR 7010, 1361, route des Lucioles, 06560, Valbonne, France
Abstract
Predicting the occurence of rare and extreme events in complex systems is a well-known problem in
non-equilibrium physics. These events can have huge impacts on human societies. New approaches
have emerged in the last ten years, which better estimate tail distributions. They often use large
deviation concepts without the need to perform heavy direct ensemble simulations. In particular, a
well-known approach is to derive a minimum action principle and to find its minimizers.
The analysis of rare reactive events in non-equilibrium systems without detailed balance is no-
toriously difficult either theoretically and computationally. They are described in the limit of small
noise by the Freidlin-Wentzell action. We propose here a new method which minimizes the geomet-
rical action instead using neural networks: it is called deep gMAM. It relies on a natural and simple
machine-learning formulation of the classical gMAM approach. We give a detailed description of the
method as well as many examples. These include bimodal switches in complex stochastic (partial)
differential equations, quasi-potential estimates, and extreme events in Burgers turbulence.
Keywords: Freidlin-Wentzell large deviation theory, instantons, geometrical action, gMAM,
quasi-potential, Neural Networks
1. Introduction
Understanding rare and extreme events in complex models has become a cornerstone of modern
non-equilibrium physics. In many cases, the occurrence of such events has strong impacts on our
daily life and cannot be ignored despite their small probability of occurrence. A well-known example
is of course climate change and its impact at the regional scale. At the level of biological systems,
dramatic phenomena can be tied to very small unexpected changes, in biochemistry, molecular
dynamics, with protein bindings/foldings for instance. Remarkably, these events are predictable:
they always conspire following the same path associated with some well-defined probability. In fact,
small fluctuations either random or deterministic can drive systems to different unknown equilibria
provided some energy barrier is reached. This is the Arrhenius law. It is a very simple instance of a
large deviation principle (LDP) which expresses the property that a quantity (e.g. a probability) is
behaving as log Pr Vwhen some parameter becomes small. Such LDP appears to be much
more general than the original Arrhenius law and is not restrictive to the existence of a free energy
potential V. In general situations Vis then called the quasi-potential or rate function. However,
understanding non-equilibrium systems which do not have a detailed balance turns out to be a very
challenging task, not only theoretically, but computationally as well as.
The natural way to tackle these problems is large deviation theory as it provides a natural and
powerful framework for non-equilibrium statistical physics [67]. In the context of metastability, the
theory has been developed by Freidlin and Wentzell in the 70s-80s [19] as a large deviation approach
to perturbed dynamical systems although these ideas were already known by the physicists with
Onsager and Machlup [48]. Noteworthy, there is another path to large deviations which is called
the Martin-Siggia-Rose-Jansen-de Dominicis (MSRJD) formalism developed in the 70s ([27] for a
Email address: eric.simonnet@inphyni.cnrs.fr (Eric Simonnet)
October 11, 2022
arXiv:2210.04042v1 [physics.comp-ph] 8 Oct 2022
presentation). Once a minimum action principle is obtained, it remains to identify the minimizers.
This is in general a very difficult optimisation problem as it involves complex quantities related to
the nontrivial nonlinear interactions with noise. It turns out that minimizing the Freidlin-Wentzell
action directly is also very difficult due to the long time it takes for the fluctuations to build up
an optimal path history. It translates mathematically as an optimisation problem in the large time
limit which turns out to be ill-conditioned. A key step is therefore to identify an equivalent problem
where time has been reparametrized. It gives the so-called geometrical action [69]. This has led many
authors to consider the so-called gMAM approach bringing essential advantages over the original
form. How to minimize this action efficiently using neural networks and machine-learning techniques
is the subject of this work.
This work is organized as follow. We first recall fundamental notions of the Freidlin-Wentzell
large deviation theory and the concept of geometrical action in Section 2. We then describe in Section
3 how to adapt these problems to the machine-learning context and give a detailed description of
the deep gMAM approach and what it brings compared to the classical one. Section 4 gives simple
introductory examples of stochastic differential equations (SDEs). It also illustrates as a byproduct
a very simple way to compute the quasi-potential for general SDEs. Section 5 considers more
challenging examples of stochastic partial differential equations (SPDEs). We then illustrate the
deep gMAM method in a very different context of extreme events in Burgers turbulence in Section
6. We conclude in Section 7. Appendix 8.1 discusses a striking example of a deterministic chaotic
system and 8.2 provides a simple snippet Julia code of deep gMAM for the SDE case.
2. Definitions and generalities
2.1. Freidlin-Wentzell large deviation theory
We consider the following stochastic (partial) differential equation
du =F(u)dt +σ(u)dWt, u H(1)
where  > 0, Wtis a Wiener process, His an ad-hoc Hilbert functional space, u: (0, T )×D Rp:
(t, x)7→ u(t, x) with D Rdis the spatial domain and uis a vector field (u1,··· , up). For simplicity,
we address the scalar fields only with p= 1. The correlation tensor (or diffusion matrix/operator)
is
χ=σσ?.
The inner product is denoted ,·i and the corresponding norm is ||u|| = (hu, ui)1
2. We are
interested in the probability Pr to have a time-Ttrajectory (possibly with T→ ∞) connecting
two distinct states aand bwhen 0. It is well-known from Friedlin-Wentzell theory [19] that it
satisfies the large deviation principle (LDP)
lim
0log Pr = inf
u∈C ST[u],(2)
where the set of trajectories starting from aand ending to bat time Tare denoted
C ≡ Ca,b,T ={u: (t, x)7→ u(t, x)), u(0,x) = a(x), u(T, x) = b(x)},(3)
and STis the so-called Freidlin-Wentzell action:
ST[u] = 1
2ZT
0h( ˙uF(u), χ1( ˙uF(u))idt. (4)
The argmin of the right-hand side (2) is called in the literature an instanton path. In the context
where a, b are local attractors for the deterministic dynamics ((1) with = 0), the (global) argmin
solution is also often called a non-equilibrium transition or maximum likelihood path. The minimum
2
action viewed as a function of aand bis called the quasi-potential [19, 8], the definition in fact
involves another minimisation w.r.t. T:
V:H×HR,(a, b)7→ inf
Tinf
u∈Ca,b,T
ST[u].(5)
Strictly speaking, FW theorem applies for finite-dimensional systems only and is very generic
with mild conditions, e.g. continuity hypothesis. More specifically, the diffusion matrix χmust be
uniformly non-degenerate: hp, χpi ≥ c||p||2, c > 0 for a LDP to apply (see [19], chap. 5.3). If not
the BV constraints (3) cannot be satisfied.
The case of SPDEs is mathematically much more difficult to handle. A well-known historical
example is a rigorous LDP proof for the Ginzburg-Landau 1-D PDE [18]. We thus consider all the
following at a formal level only. We will not write explicitly the dependency on xunless needed. A
more general formulation is to consider the minimum action problem as
inf
u∈C ZT
0
L(u, ˙u)dt with L(u, v) = sup
p
(hv, pi − H(u, p)),(6)
where His the Hamiltonian. It can take different forms depending on the class of problems and
Lis its Legendre transform. The advantage of this formulation is its generality, in particular when
the Lagrangian has no closed-form expression (see an example in [19], chap. 5.3 and also [33]).
Moreover, one does not need to invert χ. In this work however, we focus on diffusion problems (1)
only, in which case the Hamiltonian takes an explicit form
H(u, p) = hF(u), pi+1
2hp, χpi.(7)
In the following, we propose an equivalent formulation of (6). Instead of considering the Lagrangian
(4): L(u, ˙u) = 1
2h˙uF(u), χ1( ˙uF(u))i, we consider the constrained problem:
inf
u∈C
1
2ZT
0hp, χpidt, χp = ˙uF(u).(8)
The general idea of the proposed deep gMAM method, is to solve (8) by a penalisation method. We
can therefore handle more complicated situations where χ1cannot be expressed easily by enriching
the cost functional by an additional penalty term. We recall that the notation χp means the operator
χapplied to the field p. This formulation is very natural, a similar approach can also be found in [59].
We will provide a highly nontrivial PDE example with a complicated χin section 6. We next
describe the so-called geometrical action.
2.2. Geometrical action
It is often the case that one is interested in minimizing STw.r.t. Tas well. It is in general
equivalent to consider the limit T→ ∞. This situation typically happens in the context of transitions
from one equilibrium state (or loosely speaking a local attractor) to another, the transition takes
an infinite amount of time to escape from the starting equilibrium. The minimisation problem is
therefore
inf
T >0inf
u∈C
1
2ZT
0||˙uF(u)||2
χ1dt,
with ||f||χ1≡ hf, χ1fithe induced covariance norm, see (4). In order to avoid the double minimi-
sation problem, it is possible to derive an equivalent formulation which removes the time constraint as
shown by [33] (see also [20]). The idea is the following. Using ||u||2+||v||22||u|| ||v|| with equality
when ||u|| =||v||, one has 1
2ZT
0||˙uF(u)||2
χ1dt ZT
0||˙u||χ1||F(u)||χ1− h˙u, F (u)iχ1dt. The
remarkable observation is that this inequality is indeed an equality. One can choose a reparametriza-
tion of time say γ(t)0 such that γ(0) = 0 and γ(T) = T, the change of variable t=γ(τ) yields
3
ZT
0||˙u||χ1||F(u)||χ1− h˙u, F (u)iχ1dt =ZT
0||u0||χ1||F(u)||χ1− hu0, F (u)iχ1, with u0=
du/dτ. The trick is now to choose τsuch that ||u0||χ1=||F(u)||χ1. This is always possible, by
taking γ0=||˙u||χ1/||F(u)||χ1. The minimisation problem is equivalent to (changing the notation
say ˙u=du/dτ )
inf
u∈G Z1
0||˙u||χ1||F(u)||χ1− h˙u, F (u)iχ1, G ≡ {u:τu(τ), u(0) = a, u(1) = b}.(9)
It is therefore the location on the curve which prevails, the way it is parametrized does not matter.
A more rigorous proof can be found in [33].
In practice, when χhas a complicated form, ie when χ1has no explicit formulation, we just
formulate the same problem like in (8) avoiding χ1, namely solving the constraint problem
inf
u∈G Z1
0
(||v||χ||w||χ− hv, wiχ), χv = ˙u, and χw =F(u).(10)
and its penalisation (quadratic) formulation:
inf
u∈G,v,w Z1
0
(||v||χ||w||χ− hv, wiχ)+γvZ1
0||χv ˙u||2+γwZ1
0||χw F(u)||2. (11)
The penalisation parameters γv, γw1 must be large enough but at the same time not too large
to avoid dealing with an ill-conditioned functional. The norm involved in the geometrical action is
the χ-norm, by contrast with the two constraints on (v, w) involving some user-defined one, e.g. an
Euclidian weighted norm. The treatment of (10) using (11) is not the only possibility, other strategies
are possible, e.g. augmented Lagrangian methods (see [35] in the machine-learning context applied
to elliptic and eigenvalue problems, and [59] using some classical approach). One can also consider
the minimax original problem (6) using adversarial networks. In the context of machine-learning
however, (11) appears to be the simplest strategy giving very good results. This is explained in
details in the next section 3.
3. Deep gMAM method
We describe here in more details how to use Neural Networks (NNs) for solving (11). The general
idea is to parametrize the argmin solution of (11) by some NNs. In doing so, we already make a
choice of working into the physical space-time domain (t, x)(0,1) ×D. Due to that, convolutional
neural networks are not appropriated in this context. It amounts to the fact that one has access
to a macroscopic description of the action and takes full advantage of it. The broad philosophy of
the proposed approach is therefore the same than for the so-called physics-informed neural networks
(PINNs), [52, 64, 72] except that the system to solve is more involved with possibly much more
complicated constraints. We will discuss the technical aspects below in the subsections 3.1–3.6 and
summarize what it can bring in the last subsection 3.7.
3.1. Neural Network parametrization and cost functional
We consider here one of the simplest neural architecture, the so-called fully-connected feedforward
NNs. Let N:Rd+1 ×RnθR,(t, x,θ)7→ N(t, x;θ), where θis the vector of nθparameters. The
function Nis defined as
N(t, x;θ) = Nout ◦ NL···N1◦ Nin(t, x),
where is the composition and Nk(y) = σk(Wky+bk). The parameters bkRlkare called NN
biases and the (lk×ck) matrices Wkare called NN weights. The activation functions σkmust
be nonlinear. In the examples discussed later, the dimensions are chosen independent of kwith
lk=ck=c. The number of hidden layers Lis the depth, whereas cis the network capacity. In the
4
following, we consider only swish activation functions [53]. They correspond to some non-monotonic
version of the classical RELU functions max(0, x), with σ(x) = x/(1 + ex). Other choices are of
course possible, such as RELU or tanh but in practice the swish nonlinearity gives very good results
and at the same time smoother representations. The last layer Nout is in general linear. The depth
Land care important parameters. The expressivity of Nbecomes better as Land cincreases but
at the same time, it becomes harder to train, ie finding ad-hoc optimal values for θ.
As stated above, we now replace the main and auxiliary fields (u, v, w) in (11) by their NN parametriza-
tions u→ Nu,v→ Nvand w→ Nwwith parameters θu,θv,θw. The cost functional is then
minimized with respect to these parameters: we have thus performed a nonlinear projection onto
the neural network space. This is indeed a very general methodology in machine-learning. The cost
functional is written as
C[θu,θv,θw] = γgCg[θv,θw] + γarcCarc[θu] + Cconstraints + (γbcsCBCs.).(12)
The first term corresponds to the geometrical action in its continuous form:
Cg[θv,θw] = Z1
0
(||Nv||χ||Nv||χ− hNv,Nwiχ), ||N||χZDN(t, x) (χN)(t, x)dx1
2
.(13)
An important issue is to insure that when ||·||χis discretized, one still has the properties of a norm,
e.g. definite positive and Cauchy-Schwarz inequality. It is in general trivial (e.g. L2l2) but it can
be more tricky if χis complicated: a nontrivial case is discussed in subsection 3.6. We now describe
the other functionals in the next subsections.
3.2. Boundary value ansatz
The cost functional (12) must take into account the boundary value (BV) problem, namely that
Nu(0,x) = a(x) and Nu(1,x) = b(x) where a, b are given. There are two strategies: either imposing
these constraints explicitly by penalisation or considering some ansatz for Nuwhich automatically
include them. The second choice means that one uses the general ansatz
U(t, x;θu)=Λa(t)a(x)+Λb(t)b(x)+Λu(t)Nu(t, x;θu),(14)
where Λa(0) = 1, Λa(1) = 0, Λb(0) = 0, Λb(1) = 1, Λu(0) = Λu(1) = 0. In addition, the zeros of the
functions Λ must be only those required at the boundaries τ= 0, τ = 1.
We then replace Nuin (12) by Uinstead whenever it is explicitly needed. In practice, we use
Λa(t)=1t, Λb(t) = tand Λu(t) = t(1 t) mimicking a double Taylor expansion. We do not claim
for optimal decision here, as many other choices would work. This one gives very good results in
all the situations we have met. It is preferred over the first approach (BV penalization) when it is
difficult for the system to relax on aand b.
3.3. Arclength condition
Although the geometrical action does not depend on the time parametrization chosen due to its
homogeneity, it is important in practice to restrict the problem. As a matter of fact, in the cost
functional landscape, there is an infinity of possible solutions, each having its own parametrization.
Fixing an arclength condition, say ||˙u|| =c, s[0,1] is just a matter of convenience. But more
importantly, it stabilizes the gradient search preventing the NNs to drift towards ill-conditioned
parametrizations, especially in the context of PDEs. The use of a small penalisation parameter
γarc 1 is enough to prevent NNs to explore extreme landscape regions. The penalty constraint
after straightforward algebra is
Carc[θu] = Z1
0|| ˙
U||2ds Z1
0|| ˙
U|| ds2
0,(15)
where Utakes the form (14). Note that the norm used is user-defined rather than the actual χor
χ1norm.
5
摘要:

Computingnon-equilibriumtrajectoriesbyadeeplearningapproachEricSimonnetaINPHYNI,UniversiteC^oted'AzuretCNRS,UMR7010,1361,routedesLucioles,06560,Valbonne,FranceAbstractPredictingtheoccurenceofrareandextremeeventsincomplexsystemsisawell-knownprobleminnon-equilibriumphysics.Theseeventscanhavehugeimpac...

展开>> 收起<<
Computing non-equilibrium trajectories by a deep learning approach_2.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:4.75MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注