Computing non-equilibrium trajectories by a deep learning approach_2

2025-05-01 0 0 4.75MB 32 页 10玖币

侵权投诉

Computing non-equilibrium trajectories by a deep learning approach

Eric Simonnet

aINPHYNI, Universit´e Cˆote d’Azur et CNRS, UMR 7010, 1361, route des Lucioles, 06560, Valbonne, France

Abstract

Predicting the occurence of rare and extreme events in complex systems is a well-known problem in

non-equilibrium physics. These events can have huge impacts on human societies. New approaches

have emerged in the last ten years, which better estimate tail distributions. They often use large

deviation concepts without the need to perform heavy direct ensemble simulations. In particular, a

well-known approach is to derive a minimum action principle and to ﬁnd its minimizers.

The analysis of rare reactive events in non-equilibrium systems without detailed balance is no-

toriously diﬃcult either theoretically and computationally. They are described in the limit of small

noise by the Freidlin-Wentzell action. We propose here a new method which minimizes the geomet-

rical action instead using neural networks: it is called deep gMAM. It relies on a natural and simple

machine-learning formulation of the classical gMAM approach. We give a detailed description of the

method as well as many examples. These include bimodal switches in complex stochastic (partial)

diﬀerential equations, quasi-potential estimates, and extreme events in Burgers turbulence.

Keywords: Freidlin-Wentzell large deviation theory, instantons, geometrical action, gMAM,

quasi-potential, Neural Networks

1. Introduction

Understanding rare and extreme events in complex models has become a cornerstone of modern

non-equilibrium physics. In many cases, the occurrence of such events has strong impacts on our

daily life and cannot be ignored despite their small probability of occurrence. A well-known example

is of course climate change and its impact at the regional scale. At the level of biological systems,

dramatic phenomena can be tied to very small unexpected changes, in biochemistry, molecular

dynamics, with protein bindings/foldings for instance. Remarkably, these events are predictable:

they always conspire following the same path associated with some well-deﬁned probability. In fact,

small ﬂuctuations either random or deterministic can drive systems to diﬀerent unknown equilibria

provided some energy barrier is reached. This is the Arrhenius law. It is a very simple instance of a

large deviation principle (LDP) which expresses the property that a quantity (e.g. a probability) is

behaving as −log Pr ∼Vwhen some parameter becomes small. Such LDP appears to be much

more general than the original Arrhenius law and is not restrictive to the existence of a free energy

potential V. In general situations Vis then called the quasi-potential or rate function. However,

understanding non-equilibrium systems which do not have a detailed balance turns out to be a very

challenging task, not only theoretically, but computationally as well as.

The natural way to tackle these problems is large deviation theory as it provides a natural and

powerful framework for non-equilibrium statistical physics [67]. In the context of metastability, the

theory has been developed by Freidlin and Wentzell in the 70s-80s [19] as a large deviation approach

to perturbed dynamical systems although these ideas were already known by the physicists with

Onsager and Machlup [48]. Noteworthy, there is another path to large deviations which is called

the Martin-Siggia-Rose-Jansen-de Dominicis (MSRJD) formalism developed in the 70s ([27] for a

Email address: eric.simonnet@inphyni.cnrs.fr (Eric Simonnet)

October 11, 2022

arXiv:2210.04042v1 [physics.comp-ph] 8 Oct 2022

presentation). Once a minimum action principle is obtained, it remains to identify the minimizers.

This is in general a very diﬃcult optimisation problem as it involves complex quantities related to

the nontrivial nonlinear interactions with noise. It turns out that minimizing the Freidlin-Wentzell

action directly is also very diﬃcult due to the long time it takes for the ﬂuctuations to build up

an optimal path history. It translates mathematically as an optimisation problem in the large time

limit which turns out to be ill-conditioned. A key step is therefore to identify an equivalent problem

where time has been reparametrized. It gives the so-called geometrical action [69]. This has led many

authors to consider the so-called gMAM approach bringing essential advantages over the original

form. How to minimize this action eﬃciently using neural networks and machine-learning techniques

is the subject of this work.

This work is organized as follow. We ﬁrst recall fundamental notions of the Freidlin-Wentzell

large deviation theory and the concept of geometrical action in Section 2. We then describe in Section

3 how to adapt these problems to the machine-learning context and give a detailed description of

the deep gMAM approach and what it brings compared to the classical one. Section 4 gives simple

introductory examples of stochastic diﬀerential equations (SDEs). It also illustrates as a byproduct

a very simple way to compute the quasi-potential for general SDEs. Section 5 considers more

challenging examples of stochastic partial diﬀerential equations (SPDEs). We then illustrate the

deep gMAM method in a very diﬀerent context of extreme events in Burgers turbulence in Section

6. We conclude in Section 7. Appendix 8.1 discusses a striking example of a deterministic chaotic

system and 8.2 provides a simple snippet Julia code of deep gMAM for the SDE case.

2. Deﬁnitions and generalities

2.1. Freidlin-Wentzell large deviation theory

We consider the following stochastic (partial) diﬀerential equation

du =F(u)dt +√σ(u)dWt, u ∈H(1)

where  > 0, Wtis a Wiener process, His an ad-hoc Hilbert functional space, u: (0, T )×D → Rp:

(t, x)7→ u(t, x) with D ⊂ Rdis the spatial domain and uis a vector ﬁeld (u1,··· , up). For simplicity,

we address the scalar ﬁelds only with p= 1. The correlation tensor (or diﬀusion matrix/operator)

χ=σσ?.

The inner product is denoted h·,·i and the corresponding norm is ||u|| = (hu, ui)1

2. We are

interested in the probability Pr to have a time-Ttrajectory (possibly with T→ ∞) connecting

two distinct states aand bwhen →0. It is well-known from Friedlin-Wentzell theory [19] that it

satisﬁes the large deviation principle (LDP)

lim

→0−log Pr = inf

u∈C ST[u],(2)

where the set of trajectories starting from aand ending to bat time Tare denoted

C ≡ Ca,b,T ={u: (t, x)7→ u(t, x)), u(0,x) = a(x), u(T, x) = b(x)},(3)

and STis the so-called Freidlin-Wentzell action:

ST[u] = 1

2ZT

0h( ˙u−F(u), χ−1( ˙u−F(u))idt. (4)

The argmin of the right-hand side (2) is called in the literature an instanton path. In the context

where a, b are local attractors for the deterministic dynamics ((1) with = 0), the (global) argmin

solution is also often called a non-equilibrium transition or maximum likelihood path. The minimum

action viewed as a function of aand bis called the quasi-potential [19, 8], the deﬁnition in fact

involves another minimisation w.r.t. T:

V:H×H→R,(a, b)7→ inf

Tinf

u∈Ca,b,T

ST[u].(5)

Strictly speaking, FW theorem applies for ﬁnite-dimensional systems only and is very generic

with mild conditions, e.g. continuity hypothesis. More speciﬁcally, the diﬀusion matrix χmust be

uniformly non-degenerate: hp, χpi ≥ c||p||2, c > 0 for a LDP to apply (see [19], chap. 5.3). If not

the BV constraints (3) cannot be satisﬁed.

The case of SPDEs is mathematically much more diﬃcult to handle. A well-known historical

example is a rigorous LDP proof for the Ginzburg-Landau 1-D PDE [18]. We thus consider all the

following at a formal level only. We will not write explicitly the dependency on xunless needed. A

more general formulation is to consider the minimum action problem as

inf

u∈C ZT

L(u, ˙u)dt with L(u, v) = sup

(hv, pi − H(u, p)),(6)

where His the Hamiltonian. It can take diﬀerent forms depending on the class of problems and

Lis its Legendre transform. The advantage of this formulation is its generality, in particular when

the Lagrangian has no closed-form expression (see an example in [19], chap. 5.3 and also [33]).

Moreover, one does not need to invert χ. In this work however, we focus on diﬀusion problems (1)

only, in which case the Hamiltonian takes an explicit form

H(u, p) = hF(u), pi+1

2hp, χpi.(7)

In the following, we propose an equivalent formulation of (6). Instead of considering the Lagrangian

(4): L(u, ˙u) = 1

2h˙u−F(u), χ−1( ˙u−F(u))i, we consider the constrained problem:

inf

u∈C

2ZT

0hp, χpidt, χp = ˙u−F(u).(8)

The general idea of the proposed deep gMAM method, is to solve (8) by a penalisation method. We

can therefore handle more complicated situations where χ−1cannot be expressed easily by enriching

the cost functional by an additional penalty term. We recall that the notation χp means the operator

χapplied to the ﬁeld p. This formulation is very natural, a similar approach can also be found in [59].

We will provide a highly nontrivial PDE example with a complicated χin section 6. We next

describe the so-called geometrical action.

2.2. Geometrical action

It is often the case that one is interested in minimizing STw.r.t. Tas well. It is in general

equivalent to consider the limit T→ ∞. This situation typically happens in the context of transitions

from one equilibrium state (or loosely speaking a local attractor) to another, the transition takes

an inﬁnite amount of time to escape from the starting equilibrium. The minimisation problem is

therefore

inf

T >0inf

u∈C

2ZT

0||˙u−F(u)||2

χ−1dt,

with ||f||χ−1≡ hf, χ−1fithe induced covariance norm, see (4). In order to avoid the double minimi-

sation problem, it is possible to derive an equivalent formulation which removes the time constraint as

shown by [33] (see also [20]). The idea is the following. Using ||u||2+||v||2≥2||u|| ||v|| with equality

when ||u|| =||v||, one has 1

2ZT

0||˙u−F(u)||2

χ−1dt ≥ZT

0||˙u||χ−1||F(u)||χ−1− h˙u, F (u)iχ−1dt. The

remarkable observation is that this inequality is indeed an equality. One can choose a reparametriza-

tion of time say γ(t)≥0 such that γ(0) = 0 and γ(T) = T, the change of variable t=γ(τ) yields

0||˙u||χ−1||F(u)||χ−1− h˙u, F (u)iχ−1dt =ZT

0||u0||χ−1||F(u)||χ−1− hu0, F (u)iχ−1dτ, with u0=

du/dτ. The trick is now to choose τsuch that ||u0||χ−1=||F(u)||χ−1. This is always possible, by

taking γ0=||˙u||χ−1/||F(u)||χ−1. The minimisation problem is equivalent to (changing the notation

say ˙u=du/dτ )

inf

u∈G Z1

0||˙u||χ−1||F(u)||χ−1− h˙u, F (u)iχ−1dτ, G ≡ {u:τ→u(τ), u(0) = a, u(1) = b}.(9)

It is therefore the location on the curve which prevails, the way it is parametrized does not matter.

A more rigorous proof can be found in [33].

In practice, when χhas a complicated form, ie when χ−1has no explicit formulation, we just

formulate the same problem like in (8) avoiding χ−1, namely solving the constraint problem

inf

u∈G Z1

(||v||χ||w||χ− hv, wiχ)dτ, χv = ˙u, and χw =F(u).(10)

and its penalisation (quadratic) formulation:

inf

u∈G,v,w Z1

(||v||χ||w||χ− hv, wiχ)dτ +γvZ1

0||χv −˙u||2dτ +γwZ1

0||χw −F(u)||2dτ. (11)

The penalisation parameters γv, γw1 must be large enough but at the same time not too large

to avoid dealing with an ill-conditioned functional. The norm involved in the geometrical action is

the χ-norm, by contrast with the two constraints on (v, w) involving some user-deﬁned one, e.g. an

Euclidian weighted norm. The treatment of (10) using (11) is not the only possibility, other strategies

are possible, e.g. augmented Lagrangian methods (see [35] in the machine-learning context applied

to elliptic and eigenvalue problems, and [59] using some classical approach). One can also consider

the minimax original problem (6) using adversarial networks. In the context of machine-learning

however, (11) appears to be the simplest strategy giving very good results. This is explained in

details in the next section 3.

3. Deep gMAM method

We describe here in more details how to use Neural Networks (NNs) for solving (11). The general

idea is to parametrize the argmin solution of (11) by some NNs. In doing so, we already make a

choice of working into the physical space-time domain (t, x)∈(0,1) ×D. Due to that, convolutional

neural networks are not appropriated in this context. It amounts to the fact that one has access

to a macroscopic description of the action and takes full advantage of it. The broad philosophy of

the proposed approach is therefore the same than for the so-called physics-informed neural networks

(PINNs), [52, 64, 72] except that the system to solve is more involved with possibly much more

complicated constraints. We will discuss the technical aspects below in the subsections 3.1–3.6 and

summarize what it can bring in the last subsection 3.7.

3.1. Neural Network parametrization and cost functional

We consider here one of the simplest neural architecture, the so-called fully-connected feedforward

NNs. Let N:Rd+1 ×Rnθ→R,(t, x,θ)7→ N(t, x;θ), where θis the vector of nθparameters. The

function Nis deﬁned as

N(t, x;θ) = Nout ◦ NL···N1◦ Nin(t, x),

where ◦is the composition and Nk(y) = σk(Wky+bk). The parameters bk∈Rlkare called NN

biases and the (lk×ck) matrices Wkare called NN weights. The activation functions σkmust

be nonlinear. In the examples discussed later, the dimensions are chosen independent of kwith

lk=ck=c. The number of hidden layers Lis the depth, whereas cis the network capacity. In the

following, we consider only swish activation functions [53]. They correspond to some non-monotonic

version of the classical RELU functions max(0, x), with σ(x) = x/(1 + e−x). Other choices are of

course possible, such as RELU or tanh but in practice the swish nonlinearity gives very good results

and at the same time smoother representations. The last layer Nout is in general linear. The depth

Land care important parameters. The expressivity of Nbecomes better as Land cincreases but

at the same time, it becomes harder to train, ie ﬁnding ad-hoc optimal values for θ.

As stated above, we now replace the main and auxiliary ﬁelds (u, v, w) in (11) by their NN parametriza-

tions u→ Nu,v→ Nvand w→ Nwwith parameters θu,θv,θw. The cost functional is then

minimized with respect to these parameters: we have thus performed a nonlinear projection onto

the neural network space. This is indeed a very general methodology in machine-learning. The cost

functional is written as

C[θu,θv,θw] = γgCg[θv,θw] + γarcCarc[θu] + Cconstraints + (γbcsCBCs.).(12)

The ﬁrst term corresponds to the geometrical action in its continuous form:

Cg[θv,θw] = Z1

(||Nv||χ||Nv||χ− hNv,Nwiχ)dτ, ||N||χ≡ZDN(t, x) (χN)(t, x)dx1

.(13)

An important issue is to insure that when ||·||χis discretized, one still has the properties of a norm,

e.g. deﬁnite positive and Cauchy-Schwarz inequality. It is in general trivial (e.g. L2→l2) but it can

be more tricky if χis complicated: a nontrivial case is discussed in subsection 3.6. We now describe

the other functionals in the next subsections.

3.2. Boundary value ansatz

The cost functional (12) must take into account the boundary value (BV) problem, namely that

Nu(0,x) = a(x) and Nu(1,x) = b(x) where a, b are given. There are two strategies: either imposing

these constraints explicitly by penalisation or considering some ansatz for Nuwhich automatically

include them. The second choice means that one uses the general ansatz

U(t, x;θu)=Λa(t)a(x)+Λb(t)b(x)+Λu(t)Nu(t, x;θu),(14)

where Λa(0) = 1, Λa(1) = 0, Λb(0) = 0, Λb(1) = 1, Λu(0) = Λu(1) = 0. In addition, the zeros of the

functions Λ must be only those required at the boundaries τ= 0, τ = 1.

We then replace Nuin (12) by Uinstead whenever it is explicitly needed. In practice, we use

Λa(t)=1−t, Λb(t) = tand Λu(t) = t(1 −t) mimicking a double Taylor expansion. We do not claim

for optimal decision here, as many other choices would work. This one gives very good results in

all the situations we have met. It is preferred over the ﬁrst approach (BV penalization) when it is

diﬃcult for the system to relax on aand b.

3.3. Arclength condition

Although the geometrical action does not depend on the time parametrization chosen due to its

homogeneity, it is important in practice to restrict the problem. As a matter of fact, in the cost

functional landscape, there is an inﬁnity of possible solutions, each having its own parametrization.

Fixing an arclength condition, say ||˙u|| =c, ∀s∈[0,1] is just a matter of convenience. But more

importantly, it stabilizes the gradient search preventing the NNs to drift towards ill-conditioned

parametrizations, especially in the context of PDEs. The use of a small penalisation parameter

γarc 1 is enough to prevent NNs to explore extreme landscape regions. The penalty constraint

after straightforward algebra is

Carc[θu] = Z1

0|| ˙

U||2ds −Z1

0|| ˙

U|| ds2

≥0,(15)

where Utakes the form (14). Note that the norm used is user-deﬁned rather than the actual χor

χ−1norm.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Computingnon-equilibriumtrajectoriesbyadeeplearningapproachEricSimonnetaINPHYNI,UniversiteC^oted'AzuretCNRS,UMR7010,1361,routedesLucioles,06560,Valbonne,FranceAbstractPredictingtheoccurenceofrareandextremeeventsincomplexsystemsisawell-knownprobleminnon-equilibriumphysics.Theseeventscanhavehugeimpac...

展开>> 收起<<

Computing non-equilibrium trajectories by a deep learning approach_2.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Computing non-equilibrium trajectories by a deep learning approach_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: