Neural networks for rst order HJB equations and application to front propagation with obstacle terms Olivier BokanowskiAveril ProstXavier Warin

2025-05-02 0 0 4.6MB 33 页 10玖币

侵权投诉

Neural networks for ﬁrst order HJB equations and application to

front propagation with obstacle terms∗

Olivier Bokanowski,†‡ Averil Prost,§Xavier Warin¶

April 7, 2025

Abstract

We consider a deterministic optimal control problem with a maximum running cost functional, in a

ﬁnite horizon context, and propose deep neural network approximations for Bellman’s dynamic program-

ming principle, corresponding also to some ﬁrst order Hamilton-Jacobi-Bellman equation. This work

follows the lines of Hur´e et al. (SIAM J. Numer. Anal., vol. 59 (1), 2021, pp. 525-557) where algorithms

are proposed in a stochastic context. However we need to develop a completely new approach in order

to deal with the propagation of errors in the deterministic setting, where no diﬀusion is present in the

dynamics. Our analysis gives precise error estimates in an average norm. The study is then illustrated

on several academic numerical examples related to front propagations models in presence of obstacle

constraints, showing the relevance of the approach for average dimensions (e.g. from 2 to 8), even for

non smooth value functions.

Keywords:neural networks, deterministic optimal control, dynamic programming principle, ﬁrst order Hamil-

ton Jacobi Bellman equation, state constraints

1 Introduction

In this work we are interested by the approximation of a deterministic optimal control problem with ﬁnite

horizon involving a maximum running cost, deﬁned as

v(t, x) = inf

a(·)∈A[t,T ]

max max

θ∈[t,T ]g(ya

x(θ)), ϕ(ya

x(T)),(1)

where the state xbelongs to Rdand t∈[0, T ] for some T≥0. Here the trajectory y(s) = ya

x(s) obeys the

following dynamics

˙y(s) = f(y(s), a(s)), a.e. s ∈[t, T ],(2)

with initial condition y(t) = x, and control a∈A[t,T ]:=L∞([t, T ], A). It is assumed that Ais a non-empty

compact subset of Rκ(κ≥1) and (f,ϕ,g) are Lipschitz continuous. The value vis solution of the following

Hamilton-Jacobi-Bellman (HJB) partial diﬀerential equation, in the viscosity sense (see for instance [14])

min −vt+ max

a∈A(−f(x, a)· ∇xv), v −g(x)= 0, t ∈[0, T ] (3a)

v(T, x) = max(ϕ(x), g(x)).(3b)

∗This research beneﬁted from the support of the FMJH Program PGMO and from the support to this program from EDF.

†Universit´e Paris Cit´e, Laboratoire Jacques-Louis Lions (LJLL), F-75013 Paris, France

‡Sorbonne Universit´e, CNRS, LJLL, F-75005 Paris, France olivier.bokanowski@u-paris.fr

§INSA Rouen, LMI (EA 3226 - FR CNRS 3335), 76801 St Etienne du Rouvray, France. averil.prost@insa-rouen.fr

¶EDF R&D & FiME, 91120 Palaiseau, France xavier.warin@edf.fr

arXiv:2210.04300v1 [math.OC] 9 Oct 2022

Tremendous numerical eﬀorts have been made in order to propose eﬃcient algorithms for solving problem

related to (1), or the corresponding HJB equation (3). Precise numerical methods have been developed, using

approximations on grids, such as Markov Chains approximations [33], ﬁnite diﬀerence schemes (monotone

schemes [17], semi-Lagrangian schemes (see e.g. [18, 20]) ENO or WENO higher-order schemes [37, 38],

ﬁnite element methods (see [31]), discontinuous Galerkin methods [28, 35], and in particular [12, 13] for

(3), see also [40], or max-plus approaches [2]). However, grid-based methods are limited to low dimensions

because of the well-known curse of dimensionality. In order to tackle this diﬃculty, various approaches are

studied, such as spare grids methods [16, 21], tree structure approximation algorithm (see e.g.[3]), tensor

decomposition methods [19], max-plus approaches in [36].

In the deterministic context, problem (1) is motivated by deterministic optimal control with state con-

straints (see e.g. [14] and [4]). In [41], the HJB equation (3) is approximated by deep neural networks

(DNN) for solving state constrained reachability control problems of dimension up to d= 10. In [15] or

in [6], formulation (1) is used to solve an abort landing problem (using diﬀerent numerical approaches); in

[11], equations such as (3) are used to solve an aircraft payload optimization problem; a multi-vehicle safe

trajectory planning is considered in [8].

On the other hand, for stochastic control, DNN approximations were already used for gas storage op-

timization in [9], where the control approximated by a neural network was the amount of gas injected or

withdrawn in the storage. This approach has been adapted and popularized recently for the resolution of

BSDE in [26] (deep BSDE algorithm). For a convergence study of such algorithms in a more general context,

see [27].

In this work, we study some neural networks approximations for (1). We are particularly interested

for the obtention of a rigorous error analysis of such approximations. We follow the approach of [29] (and

its companion paper [7]), combining neural networks approximations and Bellman’s dynamic programming

principle. We obtain precise error estimates in an average norm.

Note that the work of [29] is developed in the stochastic context, where an error analysis is given. However

this error analysis somehow relies strongly on a diﬀusion assumption of the model (transition probabilities

with densities are assumed to exists). In our case, we would need to assume that the deterministic process

admits a density, which is not the case (see remark 6.5). Therefore the proof of [29] does not apply to the

deterministic context. Here we propose a new approach for the convergence analysis, leading to new error

estimates. We chose to present the algorithm on a running cost optimal control problem, but the approach

can be generalized to Bolza or Mayer problems (see e.g. [4, 6]).

For sake of completeness, let us notice that the ideas of [29] are related to methods already proposed

in [23] and [10] for the resolution of Backward Stochastic Diﬀerential Equations (BSDE), where the control

function is calculated by regression on a space of some basis functions (the Hybrid-Now algorithm is related

to [23], and the performance iteration algorithm is related to an improved algorithm in [10]). For recent

developments, see [24] using classical linear regressions, and [30] and [22] for BSDE approximations using

neural networks.

From the numerical point of view, we illustrate our algorithms on some academic front-propagation

problems with or without obstacles. We focus on a ”Lagrangian scheme” (a deterministic equivalent of

the performance iteration scheme of [29]), and also compare with other algorithms : a ”semi-Lagrangian

algorithm” (similar to the Hybrid-Now algorithm of [29]) and an hybrid algorithm combining the two previous,

involving successive projection of the value function on neural network spaces.

The plan of the paper is the following. In section 2 we deﬁne a semi-discrete value approximation for (1),

with controlled error with respect to the continuous value, using piecewise constant controls. In section 3,

equivalent reformulations of the problem are given using feedback controls and dynamic programming. In

section 4, an approximation result of the discrete value function by using Lipschitz continuous feedback

controls is given. In section 5 we present three numerical schemes using neural networks approximations

(for the approximation of feedback controls and/or for the value), using general Runge Kutta schemes for

the approximation of the controlled dynamics. Section 6 contains our main convergence result for one of the

proposed scheme (the Lagrangian scheme) which involves only approximations of the feedback controls, and

section 7 focuses on the proof of our main result. Section 8 is devoted to some numerical academic examples

of front-propagation problems with or without an obstacle term (state constraints), for average dimensions,

showing the potential of the proposed algorithms in this context, and also giving comparisons between the

diﬀerent algorithms introduced. An appendix contains some details for computing reference solutions for

some of the considered examples.

Notations. Unless otherwise precised, the norm |.|on Rq(q≥1) is the max norm |x|=kxk∞=

max1≤i≤q|xi|. The notation Jp, qK={p, p + 1, . . . , q}is used, for any integers p≤q. For any function

α:Rp→Rqfor some p, q ≥1, [α] := supy6=x

|α(y)−α(x)|

|y−x|denotes the corresponding Lipschitz constant. We

also denote a∨b:= max(a, b) for any a, b ∈R. The set of ”feedback” controls is deﬁned as A:= {a:Rd→

A, a(.) measurable}.

2 Semi-discrete approximation with piecewise constant controls

In this section, we ﬁrst aim to deﬁne a semi-discrete approximation of (1) in time.

Let the following assumptions hold on the set Aand functions f, g, ϕ.

(H0) Ais a non-empty compact subset of Rκ(κ≥1), and is a convex set.

(H1) f:Rd×A→Rdis Lipschitz continuous and we denote [f]1,[f]2≥0constants such that

|f(x, a)−f(x0, a0)| ≤ [f]1|x−x0|+ [f]2|a−a0|,∀(x, x0)∈(Rd)2,∀(a, a0)∈A2.

(H2) g:Rd→Ris Lipschitz continuous.

(H3) ϕ:Rd→Ris Lipschitz continuous.

Let T > 0 be the horizon, let N∈N∗be a number of iterations, and (tk)k∈J0,N K⊂[0, T ] be a time mesh

with t0= 0 and tN=T. To simplify the presentation, we restrict ourselves to the uniform mesh tk=k∆t

with ∆t=T

N, but the arguments would carry over unchanged with a non-uniform time mesh.

Let us consider Fh:Rd×A→Rd(for a given h > 0), corresponding to some one time step approximation

of ya

x(h) (starting from ya

x(0) = x). For instance, we may consider the Euler scheme Fh(x, a) = x+hf(x, a),

or the Heun scheme Fh(x, a) = x+h

2(f(x, a) + f(x+hf(x, a), a)), and so on. General Runge Kutta schemes

are considered later on in section 5.2. Assumptions on Fhwill be made precise when needed.

For a given sequence a:= (an, an+1, . . . , aN−1)∈AN−n(with n∈J0, N −1K), which corresponds to a

piecewise constant control approximation, and a given integer p≥1, we deﬁne two levels of approximations

for the trajectories.

The ﬁne approximation involves time step h=∆t

p, is denoted Yak

j,x (for a ﬁxed control ak,∀0≤j≤p),

and is deﬁned recursively by

Yak

0,x =x(4a)

Yak

j+1,x =Fh(Yak

j,x , ak), j = 0, . . . , p −1,(4b)

which also corresponds to jiterates of y→Fh(y, ak), starting from y=x, with the same control ak. This

ﬁne level will be used to obtain approximation of the trajectory at intermediate time steps tk+jh which lie

into [tk, tk+1].

The coarse approximation with time step ∆tis denoted (Xa

k,x)n≤k≤Nand is deﬁned recursively by

n,x :=x(5a)

k+1,x =Yak

p,Xa

k,x , k =n, . . . , N −1.(5b)

Notations. We will often use the notations, for a given a∈A,F(·, a)≡Fa(·) := Ya

p,. and the fact that

(5b) can also be written

k+1,x =F(Xa

k,x, ak), k =n, . . . , N −1.

We can now deﬁne the following cost functional, for x∈Rd,a∈AN−nand n∈J0, NK

Jn(x, a):= max

n≤k<N max

0≤j<p g(Yak

j,Xa

k,x )_(g∨ϕ)(Xa

N,x), x ∈Rd, a ∈AN−n,(6)

and the following semi-discrete version of (1), for x∈Rdand n∈J0, NK

Vn(x) := min

a∈AN−nJn(x, a).(7)

(for n=N, we have VN(x) = JN(x) = g(x)∨ϕ(x)). It will be also useful to introduce the following notation,

for a∈Aand x∈Rd,

Ga(x) := max

0≤j<p g(Ya

j,x).(8)

The values (Vn)0≤n≤Nsatisfy also VN(x) = g(x)∨ϕ(x), and the following dynamic programming principle

(DPP) for n= 0, . . . , N −1:

Vn(x) = inf

a∈AGa(x)∨Vn+1(F(x, a)), x ∈Rd.(9)

Let us notice that the case p= 1 leads to the following simpliﬁcations: h= ∆t,F(x, a) = F∆t(x, a),

Ga(x) = g(x), Jn(x) = maxn≤k≤Ng(Xa

k,x)∨ϕ(Xa

N,x), as well as VN(x) = g(x)∨ϕ(x) and the DPP

Vn(x) = inf

a∈Ag(x)∨Vn+1(F∆t(x, a)) (0 ≤n≤N−1).

The motivation behind the introduction of the ﬁner level of approximation (Yak

j,x )0≤j≤pis ﬁrst numerical.

It enables a better evaluation of the running cost term g(.) along the trajectory, without the computational

cost of more intermediate controls. The numerical improvement is illustrated in the examples of section 8.1.

Furthermore, from the theoretical point of view, the convergence analysis in our main result will strongly

use the fact that x→Fh(x, a) is a change of variable for hsuﬃciently small (i.e., psuﬃciently large).

We start by showing some uniform Lipschitz bounds.

Lemma 2.1. Assume (H0)-(H3), and the Lipschitz bound supa∈A[Fh(., a)] ≤1+ch for some constant c≥0.

(i)The function Jn(., a)is Lipschitz for all a∈AN−n, with uniform bound [Jn(., a)] ≤[g]∨[ϕ]ecT .

(ii)In particular, the uniform bound max0≤n≤N[Vn]≤[g]∨[ϕ]ecT holds.

Proof. (i) Notice that 1 + ch ≤ech. Then for a∈Aand for the j-th iterate F(j)

h(., a) of Fh, we obtain

|Ya

j,x −Ya

j,y|=|F(j)

h(x, a)−F(j)

h(y, a)| ≤ ejch|x−y| ≤ ec∆t|x−y|for any 0 ≤j≤p(by recursion). Hence

also supa∈A[F(., a)] ≤ec∆t, from which we deduce for any a= (an, . . . , aN−1)∈AN−nand n≤k≤N,

|Xa

k,x −Xa

k,y | ≤ ec(k−n)∆t|x−y| ≤ ecT |x−y|. The desired result follows from the deﬁnition of Jnand

repeated use of max(a, b)−max(c, d)≤max(a−c, b −d).

(ii) As a direct consequence of (i) and the deﬁnition of Vn.

The following result shows that Vn(x) is a ﬁrst order approximation of v(tn, x) in time.

Theorem 2.2. Assume (H0)-(H3), and that there exists h0>0such that

-Fhis consistent with the dynamics fin the following sense:

∃C≥0,∀(x, a, h)∈Rd×A×(0, h0),|Fh(x, a)−(x+hf(x, a))| ≤ Ch2(1 + |x|),(10)

- for all h∈]0, h0],supa∈A[Fh(., a)] ≤1 + ch for some constant c≥0(cmay depends on [f]),

-f(x, A)is convex for all x∈Rd.

Let h=∆t

p≤h0(with p≥1). Then

max

0≤n≤N|Vn(x)−v(tn, x)| ≤ e

C∆t(1 + |x|) (11)

for some constant e

C≥0independent of x, ∆t, p (and N).

Proof. This follows from the arguments of Theorem B.1. in [15].

Corollary 2.3. In particular, if Fhis a consistent RK scheme (see deﬁnition 5.2) and f(x, A)is convex for

all x, then by Lemma 5.6, (10) holds and therefore the error estimate (11) also holds.

Our aim is now to propose numerical schemes for the approximation of Vn(.).

3 Reformulation with feedback controls

In this section, equivalent deﬁnitions for Vnare given using feedback controls in A(the set of measurable

functions a:Rd→A). These formulations will lead to the numerical schemes.

First, for a given ak∈ A, the ﬁne approximation Yak

x,j (with time step h=∆t

p) is deﬁned by Yak

x,j := Yak(x)

x,j

using deﬁnition (4). This corresponds also to

Yak

0,x =x(12)

Yak

j+1,x =Fh(Yak

j,x , ak(x)), j = 0, . . . , p −1 (13)

(that is, jiterates of y→Fh(y, ak(x)), starting from y=x, with the ﬁxed control ak(x)). Then, for a given

sequence a:= (an, . . . , aN−1)∈ AN−n, the coarse approximation is deﬁned by

n,x =x(14a)

k+1 =Fak(Xa

k,x)≡FXa

k,x, ak(Xa

k,x), k =n, . . . , N −1.(14b)

(with notation Fak(x) = F(x, ak(x)). We also extend the deﬁnition of Jnto the feedback space, for x∈Rd

and a∈ AN−n, as follows

Jn(x, a):=max

n≤k≤NGak(Xa

k,x)_ϕ(Xa

N,x) (15)

where now, for a given control a∈ A, we extend the deﬁnition of (8) by

Ga(x) := max

0≤j<p g(Ya(x)

j,x ) (16)

(this also corresponds to deﬁne Ga(x) as Ga(x)(x)). With this deﬁnitions, we have the following results.

Proposition 3.1. (i)Vn(x)is the minimum of Jn(x, .)over feedback controls:

Vn(x) = min

a∈AN−nJn(x, a), x ∈Rd.

(ii)For all 0≤n≤N−1,Vnsatisﬁes the following dynamic programming principle over feedback

controls

Vn(x) = min

a∈A Ga(x)_Vn+1(Fa(x)), x ∈Rd, n = 0, . . . , N −1 (17)

and in particular, the inﬁmum is reached by some some ¯an∈ A.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralnetworksforrstorderHJBequationsandapplicationtofrontpropagationwithobstacleterms*OlivierBokanowski,AverilProst,§XavierWarin¶April7,2025AbstractWeconsideradeterministicoptimalcontrolproblemwithamaximumrunningcostfunctional,inanitehorizoncontext,andproposedeepneuralnetworkapproximationsforBe...

展开>> 收起<<

Neural networks for rst order HJB equations and application to front propagation with obstacle terms Olivier BokanowskiAveril ProstXavier Warin.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural networks for rst order HJB equations and application to front propagation with obstacle terms Olivier BokanowskiAveril ProstXavier Warin

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: