Neural networks for rst order HJB equations and application to front propagation with obstacle terms Olivier BokanowskiAveril ProstXavier Warin

2025-05-02 0 0 4.6MB 33 页 10玖币
侵权投诉
Neural networks for first order HJB equations and application to
front propagation with obstacle terms
Olivier Bokanowski,†‡ Averil Prost,§Xavier Warin
April 7, 2025
Abstract
We consider a deterministic optimal control problem with a maximum running cost functional, in a
finite horizon context, and propose deep neural network approximations for Bellman’s dynamic program-
ming principle, corresponding also to some first order Hamilton-Jacobi-Bellman equation. This work
follows the lines of Hur´e et al. (SIAM J. Numer. Anal., vol. 59 (1), 2021, pp. 525-557) where algorithms
are proposed in a stochastic context. However we need to develop a completely new approach in order
to deal with the propagation of errors in the deterministic setting, where no diffusion is present in the
dynamics. Our analysis gives precise error estimates in an average norm. The study is then illustrated
on several academic numerical examples related to front propagations models in presence of obstacle
constraints, showing the relevance of the approach for average dimensions (e.g. from 2 to 8), even for
non smooth value functions.
Keywords:neural networks, deterministic optimal control, dynamic programming principle, first order Hamil-
ton Jacobi Bellman equation, state constraints
1 Introduction
In this work we are interested by the approximation of a deterministic optimal control problem with finite
horizon involving a maximum running cost, defined as
v(t, x) = inf
a(·)A[t,T ]
max max
θ[t,T ]g(ya
x(θ)), ϕ(ya
x(T)),(1)
where the state xbelongs to Rdand t[0, T ] for some T0. Here the trajectory y(s) = ya
x(s) obeys the
following dynamics
˙y(s) = f(y(s), a(s)), a.e. s [t, T ],(2)
with initial condition y(t) = x, and control aA[t,T ]:=L([t, T ], A). It is assumed that Ais a non-empty
compact subset of Rκ(κ1) and (f,ϕ,g) are Lipschitz continuous. The value vis solution of the following
Hamilton-Jacobi-Bellman (HJB) partial differential equation, in the viscosity sense (see for instance [14])
min vt+ max
aA(f(x, a)· ∇xv), v g(x)= 0, t [0, T ] (3a)
v(T, x) = max(ϕ(x), g(x)).(3b)
This research benefited from the support of the FMJH Program PGMO and from the support to this program from EDF.
Universit´e Paris Cit´e, Laboratoire Jacques-Louis Lions (LJLL), F-75013 Paris, France
Sorbonne Universit´e, CNRS, LJLL, F-75005 Paris, France olivier.bokanowski@u-paris.fr
§INSA Rouen, LMI (EA 3226 - FR CNRS 3335), 76801 St Etienne du Rouvray, France. averil.prost@insa-rouen.fr
EDF R&D & FiME, 91120 Palaiseau, France xavier.warin@edf.fr
1
arXiv:2210.04300v1 [math.OC] 9 Oct 2022
Tremendous numerical efforts have been made in order to propose efficient algorithms for solving problem
related to (1), or the corresponding HJB equation (3). Precise numerical methods have been developed, using
approximations on grids, such as Markov Chains approximations [33], finite difference schemes (monotone
schemes [17], semi-Lagrangian schemes (see e.g. [18, 20]) ENO or WENO higher-order schemes [37, 38],
finite element methods (see [31]), discontinuous Galerkin methods [28, 35], and in particular [12, 13] for
(3), see also [40], or max-plus approaches [2]). However, grid-based methods are limited to low dimensions
because of the well-known curse of dimensionality. In order to tackle this difficulty, various approaches are
studied, such as spare grids methods [16, 21], tree structure approximation algorithm (see e.g.[3]), tensor
decomposition methods [19], max-plus approaches in [36].
In the deterministic context, problem (1) is motivated by deterministic optimal control with state con-
straints (see e.g. [14] and [4]). In [41], the HJB equation (3) is approximated by deep neural networks
(DNN) for solving state constrained reachability control problems of dimension up to d= 10. In [15] or
in [6], formulation (1) is used to solve an abort landing problem (using different numerical approaches); in
[11], equations such as (3) are used to solve an aircraft payload optimization problem; a multi-vehicle safe
trajectory planning is considered in [8].
On the other hand, for stochastic control, DNN approximations were already used for gas storage op-
timization in [9], where the control approximated by a neural network was the amount of gas injected or
withdrawn in the storage. This approach has been adapted and popularized recently for the resolution of
BSDE in [26] (deep BSDE algorithm). For a convergence study of such algorithms in a more general context,
see [27].
In this work, we study some neural networks approximations for (1). We are particularly interested
for the obtention of a rigorous error analysis of such approximations. We follow the approach of [29] (and
its companion paper [7]), combining neural networks approximations and Bellman’s dynamic programming
principle. We obtain precise error estimates in an average norm.
Note that the work of [29] is developed in the stochastic context, where an error analysis is given. However
this error analysis somehow relies strongly on a diffusion assumption of the model (transition probabilities
with densities are assumed to exists). In our case, we would need to assume that the deterministic process
admits a density, which is not the case (see remark 6.5). Therefore the proof of [29] does not apply to the
deterministic context. Here we propose a new approach for the convergence analysis, leading to new error
estimates. We chose to present the algorithm on a running cost optimal control problem, but the approach
can be generalized to Bolza or Mayer problems (see e.g. [4, 6]).
For sake of completeness, let us notice that the ideas of [29] are related to methods already proposed
in [23] and [10] for the resolution of Backward Stochastic Differential Equations (BSDE), where the control
function is calculated by regression on a space of some basis functions (the Hybrid-Now algorithm is related
to [23], and the performance iteration algorithm is related to an improved algorithm in [10]). For recent
developments, see [24] using classical linear regressions, and [30] and [22] for BSDE approximations using
neural networks.
From the numerical point of view, we illustrate our algorithms on some academic front-propagation
problems with or without obstacles. We focus on a ”Lagrangian scheme” (a deterministic equivalent of
the performance iteration scheme of [29]), and also compare with other algorithms : a ”semi-Lagrangian
algorithm” (similar to the Hybrid-Now algorithm of [29]) and an hybrid algorithm combining the two previous,
involving successive projection of the value function on neural network spaces.
The plan of the paper is the following. In section 2 we define a semi-discrete value approximation for (1),
with controlled error with respect to the continuous value, using piecewise constant controls. In section 3,
equivalent reformulations of the problem are given using feedback controls and dynamic programming. In
section 4, an approximation result of the discrete value function by using Lipschitz continuous feedback
controls is given. In section 5 we present three numerical schemes using neural networks approximations
(for the approximation of feedback controls and/or for the value), using general Runge Kutta schemes for
the approximation of the controlled dynamics. Section 6 contains our main convergence result for one of the
proposed scheme (the Lagrangian scheme) which involves only approximations of the feedback controls, and
section 7 focuses on the proof of our main result. Section 8 is devoted to some numerical academic examples
2
of front-propagation problems with or without an obstacle term (state constraints), for average dimensions,
showing the potential of the proposed algorithms in this context, and also giving comparisons between the
different algorithms introduced. An appendix contains some details for computing reference solutions for
some of the considered examples.
Notations. Unless otherwise precised, the norm |.|on Rq(q1) is the max norm |x|=kxk=
max1iq|xi|. The notation Jp, qK={p, p + 1, . . . , q}is used, for any integers pq. For any function
α:RpRqfor some p, q 1, [α] := supy6=x
|α(y)α(x)|
|yx|denotes the corresponding Lipschitz constant. We
also denote ab:= max(a, b) for any a, b R. The set of ”feedback” controls is defined as A:= {a:Rd
A, a(.) measurable}.
2 Semi-discrete approximation with piecewise constant controls
In this section, we first aim to define a semi-discrete approximation of (1) in time.
Let the following assumptions hold on the set Aand functions f, g, ϕ.
(H0) Ais a non-empty compact subset of Rκ(κ1), and is a convex set.
(H1) f:Rd×ARdis Lipschitz continuous and we denote [f]1,[f]20constants such that
|f(x, a)f(x0, a0)| ≤ [f]1|xx0|+ [f]2|aa0|,(x, x0)(Rd)2,(a, a0)A2.
(H2) g:RdRis Lipschitz continuous.
(H3) ϕ:RdRis Lipschitz continuous.
Let T > 0 be the horizon, let NNbe a number of iterations, and (tk)kJ0,N K[0, T ] be a time mesh
with t0= 0 and tN=T. To simplify the presentation, we restrict ourselves to the uniform mesh tk=kt
with ∆t=T
N, but the arguments would carry over unchanged with a non-uniform time mesh.
Let us consider Fh:Rd×ARd(for a given h > 0), corresponding to some one time step approximation
of ya
x(h) (starting from ya
x(0) = x). For instance, we may consider the Euler scheme Fh(x, a) = x+hf(x, a),
or the Heun scheme Fh(x, a) = x+h
2(f(x, a) + f(x+hf(x, a), a)), and so on. General Runge Kutta schemes
are considered later on in section 5.2. Assumptions on Fhwill be made precise when needed.
For a given sequence a:= (an, an+1, . . . , aN1)ANn(with nJ0, N 1K), which corresponds to a
piecewise constant control approximation, and a given integer p1, we define two levels of approximations
for the trajectories.
The fine approximation involves time step h=t
p, is denoted Yak
j,x (for a fixed control ak,0jp),
and is defined recursively by
Yak
0,x =x(4a)
Yak
j+1,x =Fh(Yak
j,x , ak), j = 0, . . . , p 1,(4b)
which also corresponds to jiterates of yFh(y, ak), starting from y=x, with the same control ak. This
fine level will be used to obtain approximation of the trajectory at intermediate time steps tk+jh which lie
into [tk, tk+1].
The coarse approximation with time step ∆tis denoted (Xa
k,x)nkNand is defined recursively by
Xa
n,x :=x(5a)
Xa
k+1,x =Yak
p,Xa
k,x , k =n, . . . , N 1.(5b)
Notations. We will often use the notations, for a given aA,F(·, a)Fa(·) := Ya
p,. and the fact that
(5b) can also be written
Xa
k+1,x =F(Xa
k,x, ak), k =n, . . . , N 1.
3
We can now define the following cost functional, for xRd,aANnand nJ0, NK
Jn(x, a):= max
nk<N max
0j<p g(Yak
j,Xa
k,x )_(gϕ)(Xa
N,x), x Rd, a ANn,(6)
and the following semi-discrete version of (1), for xRdand nJ0, NK
Vn(x) := min
aANnJn(x, a).(7)
(for n=N, we have VN(x) = JN(x) = g(x)ϕ(x)). It will be also useful to introduce the following notation,
for aAand xRd,
Ga(x) := max
0j<p g(Ya
j,x).(8)
The values (Vn)0nNsatisfy also VN(x) = g(x)ϕ(x), and the following dynamic programming principle
(DPP) for n= 0, . . . , N 1:
Vn(x) = inf
aAGa(x)Vn+1(F(x, a)), x Rd.(9)
Let us notice that the case p= 1 leads to the following simplifications: h= ∆t,F(x, a) = Ft(x, a),
Ga(x) = g(x), Jn(x) = maxnkNg(Xa
k,x)ϕ(Xa
N,x), as well as VN(x) = g(x)ϕ(x) and the DPP
Vn(x) = inf
aAg(x)Vn+1(Ft(x, a)) (0 nN1).
The motivation behind the introduction of the finer level of approximation (Yak
j,x )0jpis first numerical.
It enables a better evaluation of the running cost term g(.) along the trajectory, without the computational
cost of more intermediate controls. The numerical improvement is illustrated in the examples of section 8.1.
Furthermore, from the theoretical point of view, the convergence analysis in our main result will strongly
use the fact that xFh(x, a) is a change of variable for hsufficiently small (i.e., psufficiently large).
We start by showing some uniform Lipschitz bounds.
Lemma 2.1. Assume (H0)-(H3), and the Lipschitz bound supaA[Fh(., a)] 1+ch for some constant c0.
(i)The function Jn(., a)is Lipschitz for all aANn, with uniform bound [Jn(., a)] [g][ϕ]ecT .
(ii)In particular, the uniform bound max0nN[Vn][g][ϕ]ecT holds.
Proof. (i) Notice that 1 + ch ech. Then for aAand for the j-th iterate F(j)
h(., a) of Fh, we obtain
|Ya
j,x Ya
j,y|=|F(j)
h(x, a)F(j)
h(y, a)| ≤ ejch|xy| ≤ ect|xy|for any 0 jp(by recursion). Hence
also supaA[F(., a)] ect, from which we deduce for any a= (an, . . . , aN1)ANnand nkN,
|Xa
k,x Xa
k,y | ≤ ec(kn)∆t|xy| ≤ ecT |xy|. The desired result follows from the definition of Jnand
repeated use of max(a, b)max(c, d)max(ac, b d).
(ii) As a direct consequence of (i) and the definition of Vn.
The following result shows that Vn(x) is a first order approximation of v(tn, x) in time.
Theorem 2.2. Assume (H0)-(H3), and that there exists h0>0such that
-Fhis consistent with the dynamics fin the following sense:
C0,(x, a, h)Rd×A×(0, h0),|Fh(x, a)(x+hf(x, a))| ≤ Ch2(1 + |x|),(10)
- for all h]0, h0],supaA[Fh(., a)] 1 + ch for some constant c0(cmay depends on [f]),
-f(x, A)is convex for all xRd.
4
Let h=t
ph0(with p1). Then
max
0nN|Vn(x)v(tn, x)| ≤ e
Ct(1 + |x|) (11)
for some constant e
C0independent of x, t, p (and N).
Proof. This follows from the arguments of Theorem B.1. in [15].
Corollary 2.3. In particular, if Fhis a consistent RK scheme (see definition 5.2) and f(x, A)is convex for
all x, then by Lemma 5.6, (10) holds and therefore the error estimate (11) also holds.
Our aim is now to propose numerical schemes for the approximation of Vn(.).
3 Reformulation with feedback controls
In this section, equivalent definitions for Vnare given using feedback controls in A(the set of measurable
functions a:RdA). These formulations will lead to the numerical schemes.
First, for a given ak∈ A, the fine approximation Yak
x,j (with time step h=t
p) is defined by Yak
x,j := Yak(x)
x,j
using definition (4). This corresponds also to
Yak
0,x =x(12)
Yak
j+1,x =Fh(Yak
j,x , ak(x)), j = 0, . . . , p 1 (13)
(that is, jiterates of yFh(y, ak(x)), starting from y=x, with the fixed control ak(x)). Then, for a given
sequence a:= (an, . . . , aN1)∈ ANn, the coarse approximation is defined by
Xa
n,x =x(14a)
Xa
k+1 =Fak(Xa
k,x)FXa
k,x, ak(Xa
k,x), k =n, . . . , N 1.(14b)
(with notation Fak(x) = F(x, ak(x)). We also extend the definition of Jnto the feedback space, for xRd
and a∈ ANn, as follows
Jn(x, a):=max
nkNGak(Xa
k,x)_ϕ(Xa
N,x) (15)
where now, for a given control a∈ A, we extend the definition of (8) by
Ga(x) := max
0j<p g(Ya(x)
j,x ) (16)
(this also corresponds to define Ga(x) as Ga(x)(x)). With this definitions, we have the following results.
Proposition 3.1. (i)Vn(x)is the minimum of Jn(x, .)over feedback controls:
Vn(x) = min
a∈ANnJn(x, a), x Rd.
(ii)For all 0nN1,Vnsatisfies the following dynamic programming principle over feedback
controls
Vn(x) = min
a∈A Ga(x)_Vn+1(Fa(x)), x Rd, n = 0, . . . , N 1 (17)
and in particular, the infimum is reached by some some ¯an∈ A.
5
摘要:

Neuralnetworksfor rstorderHJBequationsandapplicationtofrontpropagationwithobstacleterms*OlivierBokanowski,„…AverilProst,§XavierWarin¶April7,2025AbstractWeconsideradeterministicoptimalcontrolproblemwithamaximumrunningcostfunctional,ina nitehorizoncontext,andproposedeepneuralnetworkapproximationsforBe...

展开>> 收起<<
Neural networks for rst order HJB equations and application to front propagation with obstacle terms Olivier BokanowskiAveril ProstXavier Warin.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:33 页 大小:4.6MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注