Neural ODEs as Feedback Policies for Nonlinear Optimal Control Ilya Orson SandovalPanagiotis Petsagkourakis

2025-05-02 0 0 655.28KB 8 页 10玖币

侵权投诉

Neural ODEs as Feedback Policies for

Nonlinear Optimal Control ?

Ilya Orson Sandoval ∗Panagiotis Petsagkourakis ∗

Ehecatl Antonio del Rio-Chanona ∗

∗Centre for Process Systems Engineering

Imperial College London, London, United Kingdom

(os220@ic.ac.uk, p.petsagkourakis@imperial.ac.uk &

a.del-rio-chanona@imperial.ac.uk).

Abstract: Neural ordinary diﬀerential equations (Neural ODEs) deﬁne continuous time

dynamical systems with neural networks. The interest in their application for modelling has

sparked recently, spanning hybrid system identiﬁcation problems and time series analysis. In

this work we propose the use of a neural control policy capable of satisfying state and control

constraints to solve nonlinear optimal control problems. The control policy optimization is posed

as a Neural ODE problem to eﬃciently exploit the availability of a dynamical system model. We

showcase the eﬃcacy of this type of deterministic neural policies in two constrained systems:

the controlled Van der Pol system and a bioreactor control problem. This approach represents

a practical approximation to the intractable closed-loop solution of nonlinear control problems.

Keywords: Optimal Control, Feedback Policy, Reinforcement Learning, Adjoint Sensitivity

Analysis, Control Vector Iteration, Penalty Methods, Nonlinear Optimization.

1. INTRODUCTION

Neural policies represent the dominant approach to

parametrize controllers in Reinforcement Learning (RL)

research. Their attractiveness relies on their dimensional

scaling property and universal approximation capacity.

However, their training procedure usually relies on inef-

ﬁcient sampling based strategies to estimate the gradient

of the objective function to be optimized. On the contrary,

when an environment model is available as a diﬀerential

equation system, it is possible to leverage it for eﬃciency

through methods based on optimal control and dynamic

optimization (Ainsworth et al., 2021; Yildiz et al., 2021).

In this work we explore this setting, exploiting a neural

policy to parametrize a deterministic control function as

a state feedback controller. This approach provides an ap-

proximation to the practically intractable optimal closed-

loop policy in continuous time.

The optimization of such a controller follows the same

strategy as the training of Neural Ordinary Diﬀerential

Equations (Neural ODEs) (Chen et al., 2018). In our appli-

cation, the weights of the network only deﬁne the control

function within a predeﬁned system, instead of deﬁning

the whole diﬀerential equation system as in NeuralODEs.

To understand the training procedure, it is fruitful to

overview the close connection between adjoint sensitivity

analysis used in dynamical systems and the backpropa-

gation algorithm used in neural networks. We revise the

literature surrounding these topics and their use in recent

applications where both ﬁelds meet.

?This work has been submitted to IFAC for possible publication.

Backpropagation may be seen as a judicious application

of the chain rule in computational routines, introduced in

optimal control (Griewank, 2012) and popularized within

the neural network community (Rumelhart et al., 1986).

The ﬁrst derivations trace back to the introduction of

the Kelley-Bryson gradient method to solve multistage

nonlinear control problems (Dreyfus, 1962). In neural

networks, it may be derived from the optimization problem

where Lagrange multipliers (adjoint variables) enforcing

the transitions between states (Mizutani et al., 2000).

On continuous time optimal control problems, the opti-

mality conditions from Pontryagin’s Maximum Principle

(Pontryagin et al., 1986) establish a connection between

the sensitivities of a functional cost and the adjoint vari-

ables. This relationship is exploited in continuous sensitiv-

ity analysis, where it is used to estimate the inﬂuence of

parameters in the solution of diﬀerential equation systems

(Serban and Hindmarsh, 2005; Jorgensen, 2007). When

the time is discretized in an ODE system, backpropaga-

tion is analogous to the adjoint system of the maximum

principle (Griewank, 2012; Baydin et al., 2017), and its

use to propagate sensitivities is called discrete sensitivity

analysis. Modern diﬀerential equation solvers include im-

plementations of either continuous or discrete sensitivity

analysis, relying on the solution of secondary diﬀerential

equation systems (optimize-then-discretize) or automatic

diﬀerentiation of the integrator routines (discretize-then-

optimize). 1

1The discretize-then-optimize distinction has a diﬀerent meaning

in optimal control literature (Biegler, 2010). There it refers to di-

rect approaches to the optimization problem, which ﬁrst discretize

the dynamical equations to afterwards solve the ﬁnite-dimensional

nonlinear optimization. The optimize-then-discretize refers to any

arXiv:2210.11245v2 [math.OC] 12 Nov 2022

The adjoint approach suggested with the introduction of

Neural ODEs (Chen et al., 2018) is a modern variant of

Control Vector Iteration (CVI) (Luus, 2009), a sequential

indirect strategy that optimizes a ﬁxed vector of param-

eters by relaxing just one of the necessary conditions of

optimality 2. The relaxed condition is approximated itera-

tively along optimization rounds where the dynamical and

adjoint equations are always satisﬁed, as in feasible paths

methods (Chachuat, 2007). An algorithmic improvement

introduced by (Chen et al., 2018) is the eﬃcient use of

reverse-mode Automatic Diﬀerentiation (AD) to calculate

vector-Jacobians products that appear through the dif-

ferential equations deﬁning the optimality conditions of

the problem. This grants the possibility of dealing with

high dimensional parameter problems eﬃciently, which

is crucial for neural policies within dynamical systems

in continuous time. Furthermore, it crucially avoids the

symbolic derivations historically associated with indirect

methods including CVI in the numerical optimal control

literature (Biegler, 2010).

The use of neural networks within control systems was

explored originally in the 90s (Chen, 1989; Miller et al.,

1990) where the focus was on discrete time systems (Hunt

et al., 1992). The use of neural control policies in this

discrete vein has also attracted attention recently in non-

linear control (Rackauckas et al., 2020a; Adhau et al., 2021;

Jin et al., 2020) and MPC (Amos et al., 2018; Karg and

Lucia, 2020; Drgona et al., 2022). The gradients required

for optimization are computed through direct application

of AD over the evolution of a discrete system; a strategy

coined as diﬀerentiable control or more generally diﬀer-

entiable simulations. These strategies revive some of the

original ideas that brought interest to neural networks in

nonlinear control (Cao, 2005) with modern computational

tooling for AD calculations (Baydin et al., 2017).

Reinforcement Learning (RL) commonly leverage neural

parametrization of policies within Markov decision pro-

cesses (MPD) (Sutton et al., 1992) and has had a rising

success with dimensionality scaling thanks to deep learning

(Schmidhuber, 2015). The adaptation of RL approaches

to continuous time scenarios based on Hamilton-Jacobi-

Bellman formulations was originally explored in (Munos,

1997; Doya, 2000). A continuous time actor-critic variation

based on discrete time data was analysed more recently

in (Yildiz et al., 2021). The study of the model-free pol-

icy gradient method to continuous time was explored by

(Munos, 2006). In continuous time settings with determin-

istic dynamics as an environment, neural policies may be

trained with techniques from dynamic optimization like

CVI, avoiding noisy sampling estimations as in classic RL.

A comparison of the training performance improvement

between deterministic model-based neural policies and

model-free policy gradient was showcased in (Ainsworth

et al., 2021).

procedure that departs from the optimality conditions of the prob-

lem, also called indirect approaches. Since the adjoint equations

are part of the optimality conditions in dynamic optimization, the

classic indirect classiﬁcation includes both variants in ML literature.

From an optimization perspective, there is not diﬀerence in the

approach that calculates the gradients since both do so through

adjoints; the selection of the approach is merely practical, based on

the peculiarities of each problem (Ma et al., 2021).

2This is covered in detail in section 3.1.

With a view in practical applications, it is crucial to

be able to satisfy constraints while also optimizing the

policy performance. While this has been a major focus

in optimal control since its inception (Bryson and Ho,

1975), there has been little attention to general nonlinear

scenarios. Most works assume either linear dynamics or

ﬁxed control proﬁles instead of state feedback policies.

In relation to model-free RL, constraint enforcement is

an active area of research (Brunke et al., 2021). Recent

work has explored variants of objective penalties (Achiam

et al., 2017), Lyapunov functions (Chow et al., 2019) and

satisfaction in chance techniques (Petsagkourakis et al.,

2022).

Here we develop a strategy that allows continuous time

policies to solve general nonlinear control problems while

satisfying constraints successfully. Our approach is based

on the deterministic calculation of the cost functional gra-

dient with respect to the static feedback policy parameters

given a white-box dynamical system environment. Satu-

ration from the policy architecture enforces hard control

constraints while state constraints are enforced through re-

laxed logarithmic penalties and an adaptive barrier update

strategy. We furthermore showcase how the inclusion of the

feedback controller within the ODE deﬁnition shapes the

whole phase space of the system. This Neural ODE quality

is impossible to achieve in nonlinear systems with standard

optimal control methods that only provide controls as a

function of time.

2. PROBLEM STATEMENT

We consider continuous time optimal control problems

with ﬁxed initial condition and ﬁxed ﬁnal time. The cost

functional is in the Bolza form, including both a running

(`) and a terminal cost (φ) in the functional objective (J)

(Bryson and Ho, 1975):

min

θJ=Ztf

`(x(t), πθ(x)) dt +φ(x(tf)),

s.t. ˙x(t) = f(x(t), πθ(x), t),

x(t0) = x0,

g(x(t), πθ(x)) ≤0,

(1)

where the time window is ﬁxed t∈[t0, tf], x(t)∈Rnx

is the state, πθ(x) : Rnx→Rnuis the state feedback

controller and θ∈Rnθare its parameters.

3. METHODOLOGY

The most common parametrization in trajectory optimiza-

tion methods utilizes low-order polynomials with a prede-

ﬁned set of time intervals to approximate the controller

as a function of time (Rao, 2009; Teo et al., 2021). In

contrast, in our work the the control function is posed the

output of a parametrized state feedback controller. The

controller parameters are constant through all the inte-

gration time (they deﬁne statically the nonlinear feedback

controller) and are the only optimization variables of the

problem. This transforms the original dynamical optimiza-

tion problem into a parameter estimation one (Teo et al.,

2021). The approach approximates an optimal closed-loop

policy, which is a continuous function that is well-deﬁned

for states outside the optimal path. This quality allows

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralODEsasFeedbackPoliciesforNonlinearOptimalControl?IlyaOrsonSandovalPanagiotisPetsagkourakisEhecatlAntoniodelRio-ChanonaCentreforProcessSystemsEngineeringImperialCollegeLondon,London,UnitedKingdom(os220@ic.ac.uk,p.petsagkourakis@imperial.ac.uk&a.del-rio-chanona@imperial.ac.uk).Abstract:Neura...

展开>> 收起<<

Neural ODEs as Feedback Policies for Nonlinear Optimal Control Ilya Orson SandovalPanagiotis Petsagkourakis.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural ODEs as Feedback Policies for Nonlinear Optimal Control Ilya Orson SandovalPanagiotis Petsagkourakis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: