Output Feedback Adaptive Optimal Control of Afﬁne Nonlinear systems with a Linear Measurement Model Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1

2025-04-26 0 0 476.91KB 16 页 10玖币

Output Feedback Adaptive Optimal Control of Afﬁne Nonlinear systems

with a Linear Measurement Model

Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1

Abstract

Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties

and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement

learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the

learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that

learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood

of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and

demonstrated through simulation results, under mild excitation conditions.

I. INTRODUCTION

Reinforcement learning (RL) has proven to be robust to modeling errors in dynamic systems and capable of addressing

parametric uncertainties, ensuring a fast convergence rate to the optimal solution while maintaining stability regardless of

disturbances in the system [1]–[3].

Real-time state estimation while maintaining system stability is offered by model-based reinforcement learning (MBRL)

through the use of neural networks (NNs) to achieve fast approximation without complete knowledge of the dynamics of the

system [4], [5]. In the absence of full state measurement information, MBRL controllers in [6]–[9], tend to perform poorly

since the excitation conditions require the accuracy of the estimated model to guarantee the closed-loop stability of the system.

To address this, the semi-deﬁnite programming (SDP) observer design technique in [10] is augmented to provide an accurate

estimated model for the MBRL controller which will yield a more robust framework to model uncertainties/disturbances

and achieve an optimal solution while guaranteeing the stability of the closed-loop system.

In this paper, an observer for state estimation using semi-deﬁnite programming (SDP) to search for the extended Luenberger

observer gains is developed for continuous-time nonlinear systems. The observer developed in this paper uses the multiplier

matrix approach to develop sufﬁcient conditions for stability of the nonlinear control afﬁne system with partially constrained

input. By placing bounds on the derivatives of the drift and control effectiveness functions of the system, sufﬁcient conditions

*This research was supported in part by Ofﬁce of Naval Research under award number N00014-21-1-2481 and the Air Force Research Laboratories

under contract number FA8651-19-2-0009. Any opinions, ﬁndings, or recommendations in this article are those of the author(s), and do not necessarily

reﬂect the views of the sponsoring agencies.

1School of Mechanical and Aerospace Engineering, Oklahoma State University, email: {tochukwu.ogri, rushikesh.kamalapurkar}

@okstate.edu.

2School of Aeronautics and Astronautics, Purdue University, West Lafayette, 47907, USA, e-mail: {mahmud7}@purdue.edu.

3Air Force Research Laboratories, Florida, USA, email: zachary.bell.10@us.af.mil.

arXiv:2210.06637v4 [eess.SY] 3 Apr 2023

developed using Lyapunov analysis are used to guarantee the stability of the state estimation error dynamics [11], [12]. The

state estimates are then used in a model-based reinforcement learning (MBRL) framework to design a controller that ensures

the stability of the closed-loop system during learning.

This observer architecture is motivated by the observer design technique developed in [10] which introduces a third observer

gain to cancel nonconvex terms in the semideﬁnite condition. When compared with the extended Luenberger observer in

[13], which just has a linear correction term and a nonlinear injection term, the method developed in [10] extends to a less

conservative class of nonlinear systems while ensuring uniform asymptotic convergence. This paper offers a modiﬁcation to

the observer structure in [10], [14] with fewer restrictions on the class of nonlinear systems. The observer is then combined

with an MBRL-based controller to optimize a given performance objective.

The goal of the MBRL is to learn an optimal controller to approximate the value function, and subsequently, the optimal

policy, for an input-constrained nonlinear system. While adaptive optimal control methods have been extensively studied

in the literature to solve the online optimal control problem, [2], [6]–[9], [15]–[19], most existing results require full state

feedback. In this paper, an output feedback problem is solved for systems with a linear measurement model. Furthermore,

unlike actor-critic MBRL methods popular in the literature [3], [20], this paper presents a critic-only structure to provide

an approximate solution of the Hamilton–Jacobi–Bellman (HJB) equation that requires the identiﬁcation of fewer free

parameters. Lyapunov methods are used to show that the states of the system, the state estimation error, and the critic

weights are locally uniformly ultimately bounded (UUB) for all time starting from any initial condition.

The main contributions are as follows:

1) This paper uses an observer with bounded Jacobian for fast state estimation and an online RL critic-only architecture

to learn a controller that keeps the input-constrained nonlinear systems stable during the learning phase. This novel

architecture is different from existing NN network observers in [17], [21]–[25] whose convergence analysis relies solely

on negative terms that result from a σ−modiﬁcation-like term added to the weight update laws, and as a result, similar

to adaptive control, the convergence of the observer weights to their true values cannot be expected, and convergence

of state estimates to the true states is not robust to disturbances and approximation errors.

2) This paper proposes a robust output feedback RL method for a nonlinear control-afﬁne system with a general Cmatrix.

The method in this paper does not require restrictions on the form and rank of the Cmatrix which is different from

most NN observers in the literature [17], [23], [24]. A drawback of existing state feedback control methods, like [23],

is that the substitution x=C+yimplicitly restricts the proof to systems where the number of outputs is larger than

the number of states, which invalidates the point of output feedback control.

The rest of the paper is organized as follows: Section II contains the problem formulation, Section III introduces the

state estimator/observer, Section IV presents the Multiplier matrices and sector Conditions, Section V contains control

design using MBRL methods, Section VI contains stability analysis of the developed architecture, Section VII contains

simulation results and Section VIII concludes the paper.

II. PROBLEM FORMULATION

This paper considers nonlinear dynamical systems of the form,

˙x=f(x) + g(x)u, y =Cx, (1)

where x∈Rnis the system state , u∈Rmis the control input, C∈Rq×nis the output matrix, and y∈Rqis the measured

output. The functions f:Rn→Rn, and g:Rn→Rn×m, denote the drift and the control effectiveness matrix, respectively.

Assumption 1. The functions fand gare known, their derivatives exist on a compact set C ⊂ Rn, and satisfy the element-

wise bounds

(Mf1)i,j ≤d(f(x))i

d(x)j

≤(Mf2)i,j ,(2)

(Mg1)i,j,k ≤d(g(x))i,k

d(x)j

≤(Mg2)i,j,k,(3)

for all x∈ C,i, j = 1, . . . , n and k= 1, . . . , m, where (·)i,j,k and (·)i,j , and (·)idenote the element of the array (·)at the

indices indicated by the subscript.

Remark 1.The conditions stated in Assumption 1 are commonly required in several observer design schemes (see, e.g.,

[10], [26]–[28].

The objective is to design a uniformly asymptotic observer to estimate the states online, using input-output measurements,

and to simultaneously synthesize and utilize an estimate of a controller that minimizes the cost functional deﬁned in (32),

under the saturation constraint |(u)i| ≤ ¯

λ > 0for i= 1, . . . , m while ensuring local uniform ultimate boundedness of the

trajectories of the system in (1).

III. STATE ESTIMATOR

In this section, a state estimator inspired by the extended Luenberger observer is developed to generate estimates of x.

Let the nonlinear dynamics in (1) be expressed in the form below,

˙x=Mf1x+Mg1ux +¯

f(x) + ¯gu(x, u),(4)

where

f(x) = −Mf1x+f(x),and (5)

¯gu(x, u) = −Mg1ux +

i=1

gi(x)(u)i.(6)

The derivatives of ¯

fand ¯gsatisfy the element-wise inequalities

0≤d( ¯

f(x))i

d(x)j

≤(Mf2)i,j −(Mf1)i,j and (7)

0≤d(¯gu(x, u))i,k

d(x)j

≤[(Mg2)i,j,k −(Mg1)i,j,k] (u)k,(8)

where i, j := 1, . . . , n, and k:= 1, . . . , m. Thus, ¯

Mf1= 0n×n,¯

Mf2=Mf2−Mf1,¯

Mg1= 0n×n×mand ¯

Mg2=Mg2−Mg1.

To simplify the notation, n×n×1arrays are treated as n×nmatrices in the following development.

Using the derivative bounds, a state estimator with three correction terms is designed as

ˆx=Mf1ˆx+Mg1uˆx+¯

f[ˆx+H(y−Cˆx)] + ¯gu[ˆx+K(y−Cˆx), u] + L(y−Cˆx),(9)

where ˆx∈Rnis the estimate of x,H∈Rn×q,K∈Rn×qand L∈Rn×qare observer gains, H(y−Cˆx)and K(y−Cˆx)

are nonlinear injection terms and L(y−Cˆx)is a linear correction term. The estimation error is deﬁned as

e=x−ˆx, (10)

and the estimation error dynamics is given by

˙e= (Mf1+Mg1u−LC)e+¯

f(x) + ¯gu(x, u)−¯

f[ˆx+H(y−Cˆx)] −¯gu[ˆx+K(y−Cˆx), u].(11)

For convenience of notation, let ¯

φf(t, e):=¯

f(x)−¯

f[ˆx+H(y−Cˆx)],¯

φg(t, e, u):= ¯gu(x, u)−¯gu[ˆx+K(y−Cˆx), u],

Mug :=Mgu,¯

Mug1:= 0n×n×m, and ¯

Mug2:= (Mg2−Mg1)u.

The differential mean value theorem (DMVT) in [29, Theorem 2.1] guarantees that the difference function ¯

φf(t, e)is

proportional to x−ˆx, and can be expressed as

φf(t, e) = ¯

Mf(I−HC)(x−ˆx).(12)

where ¯

Mfis a time-varying matrix but always constrained in a compact set deﬁned by ¯

Mf1,¯

Mf2in (7). Similarly,

φg(t, e, u) = ¯

Mgu(I−KC)(x−ˆx),(13)

where the proportional factor ¯

Mgis a time-varying 3-dimensional array that is constrained in a compact set deﬁned by ¯

Mg1,

Mg2in (8).

IV. MULTIPLIER FORMULATION AND SECTOR CONDITIONS

In this section, the conditions sufﬁcient for Lyapunov stability are derived by designing multiplier matrices that characterize

the nonlinear functions fand g. As shown in [11], using the multiplier matrix approach in the analysis and control of nonlinear

systems, stability can be achieved if the conditions developed in this section are satisﬁed.

The DMVT implies that the difference functions ¯

φf(t, e)and ¯

φg(t, e, u)are bounded as

Mf1(I−HC)e≤¯

φf(t, e)≤¯

Mf2(I−HC)e, (14)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OutputFeedbackAdaptiveOptimalControlofAfneNonlinearsystemswithaLinearMeasurementModelTochukwuElijahOgri1SMNahidMahmud2ZacharyI.Bell3RushikeshKamalapurkar1AbstractReal-worldcontrolapplicationsincomplexanduncertainenvironmentsrequireadaptabilitytohandlemodeluncertaintiesandrobustnessagainstdisturbanc...

展开>> 收起<<

Output Feedback Adaptive Optimal Control of Afﬁne Nonlinear systems with a Linear Measurement Model Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Output Feedback Adaptive Optimal Control of Afﬁne Nonlinear systems with a Linear Measurement Model Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: