Output Feedback Adaptive Optimal Control of Affine Nonlinear systems with a Linear Measurement Model Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1

2025-04-26 0 0 476.91KB 16 页 10玖币
侵权投诉
Output Feedback Adaptive Optimal Control of Affine Nonlinear systems
with a Linear Measurement Model
Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1
Abstract
Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties
and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement
learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the
learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that
learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood
of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and
demonstrated through simulation results, under mild excitation conditions.
I. INTRODUCTION
Reinforcement learning (RL) has proven to be robust to modeling errors in dynamic systems and capable of addressing
parametric uncertainties, ensuring a fast convergence rate to the optimal solution while maintaining stability regardless of
disturbances in the system [1]–[3].
Real-time state estimation while maintaining system stability is offered by model-based reinforcement learning (MBRL)
through the use of neural networks (NNs) to achieve fast approximation without complete knowledge of the dynamics of the
system [4], [5]. In the absence of full state measurement information, MBRL controllers in [6]–[9], tend to perform poorly
since the excitation conditions require the accuracy of the estimated model to guarantee the closed-loop stability of the system.
To address this, the semi-definite programming (SDP) observer design technique in [10] is augmented to provide an accurate
estimated model for the MBRL controller which will yield a more robust framework to model uncertainties/disturbances
and achieve an optimal solution while guaranteeing the stability of the closed-loop system.
In this paper, an observer for state estimation using semi-definite programming (SDP) to search for the extended Luenberger
observer gains is developed for continuous-time nonlinear systems. The observer developed in this paper uses the multiplier
matrix approach to develop sufficient conditions for stability of the nonlinear control affine system with partially constrained
input. By placing bounds on the derivatives of the drift and control effectiveness functions of the system, sufficient conditions
*This research was supported in part by Office of Naval Research under award number N00014-21-1-2481 and the Air Force Research Laboratories
under contract number FA8651-19-2-0009. Any opinions, findings, or recommendations in this article are those of the author(s), and do not necessarily
reflect the views of the sponsoring agencies.
1School of Mechanical and Aerospace Engineering, Oklahoma State University, email: {tochukwu.ogri, rushikesh.kamalapurkar}
@okstate.edu.
2School of Aeronautics and Astronautics, Purdue University, West Lafayette, 47907, USA, e-mail: {mahmud7}@purdue.edu.
3Air Force Research Laboratories, Florida, USA, email: zachary.bell.10@us.af.mil.
arXiv:2210.06637v4 [eess.SY] 3 Apr 2023
developed using Lyapunov analysis are used to guarantee the stability of the state estimation error dynamics [11], [12]. The
state estimates are then used in a model-based reinforcement learning (MBRL) framework to design a controller that ensures
the stability of the closed-loop system during learning.
This observer architecture is motivated by the observer design technique developed in [10] which introduces a third observer
gain to cancel nonconvex terms in the semidefinite condition. When compared with the extended Luenberger observer in
[13], which just has a linear correction term and a nonlinear injection term, the method developed in [10] extends to a less
conservative class of nonlinear systems while ensuring uniform asymptotic convergence. This paper offers a modification to
the observer structure in [10], [14] with fewer restrictions on the class of nonlinear systems. The observer is then combined
with an MBRL-based controller to optimize a given performance objective.
The goal of the MBRL is to learn an optimal controller to approximate the value function, and subsequently, the optimal
policy, for an input-constrained nonlinear system. While adaptive optimal control methods have been extensively studied
in the literature to solve the online optimal control problem, [2], [6]–[9], [15]–[19], most existing results require full state
feedback. In this paper, an output feedback problem is solved for systems with a linear measurement model. Furthermore,
unlike actor-critic MBRL methods popular in the literature [3], [20], this paper presents a critic-only structure to provide
an approximate solution of the Hamilton–Jacobi–Bellman (HJB) equation that requires the identification of fewer free
parameters. Lyapunov methods are used to show that the states of the system, the state estimation error, and the critic
weights are locally uniformly ultimately bounded (UUB) for all time starting from any initial condition.
The main contributions are as follows:
1) This paper uses an observer with bounded Jacobian for fast state estimation and an online RL critic-only architecture
to learn a controller that keeps the input-constrained nonlinear systems stable during the learning phase. This novel
architecture is different from existing NN network observers in [17], [21]–[25] whose convergence analysis relies solely
on negative terms that result from a σmodification-like term added to the weight update laws, and as a result, similar
to adaptive control, the convergence of the observer weights to their true values cannot be expected, and convergence
of state estimates to the true states is not robust to disturbances and approximation errors.
2) This paper proposes a robust output feedback RL method for a nonlinear control-affine system with a general Cmatrix.
The method in this paper does not require restrictions on the form and rank of the Cmatrix which is different from
most NN observers in the literature [17], [23], [24]. A drawback of existing state feedback control methods, like [23],
is that the substitution x=C+yimplicitly restricts the proof to systems where the number of outputs is larger than
the number of states, which invalidates the point of output feedback control.
The rest of the paper is organized as follows: Section II contains the problem formulation, Section III introduces the
state estimator/observer, Section IV presents the Multiplier matrices and sector Conditions, Section V contains control
design using MBRL methods, Section VI contains stability analysis of the developed architecture, Section VII contains
simulation results and Section VIII concludes the paper.
II. PROBLEM FORMULATION
This paper considers nonlinear dynamical systems of the form,
˙x=f(x) + g(x)u, y =Cx, (1)
where xRnis the system state , uRmis the control input, CRq×nis the output matrix, and yRqis the measured
output. The functions f:RnRn, and g:RnRn×m, denote the drift and the control effectiveness matrix, respectively.
Assumption 1. The functions fand gare known, their derivatives exist on a compact set C Rn, and satisfy the element-
wise bounds
(Mf1)i,j d(f(x))i
d(x)j
(Mf2)i,j ,(2)
(Mg1)i,j,k d(g(x))i,k
d(x)j
(Mg2)i,j,k,(3)
for all x∈ C,i, j = 1, . . . , n and k= 1, . . . , m, where (·)i,j,k and (·)i,j , and (·)idenote the element of the array (·)at the
indices indicated by the subscript.
Remark 1.The conditions stated in Assumption 1 are commonly required in several observer design schemes (see, e.g.,
[10], [26]–[28].
The objective is to design a uniformly asymptotic observer to estimate the states online, using input-output measurements,
and to simultaneously synthesize and utilize an estimate of a controller that minimizes the cost functional defined in (32),
under the saturation constraint |(u)i| ≤ ¯
λ > 0for i= 1, . . . , m while ensuring local uniform ultimate boundedness of the
trajectories of the system in (1).
III. STATE ESTIMATOR
In this section, a state estimator inspired by the extended Luenberger observer is developed to generate estimates of x.
Let the nonlinear dynamics in (1) be expressed in the form below,
˙x=Mf1x+Mg1ux +¯
f(x) + ¯gu(x, u),(4)
where
¯
f(x) = Mf1x+f(x),and (5)
¯gu(x, u) = Mg1ux +
m
X
i=1
gi(x)(u)i.(6)
The derivatives of ¯
fand ¯gsatisfy the element-wise inequalities
0d( ¯
f(x))i
d(x)j
(Mf2)i,j (Mf1)i,j and (7)
0d(¯gu(x, u))i,k
d(x)j
[(Mg2)i,j,k (Mg1)i,j,k] (u)k,(8)
where i, j := 1, . . . , n, and k:= 1, . . . , m. Thus, ¯
Mf1= 0n×n,¯
Mf2=Mf2Mf1,¯
Mg1= 0n×n×mand ¯
Mg2=Mg2Mg1.
To simplify the notation, n×n×1arrays are treated as n×nmatrices in the following development.
Using the derivative bounds, a state estimator with three correction terms is designed as
˙
ˆx=Mf1ˆx+Mg1uˆx+¯
f[ˆx+H(yCˆx)] + ¯gu[ˆx+K(yCˆx), u] + L(yCˆx),(9)
where ˆxRnis the estimate of x,HRn×q,KRn×qand LRn×qare observer gains, H(yCˆx)and K(yCˆx)
are nonlinear injection terms and L(yCˆx)is a linear correction term. The estimation error is defined as
e=xˆx, (10)
and the estimation error dynamics is given by
˙e= (Mf1+Mg1uLC)e+¯
f(x) + ¯gu(x, u)¯
f[ˆx+H(yCˆx)] ¯gu[ˆx+K(yCˆx), u].(11)
For convenience of notation, let ¯
φf(t, e):=¯
f(x)¯
f[ˆx+H(yCˆx)],¯
φg(t, e, u):= ¯gu(x, u)¯gu[ˆx+K(yCˆx), u],
Mug :=Mgu,¯
Mug1:= 0n×n×m, and ¯
Mug2:= (Mg2Mg1)u.
The differential mean value theorem (DMVT) in [29, Theorem 2.1] guarantees that the difference function ¯
φf(t, e)is
proportional to xˆx, and can be expressed as
¯
φf(t, e) = ¯
Mf(IHC)(xˆx).(12)
where ¯
Mfis a time-varying matrix but always constrained in a compact set defined by ¯
Mf1,¯
Mf2in (7). Similarly,
¯
φg(t, e, u) = ¯
Mgu(IKC)(xˆx),(13)
where the proportional factor ¯
Mgis a time-varying 3-dimensional array that is constrained in a compact set defined by ¯
Mg1,
¯
Mg2in (8).
IV. MULTIPLIER FORMULATION AND SECTOR CONDITIONS
In this section, the conditions sufficient for Lyapunov stability are derived by designing multiplier matrices that characterize
the nonlinear functions fand g. As shown in [11], using the multiplier matrix approach in the analysis and control of nonlinear
systems, stability can be achieved if the conditions developed in this section are satisfied.
The DMVT implies that the difference functions ¯
φf(t, e)and ¯
φg(t, e, u)are bounded as
¯
Mf1(IHC)e¯
φf(t, e)¯
Mf2(IHC)e, (14)
摘要:

OutputFeedbackAdaptiveOptimalControlofAfneNonlinearsystemswithaLinearMeasurementModelTochukwuElijahOgri1SMNahidMahmud2ZacharyI.Bell3RushikeshKamalapurkar1AbstractReal-worldcontrolapplicationsincomplexanduncertainenvironmentsrequireadaptabilitytohandlemodeluncertaintiesandrobustnessagainstdisturbanc...

展开>> 收起<<
Output Feedback Adaptive Optimal Control of Affine Nonlinear systems with a Linear Measurement Model Tochukwu Elijah Ogri1S M Nahid Mahmud2Zachary I. Bell3Rushikesh Kamalapurkar1.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:476.91KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注