developed using Lyapunov analysis are used to guarantee the stability of the state estimation error dynamics [11], [12]. The
state estimates are then used in a model-based reinforcement learning (MBRL) framework to design a controller that ensures
the stability of the closed-loop system during learning.
This observer architecture is motivated by the observer design technique developed in [10] which introduces a third observer
gain to cancel nonconvex terms in the semidefinite condition. When compared with the extended Luenberger observer in
[13], which just has a linear correction term and a nonlinear injection term, the method developed in [10] extends to a less
conservative class of nonlinear systems while ensuring uniform asymptotic convergence. This paper offers a modification to
the observer structure in [10], [14] with fewer restrictions on the class of nonlinear systems. The observer is then combined
with an MBRL-based controller to optimize a given performance objective.
The goal of the MBRL is to learn an optimal controller to approximate the value function, and subsequently, the optimal
policy, for an input-constrained nonlinear system. While adaptive optimal control methods have been extensively studied
in the literature to solve the online optimal control problem, [2], [6]–[9], [15]–[19], most existing results require full state
feedback. In this paper, an output feedback problem is solved for systems with a linear measurement model. Furthermore,
unlike actor-critic MBRL methods popular in the literature [3], [20], this paper presents a critic-only structure to provide
an approximate solution of the Hamilton–Jacobi–Bellman (HJB) equation that requires the identification of fewer free
parameters. Lyapunov methods are used to show that the states of the system, the state estimation error, and the critic
weights are locally uniformly ultimately bounded (UUB) for all time starting from any initial condition.
The main contributions are as follows:
1) This paper uses an observer with bounded Jacobian for fast state estimation and an online RL critic-only architecture
to learn a controller that keeps the input-constrained nonlinear systems stable during the learning phase. This novel
architecture is different from existing NN network observers in [17], [21]–[25] whose convergence analysis relies solely
on negative terms that result from a σ−modification-like term added to the weight update laws, and as a result, similar
to adaptive control, the convergence of the observer weights to their true values cannot be expected, and convergence
of state estimates to the true states is not robust to disturbances and approximation errors.
2) This paper proposes a robust output feedback RL method for a nonlinear control-affine system with a general Cmatrix.
The method in this paper does not require restrictions on the form and rank of the Cmatrix which is different from
most NN observers in the literature [17], [23], [24]. A drawback of existing state feedback control methods, like [23],
is that the substitution x=C+yimplicitly restricts the proof to systems where the number of outputs is larger than
the number of states, which invalidates the point of output feedback control.
The rest of the paper is organized as follows: Section II contains the problem formulation, Section III introduces the
state estimator/observer, Section IV presents the Multiplier matrices and sector Conditions, Section V contains control
design using MBRL methods, Section VI contains stability analysis of the developed architecture, Section VII contains
simulation results and Section VIII concludes the paper.