
Attitude Control of Highly Maneuverable
Aircraft Using an Improved Q-learning
Mohsen. Zahmatkesh ∗Seyyed. Ali. Emami ∗
Afshin. Banazadeh ∗Paolo. Castaldi ∗∗
∗Aerospace Engineering Department, Sharif University of Technology,
Tehran, Iran (e-mail: banazadeh@sharif.edu).
∗∗ Department of Electrical, Electronic and Information Engineering
”Guglielmo Marconi”, University of Bologna, Via Dell’Universit‘a 50,
Cesena, Italy (e-mail: paolo.castaldi@unibo.it)
Abstract: Attitude control of a novel regional truss-braced wing aircraft with low stability
characteristics is addressed in this paper using Reinforcement Learning (RL). In recent years,
RL has been increasingly employed in challenging applications, particularly, autonomous
flight control. However, a significant predicament confronting discrete RL algorithms is the
dimension limitation of the state-action table and difficulties in defining the elements of the
RL environment. To address these issues, in this paper, a detailed mathematical model of the
mentioned aircraft is first developed to shape an RL environment. Subsequently, Q-learning,
the most prevalent discrete RL algorithm will be implemented in both the Markov Decision
Process (MDP), and Partially Observable Markov Decision Process (POMDP) frameworks to
control the longitudinal mode of the air vehicle. In order to eliminate residual fluctuations
that are a consequence of discrete action selection, and simultaneously track variable pitch
angles, a Fuzzy Action Assignment (FAA) method is proposed to generate continuous control
commands using the trained Q-table. Accordingly, it will be proved that by defining an accurate
reward function, along with observing all crucial states (which is equivalent to satisfy the
Markov Property), the performance of the introduced control system surpasses a well-tuned
Proportional–Integral–Derivative (PID) controller.
Keywords: Reinforcement Learning, Q-learning, Fuzzy Q-learning, Attitude Control,
Truss-braced Wing, Flight Control
1. INTRODUCTION
The aviation industry is expeditiously growing due to
world demands such as reducing fuel burn, emissions, and
cost, as well as providing the faster and safer flight. This
motivates the advent of new airplanes with novel config-
urations. In addition, the scope clause agreement limits
the number of seats in each aircraft and flight outsourcing
to protect the union pilot jobs. This factor leads to an
increase in production of the Modern Regional Jet (MRJ)
airplane. In this regard, the importance of a safe flight
becomes more vital considering more crowded airspace
and new aircraft configurations having the ability to fly
faster. Truss-braced wing aircraft is one of the re-raised
high-performance configurations, which has attracted sig-
nificant attention from both academia (Li et al., 2022) and
industry (Sarode, 2022) due to its fuel burn efficiency. As
a result, there would be a growing need for reliable model-
ing and simulations, analyzing the flight handling quality,
and stability analysis for such configurations (Nguyen and
Xiong, 2022; Zavaree et al., 2021), while very few studies
have addressed the flight control design for this aircraft.
In the last decades, various classic methods for aircraft
attitude control have been developed to enhance control
performance. However, the most significant deficiency of
these approaches is the insufficient capability to deal with
unexpected flight conditions, while typically requiring a
detailed dynamic model of the system.
Recently, the application of Reinforcement Learning (RL)
has been extended to real problems, particularly, flight
control design (Emami et al., 2022). Generally, there are
two main frameworks to incorporate RL in the control
design process, i.e., the high-level and low-level control
systems. In Xi et al. (2022), a Soft Actor-Critic (SAC)
algorithm was implemented in a path planning problem
for a long-endurance solar-powered UAV with energy-
consuming considerations. Another work (Bøhn et al.,
2021) concentrated on the inner loop control of a Sky-
walker X8 using SAC and comparing it with a PID con-
troller. In Yang et al. (2020) a ANN based Q-learning hor-
izontal trajectory tracking controller was developed based
on the MDP model of an airship with fine stability charac-
teristics. Apart from the previous method, Proximal Policy
Optimization (PPO) was utilized in Hu et al. (2022) for
orientation control of a common strongly dynamic coupled
fixed-wing aircraft in the stall condition. The PPO was
successful to be converged after 100000 episodes. However,
useful to say that the PPO performance is adequate to
optimize PID controllers (Dammen, 2022).
arXiv:2210.12317v1 [eess.SY] 22 Oct 2022