Attitude Control of Highly Maneuverable Aircraft Using an Improved Q-learning Mohsen. ZahmatkeshSeyyed. Ali. Emami

2025-05-02 0 0 1.11MB 7 页 10玖币
侵权投诉
Attitude Control of Highly Maneuverable
Aircraft Using an Improved Q-learning
Mohsen. Zahmatkesh Seyyed. Ali. Emami
Afshin. Banazadeh Paolo. Castaldi ∗∗
Aerospace Engineering Department, Sharif University of Technology,
Tehran, Iran (e-mail: banazadeh@sharif.edu).
∗∗ Department of Electrical, Electronic and Information Engineering
”Guglielmo Marconi”, University of Bologna, Via Dell’Universit‘a 50,
Cesena, Italy (e-mail: paolo.castaldi@unibo.it)
Abstract: Attitude control of a novel regional truss-braced wing aircraft with low stability
characteristics is addressed in this paper using Reinforcement Learning (RL). In recent years,
RL has been increasingly employed in challenging applications, particularly, autonomous
flight control. However, a significant predicament confronting discrete RL algorithms is the
dimension limitation of the state-action table and difficulties in defining the elements of the
RL environment. To address these issues, in this paper, a detailed mathematical model of the
mentioned aircraft is first developed to shape an RL environment. Subsequently, Q-learning,
the most prevalent discrete RL algorithm will be implemented in both the Markov Decision
Process (MDP), and Partially Observable Markov Decision Process (POMDP) frameworks to
control the longitudinal mode of the air vehicle. In order to eliminate residual fluctuations
that are a consequence of discrete action selection, and simultaneously track variable pitch
angles, a Fuzzy Action Assignment (FAA) method is proposed to generate continuous control
commands using the trained Q-table. Accordingly, it will be proved that by defining an accurate
reward function, along with observing all crucial states (which is equivalent to satisfy the
Markov Property), the performance of the introduced control system surpasses a well-tuned
Proportional–Integral–Derivative (PID) controller.
Keywords: Reinforcement Learning, Q-learning, Fuzzy Q-learning, Attitude Control,
Truss-braced Wing, Flight Control
1. INTRODUCTION
The aviation industry is expeditiously growing due to
world demands such as reducing fuel burn, emissions, and
cost, as well as providing the faster and safer flight. This
motivates the advent of new airplanes with novel config-
urations. In addition, the scope clause agreement limits
the number of seats in each aircraft and flight outsourcing
to protect the union pilot jobs. This factor leads to an
increase in production of the Modern Regional Jet (MRJ)
airplane. In this regard, the importance of a safe flight
becomes more vital considering more crowded airspace
and new aircraft configurations having the ability to fly
faster. Truss-braced wing aircraft is one of the re-raised
high-performance configurations, which has attracted sig-
nificant attention from both academia (Li et al., 2022) and
industry (Sarode, 2022) due to its fuel burn efficiency. As
a result, there would be a growing need for reliable model-
ing and simulations, analyzing the flight handling quality,
and stability analysis for such configurations (Nguyen and
Xiong, 2022; Zavaree et al., 2021), while very few studies
have addressed the flight control design for this aircraft.
In the last decades, various classic methods for aircraft
attitude control have been developed to enhance control
performance. However, the most significant deficiency of
these approaches is the insufficient capability to deal with
unexpected flight conditions, while typically requiring a
detailed dynamic model of the system.
Recently, the application of Reinforcement Learning (RL)
has been extended to real problems, particularly, flight
control design (Emami et al., 2022). Generally, there are
two main frameworks to incorporate RL in the control
design process, i.e., the high-level and low-level control
systems. In Xi et al. (2022), a Soft Actor-Critic (SAC)
algorithm was implemented in a path planning problem
for a long-endurance solar-powered UAV with energy-
consuming considerations. Another work (Bøhn et al.,
2021) concentrated on the inner loop control of a Sky-
walker X8 using SAC and comparing it with a PID con-
troller. In Yang et al. (2020) a ANN based Q-learning hor-
izontal trajectory tracking controller was developed based
on the MDP model of an airship with fine stability charac-
teristics. Apart from the previous method, Proximal Policy
Optimization (PPO) was utilized in Hu et al. (2022) for
orientation control of a common strongly dynamic coupled
fixed-wing aircraft in the stall condition. The PPO was
successful to be converged after 100000 episodes. However,
useful to say that the PPO performance is adequate to
optimize PID controllers (Dammen, 2022).
arXiv:2210.12317v1 [eess.SY] 22 Oct 2022
There are some papers on maneuver flight such as landing
phase control both in inner and outer loops. For instance,
in Wang et al. (2018), a Deep Q-learning (DQL) is used to
guide an aircraft to land in the desired field. In Yuan et al.
(2019), a Deep Deterministic Policy Gradient (DDPG) was
implemented for a UAV to control either path-tracking
for landing glide slope and attitude control for landing
flare section. Similarly, a DDPG method in Tang and Lai
(2020) is used to control outer loop of a landing procedure
in presence of wind disturbance. The works which have
been referred to so far accompanied ANNs to be able to
converge. But to our best knowledge, there is research
in attitude control using discrete RL without aiming
ANNs. In Richter et al. (2022), a Q-learning algorithm was
implemented to control longitudinal and lateral angles in a
general aviation aircraft(Cessna 172). This airplane profits
suitable stability characteristics and also desired angles are
zero. There are some fuzzy adaptations on Watkins and
Dayan (1992) work like Glorennec and Jouffe (1997) where
the Q-functions and action selection strategy are inferred
from fuzzy rules. Also, Er and Deng (2004) proposed a
dynamic fuzzy Q-learning for online and continuous tasks
in mobile robots.
Motivated by the above discussions, the main contribu-
tions of the current study can be summarized as follows:
a) A truss-braced wing aircraft (Chaka 50) (1) with poor
stability characteristics has been selected carefully for
attitude control alongside responding to global aviation
society demands. b) It will be proven that the Q-learning
performance in control problems strictly depends on re-
ward function and problem definition. So, it is able to
have prosperous results even in a low stable high degree of
freedom plants. c) In this work, the performance of Q-
learning will be examined in both MDP and POMDP
problem modelings. Also, the learned Q-table is used
to generate continuous elevator deflections using Fuzzy
Action Assignment (FAA) illustrating Q-table capability
tracking the desired angle and also variable angles.
Fig. 1. Chaka MRJ Family (Zavaree et al., 2021)
2. MODELING AND SIMULATION
Nonlinear conservation of linear and angular momentum
equations are used for modeling and simulation according
to Zipfel (2014).
m"˙u
˙v
˙w#+m"0r q
r0p
q p 0#"u
v
w#="FAx+Tx
FAy+Ty
FAz+Tz#+m"gx
gy
gz#
(1)
"Ix0 0
0Iy0
0 0 Iz#"˙p
˙q
˙r#+"0r q
r0p
q p 0#"Ix0 0
0Iy0
0 0 Iz#"p
q
r#="LA+T
MA+T
NA+T#
(2)
where assuming the moments of thrust are equal to zero
and also the thrust is only implying in xdirection. There-
fore, LT=MT=NT=FTy=FTz= 0. and the aerody-
namic forces and moments in body axis are as follows:
"FAx
FAz
MA#B
= ¯qS¯c
cL0cLαcL˙αcLucLqcLδE
cD0cDαcD˙αcDucDqcDδE
cm0cmαcm˙αcmucmqcmδE
1
α
˙α¯c
2VP1
u
VP1
q¯c
2VP1
δE
(3)
The vector of gravity acceleration in the body axis defined
by (1) is as follows:
"gx
gy
gz#B
=(gsin(θ)
gcos(θ) sin(φ)
gcos(θ) cos(φ))(4)
And also, the rotational kinematic equations are necessary
for transfer from body to inertia coordinates.
˙
φ
˙
θ
˙
ψ
="1 sin ϕtan θcos ϕtan θ
0 cos ϕsin ϕ
0 sin ϕ/ cos θcos ϕ/ cos θ#"p
q
r#(5)
using (1), and (5), velocity vector in inertia coordinate is
achievable. For contraction reasons; sin = s,cos = c.
"˙x
˙y
˙z#I
="cψcθcψsθsϕsψcϕcψsθcϕ+ sψsϕ
sψcθsψsθsϕ+ cψcϕsψsθcϕcψsϕ
sθcθsϕcθcϕ#"u
v
w#B
(6)
Stability and control derivatives for the Chaka-50 are
reported in Zavaree et al. (2021) based on Computational
Fluid Dynamics (CFD). The summary of these derivatives
for two flight phases is given in table (1). before six-degree-
of-freedom (6DoF) simulation using equations (1-2), the
trim conditions in a wings-level flight are calculated for
simulation verification based on trim equations in Roskam
(1998). In drag equation, absolute value of δE,iH1,α1is
considered. Also, flight path angle γ1, motor installation
angle φT, and horizontal tail incidence angle iH, are zero.
The elevator deflection δE, and required thrust T1for a
trim flight is obtained and shown in table (2). The values in
the table (2) are important for 6DoF simulation validation.
Atmospheric Disturbance and Sensor Measurement Noise
This research has utilized the Dryden atmosphere tur-
bulence for its simple mathematical modeling.
Gw(s) = σwrLw
πu1"1 + 3Lw
u1s
1+(Lw
u1s)2#(7)
This model is applied in wdirection where Lw=h=
100m,σw= 10, and u1= 160 m
s. In addition, the sensor
noise is defined as 10% of sensor measurement. Also, the
geometric, mass, and moment of inertia data are given in
the table (3).
摘要:

AttitudeControlofHighlyManeuverableAircraftUsinganImprovedQ-learningMohsen.ZahmatkeshSeyyed.Ali.EmamiAfshin.BanazadehPaolo.CastaldiAerospaceEngineeringDepartment,SharifUniversityofTechnology,Tehran,Iran(e-mail:banazadeh@sharif.edu).DepartmentofElectrical,ElectronicandInformationEngineering"G...

展开>> 收起<<
Attitude Control of Highly Maneuverable Aircraft Using an Improved Q-learning Mohsen. ZahmatkeshSeyyed. Ali. Emami.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:1.11MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注