Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks_2

2025-04-27 0 0 2.84MB 16 页 10玖币
侵权投诉
Communication-Enabled Deep Reinforcement Learning to Optimise
Energy-Efficiency in UAV-Assisted Networks
Babatunji Omoniwaa,,Boris Galkinband Ivana Dusparica
aTrinity College Dublin, Dublin, Ireland
bTyndall National Institute, Cork, Ireland
ARTICLE INFO
Keywords:
Deep Reinforcement Learning
Energy Efficiency
UAV networks
Wireless Connectivity
Abstract
Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static
and mobile ground users in situations of increased network demand or points of failure in existing
terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge
of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the
system’s energy efficiency (EE). Recent approaches focus on optimising the system’s EE by optimising
the trajectory of UAVs serving only static ground users and neglecting mobile users. Several others
neglect the impact of interference from nearby UAV cells, assuming an interference-free network
environment. Furthermore, some works assume global spatial knowledge of ground users’ location
via a central controller (CC) that periodically scans the network perimeter and provides real-time
updates to the UAVs for decision-making. However, this assumption may be unsuitable in disaster
scenarios since it requires significant information exchange between the UAVs and CC. Moreover, it
may not be possible to track users’ locations in a disaster scenario. Despite growing research interest
in decentralised control over centralised UAVs’ control, direct collaboration among UAVs to improve
coordination while optimising the systems’ EE has not been adequately explored. To address this,
we propose a direct collaborative communication-enabled multi-agent decentralised double deep Q-
network (CMAD–DDQN) approach. The CMAD–DDQN is a collaborative algorithm that allows
UAVs to explicitly share their telemetry via existing 3GPP guidelines by communicating with their
nearest neighbours. This allows the agent-controlled UAVs to optimise their 3D flight trajectories
by filling up knowledge gaps and converging to optimal policies. We account for the mobility of
ground users, the UAVs’ limited energy budget and interference in the environment. Our approach
can maximise the system’s EE without hampering performance gains in the network. Simulation
results show that the proposed approach outperforms existing baselines in terms of maximising the
systems’ EE without degrading coverage performance in the network. The CMAD–DDQN approach
outperforms the MAD-DDQN that neglects direct collaboration among UAVs, the multi-agent deep
deterministic policy gradient (MADDPG) and random policy approaches that consider a 2D UAV
deployment design while neglecting interference from nearby UAV cells by about 15%, 65% and
85%, respectively.
1. Introduction
It is envisaged that machine-to-machine (M2M) connec-
tions will grow 2.4-fold, from 6.1 billion in 2018 to 14.7
billion by 2023 [1]. Unmanned aerial vehicles (UAVs) can
play a vital role in supporting the Internet-of-Things (IoT)
networks by providing ubiquitous connectivity to static and
mobile ground devices [2]. For instance, the deployment
of UAVs to provide wireless connectivity to ground users
is gaining significant research attention [3]–[13]. UAVs
deployment can complement cellular networks by accom-
modating the projected growth of connected things. Specif-
ically, UAVs’ adjustable altitude and mobility make them
This work was supported, in part, by the Science Foundation Ireland
(SFI) Grants No. 16/SP/3804 (Enable) and 13/RC/2077_P2 (CONNECT
Phase 2), the National Natural Science Foundation Of China (NSFC) under
the SFI-NSFC Partnership Programme Grant Number 17/NSFC/5224. Dr.
Galkin’s work was funded by SFI under the MISTRAL project, grant
number 21/FIP/DO/9949, as well as the project "GUARD: Drug Interdic-
tion Using Smart Drones" funded under Enterprise Ireland’s Disruptive
Technologies Innovation Fund, Project ref. DT20200268A.
Corresponding author
omoniwab@tcd.ie (B. Omoniwa); boris.galkin@tyndall.ie (B.
Galkin); ivana.dusparic@scss.tcd.ie (I. Dusparic)
ORCID(s): 0000-0003-3508-3689 (B. Omoniwa); 0000-0002-6755-7781
(B. Galkin); 0000-0003-0621-5400 (I. Dusparic)
suitable candidates for flexible deployment as aerial base
stations in the event of increased network demand, points-
of-failure in existing terrestrial infrastructure, or emergen-
cies [5,7]. For example, UAVs may be deployed to provide
coverage to users in post-disaster scenarios where existing
terrestrial infrastructures are damaged [14]. However, it is
challenging to conserve the energy of UAVs during pro-
longed coverage tasks, considering their limited onboard
battery capacity. UAVs may deplete energy during propul-
sion for flying and hovering, and during communication [8].
To derive the full benefit of UAV deployments, recent re-
search has focused on addressing some main challenges,
they include the 3D trajectory optimisation [11,15,16],
energy efficiency (EE) optimisation [3,4], and coverage
optimisation [4,5,16]. As energy-constrained UAVs fly
in the sky, they may encounter interference from nearby
UAV cells or other access points sharing the same frequency
band, thereby affecting the system’s EE [17]. There has been
significant research on optimising the EE in UAV-assisted
networks. However, many of these works neglect the impact
of interference on the system’s performance.
Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 1 of 16
arXiv:2210.00041v2 [cs.MA] 27 Jun 2023
Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks
Compared with a terrestrial cellular communication net-
work, channel modelling for an airborne, UAV-assisted wire-
less system is more challenging due the mobility and di-
rect line-of-sight (LoS) communication link from nearby
UAVs [18]. Furthermore, the adoption of UAVs for com-
munication may require jointly finding the optimal 3D de-
ployment plan, energy and interference management strat-
egy [6]. Crucially, UAVs require robust strategies to provide
ubiquitous wireless coverage to static and mobile ground
users in this dynamic environment. Unlike previous work
that assumes global spatial knowledge of ground users
location through a central controller that periodically scans
the network perimeter and provides real-time updates to
the UAVs for decision-making, we focus on a decentralised
approach suitable in emergency scenarios where there may
be service outage due to failure in the controller, or loss
of UAVs’ control packets due traffic congestion in the net-
work. Moreover, in such scenarios, it is difficult to keep
track of the location of all ground users in real time. To
simplify the model, recent approaches that optimise the
system’s EE consider a 2D trajectory optimisation design of
UAVs serving static users in an interference-free network
environment. This may be based on the assumption that
each operating UAV is assigned a unique frequency band.
However, this assumption is impractical as radio spectrum
is a scarce resource. Hence, we assume that UAVs serving
as aerial base stations may have to share same frequency
band. However, this introduces the challenge of interference
in the shared network environment. Nevertheless, UAVs
require robust strategies to optimise their flight trajectory
while providing coverage to ground users in a dynamic en-
vironment. Multi-Agent Reinforcement Learning (MARL)
has been shown to perform well in decision-making tasks
in such a dynamic environment [3,4,15]. To improve the
performance of the decentralised control, several methods
have been studied [25]. In this work, we adopt a MARL
approach and propose a direct collaborative communication-
enabled multi-agent decentralised double deep Q-network
(CMAD–DDQN) algorithm to maximise the system’s EE
by optimising the 3D trajectory of each UAV, the energy
consumed and the number of connected static and mobile
ground users over a series of time-steps, while taking into
account the impact of interference from nearby UAV cells.
In our previous work [5], we considered a decentralised
MARL where there was no direct collaboration among
UAVs and other agents are treated as a part of the en-
vironment, with the reward of each agent reflecting the
coverage performance in its neighbourhood. However, the
approach [5] ignores the potential benefit of direct collabo-
ration among agents. Moreover, finding a globally optimal
solution for agents with partial information is known to be
intractable [20]. As an extension to our prior work [3], we
leverage agents’ capability to communicate with neighbours
to maximise the system’s EE by jointly optimising the
number of connected ground users and the energy con-
sumption in the network. The incorporation of collaborative
algorithms into MARL can allow the agents to assist each
other in filling the knowledge gaps by exchanging infor-
mation that could improve the decision-making of UAVs
over a series of time-steps [21]. However, several real-time
applications place considerable restrictions on communi-
cation, especially in terms of both throughput and latency.
Nevertheless, communication has extensively been used to
address the non-stationarity issue in the multi-agent learning
process [22].
Multi-agent learning is challenging in itself, requiring
agents to learn their policies while taking into account the
consequences of the actions of others. The authors in [4,
11,12] proposed a multi-agent deep deterministic policy
gradient (MADDPG) approach to improve the system’s EE
as UAVs hover at fixed altitudes while providing coverage
to static ground users in an interference-free network envi-
ronment. This problem becomes even more challenging in
an interference-limited network environment, where inter-
ference from nearby UAV cells impacts the system’s EE.
Hence, we propose a direct collaborative communication-
enabled multi-agent decentralised double deep Q-network
(CMAD–DDQN) approach where each agent relies on its
local observations, as well as the information it receives from
its nearby UAVs for decision-making. The communicated
information from the nearby UAVs will contain the number
of connected ground users, instantaneous energy value, and
distances from nearby UAVs in each time step. We propose
an approach where each agent executes actions based on
state information. We assume a two-way communication
link among neighbouring UAVs [23]. Although the 3GPP
system provides a methodology to set up and optimise neigh-
bour relations with little or no human intervention [24]
and to allow a 3rd party to request and obtain real-time
monitored status information (e.g., position, communication
link status, power consumption) of a UAV [23], to the best of
our knowledge this work is first to investigate the impact of
collaborations on the system’s EE using the communication
mechanism based on the existing 3GPP standard [24]. This
paper has three main contributions given as follows:
We propose a direct collaborative CMAD–DDQN ap-
proach that relies on local observations from each UAV
and the explicitly-communicated information from its
neighbours for decision-making. We adopt a collaborative
algorithm based on an existing 3GPP standard [24] that
allows agents to collaborate by exchanging information
with their nearest neighbours to improve the system’s EE
by jointly optimising each UAV’s 3D trajectory, the num-
ber of connected ground users, and the energy consumed
by the UAVs in a shared dynamic environment.
We consider a realistic model of the agent’s environment,
taking into consideration the dynamic and interference-
limited nature of the wireless environment. Unlike in
previous work that consider the deployment of static
users [4] or fully synthetic ground users’ distribution [3],
we consider a real-world deployment of static and mobile
end-users in an area of Dublin, Ireland. Furthermore, we
leverage widely used mobility models (the random walk
Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 2 of 16
Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks
(RW), random waypoint (RWP) and the Gauss–Markov
(GMM) mobility models) [38,39] to depict pedestrian
movements.
We evaluated the proposed CMAD–DDQN approach by
comparing it with the MAD–DDQN [3] that ignores
direct collaboration among UAVs, the MADDPG [4] that
considers a 2D UAV deployment design while neglecting
interference from nearby UAV cells, and the random
policy. Results show that our proposed approach can
significantly improve the total system’s EE while jointly
optimising the 3D trajectory, number of connected users,
and the energy consumed by the UAVs serving ground
users under a strict energy budget.
The remainder of this work is organised as follows. In
Section II, we present related work. The environment model
is provided in Section III. We discuss the proposed decen-
tralised MARL approach for EE optimisation in Section IV.
In Section V, we present the simulation setup and evaluation
plan. We discuss and analyse the results in Section VI. Sec-
tion VII concludes the paper and outlines future directions.
2. Related Work
Energy efficiency (EE) optimisation in UAV-assisted
networks has been studied recently [3,4,12]. The works [27,
28,33] proposed classical optimisation techniques to opti-
mise the EE of a single UAV deployed to provide wireless
service to static ground users. A similar technique was
used in a relay scenario [34] to jointly optimise the energy
and trajectory of UAVs while transferring information from
source ground users to corresponding destination ground
users via the UAV relay. In [29], an iterative algorithm was
proposed to optimise the trajectory of a fixed-wing UAV
base station deployed at fixed altitudes while optimising the
transmit power in each iteration. However, these single UAV
models may not be applicable in larger geographical areas
where multiple UAVs are deployed to serve ground users.
Moreover, it is anticipated that future UAV networks will
have multiple UAVs flying in the sky, and possibly sharing
the same frequency spectrum. Table 1shows a summary of
related work on multi-UAV deployments.
Recent research focuses on optimising EE in multi-UAV
networks [4,8,12]. An iterative algorithm was proposed
in [8] to minimise the energy consumption of UAV base
stations providing coverage to static ground users. Game
theory was proposed in [10] to optimise the system’s EE
while maximising the ground area covered by the UAVs
irrespective of the presence of ground users. However, this
work may only be suitable in scenarios with an unlimited en-
ergy budget or cost of UAVs deployment. Furthermore, these
works rely on a ground controller that supports the decision-
making of the UAVs, hence making emergency deployment
impractical due to the significant amount of information
exchanged between the UAVs and the controller. Moreover,
tracking ground user locations at each time step may be
difficult. In [31], a classical optimisation method was used
to minimise the energy consumption of static ground users
by optimizing the UAVs’ trajectory. As energy-constrained
UAVs fly in the sky, they may encounter interference from
nearby UAV cells or other access points sharing the same
frequency band, thereby affecting the system’s EE [17].
The adoption of machine learning to solve complex
multi-UAV deployment problems is gaining research atten-
tion [13]. Specifically, multi-agent reinforcement learning
(MARL) approaches have been used in several works to opti-
mise the system’s EE. The work [42] proposed a distributed
sense-and-send protocol, where the UAVs determine their
trajectories by selecting from a discrete set of tasks and a
continuous set of locations for sensing and transmission.
However, the UAVs relied on a feedback mechanism from
the central base station to execute the next task, thus leading
to significant communication overhead in the centralised
architecture. Furthermore, the authors did not consider the
impact of interference from neighbouring UAVs in this
shared multi-agent environment. Our prior work applied a
distributed Q-learning approach [5] to optimise the energy
utilisation of UAVs providing coverage to ground users with-
out taking into account the system’s EE. The work [5] con-
sidered a limited number of deployed independent learning
agent-controlled UAVs that have no mechanism for direct
collaboration. These UAVs were deployed to serve mobile
ground users that follow the RWP model. To solve this prob-
lem, a deep reinforcement learning (DRL) approach [17]
was proposed in our recent work for intelligent UAV cellular
users to base station association, allowing a UAV flying
over an urban area to intelligently connect to underlying
base stations. A deep meta reinforcement learning-based
offloading algorithm was proposed in [19] to make fine-
grained offloading decisions in a dynamic environment. In
our prior work [26], a DRL-based approach was proposed
to optimise the EE of fixed-winged UAVs that move in
circular orbits and are typically not able to hover like the
rotary-winged UAVs. Moreover, the attention was on UAVs
deployed to provide connectivity to static ground users. The
distributed DRL work in [4] improved on the centralised ap-
proach in [12], where all deployed UAVs are controlled by a
single autonomous agent. The authors in [4,11,12] proposed
a deep deterministic policy gradient (DDPG) approach to
optimise the system’s EE as the UAVs hover at fixed altitudes
while serving static ground users in an interference-free
network environment. Although the approaches [4,11,12]
improve the coverage performance of UAVs, they focus on
the 2D trajectory optimisation of the UAVs serving static
ground users. The authors in [16] also proposed a DDPG-
based solution to optimise the flight trajectory and coverage
performance of UAVs serving mobile users, with the mo-
bility of the ground users modelled to follow the RWP and
reference point group mobility (RPGM) models.
This paper extends the decentralised MARL approach
proposed in [3], where each agent relies on locally sensed in-
formation and makes decisions based on implicitly provided
neighbour connectivity information reflected in the agents
reward function [25] (i.e., no communication mechanism
Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 3 of 16
Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks
Table 1
Related Work on Multiple UAVs Deployed as Aerial Base Stations.
Paper Approach Training Execution Control Flight User
Overhead Trajectory Data
Liu C. H. et al. [12] DDPG Centralised Centralised Global 2D Synthetic
Liu C. H. et al. [4] MADDPG Centralised Decentralised Global 2D Synthetic
Wang L. et al. [11] MADDPG Centralised Decentralised Global 2D Synthetic
Liu X. et al. [15] Cluster-based QL Centralised Centralised Global 3D Synthetic
Oubbati O. S. et al. [16] MADDPG Centralised Centralised Global 3D Synthetic
Mozaffari M. et al. [8] Iterative Search Centralised Global 3D Synthetic
Omoniwa B. et al. [5] Q-Learning Decentralised Decentralised – 3D Synthetic
Omoniwa B. et al. [3] MAD–DDQN Decentralised Decentralised – 3D Synthetic
This work CMAD–DDQN Decentralised Decentralised Local 3D Synthetic +
Real
Paper Ground Users CC Partitioning Interference EE Objective
Liu C. H. et al. [12] Static 𝐾-Cells EE, Coverage
Liu C. H. et al. [4] Static 𝐾-Cells EE, Coverage
Wang L. et al. [11] Static Energy, Fairness
Liu X. et al. [15] Mobile (RW) 𝐾-Clusters Trajectory
Oubbati O. S. et al. [16] Mobile (RWP, RPGM) Trajectory, Coverage
Mozaffari M. et al. [8] Static 𝐾-Clusters Energy
Omoniwa B. et al. [5] Mobile (RWP) Coverage, Energy
Omoniwa B. et al. [3] Mobile (GMM) EE, Outage, Energy
This work Static + EE, Coverage,
Mobile (RW, RWP, GMM) Energy, Fairness
was provided on how agents can collaborate to optimise
the system’s EE in this dynamic network environment). We
observed in our previous work [3] that as the number of
deployed UAVs in the network increases, a significant drop
in the system’s EE was observed. The work [3] provided no
mechanism for direct collaboration which could significantly
impact the agent-controlled UAVs’ ability to coordinate
while improving the performance of the overall system. The
work [16] achieved global collaboration among UAVs via a
centralised training technique. However, this approach may
be impractical with an increase in the number of UAVs de-
ployed. In this work, we aim to achieve collaboration among
the agent-controlled UAVs locally via direct communication
with the UAVs’ neighbours, which in the long run will
enhance global coordination while improving the overall
system performance. We extend our evaluation to consider
real-world data of users’ distribution while investigating the
impact of various users’ mobility models on the overall
system’s EE. This motivates us to investigate novel collabo-
rative techniques that improve the total system’s EE. Hence,
we present a collaborative CMAD–DDQN approach, where
each agent makes decisions based on its local observation
and direct interaction via existing 3GPP guidelines [23,24]
with its nearest neighbours.
3. System Model
We consider a set of static and mobile ground users 𝜉
located in a given area as in [3]. Each user 𝑖𝜉at time 𝑡
is located in the coordinate (𝑥𝑡
𝑖, 𝑦𝑡
𝑖) ∈ 2. In this work, We
assume connectivity service outages from the existing ter-
restrial infrastructure due to disasters or increased network
load. As such, a set 𝑈of quad-rotor UAVs are deployed
Figure 1: System model for UAVs serving static and mobile
ground users. The UAVs directly communicate with nearby
neighbours.
within the area to provide wireless coverage to the ground
users. As an extension to our prior work in [3], we assume
that the UAVs can exchange state information by communi-
cating with nearby neighbours as shown in Figure 1. Table 2
provides us with the notations used and their definitions.
3.1. Wireless channel model
A UAV 𝑗𝑈providing wireless coverage to ground
users at time 𝑡is located in the coordinate (𝑥𝑡
𝑗, 𝑦𝑡
𝑗, ℎ𝑡
𝑗) ∈ 3.
Without loss of generality, a guaranteed line-of-sight (LoS)
channel condition is assumed, due to the aerial positions
of the UAVs in the sky. Each user 𝑖𝜉in time 𝑡can
be connected to a single UAV 𝑗𝑈which provides the
strongest downlink signal-to-interference-plus-noise-ratio
(SINR). SINR is a measure of signal quality. It can be defined
as the ratio of the power of a certain signal of interest and the
interference power from all the other interfering signals plus
Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 4 of 16
摘要:

Communication-EnabledDeepReinforcementLearningtoOptimiseEnergy-EfficiencyinUAV-AssistedNetworksBabatunjiOmoniwaa,∗,BorisGalkinbandIvanaDusparicaaTrinityCollegeDublin,Dublin,IrelandbTyndallNationalInstitute,Cork,IrelandARTICLEINFOKeywords:DeepReinforcementLearningEnergyEfficiencyUAVnetworksWirelessCo...

展开>> 收起<<
Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks_2.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:2.84MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注