Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks_2

2025-04-27 0 0 2.84MB 16 页 10玖币

侵权投诉

Communication-Enabled Deep Reinforcement Learning to Optimise

Energy-Eﬃciency in UAV-Assisted Networks

Babatunji Omoniwaa,∗,Boris Galkinband Ivana Dusparica

aTrinity College Dublin, Dublin, Ireland

bTyndall National Institute, Cork, Ireland

ARTICLE INFO

Keywords:

Deep Reinforcement Learning

Energy Eﬃciency

UAV networks

Wireless Connectivity

Abstract

Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static

and mobile ground users in situations of increased network demand or points of failure in existing

terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge

of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the

system’s energy eﬃciency (EE). Recent approaches focus on optimising the system’s EE by optimising

the trajectory of UAVs serving only static ground users and neglecting mobile users. Several others

neglect the impact of interference from nearby UAV cells, assuming an interference-free network

environment. Furthermore, some works assume global spatial knowledge of ground users’ location

via a central controller (CC) that periodically scans the network perimeter and provides real-time

updates to the UAVs for decision-making. However, this assumption may be unsuitable in disaster

scenarios since it requires signiﬁcant information exchange between the UAVs and CC. Moreover, it

may not be possible to track users’ locations in a disaster scenario. Despite growing research interest

in decentralised control over centralised UAVs’ control, direct collaboration among UAVs to improve

coordination while optimising the systems’ EE has not been adequately explored. To address this,

we propose a direct collaborative communication-enabled multi-agent decentralised double deep Q-

network (CMAD–DDQN) approach. The CMAD–DDQN is a collaborative algorithm that allows

UAVs to explicitly share their telemetry via existing 3GPP guidelines by communicating with their

nearest neighbours. This allows the agent-controlled UAVs to optimise their 3D ﬂight trajectories

by ﬁlling up knowledge gaps and converging to optimal policies. We account for the mobility of

ground users, the UAVs’ limited energy budget and interference in the environment. Our approach

can maximise the system’s EE without hampering performance gains in the network. Simulation

results show that the proposed approach outperforms existing baselines in terms of maximising the

systems’ EE without degrading coverage performance in the network. The CMAD–DDQN approach

outperforms the MAD-DDQN that neglects direct collaboration among UAVs, the multi-agent deep

deterministic policy gradient (MADDPG) and random policy approaches that consider a 2D UAV

deployment design while neglecting interference from nearby UAV cells by about 15%, 65% and

85%, respectively.

1. Introduction

It is envisaged that machine-to-machine (M2M) connec-

tions will grow 2.4-fold, from 6.1 billion in 2018 to 14.7

billion by 2023 [1]. Unmanned aerial vehicles (UAVs) can

play a vital role in supporting the Internet-of-Things (IoT)

networks by providing ubiquitous connectivity to static and

mobile ground devices [2]. For instance, the deployment

of UAVs to provide wireless connectivity to ground users

is gaining signiﬁcant research attention [3]–[13]. UAVs’

deployment can complement cellular networks by accom-

modating the projected growth of connected things. Specif-

ically, UAVs’ adjustable altitude and mobility make them

⋆This work was supported, in part, by the Science Foundation Ireland

(SFI) Grants No. 16/SP/3804 (Enable) and 13/RC/2077_P2 (CONNECT

Phase 2), the National Natural Science Foundation Of China (NSFC) under

the SFI-NSFC Partnership Programme Grant Number 17/NSFC/5224. Dr.

Galkin’s work was funded by SFI under the MISTRAL project, grant

number 21/FIP/DO/9949, as well as the project "GUARD: Drug Interdic-

tion Using Smart Drones" funded under Enterprise Ireland’s Disruptive

Technologies Innovation Fund, Project ref. DT20200268A.

∗Corresponding author

omoniwab@tcd.ie (B. Omoniwa); boris.galkin@tyndall.ie (B.

Galkin); ivana.dusparic@scss.tcd.ie (I. Dusparic)

ORCID(s): 0000-0003-3508-3689 (B. Omoniwa); 0000-0002-6755-7781

(B. Galkin); 0000-0003-0621-5400 (I. Dusparic)

suitable candidates for ﬂexible deployment as aerial base

stations in the event of increased network demand, points-

of-failure in existing terrestrial infrastructure, or emergen-

cies [5,7]. For example, UAVs may be deployed to provide

coverage to users in post-disaster scenarios where existing

terrestrial infrastructures are damaged [14]. However, it is

challenging to conserve the energy of UAVs during pro-

longed coverage tasks, considering their limited onboard

battery capacity. UAVs may deplete energy during propul-

sion for ﬂying and hovering, and during communication [8].

To derive the full beneﬁt of UAV deployments, recent re-

search has focused on addressing some main challenges,

they include the 3D trajectory optimisation [11,15,16],

energy eﬃciency (EE) optimisation [3,4], and coverage

optimisation [4,5,16]. As energy-constrained UAVs ﬂy

in the sky, they may encounter interference from nearby

UAV cells or other access points sharing the same frequency

band, thereby aﬀecting the system’s EE [17]. There has been

signiﬁcant research on optimising the EE in UAV-assisted

networks. However, many of these works neglect the impact

of interference on the system’s performance.

Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 1 of 16

arXiv:2210.00041v2 [cs.MA] 27 Jun 2023

Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks

Compared with a terrestrial cellular communication net-

work, channel modelling for an airborne, UAV-assisted wire-

less system is more challenging due the mobility and di-

rect line-of-sight (LoS) communication link from nearby

UAVs [18]. Furthermore, the adoption of UAVs for com-

munication may require jointly ﬁnding the optimal 3D de-

ployment plan, energy and interference management strat-

egy [6]. Crucially, UAVs require robust strategies to provide

ubiquitous wireless coverage to static and mobile ground

users in this dynamic environment. Unlike previous work

that assumes global spatial knowledge of ground users’

location through a central controller that periodically scans

the network perimeter and provides real-time updates to

the UAVs for decision-making, we focus on a decentralised

approach suitable in emergency scenarios where there may

be service outage due to failure in the controller, or loss

of UAVs’ control packets due traﬃc congestion in the net-

work. Moreover, in such scenarios, it is diﬃcult to keep

track of the location of all ground users in real time. To

simplify the model, recent approaches that optimise the

system’s EE consider a 2D trajectory optimisation design of

UAVs serving static users in an interference-free network

environment. This may be based on the assumption that

each operating UAV is assigned a unique frequency band.

However, this assumption is impractical as radio spectrum

is a scarce resource. Hence, we assume that UAVs serving

as aerial base stations may have to share same frequency

band. However, this introduces the challenge of interference

in the shared network environment. Nevertheless, UAVs

require robust strategies to optimise their ﬂight trajectory

while providing coverage to ground users in a dynamic en-

vironment. Multi-Agent Reinforcement Learning (MARL)

has been shown to perform well in decision-making tasks

in such a dynamic environment [3,4,15]. To improve the

performance of the decentralised control, several methods

have been studied [25]. In this work, we adopt a MARL

approach and propose a direct collaborative communication-

enabled multi-agent decentralised double deep Q-network

(CMAD–DDQN) algorithm to maximise the system’s EE

by optimising the 3D trajectory of each UAV, the energy

consumed and the number of connected static and mobile

ground users over a series of time-steps, while taking into

account the impact of interference from nearby UAV cells.

In our previous work [5], we considered a decentralised

MARL where there was no direct collaboration among

UAVs and other agents are treated as a part of the en-

vironment, with the reward of each agent reﬂecting the

coverage performance in its neighbourhood. However, the

approach [5] ignores the potential beneﬁt of direct collabo-

ration among agents. Moreover, ﬁnding a globally optimal

solution for agents with partial information is known to be

intractable [20]. As an extension to our prior work [3], we

leverage agents’ capability to communicate with neighbours

to maximise the system’s EE by jointly optimising the

number of connected ground users and the energy con-

sumption in the network. The incorporation of collaborative

algorithms into MARL can allow the agents to assist each

other in ﬁlling the knowledge gaps by exchanging infor-

mation that could improve the decision-making of UAVs

over a series of time-steps [21]. However, several real-time

applications place considerable restrictions on communi-

cation, especially in terms of both throughput and latency.

Nevertheless, communication has extensively been used to

address the non-stationarity issue in the multi-agent learning

process [22].

Multi-agent learning is challenging in itself, requiring

agents to learn their policies while taking into account the

consequences of the actions of others. The authors in [4,

11,12] proposed a multi-agent deep deterministic policy

gradient (MADDPG) approach to improve the system’s EE

as UAVs hover at ﬁxed altitudes while providing coverage

to static ground users in an interference-free network envi-

ronment. This problem becomes even more challenging in

an interference-limited network environment, where inter-

ference from nearby UAV cells impacts the system’s EE.

Hence, we propose a direct collaborative communication-

enabled multi-agent decentralised double deep Q-network

(CMAD–DDQN) approach where each agent relies on its

local observations, as well as the information it receives from

its nearby UAVs for decision-making. The communicated

information from the nearby UAVs will contain the number

of connected ground users, instantaneous energy value, and

distances from nearby UAVs in each time step. We propose

an approach where each agent executes actions based on

state information. We assume a two-way communication

link among neighbouring UAVs [23]. Although the 3GPP

system provides a methodology to set up and optimise neigh-

bour relations with little or no human intervention [24]

and to allow a 3rd party to request and obtain real-time

monitored status information (e.g., position, communication

link status, power consumption) of a UAV [23], to the best of

our knowledge this work is ﬁrst to investigate the impact of

collaborations on the system’s EE using the communication

mechanism based on the existing 3GPP standard [24]. This

paper has three main contributions given as follows:

•We propose a direct collaborative CMAD–DDQN ap-

proach that relies on local observations from each UAV

and the explicitly-communicated information from its

neighbours for decision-making. We adopt a collaborative

algorithm based on an existing 3GPP standard [24] that

allows agents to collaborate by exchanging information

with their nearest neighbours to improve the system’s EE

by jointly optimising each UAV’s 3D trajectory, the num-

ber of connected ground users, and the energy consumed

by the UAVs in a shared dynamic environment.

•We consider a realistic model of the agent’s environment,

taking into consideration the dynamic and interference-

limited nature of the wireless environment. Unlike in

previous work that consider the deployment of static

users [4] or fully synthetic ground users’ distribution [3],

we consider a real-world deployment of static and mobile

end-users in an area of Dublin, Ireland. Furthermore, we

leverage widely used mobility models (the random walk

Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 2 of 16

Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks

(RW), random waypoint (RWP) and the Gauss–Markov

(GMM) mobility models) [38,39] to depict pedestrian

movements.

•We evaluated the proposed CMAD–DDQN approach by

comparing it with the MAD–DDQN [3] that ignores

direct collaboration among UAVs, the MADDPG [4] that

considers a 2D UAV deployment design while neglecting

interference from nearby UAV cells, and the random

policy. Results show that our proposed approach can

signiﬁcantly improve the total system’s EE while jointly

optimising the 3D trajectory, number of connected users,

and the energy consumed by the UAVs serving ground

users under a strict energy budget.

The remainder of this work is organised as follows. In

Section II, we present related work. The environment model

is provided in Section III. We discuss the proposed decen-

tralised MARL approach for EE optimisation in Section IV.

In Section V, we present the simulation setup and evaluation

plan. We discuss and analyse the results in Section VI. Sec-

tion VII concludes the paper and outlines future directions.

2. Related Work

Energy eﬃciency (EE) optimisation in UAV-assisted

networks has been studied recently [3,4,12]. The works [27,

28,33] proposed classical optimisation techniques to opti-

mise the EE of a single UAV deployed to provide wireless

service to static ground users. A similar technique was

used in a relay scenario [34] to jointly optimise the energy

and trajectory of UAVs while transferring information from

source ground users to corresponding destination ground

users via the UAV relay. In [29], an iterative algorithm was

proposed to optimise the trajectory of a ﬁxed-wing UAV

base station deployed at ﬁxed altitudes while optimising the

transmit power in each iteration. However, these single UAV

models may not be applicable in larger geographical areas

where multiple UAVs are deployed to serve ground users.

Moreover, it is anticipated that future UAV networks will

have multiple UAVs ﬂying in the sky, and possibly sharing

the same frequency spectrum. Table 1shows a summary of

related work on multi-UAV deployments.

Recent research focuses on optimising EE in multi-UAV

networks [4,8,12]. An iterative algorithm was proposed

in [8] to minimise the energy consumption of UAV base

stations providing coverage to static ground users. Game

theory was proposed in [10] to optimise the system’s EE

while maximising the ground area covered by the UAVs

irrespective of the presence of ground users. However, this

work may only be suitable in scenarios with an unlimited en-

ergy budget or cost of UAVs deployment. Furthermore, these

works rely on a ground controller that supports the decision-

making of the UAVs, hence making emergency deployment

impractical due to the signiﬁcant amount of information

exchanged between the UAVs and the controller. Moreover,

tracking ground user locations at each time step may be

diﬃcult. In [31], a classical optimisation method was used

to minimise the energy consumption of static ground users

by optimizing the UAVs’ trajectory. As energy-constrained

UAVs ﬂy in the sky, they may encounter interference from

nearby UAV cells or other access points sharing the same

frequency band, thereby aﬀecting the system’s EE [17].

The adoption of machine learning to solve complex

multi-UAV deployment problems is gaining research atten-

tion [13]. Speciﬁcally, multi-agent reinforcement learning

(MARL) approaches have been used in several works to opti-

mise the system’s EE. The work [42] proposed a distributed

sense-and-send protocol, where the UAVs determine their

trajectories by selecting from a discrete set of tasks and a

continuous set of locations for sensing and transmission.

However, the UAVs relied on a feedback mechanism from

the central base station to execute the next task, thus leading

to signiﬁcant communication overhead in the centralised

architecture. Furthermore, the authors did not consider the

impact of interference from neighbouring UAVs in this

shared multi-agent environment. Our prior work applied a

distributed Q-learning approach [5] to optimise the energy

utilisation of UAVs providing coverage to ground users with-

out taking into account the system’s EE. The work [5] con-

sidered a limited number of deployed independent learning

agent-controlled UAVs that have no mechanism for direct

collaboration. These UAVs were deployed to serve mobile

ground users that follow the RWP model. To solve this prob-

lem, a deep reinforcement learning (DRL) approach [17]

was proposed in our recent work for intelligent UAV cellular

users to base station association, allowing a UAV ﬂying

over an urban area to intelligently connect to underlying

base stations. A deep meta reinforcement learning-based

oﬄoading algorithm was proposed in [19] to make ﬁne-

grained oﬄoading decisions in a dynamic environment. In

our prior work [26], a DRL-based approach was proposed

to optimise the EE of ﬁxed-winged UAVs that move in

circular orbits and are typically not able to hover like the

rotary-winged UAVs. Moreover, the attention was on UAVs

deployed to provide connectivity to static ground users. The

distributed DRL work in [4] improved on the centralised ap-

proach in [12], where all deployed UAVs are controlled by a

single autonomous agent. The authors in [4,11,12] proposed

a deep deterministic policy gradient (DDPG) approach to

optimise the system’s EE as the UAVs hover at ﬁxed altitudes

while serving static ground users in an interference-free

network environment. Although the approaches [4,11,12]

improve the coverage performance of UAVs, they focus on

the 2D trajectory optimisation of the UAVs serving static

ground users. The authors in [16] also proposed a DDPG-

based solution to optimise the ﬂight trajectory and coverage

performance of UAVs serving mobile users, with the mo-

bility of the ground users modelled to follow the RWP and

reference point group mobility (RPGM) models.

This paper extends the decentralised MARL approach

proposed in [3], where each agent relies on locally sensed in-

formation and makes decisions based on implicitly provided

neighbour connectivity information reﬂected in the agent’s

reward function [25] (i.e., no communication mechanism

Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 3 of 16

Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks

Table 1

Related Work on Multiple UAVs Deployed as Aerial Base Stations.

Paper Approach Training Execution Control Flight User

Overhead Trajectory Data

Liu C. H. et al. [12] DDPG Centralised Centralised Global 2D Synthetic

Liu C. H. et al. [4] MADDPG Centralised Decentralised Global 2D Synthetic

Wang L. et al. [11] MADDPG Centralised Decentralised Global 2D Synthetic

Liu X. et al. [15] Cluster-based QL Centralised Centralised Global 3D Synthetic

Oubbati O. S. et al. [16] MADDPG Centralised Centralised Global 3D Synthetic

Mozaﬀari M. et al. [8] Iterative Search – Centralised Global 3D Synthetic

Omoniwa B. et al. [5] Q-Learning Decentralised Decentralised – 3D Synthetic

Omoniwa B. et al. [3] MAD–DDQN Decentralised Decentralised – 3D Synthetic

This work CMAD–DDQN Decentralised Decentralised Local 3D Synthetic +

Real

Paper Ground Users CC Partitioning Interference EE Objective

Liu C. H. et al. [12] Static 𝐾-Cells ✗ ✓ EE, Coverage

Liu C. H. et al. [4] Static 𝐾-Cells ✗ ✓ EE, Coverage

Wang L. et al. [11] Static – ✗ ✗ Energy, Fairness

Liu X. et al. [15] Mobile (RW) 𝐾-Clusters ✗ ✗ Trajectory

Oubbati O. S. et al. [16] Mobile (RWP, RPGM) – ✓ ✗ Trajectory, Coverage

Mozaﬀari M. et al. [8] Static 𝐾-Clusters ✓ ✗ Energy

Omoniwa B. et al. [5] Mobile (RWP) – ✓ ✗ Coverage, Energy

Omoniwa B. et al. [3] Mobile (GMM) – ✓ ✓ EE, Outage, Energy

This work Static + – ✓ ✓ EE, Coverage,

Mobile (RW, RWP, GMM) Energy, Fairness

was provided on how agents can collaborate to optimise

the system’s EE in this dynamic network environment). We

observed in our previous work [3] that as the number of

deployed UAVs in the network increases, a signiﬁcant drop

in the system’s EE was observed. The work [3] provided no

mechanism for direct collaboration which could signiﬁcantly

impact the agent-controlled UAVs’ ability to coordinate

while improving the performance of the overall system. The

work [16] achieved global collaboration among UAVs via a

centralised training technique. However, this approach may

be impractical with an increase in the number of UAVs de-

ployed. In this work, we aim to achieve collaboration among

the agent-controlled UAVs locally via direct communication

with the UAVs’ neighbours, which in the long run will

enhance global coordination while improving the overall

system performance. We extend our evaluation to consider

real-world data of users’ distribution while investigating the

impact of various users’ mobility models on the overall

system’s EE. This motivates us to investigate novel collabo-

rative techniques that improve the total system’s EE. Hence,

we present a collaborative CMAD–DDQN approach, where

each agent makes decisions based on its local observation

and direct interaction via existing 3GPP guidelines [23,24]

with its nearest neighbours.

3. System Model

We consider a set of static and mobile ground users 𝜉

located in a given area as in [3]. Each user 𝑖∈𝜉at time 𝑡

is located in the coordinate (𝑥𝑡

𝑖, 𝑦𝑡

𝑖) ∈ ℝ2. In this work, We

assume connectivity service outages from the existing ter-

restrial infrastructure due to disasters or increased network

load. As such, a set 𝑈of quad-rotor UAVs are deployed

Figure 1: System model for UAVs serving static and mobile

ground users. The UAVs directly communicate with nearby

neighbours.

within the area to provide wireless coverage to the ground

users. As an extension to our prior work in [3], we assume

that the UAVs can exchange state information by communi-

cating with nearby neighbours as shown in Figure 1. Table 2

provides us with the notations used and their deﬁnitions.

3.1. Wireless channel model

A UAV 𝑗∈𝑈providing wireless coverage to ground

users at time 𝑡is located in the coordinate (𝑥𝑡

𝑗, 𝑦𝑡

𝑗, ℎ𝑡

𝑗) ∈ ℝ3.

Without loss of generality, a guaranteed line-of-sight (LoS)

channel condition is assumed, due to the aerial positions

of the UAVs in the sky. Each user 𝑖∈𝜉in time 𝑡can

be connected to a single UAV 𝑗∈𝑈which provides the

strongest downlink signal-to-interference-plus-noise-ratio

(SINR). SINR is a measure of signal quality. It can be deﬁned

as the ratio of the power of a certain signal of interest and the

interference power from all the other interfering signals plus

Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 4 of 16

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Communication-EnabledDeepReinforcementLearningtoOptimiseEnergy-EfficiencyinUAV-AssistedNetworksBabatunjiOmoniwaa,∗,BorisGalkinbandIvanaDusparicaaTrinityCollegeDublin,Dublin,IrelandbTyndallNationalInstitute,Cork,IrelandARTICLEINFOKeywords:DeepReinforcementLearningEnergyEfficiencyUAVnetworksWirelessCo...

展开>> 收起<<

Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks_2.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: