
Communication-Enabled DRL to Optimise EE in UAV-Assisted Networks
Compared with a terrestrial cellular communication net-
work, channel modelling for an airborne, UAV-assisted wire-
less system is more challenging due the mobility and di-
rect line-of-sight (LoS) communication link from nearby
UAVs [18]. Furthermore, the adoption of UAVs for com-
munication may require jointly finding the optimal 3D de-
ployment plan, energy and interference management strat-
egy [6]. Crucially, UAVs require robust strategies to provide
ubiquitous wireless coverage to static and mobile ground
users in this dynamic environment. Unlike previous work
that assumes global spatial knowledge of ground users’
location through a central controller that periodically scans
the network perimeter and provides real-time updates to
the UAVs for decision-making, we focus on a decentralised
approach suitable in emergency scenarios where there may
be service outage due to failure in the controller, or loss
of UAVs’ control packets due traffic congestion in the net-
work. Moreover, in such scenarios, it is difficult to keep
track of the location of all ground users in real time. To
simplify the model, recent approaches that optimise the
system’s EE consider a 2D trajectory optimisation design of
UAVs serving static users in an interference-free network
environment. This may be based on the assumption that
each operating UAV is assigned a unique frequency band.
However, this assumption is impractical as radio spectrum
is a scarce resource. Hence, we assume that UAVs serving
as aerial base stations may have to share same frequency
band. However, this introduces the challenge of interference
in the shared network environment. Nevertheless, UAVs
require robust strategies to optimise their flight trajectory
while providing coverage to ground users in a dynamic en-
vironment. Multi-Agent Reinforcement Learning (MARL)
has been shown to perform well in decision-making tasks
in such a dynamic environment [3,4,15]. To improve the
performance of the decentralised control, several methods
have been studied [25]. In this work, we adopt a MARL
approach and propose a direct collaborative communication-
enabled multi-agent decentralised double deep Q-network
(CMAD–DDQN) algorithm to maximise the system’s EE
by optimising the 3D trajectory of each UAV, the energy
consumed and the number of connected static and mobile
ground users over a series of time-steps, while taking into
account the impact of interference from nearby UAV cells.
In our previous work [5], we considered a decentralised
MARL where there was no direct collaboration among
UAVs and other agents are treated as a part of the en-
vironment, with the reward of each agent reflecting the
coverage performance in its neighbourhood. However, the
approach [5] ignores the potential benefit of direct collabo-
ration among agents. Moreover, finding a globally optimal
solution for agents with partial information is known to be
intractable [20]. As an extension to our prior work [3], we
leverage agents’ capability to communicate with neighbours
to maximise the system’s EE by jointly optimising the
number of connected ground users and the energy con-
sumption in the network. The incorporation of collaborative
algorithms into MARL can allow the agents to assist each
other in filling the knowledge gaps by exchanging infor-
mation that could improve the decision-making of UAVs
over a series of time-steps [21]. However, several real-time
applications place considerable restrictions on communi-
cation, especially in terms of both throughput and latency.
Nevertheless, communication has extensively been used to
address the non-stationarity issue in the multi-agent learning
process [22].
Multi-agent learning is challenging in itself, requiring
agents to learn their policies while taking into account the
consequences of the actions of others. The authors in [4,
11,12] proposed a multi-agent deep deterministic policy
gradient (MADDPG) approach to improve the system’s EE
as UAVs hover at fixed altitudes while providing coverage
to static ground users in an interference-free network envi-
ronment. This problem becomes even more challenging in
an interference-limited network environment, where inter-
ference from nearby UAV cells impacts the system’s EE.
Hence, we propose a direct collaborative communication-
enabled multi-agent decentralised double deep Q-network
(CMAD–DDQN) approach where each agent relies on its
local observations, as well as the information it receives from
its nearby UAVs for decision-making. The communicated
information from the nearby UAVs will contain the number
of connected ground users, instantaneous energy value, and
distances from nearby UAVs in each time step. We propose
an approach where each agent executes actions based on
state information. We assume a two-way communication
link among neighbouring UAVs [23]. Although the 3GPP
system provides a methodology to set up and optimise neigh-
bour relations with little or no human intervention [24]
and to allow a 3rd party to request and obtain real-time
monitored status information (e.g., position, communication
link status, power consumption) of a UAV [23], to the best of
our knowledge this work is first to investigate the impact of
collaborations on the system’s EE using the communication
mechanism based on the existing 3GPP standard [24]. This
paper has three main contributions given as follows:
•We propose a direct collaborative CMAD–DDQN ap-
proach that relies on local observations from each UAV
and the explicitly-communicated information from its
neighbours for decision-making. We adopt a collaborative
algorithm based on an existing 3GPP standard [24] that
allows agents to collaborate by exchanging information
with their nearest neighbours to improve the system’s EE
by jointly optimising each UAV’s 3D trajectory, the num-
ber of connected ground users, and the energy consumed
by the UAVs in a shared dynamic environment.
•We consider a realistic model of the agent’s environment,
taking into consideration the dynamic and interference-
limited nature of the wireless environment. Unlike in
previous work that consider the deployment of static
users [4] or fully synthetic ground users’ distribution [3],
we consider a real-world deployment of static and mobile
end-users in an area of Dublin, Ireland. Furthermore, we
leverage widely used mobility models (the random walk
Omoniwa B., Galkin B. & Dusparic I.: Preprint submitted to Elsevier Page 2 of 16