
energy consumed (Hühn [2013]). In principle, decreasing the power or raising the carrier sense threshold may help
to enhance spatial reuse. By differentiating congestion from interference losses, (Ma et al. [2008]) proposes a hybrid
transmit power and carrier sense adaptation approach. When the interference occurs before the data signal, this work
shows that fine-tuning the carrier detection threshold may completely remove interference-related losses. In addition,
power control avoids data signal loss due to interference that occurs when the data signal is sent. (Miao et al. [2017])
explores the trade-off between energy and latency and uses a real-time controller for the dynamic regulation of task
delivery in order to minimize energy consumption while meeting a deadline for each individual task. In particular, the
authors make use of the generalized critical task decomposition algorithm to identify critical tasks on an optimal sample
path.
In addition to power and carrier sense management, when rate control is taken into consideration, it introduces a
trade-off between spatial reuse and the transmission rate that can be sustained (Tobagi and Hira [2010]). A new idea,
spatial back-off, was introduced in (Jamieson et al. [2005]), which allows for dynamic tweaking of the carrier sensing
threshold in conjunction with the Auto-Rate Fallback (ARF) algorithm in order to achieve high throughput. In particular,
ARF shifts to a lower transmission rate if the measured losses exceed a threshold, then switches to a higher transmission
rate after a specified number of consecutive frames are successfully sent. According to (Yang and Vaidya [2007]), when
dealing with discrete data rates and when there are a sufficient number of power levels, controlling the power gives
several benefits over carrier sensing control as compared to a continuous data rate. According to the authors, power
and rate control is a technique that regulates the power and rate of a transmitter depending on the perceived degree of
interference at the receiving end. It is necessary for the receiver to return this information to the transmitter, which
may be accomplished by IEEE 802.11k (Kim et al. [2006]), but is not currently supported by any of the device driver
versions.
An adaptive rate and power control technique that is consistent with IEEE 802.11 operations is proposed in (Committee
et al. [2016]) where Acknowledgments (ACKs) received from the receiver are used to communicate the optimization of
the transmission speed, which continues to operate utilizing two basic adaptive strategies: the maximum possible rate is
assisted with the least potential power; and the lowest possible power is chosen first, then the highest rate conceivable
at this power is chosen. In a related manner, Power-controlled Auto Rate Fallback (PARF) and Power-Enabled Rate
Fallback (PERF) were suggested in (Chevillat et al. [2005]), in which the authors extend ARF and Estimated Rate
Fallback (ERF) to work with transmission power control. It is important to note that ERF is the SNR-based variant
of ARF, in which each packet carries the power level, the path loss, and noise estimate from the previous packet that
has been received. ERF senders estimate the SNR based on this information and establish the highest transmission
rate compatible with the estimated SNR. The authors of (Chevillat et al. [2005]) discovered that PARF did not work
effectively when the receiver reduced the power used for ACK messages, as they predicted. In essence, this resulted in
inaccurate power reduction choices at the transmitter when these ACK packets were not received. They get more reliable
performance using PERF, making power and rate choices based on the SNR values. These findings are consistent with
(Akella et al. [2005]), which demonstrates that SNR-based treatments are more resilient when compared to loss-based
protocols (Kim et al. [2006]). Despite this, they conclude that in order to achieve such resilience, SNR-based methods
necessitate real-time training.
2 Related Work
Reinforcement Learning (RL) has been researched actively for the control of transmission power and the data rates
for 802.11 standards. (Camp and Knightly [2010]) presented a power allocation technique based on multi-agent
reinforcement learning. The paper reduces the loss function through stochastic gradient descent using a Deep Q-
Network with many agents learning in parallel. The state description for each agent is the previous transmit power
which describes agent
i
’s potential contribution to the network as well as the interfering neighbors’ contributions to
the network based on observations from a set of
n
transmitters with an SNR greater than a predefined threshold and
a receiver with an SNR greater than the threshold. The actions are described as discretized steps of power within a
specified power range shared by all actors (i.e., all agents have the same action space). The reward function is intended
to reflect each agent’s direct interference contribution to the network and its penalty for interfering with all other agents,
defined and interpreted as how action of agent
i
through time slot
t
, i.e.,
p(t)
i
, affects the weighted sum-rate of its own
and future interfered neighbors.
The authors of (Nasir and Guo [2019]) builds on (Camp and Knightly [2010]) and uses an actor-critic algorithm to learn
the optimal policy of a distributed power control. In actor-critic algorithms, two neural networks are designed to learn
and update each other’s weight regarding the experiences and state-action pairs encountered in each episode. Each
transmitter is designed to be a learning agent in the system exploration, so the actors are a number of learning agents
whose next state is conditioned upon the joint actions of all agents that existed as an actor. The critic is a single network
described as the Qtarget used to update each learning agent’s parameters after every episode. The presented work is a
2