OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING Fadlullah Raji

2025-04-29 0 0 847.31KB 15 页 10玖币
侵权投诉
OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE
PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING
Fadlullah Raji
Department of Computer Science
University of South Florida
Tampa, Florida, USA
fraji@usf.edu
Lei Miao
Department of Engineering Technology
Middle Tennessee State University
Murfreesboro, Tennessee, USA
lei.miao@mtsu.edu
ABSTRACT
Future wireless networks require high throughput and energy efficiency. This paper studies using
Reinforcement Learning (RL) to do transmission rate and power control for maximizing a joint reward
function consisting of both throughput and energy consumption. We design the system state to include
factors that reflect packet queue length, interference from other nodes, quality of the wireless channel,
battery status, etc. The reward function is normalized and does not involve unit conversion. It can be
used to train three different types of agents: throughput-critical, energy-critical, and throughput and
energy balanced. Using the NS-3 network simulation software, we implement and train these agents
in an 802.11ac network with the presence of a jammer. We then test the agents with two jamming
nodes interfering with the packets received at the receiver. We compare the performance of our RL
optimal policies with the popular Minstrel rate adaptation algorithm: our approach can achieve (i)
higher throughput when using the throughput-critical reward function; (ii) lower energy consumption
when using the energy-critical reward function; and (iii) higher throughput and slightly higher energy
when using the throughput and energy balanced reward function. Although our discussion is focused
on 802.11ac networks, our method is readily applicable to other types of wireless networks.
Keywords machine learning, reinforcement learning, wireless communications, wireless transmission control.
1 Introduction
Future communication networks need to provide high data rates to users in an energy efficient way. Wi-Fi is a
very popular type of wireless network, and there were 22.2 billion Wi-Fi devices in 2021 (Gadasin et al. [2020]).
Therefore, a slight future improvement on Wi-Fi can significantly improve productivity and make a positive impact
to the environment. IEEE 802.11, the protocol that enables Wi-Fi, defines physical layers that can transmit data at a
variety of rates. Various channel access techniques, such as Orthogonal Frequency Division Multiplexing (OFDM) or
Direct Sequence Spread Spectrum (DSSS), and modulation schemes, such as Binary Phase Shift Keying (BPSK) or
variants of Quadrature Amplitude Modulation (QAM), may be used at different rates. Because effects like multipath
fading, shadowing, signal attenuation, and interference from other radio sources are tolerated differently by each of
these, using the fastest rate regardless of the channel circumstances is not the optimal solution.
For this reason, various rate control algorithms (either proprietary or open-source ones) that dynamically adjust the
transmission rate in response to changing channel circumstances have been designed to improve the performance of
wireless networks. In particular, these rate control algorithms (Mo and Shen [2008], Ye et al. [2014], Hedayati et al.
[2010], Lacage et al. [2004], Yin et al. [2012]) are primarily designed to identify the best rate and modulation scheme
that yield the highest throughput. Because reliable data transmission rates and interference levels are fundamentally
connected in wireless networks, transmission power control (Kim et al. [2014], Ho [2007]) has also been used to
reduce undesirable interference and to conserve energy for wireless devices, especially the battery-powered ones. Joint
transmission rate and power control has been explored to take into account the trade-off between the throughput and the
Fadlullah Raji and Lei Miao.
arXiv:2210.04976v1 [cs.NI] 10 Oct 2022
energy consumed (Hühn [2013]). In principle, decreasing the power or raising the carrier sense threshold may help
to enhance spatial reuse. By differentiating congestion from interference losses, (Ma et al. [2008]) proposes a hybrid
transmit power and carrier sense adaptation approach. When the interference occurs before the data signal, this work
shows that fine-tuning the carrier detection threshold may completely remove interference-related losses. In addition,
power control avoids data signal loss due to interference that occurs when the data signal is sent. (Miao et al. [2017])
explores the trade-off between energy and latency and uses a real-time controller for the dynamic regulation of task
delivery in order to minimize energy consumption while meeting a deadline for each individual task. In particular, the
authors make use of the generalized critical task decomposition algorithm to identify critical tasks on an optimal sample
path.
In addition to power and carrier sense management, when rate control is taken into consideration, it introduces a
trade-off between spatial reuse and the transmission rate that can be sustained (Tobagi and Hira [2010]). A new idea,
spatial back-off, was introduced in (Jamieson et al. [2005]), which allows for dynamic tweaking of the carrier sensing
threshold in conjunction with the Auto-Rate Fallback (ARF) algorithm in order to achieve high throughput. In particular,
ARF shifts to a lower transmission rate if the measured losses exceed a threshold, then switches to a higher transmission
rate after a specified number of consecutive frames are successfully sent. According to (Yang and Vaidya [2007]), when
dealing with discrete data rates and when there are a sufficient number of power levels, controlling the power gives
several benefits over carrier sensing control as compared to a continuous data rate. According to the authors, power
and rate control is a technique that regulates the power and rate of a transmitter depending on the perceived degree of
interference at the receiving end. It is necessary for the receiver to return this information to the transmitter, which
may be accomplished by IEEE 802.11k (Kim et al. [2006]), but is not currently supported by any of the device driver
versions.
An adaptive rate and power control technique that is consistent with IEEE 802.11 operations is proposed in (Committee
et al. [2016]) where Acknowledgments (ACKs) received from the receiver are used to communicate the optimization of
the transmission speed, which continues to operate utilizing two basic adaptive strategies: the maximum possible rate is
assisted with the least potential power; and the lowest possible power is chosen first, then the highest rate conceivable
at this power is chosen. In a related manner, Power-controlled Auto Rate Fallback (PARF) and Power-Enabled Rate
Fallback (PERF) were suggested in (Chevillat et al. [2005]), in which the authors extend ARF and Estimated Rate
Fallback (ERF) to work with transmission power control. It is important to note that ERF is the SNR-based variant
of ARF, in which each packet carries the power level, the path loss, and noise estimate from the previous packet that
has been received. ERF senders estimate the SNR based on this information and establish the highest transmission
rate compatible with the estimated SNR. The authors of (Chevillat et al. [2005]) discovered that PARF did not work
effectively when the receiver reduced the power used for ACK messages, as they predicted. In essence, this resulted in
inaccurate power reduction choices at the transmitter when these ACK packets were not received. They get more reliable
performance using PERF, making power and rate choices based on the SNR values. These findings are consistent with
(Akella et al. [2005]), which demonstrates that SNR-based treatments are more resilient when compared to loss-based
protocols (Kim et al. [2006]). Despite this, they conclude that in order to achieve such resilience, SNR-based methods
necessitate real-time training.
2 Related Work
Reinforcement Learning (RL) has been researched actively for the control of transmission power and the data rates
for 802.11 standards. (Camp and Knightly [2010]) presented a power allocation technique based on multi-agent
reinforcement learning. The paper reduces the loss function through stochastic gradient descent using a Deep Q-
Network with many agents learning in parallel. The state description for each agent is the previous transmit power
which describes agent
i
s potential contribution to the network as well as the interfering neighbors’ contributions to
the network based on observations from a set of
n
transmitters with an SNR greater than a predefined threshold and
a receiver with an SNR greater than the threshold. The actions are described as discretized steps of power within a
specified power range shared by all actors (i.e., all agents have the same action space). The reward function is intended
to reflect each agent’s direct interference contribution to the network and its penalty for interfering with all other agents,
defined and interpreted as how action of agent
i
through time slot
t
, i.e.,
p(t)
i
, affects the weighted sum-rate of its own
and future interfered neighbors.
The authors of (Nasir and Guo [2019]) builds on (Camp and Knightly [2010]) and uses an actor-critic algorithm to learn
the optimal policy of a distributed power control. In actor-critic algorithms, two neural networks are designed to learn
and update each other’s weight regarding the experiences and state-action pairs encountered in each episode. Each
transmitter is designed to be a learning agent in the system exploration, so the actors are a number of learning agents
whose next state is conditioned upon the joint actions of all agents that existed as an actor. The critic is a single network
described as the Qtarget used to update each learning agent’s parameters after every episode. The presented work is a
2
power allocation scheme for conventional wireless mobile networks that considers interference from other networks
and distributes learning of the optimal policy through an actor-critic agent.
Rate adaptation simulation using the standard 802.11g with a finite state (
<
100) was investigated in (Nasir and Guo
[2020]) using the SARSA algorithm with learned Q-values stored in a table. The states were defined to simulate a
standard Robust Rate Adaptation Algorithm (RRAA) which minimizes the loss rate,
Rloss
, to achieve the desirable
transmitting rate. This approach falls short since it considers standard Wi-Fi with a lower data rate than the latest
advancement of the 802.11 such as the 802.11ac and 802.11ax, which has a lot more data rates encoded as the
Modulation Coding Schemes (MCS). Another rate adaptation algorithm simulated in (Peserico et al. [2020]) represents
the state’s observation to the agent as Contention Window (CW) size of the CSMA/CS on an 802.11a standard, which
has 8 MCS level: {6, 9, 12, 18, 24, 36, 48, 54}Mbps as the action space. Locally, the sender node has access to the
observation. Each CSMA/CA node operating in Distributed Coordinated Function (DCF) mode chooses the random
back-off time depending on the current CW size, ensuring that packets transmitted by other nodes do not overlap in the
same manner. When a node transmits a packet for the first time in a typical CSMA/CS 802.11 protocol, it decides the
minimum size of the CW, which is 15 in IEEE 802.11a. If the packets do not arrive at the receiver, the sender re-sends
the packet at twice the CW size. This algorithm uses the CW size concept to discretize its states-action pairs stored in a
Q-values table which is not enough representation of the state of the channel, as packets of bits could be dropped not
only due to the interference but also due to the state of the channel.
This paper proposes a new paradigm that combines the control of transmission speed and the transmission power of
the 802.11 Wi-Fi protocol using the recently developed 802.11ac standard. Specifically, we explore the use of an RL
algorithm to observe the channel state and develop an optimal policy for transmitting packets of information in the
presence of jamming nodes intended for deliberately disrupting the delivery of packets to the receiver. Compared with
other related works in the literature, our contributions are three-fold: (i) Our state space includes multiple elements,
including packet queue length, ACK from the receiver, battery level, CW, and back-off slots; (ii) We jointly control
the transmission power and data rates of a transmitter under jamming and incorporate both energy and throughput
into the reward function; and (iii) We show in simulation that our method outperforms the widely used Minstrel rate
adaptation algorithm. It is worth noting that differently from the surveyed literature, our methodology combines two
reward functions and offers a flexible approach to users for selecting which factor is more important (i.e., either to
maximize the throughput of the system or to minimize the energy consumption of the device).
The organization of the rest of the paper is as follows: the methodology is presented in Section 3; Section 4 discusses
the agents’ training and testing results; the conclusion and future work are discussed in Section 5.
3 Methodology
3.1 State observation
The state is represented by a collection of characteristics derived from local measurements taken at the Transmitting
node (Tx). These characteristics should give sufficient information about the transmitter’s performance and the wireless
channel. In particular, the state is defined as a tuple (Nt,Cw, Bf s, Rp, Bl):
Nt
(packet queue length): the percentage of the packet queue occupied by packets that are ready to be
transmitted or re transmitted. The maximum number of packets that can be queued is 5000. The queue uses a
First-In-First-Out (FIFO) policy, and it is full when the packets in the queue has reached 5000. To reduce the
state space, we divide the queue length into 10 discrete levels, i.e.,
Nt
can only be multiples of 10 between 10
and 100.
Cw
(CW size): This defines a period of time in which the network is operating in contention mode. The larger
the contention window, the larger the average back-off value, and the lower the likelihood of collisions. For
a Best Effort (BE) packet delivery, the contention window duration doubles its current value when there is
collision; the minimum contention value,
Cw(min)
, is 15, and the maximum contention value,
Cw(max)
, is
1023. That is, there are 7 power of two values possible for the contention window between 15 and 1023.
Bfs
(back-off slots): This is the value returned from the back-off algorithm for collision resolution used to
alert collision and re-transmission of packets when there are collisions during the transmission schedule. When
a station enters the back-off state, it waits for an additional and randomly selected number of time slots (the
random number is larger than 0 and less than the current CW maximum value). The total possible value of the
slots in the state space is a slot range from 0 to 1023, inclusive or 1024 values. This is discretized into 128
categories by the given equation below
Bfs int(Bf/8),(1)
3
摘要:

OPTIMALWIRELESSRATEANDPOWERCONTROLINTHEPRESENCEOFJAMMERSUSINGREINFORCEMENTLEARNINGFadlullahRajiDepartmentofComputerScienceUniversityofSouthFloridaTampa,Florida,USAfraji@usf.eduLeiMiaoDepartmentofEngineeringTechnologyMiddleTennesseeStateUniversityMurfreesboro,Tennessee,USAlei.miao@mtsu.eduABSTRACTFu...

展开>> 收起<<
OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING Fadlullah Raji.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:847.31KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注