OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING Fadlullah Raji

2025-04-29 0 0 847.31KB 15 页 10玖币

侵权投诉

OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE

PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING

Fadlullah Raji∗

Department of Computer Science

University of South Florida

Tampa, Florida, USA

fraji@usf.edu

Lei Miao

Department of Engineering Technology

Middle Tennessee State University

Murfreesboro, Tennessee, USA

lei.miao@mtsu.edu

ABSTRACT

Future wireless networks require high throughput and energy efﬁciency. This paper studies using

Reinforcement Learning (RL) to do transmission rate and power control for maximizing a joint reward

function consisting of both throughput and energy consumption. We design the system state to include

factors that reﬂect packet queue length, interference from other nodes, quality of the wireless channel,

battery status, etc. The reward function is normalized and does not involve unit conversion. It can be

used to train three different types of agents: throughput-critical, energy-critical, and throughput and

energy balanced. Using the NS-3 network simulation software, we implement and train these agents

in an 802.11ac network with the presence of a jammer. We then test the agents with two jamming

nodes interfering with the packets received at the receiver. We compare the performance of our RL

optimal policies with the popular Minstrel rate adaptation algorithm: our approach can achieve (i)

higher throughput when using the throughput-critical reward function; (ii) lower energy consumption

when using the energy-critical reward function; and (iii) higher throughput and slightly higher energy

when using the throughput and energy balanced reward function. Although our discussion is focused

on 802.11ac networks, our method is readily applicable to other types of wireless networks.

Keywords machine learning, reinforcement learning, wireless communications, wireless transmission control.

1 Introduction

Future communication networks need to provide high data rates to users in an energy efﬁcient way. Wi-Fi is a

very popular type of wireless network, and there were 22.2 billion Wi-Fi devices in 2021 (Gadasin et al. [2020]).

Therefore, a slight future improvement on Wi-Fi can signiﬁcantly improve productivity and make a positive impact

to the environment. IEEE 802.11, the protocol that enables Wi-Fi, deﬁnes physical layers that can transmit data at a

variety of rates. Various channel access techniques, such as Orthogonal Frequency Division Multiplexing (OFDM) or

Direct Sequence Spread Spectrum (DSSS), and modulation schemes, such as Binary Phase Shift Keying (BPSK) or

variants of Quadrature Amplitude Modulation (QAM), may be used at different rates. Because effects like multipath

fading, shadowing, signal attenuation, and interference from other radio sources are tolerated differently by each of

these, using the fastest rate regardless of the channel circumstances is not the optimal solution.

For this reason, various rate control algorithms (either proprietary or open-source ones) that dynamically adjust the

transmission rate in response to changing channel circumstances have been designed to improve the performance of

wireless networks. In particular, these rate control algorithms (Mo and Shen [2008], Ye et al. [2014], Hedayati et al.

[2010], Lacage et al. [2004], Yin et al. [2012]) are primarily designed to identify the best rate and modulation scheme

that yield the highest throughput. Because reliable data transmission rates and interference levels are fundamentally

connected in wireless networks, transmission power control (Kim et al. [2014], Ho [2007]) has also been used to

reduce undesirable interference and to conserve energy for wireless devices, especially the battery-powered ones. Joint

transmission rate and power control has been explored to take into account the trade-off between the throughput and the

∗Fadlullah Raji and Lei Miao.

arXiv:2210.04976v1 [cs.NI] 10 Oct 2022

energy consumed (Hühn [2013]). In principle, decreasing the power or raising the carrier sense threshold may help

to enhance spatial reuse. By differentiating congestion from interference losses, (Ma et al. [2008]) proposes a hybrid

transmit power and carrier sense adaptation approach. When the interference occurs before the data signal, this work

shows that ﬁne-tuning the carrier detection threshold may completely remove interference-related losses. In addition,

power control avoids data signal loss due to interference that occurs when the data signal is sent. (Miao et al. [2017])

explores the trade-off between energy and latency and uses a real-time controller for the dynamic regulation of task

delivery in order to minimize energy consumption while meeting a deadline for each individual task. In particular, the

authors make use of the generalized critical task decomposition algorithm to identify critical tasks on an optimal sample

path.

In addition to power and carrier sense management, when rate control is taken into consideration, it introduces a

trade-off between spatial reuse and the transmission rate that can be sustained (Tobagi and Hira [2010]). A new idea,

spatial back-off, was introduced in (Jamieson et al. [2005]), which allows for dynamic tweaking of the carrier sensing

threshold in conjunction with the Auto-Rate Fallback (ARF) algorithm in order to achieve high throughput. In particular,

ARF shifts to a lower transmission rate if the measured losses exceed a threshold, then switches to a higher transmission

rate after a speciﬁed number of consecutive frames are successfully sent. According to (Yang and Vaidya [2007]), when

dealing with discrete data rates and when there are a sufﬁcient number of power levels, controlling the power gives

several beneﬁts over carrier sensing control as compared to a continuous data rate. According to the authors, power

and rate control is a technique that regulates the power and rate of a transmitter depending on the perceived degree of

interference at the receiving end. It is necessary for the receiver to return this information to the transmitter, which

may be accomplished by IEEE 802.11k (Kim et al. [2006]), but is not currently supported by any of the device driver

versions.

An adaptive rate and power control technique that is consistent with IEEE 802.11 operations is proposed in (Committee

et al. [2016]) where Acknowledgments (ACKs) received from the receiver are used to communicate the optimization of

the transmission speed, which continues to operate utilizing two basic adaptive strategies: the maximum possible rate is

assisted with the least potential power; and the lowest possible power is chosen ﬁrst, then the highest rate conceivable

at this power is chosen. In a related manner, Power-controlled Auto Rate Fallback (PARF) and Power-Enabled Rate

Fallback (PERF) were suggested in (Chevillat et al. [2005]), in which the authors extend ARF and Estimated Rate

Fallback (ERF) to work with transmission power control. It is important to note that ERF is the SNR-based variant

of ARF, in which each packet carries the power level, the path loss, and noise estimate from the previous packet that

has been received. ERF senders estimate the SNR based on this information and establish the highest transmission

rate compatible with the estimated SNR. The authors of (Chevillat et al. [2005]) discovered that PARF did not work

effectively when the receiver reduced the power used for ACK messages, as they predicted. In essence, this resulted in

inaccurate power reduction choices at the transmitter when these ACK packets were not received. They get more reliable

performance using PERF, making power and rate choices based on the SNR values. These ﬁndings are consistent with

(Akella et al. [2005]), which demonstrates that SNR-based treatments are more resilient when compared to loss-based

protocols (Kim et al. [2006]). Despite this, they conclude that in order to achieve such resilience, SNR-based methods

necessitate real-time training.

2 Related Work

Reinforcement Learning (RL) has been researched actively for the control of transmission power and the data rates

for 802.11 standards. (Camp and Knightly [2010]) presented a power allocation technique based on multi-agent

reinforcement learning. The paper reduces the loss function through stochastic gradient descent using a Deep Q-

Network with many agents learning in parallel. The state description for each agent is the previous transmit power

which describes agent

’s potential contribution to the network as well as the interfering neighbors’ contributions to

the network based on observations from a set of

transmitters with an SNR greater than a predeﬁned threshold and

a receiver with an SNR greater than the threshold. The actions are described as discretized steps of power within a

speciﬁed power range shared by all actors (i.e., all agents have the same action space). The reward function is intended

to reﬂect each agent’s direct interference contribution to the network and its penalty for interfering with all other agents,

deﬁned and interpreted as how action of agent

through time slot

, i.e.,

p(t)

, affects the weighted sum-rate of its own

and future interfered neighbors.

The authors of (Nasir and Guo [2019]) builds on (Camp and Knightly [2010]) and uses an actor-critic algorithm to learn

the optimal policy of a distributed power control. In actor-critic algorithms, two neural networks are designed to learn

and update each other’s weight regarding the experiences and state-action pairs encountered in each episode. Each

transmitter is designed to be a learning agent in the system exploration, so the actors are a number of learning agents

whose next state is conditioned upon the joint actions of all agents that existed as an actor. The critic is a single network

described as the Qtarget used to update each learning agent’s parameters after every episode. The presented work is a

power allocation scheme for conventional wireless mobile networks that considers interference from other networks

and distributes learning of the optimal policy through an actor-critic agent.

Rate adaptation simulation using the standard 802.11g with a ﬁnite state (

100) was investigated in (Nasir and Guo

[2020]) using the SARSA algorithm with learned Q-values stored in a table. The states were deﬁned to simulate a

standard Robust Rate Adaptation Algorithm (RRAA) which minimizes the loss rate,

Rloss

, to achieve the desirable

transmitting rate. This approach falls short since it considers standard Wi-Fi with a lower data rate than the latest

advancement of the 802.11 such as the 802.11ac and 802.11ax, which has a lot more data rates encoded as the

Modulation Coding Schemes (MCS). Another rate adaptation algorithm simulated in (Peserico et al. [2020]) represents

the state’s observation to the agent as Contention Window (CW) size of the CSMA/CS on an 802.11a standard, which

has 8 MCS level: {6, 9, 12, 18, 24, 36, 48, 54}Mbps as the action space. Locally, the sender node has access to the

observation. Each CSMA/CA node operating in Distributed Coordinated Function (DCF) mode chooses the random

back-off time depending on the current CW size, ensuring that packets transmitted by other nodes do not overlap in the

same manner. When a node transmits a packet for the ﬁrst time in a typical CSMA/CS 802.11 protocol, it decides the

minimum size of the CW, which is 15 in IEEE 802.11a. If the packets do not arrive at the receiver, the sender re-sends

the packet at twice the CW size. This algorithm uses the CW size concept to discretize its states-action pairs stored in a

Q-values table which is not enough representation of the state of the channel, as packets of bits could be dropped not

only due to the interference but also due to the state of the channel.

This paper proposes a new paradigm that combines the control of transmission speed and the transmission power of

the 802.11 Wi-Fi protocol using the recently developed 802.11ac standard. Speciﬁcally, we explore the use of an RL

algorithm to observe the channel state and develop an optimal policy for transmitting packets of information in the

presence of jamming nodes intended for deliberately disrupting the delivery of packets to the receiver. Compared with

other related works in the literature, our contributions are three-fold: (i) Our state space includes multiple elements,

including packet queue length, ACK from the receiver, battery level, CW, and back-off slots; (ii) We jointly control

the transmission power and data rates of a transmitter under jamming and incorporate both energy and throughput

into the reward function; and (iii) We show in simulation that our method outperforms the widely used Minstrel rate

adaptation algorithm. It is worth noting that differently from the surveyed literature, our methodology combines two

reward functions and offers a ﬂexible approach to users for selecting which factor is more important (i.e., either to

maximize the throughput of the system or to minimize the energy consumption of the device).

The organization of the rest of the paper is as follows: the methodology is presented in Section 3; Section 4 discusses

the agents’ training and testing results; the conclusion and future work are discussed in Section 5.

3 Methodology

3.1 State observation

The state is represented by a collection of characteristics derived from local measurements taken at the Transmitting

node (Tx). These characteristics should give sufﬁcient information about the transmitter’s performance and the wireless

channel. In particular, the state is deﬁned as a tuple (Nt,Cw, Bf s, Rp, Bl):

•Nt

(packet queue length): the percentage of the packet queue occupied by packets that are ready to be

transmitted or re transmitted. The maximum number of packets that can be queued is 5000. The queue uses a

First-In-First-Out (FIFO) policy, and it is full when the packets in the queue has reached 5000. To reduce the

state space, we divide the queue length into 10 discrete levels, i.e.,

can only be multiples of 10 between 10

and 100.

•Cw

(CW size): This deﬁnes a period of time in which the network is operating in contention mode. The larger

the contention window, the larger the average back-off value, and the lower the likelihood of collisions. For

a Best Effort (BE) packet delivery, the contention window duration doubles its current value when there is

collision; the minimum contention value,

Cw(min)

, is 15, and the maximum contention value,

Cw(max)

, is

1023. That is, there are 7 power of two values possible for the contention window between 15 and 1023.

•Bfs

(back-off slots): This is the value returned from the back-off algorithm for collision resolution used to

alert collision and re-transmission of packets when there are collisions during the transmission schedule. When

a station enters the back-off state, it waits for an additional and randomly selected number of time slots (the

random number is larger than 0 and less than the current CW maximum value). The total possible value of the

slots in the state space is a slot range from 0 to 1023, inclusive or 1024 values. This is discretized into 128

categories by the given equation below

Bfs ←int(Bf/8),(1)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OPTIMALWIRELESSRATEANDPOWERCONTROLINTHEPRESENCEOFJAMMERSUSINGREINFORCEMENTLEARNINGFadlullahRajiDepartmentofComputerScienceUniversityofSouthFloridaTampa,Florida,USAfraji@usf.eduLeiMiaoDepartmentofEngineeringTechnologyMiddleTennesseeStateUniversityMurfreesboro,Tennessee,USAlei.miao@mtsu.eduABSTRACTFu...

展开>> 收起<<

OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING Fadlullah Raji.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

OPTIMAL WIRELESS RATE AND POWER CONTROL IN THE PRESENCE OF JAMMERS USING REINFORCEMENT LEARNING Fadlullah Raji

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: