Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback Gopikrishna Rathinavel

2025-05-01 0 0 2.39MB 12 页 10玖币

侵权投诉

Detecting Irregular Network Activity with

Adversarial Learning and Expert Feedback

Gopikrishna Rathinavel

Virginia Tech

Blacksburg, VA

rgopikrishna@vt.edu

Nikhil Muralidhar

Stevens Institute of Technology

Hoboken, NJ

nmurali1@stevens.edu

Timothy O’Shea

DeepSig Inc & Virginia Tech

Arlington, VA

tim@deepsig.io

Naren Ramakrishnan

Virginia Tech

Arlington, VA

naren@cs.vt.edu

Abstract—Anomaly detection is a ubiquitous and challeng-

ing task, relevant across many disciplines. With the vital role

communication networks play in our daily lives, the security

of these networks is imperative for the smooth functioning of

society. To this end, we propose a novel self-supervised deep

learning framework CAAD for anomaly detection in wireless

communication systems. Speciﬁcally, CAAD employs contrastive

learning in an adversarial setup to learn effective representations

of normal and anomalous behavior in wireless networks. We

conduct rigorous performance comparisons of CAAD with several

state-of-the-art anomaly detection techniques and verify that

CAAD yields a mean performance improvement of 92.84%.

Additionally, we also augment CAAD enabling it to systematically

incorporate expert feedback through a novel contrastive learning

feedback loop to improve the learned representations and thereby

reduce prediction uncertainty (CAAD-EF ). We view CAAD-

EF as a novel, holistic, and widely applicable solution to anomaly

detection. Our source code and data are available online1

Index Terms—anomaly detection, generative neural networks,

wireless, self-supervised learning, contrastive learning, expert

feedback

I. INTRODUCTION

Wireless communications systems form an essential compo-

nent of cyber-physical systems in urban environments along

with the electric grid and the transportation network. These

wireless communication systems and networks enable us to

access the internet, and connect with others remotely, thereby

serving as a vital means for human interaction. Further, they

connect hundreds or thousands of sensors, applications, indus-

trial networks, critical communications systems, and other in-

frastructure. Hence, state monitoring and detection of irregular

activity in wireless networks are essential to ensuring robust

and resilient system operational capabilities.

The electromagnetic spectrum (simply referred to as ‘the

spectrum’) is the information highway through which most

modern forms of electronic communication occur. Parts of the

spectrum are grouped into ‘bands’ (based on the wavelength)

which can be thought of as analogous to lanes on the highway.

Speciﬁc regions (i.e., lanes) of the spectrum are reserved for

speciﬁc types of communication (e.g., radio communication,

broadcast television) based on frequency. The entire spectrum

ranges from 3Hz-300EHz and the typical range used for

wireless communication today is 30Khz-28GHz.

1https://github.com/rgopikrishna-vt/CAAD

Frequency

Bandwidth

Anomaly

Packet Counts

Normal

Antenna

Fig. 1. Irregular Activity in Wireless Communication Systems. [1]

Spectrum access activity in wireless systems carries rich

information which can indicate underlying activity of phys-

ical device presence, activity and behaviors corresponding

to security threats and intrusions, jamming attempts, device

malfunctioning, interference, illicit transmissions, and a host

of other activities (see Fig. 1). Data corresponding to spectrum

access activity information has been explored in wireless

intrusion detection systems (WIDS) in a very limited context

and most of the systems in use today for detecting anomalous

network activity, are highly application speciﬁc and focus

on specialized feature engineering, detector engineering, and

signal-speciﬁc digital signal processing (DSP) engineering.

Such systems are not generalizable, are highly sensitive to

minor variations in system characteristics and are costly to

maintain due to the requirements of rich feature engineering.

Hence in this work, we have developed a generic and

powerful unsupervised anomaly detection framework and

demonstrated its prowess in the context of wireless net-

work anomalies. Speciﬁcally, we propose a novel solution

to anomaly detection (AD), Contrastive Adversarial Anomaly

Detection (CAAD ) which applies contrastive learning (CL)

in an adversarial setup. We also augment CAAD with the

ability to incorporate expert feedback (EF) to improve the

quality of its learned representations for AD. We call this

model CAAD-EF (CAAD with expert feedback). To the best

of our knowledge, we are the ﬁrst to propose such a powerful

yet ﬂexible AD framework that applies contrastive learning

paradigms in an adversarial setup with the ability to incor-

porate expert feedback via contrastive learning to improve its

learned representations and reduce prediction uncertainty.

Our contributions are as follows:

arXiv:2210.02841v2 [cs.CR] 15 Oct 2022

•We propose CAAD , a novel method for AD which uti-

lizes contrastive learning and generative adversarial networks

(GAN). We demonstrate that our proposed model is able to

signiﬁcantly outperform state-of-the-art (SOTA) models on

AD in wireless networks and standard datasets. To the best of

our knowledge, CAAD is the ﬁrst model to use a combination

of CL and adversarial learning for AD.

•We propose CAAD-EF , which is another novel model

supplemental to CAAD , which further enables us to incorpo-

rate expert feedback via contrastive learning and uncertainty

quantiﬁcation using Monte Carlo dropouts. To the best of our

knowledge, our framework is the ﬁrst successful undertaking

to utilize contrastive learning to incorporate expert feedback.

•Finally, we highlight the importance of various facets

of CAAD-EF through rigorous qualitative, quantitative, and

ablation analyses.

II. RELATED WORK

Many ML approaches have been developed for anomaly

detection across diverse applications. The recent resurgence

of deep learning techniques demonstrating their effectiveness

across a wide variety of domains has lead to the develop-

ment of many novel and powerful modeling paradigms like

generative adversarial networks (GAN) [2], self-supervised

representation learning [3] and contrastive learning (CL) [4].

Contrastive Learning (CL) imposes structure on the latent

space by encouraging similarity in representations learned

for related instances and dissimilarity in representations for

unrelated instances. Such techniques have proven effective,

especially when combined with self-supervised learning [5],

[6] and also with labeled data [7]. CL has demonstrated

promising results in image recognition tasks. However, most

of these efforts focus on improving representation learning

performance on traditional classiﬁcation tasks and do not

speciﬁcally focus on AD. Generative Adversarial Networks

(GANs) [2] are a powerful generative learning paradigm

grounded in an adversarial training setup. However, they

are fraught with training instability. Recently, improvements

have been proposed to stabilize the GAN training setup by

employing Wasserstien distance functions [8] and gradient

penalties on the learned weights.

Deep Learning for Anomaly Detection: The aforementioned

developments in deep learning have led to techniques such as

autoencoders and GANs being employed for the ubiquitous

and challenging problem of AD. Speciﬁcally, in [9], a deep

robust autoencoder (Robust AE) model is proposed, inspired

by the Robust Principal Component Analysis technique, for

AD with noisy training data. However, this methodology by

design requires knowledge of a subset of anomalies during

model training and may be considered semi-supervised, and

is not directly related to our context of unsupervised AD.

Recently, another line of AD research [10] proposes employing

DCGAN [2] for unsupervised AD. The authors then build

upon their previous work to propose fAnoGAN [11], a two-

step encoder-decoder architecture based on DCGANs where

the encoder (trained separately) learns to invert the mapping

learned by the Generator (i.e., decoder) of the DCGAN model.

We employ fAnoGAN as one of the baselines for empirical

comparison.

Contrastive Learning for Anomaly Detection: There are

multiple reports of contrastive learning being utilized for AD.

Masked Contrastive Learning [12] is a supervised method that

varies the weights of different classes in the contrastive loss

function to produce good representations that separate each

class. Even though this method shows promise, it requires

knowledge of anomaly labels. Contrasting Shifted Instances

(CSI) [13] and Mean Shifted Contrastive Loss [14] are two

unsupervised AD methods based on CL. CSI investigates the

power of self-supervised CL for detecting out-of-distribution

(OOD) data by using distributionally shifted variations of input

data. We employ CSI as one of our baselines. Mean Shifted

Contrastive Loss applies a contrastive loss modiﬁed using

the mean representation on representations generated using

models pre-trained on ImageNet data. However, this model is

not useful for wireless AD as it is pre-trained on a particular

kind of data. Also, none of these methods provide a means to

incorporate expert feedback.

Incorporating Expert Feedback: The solutions presented

in [15]–[19] all employ human feedback in various ways.

Active Anomaly Discovery (AAD) [15] is designed to op-

erate in an anomaly exploration loop where the algorithm

selects data to be presented to experts and also provides

a means to incorporate feedback into the model. However,

its performance is dependent on the number of feedback

loops that can be afforded. Hence, such a method could

not be applied to wireless AD where the volume of input

data is really high. RAMODO [17], combines representation

learning and outlier detection in a single objective function.

It utilizes pseudo labels generated by other state-of-the-art

outlier detection methods and Chebyshev’s inequality. This

dependence on other methods to generate pseudo labels can

sometimes be unreliable in cases where state-of-the-art outlier

detection methods perform poorly. SAAD [16], DevNet [18]

and DPLAN [19] are semi-supervised methods, all of which

require minimal labeled anomalies and are not suitable for our

problem.

The advantage of using contrastive learning for AD is that

it can be utilized in a self-supervised setup. That is, we can

augment the training samples to generate anomalous samples

that are very close to the training distribution and utilize them

as negative samples in contrastive loss. This allows our model

to detect unseen anomalies effectively. Also, the penultimate

layer of the GAN discriminators has recently been shown

to act as good representations of the input data [11], [20]–

[22]. Hence, the combination of these powerful techniques,

CL and GAN serve well for our AD task. None of the related

approaches outlined above have developed AD techniques that

combine the aforementioned techniques for AD. Also, none of

the state-of-the-art related AD approaches provide a means to

incorporate expert feedback via contrastive learning.

III. BACKGROUND

We propose CAAD and CAAD-EF which employ tech-

niques such as adversarial learning, contrastive learning (CL)

and uncertainty quantiﬁcation (UQ). We shall now brieﬂy

introduce these concepts before detailing the full CAAD-

EF framework in section IV.

A. Generative Adversarial Networks (GAN)

GANs are a class of generative models where the learning

problem is formulated as a game between two neural networks,

namely the generator (G) and the discriminator (D). The

problem setup of GANs comprises the generator learning to

transform inputs sampled from a noise distribution into a

distribution Pfsuch that it resembles the true data distribution

Pr. Essentially, the generator is trained to fool the discrim-

inator while the discriminator is tasked with distinguishing

between fake samples ˜x∼Pfgenerated by G and real samples

x∼Pr. The traditional GAN [2] setup minimizes the Jenson-

Shannon (JS) divergence between Prand Pf. However, this

divergence is not continuous with respect to the parameters

of G, leading to training instabilities. Wasserstein GAN [8]

(WGAN) was proposed to address this issue. WGAN employs

Earth-Mover distance (instead of the JS divergence) which

under mild assumptions does not have discontinuities and is

almost universally differentiable. Consider the discriminator

(also termed the critic2in WGAN) D parameterized by Ωand

generator G parameterized by θ. Eq. 1 depicts the WGAN loss

function where Dis parameterized by Ω∈ B, where Bis the

set of 1-Lipschitz functions.

Lw= min

θmax

Ω∈B

x∼Pr

[DΩ(x)] −E

˜x∼Pf

[DΩ(˜x)] (1)

Enforcing the 1-Lipschitz constraint on DΩhas been found

to be challenging. On the basis of the property that a function

is 1-Lipschitz if and only if it has a norm no greater than

1 everywhere, [23] proposed a solution of augmenting the

WGAN loss with a soft constraint enforcing that the norm

of the discriminator gradients (w.r.t the inputs) be 1. The ob-

jective function of the WGAN with this updated soft constraint

(termed a gradient penalty) is shown in Eq. 2.

Lgp =Lw+λE

ˇx∼Pi

(k∇DΩ(ˇx)k2−1)2(2)

Each sample ˇx∼Piis generated as a convex combination

of points from Pr,Pf(i.e., sampled from the line connecting

points from Pr,Pf). λenforces the strictness of the gradient

penalty.

B. Contrastive Learning (CL)

The paradigm of contrastive learning (CL) has recently

demonstrated highly effective results across a diverse set

of disciplines and tasks, especially in computer vision [4],

[6], [24]. The goal of CL is to impose structure on latent

representations learned by a model (M). This is often achieved

using soft penalties (e.g., additional loss terms) that inﬂuence

2words ‘critic’, ‘discriminator’ are used interchangeably in the paper.

representations generated by M to be structured so repre-

sentations of related instances are closer together relative to

instances that are known to be unrelated. Most CL losses are

set in a self-supervised context where relatedness is generated

via augmentations of an instance and two distinct instances

are considered to be unrelated.

Recently, [7] proposed supervised contrastive learning

(SupCon), which is an extension of the CL paradigm to super-

vised (classiﬁcation) settings. A model trained with SupCon

on a labeled dataset learns latent representations grouped by

class labels while also forcing separation in representations

between instances belonging to different classes (i.e., low

intra-class separation and high inter-class separation of latent

representations).

Consider a dataset of instances D={(x1, y1), .., (xm, ym)}

such that xi∈Rb×land yi∈ C is the label of xiand Cis

the set of class labels. Then, the supervised contrastive loss is

deﬁned by Eq. 3.

Lsup =X

xi∈D

−1

|P os(i)|X

xk∈P os(i)

log exp (zi·zk/τ)

Pj∈Q(i)exp (zi·zj/τ)(3)

Here, zi∈Rh×1is the latent representation of xigenerated by

model M. Pos(i) = {xk∈ D|yk== yi∧k6=i}is the set of

instances that form the ‘positive set’ for xi.Q(i) = {D\xi}.

τ∈R+is a hyperparameter. We employ Eq. 3 for CL but

with labels generated in a self-supervised manner.

C. Uncertainty Quantiﬁcation (UQ)

Quantifying decision uncertainty is critical to the success

of real-world machine learning (ML) frameworks. It is of

special relevance in the current setting of anomaly detection

wherein the conﬁdence of a model in its decision addition-

ally indicates the urgency of a potential alert issued by the

model. While traditional ML models yield point predictions,

Bayesian ML provides a framework for capturing model

uncertainty. One such UQ approach [25], can be considered to

approximate Gaussian Processes with neural network models.

This approach termed Monte-Carlo Dropout entails running a

monte-carlo sampling (during inference) of a trained model by

randomly masking a set of learned weights of the model each

time (i.e., dropout [26]). This is akin to sampling from the

approximate posterior which leads to uncovering the model’s

predictive distribution. Inferring the predictive distribution

is one of the methods to quantify model uncertainty with

Bayesian neural networks.

IV. PROBLEM FORMULATION

We shall now describe the various facets of our novel

human-in-the-loop anomaly detection framework CAAD-EF .

Fig. 2 details the overall architecture of CAAD-EF .

A. Self Supervised Anomaly Detection with Negative Trans-

formations

The core of the proposed framework is the Contrastive

Adversarial Anomaly Detection (CAAD ) model. The struc-

ture of the CAAD model resembles a WGAN with gradient

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DetectingIrregularNetworkActivitywithAdversarialLearningandExpertFeedbackGopikrishnaRathinavelVirginiaTechBlacksburg,VArgopikrishna@vt.eduNikhilMuralidharStevensInstituteofTechnologyHoboken,NJnmurali1@stevens.eduTimothyO'SheaDeepSigInc&VirginiaTechArlington,VAtim@deepsig.ioNarenRamakrishnanVirginiaT...

展开>> 收起<<

Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback Gopikrishna Rathinavel.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback Gopikrishna Rathinavel

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: