Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback Gopikrishna Rathinavel

2025-05-01 0 0 2.39MB 12 页 10玖币
侵权投诉
Detecting Irregular Network Activity with
Adversarial Learning and Expert Feedback
Gopikrishna Rathinavel
Virginia Tech
Blacksburg, VA
rgopikrishna@vt.edu
Nikhil Muralidhar
Stevens Institute of Technology
Hoboken, NJ
nmurali1@stevens.edu
Timothy O’Shea
DeepSig Inc & Virginia Tech
Arlington, VA
tim@deepsig.io
Naren Ramakrishnan
Virginia Tech
Arlington, VA
naren@cs.vt.edu
Abstract—Anomaly detection is a ubiquitous and challeng-
ing task, relevant across many disciplines. With the vital role
communication networks play in our daily lives, the security
of these networks is imperative for the smooth functioning of
society. To this end, we propose a novel self-supervised deep
learning framework CAAD for anomaly detection in wireless
communication systems. Specifically, CAAD employs contrastive
learning in an adversarial setup to learn effective representations
of normal and anomalous behavior in wireless networks. We
conduct rigorous performance comparisons of CAAD with several
state-of-the-art anomaly detection techniques and verify that
CAAD yields a mean performance improvement of 92.84%.
Additionally, we also augment CAAD enabling it to systematically
incorporate expert feedback through a novel contrastive learning
feedback loop to improve the learned representations and thereby
reduce prediction uncertainty (CAAD-EF ). We view CAAD-
EF as a novel, holistic, and widely applicable solution to anomaly
detection. Our source code and data are available online1
Index Terms—anomaly detection, generative neural networks,
wireless, self-supervised learning, contrastive learning, expert
feedback
I. INTRODUCTION
Wireless communications systems form an essential compo-
nent of cyber-physical systems in urban environments along
with the electric grid and the transportation network. These
wireless communication systems and networks enable us to
access the internet, and connect with others remotely, thereby
serving as a vital means for human interaction. Further, they
connect hundreds or thousands of sensors, applications, indus-
trial networks, critical communications systems, and other in-
frastructure. Hence, state monitoring and detection of irregular
activity in wireless networks are essential to ensuring robust
and resilient system operational capabilities.
The electromagnetic spectrum (simply referred to as ‘the
spectrum’) is the information highway through which most
modern forms of electronic communication occur. Parts of the
spectrum are grouped into ‘bands’ (based on the wavelength)
which can be thought of as analogous to lanes on the highway.
Specific regions (i.e., lanes) of the spectrum are reserved for
specific types of communication (e.g., radio communication,
broadcast television) based on frequency. The entire spectrum
ranges from 3Hz-300EHz and the typical range used for
wireless communication today is 30Khz-28GHz.
1https://github.com/rgopikrishna-vt/CAAD
Frequency
Bandwidth
Anomaly
Packet Counts
Normal
Antenna
Fig. 1. Irregular Activity in Wireless Communication Systems. [1]
Spectrum access activity in wireless systems carries rich
information which can indicate underlying activity of phys-
ical device presence, activity and behaviors corresponding
to security threats and intrusions, jamming attempts, device
malfunctioning, interference, illicit transmissions, and a host
of other activities (see Fig. 1). Data corresponding to spectrum
access activity information has been explored in wireless
intrusion detection systems (WIDS) in a very limited context
and most of the systems in use today for detecting anomalous
network activity, are highly application specific and focus
on specialized feature engineering, detector engineering, and
signal-specific digital signal processing (DSP) engineering.
Such systems are not generalizable, are highly sensitive to
minor variations in system characteristics and are costly to
maintain due to the requirements of rich feature engineering.
Hence in this work, we have developed a generic and
powerful unsupervised anomaly detection framework and
demonstrated its prowess in the context of wireless net-
work anomalies. Specifically, we propose a novel solution
to anomaly detection (AD), Contrastive Adversarial Anomaly
Detection (CAAD ) which applies contrastive learning (CL)
in an adversarial setup. We also augment CAAD with the
ability to incorporate expert feedback (EF) to improve the
quality of its learned representations for AD. We call this
model CAAD-EF (CAAD with expert feedback). To the best
of our knowledge, we are the first to propose such a powerful
yet flexible AD framework that applies contrastive learning
paradigms in an adversarial setup with the ability to incor-
porate expert feedback via contrastive learning to improve its
learned representations and reduce prediction uncertainty.
Our contributions are as follows:
arXiv:2210.02841v2 [cs.CR] 15 Oct 2022
We propose CAAD , a novel method for AD which uti-
lizes contrastive learning and generative adversarial networks
(GAN). We demonstrate that our proposed model is able to
significantly outperform state-of-the-art (SOTA) models on
AD in wireless networks and standard datasets. To the best of
our knowledge, CAAD is the first model to use a combination
of CL and adversarial learning for AD.
We propose CAAD-EF , which is another novel model
supplemental to CAAD , which further enables us to incorpo-
rate expert feedback via contrastive learning and uncertainty
quantification using Monte Carlo dropouts. To the best of our
knowledge, our framework is the first successful undertaking
to utilize contrastive learning to incorporate expert feedback.
Finally, we highlight the importance of various facets
of CAAD-EF through rigorous qualitative, quantitative, and
ablation analyses.
II. RELATED WORK
Many ML approaches have been developed for anomaly
detection across diverse applications. The recent resurgence
of deep learning techniques demonstrating their effectiveness
across a wide variety of domains has lead to the develop-
ment of many novel and powerful modeling paradigms like
generative adversarial networks (GAN) [2], self-supervised
representation learning [3] and contrastive learning (CL) [4].
Contrastive Learning (CL) imposes structure on the latent
space by encouraging similarity in representations learned
for related instances and dissimilarity in representations for
unrelated instances. Such techniques have proven effective,
especially when combined with self-supervised learning [5],
[6] and also with labeled data [7]. CL has demonstrated
promising results in image recognition tasks. However, most
of these efforts focus on improving representation learning
performance on traditional classification tasks and do not
specifically focus on AD. Generative Adversarial Networks
(GANs) [2] are a powerful generative learning paradigm
grounded in an adversarial training setup. However, they
are fraught with training instability. Recently, improvements
have been proposed to stabilize the GAN training setup by
employing Wasserstien distance functions [8] and gradient
penalties on the learned weights.
Deep Learning for Anomaly Detection: The aforementioned
developments in deep learning have led to techniques such as
autoencoders and GANs being employed for the ubiquitous
and challenging problem of AD. Specifically, in [9], a deep
robust autoencoder (Robust AE) model is proposed, inspired
by the Robust Principal Component Analysis technique, for
AD with noisy training data. However, this methodology by
design requires knowledge of a subset of anomalies during
model training and may be considered semi-supervised, and
is not directly related to our context of unsupervised AD.
Recently, another line of AD research [10] proposes employing
DCGAN [2] for unsupervised AD. The authors then build
upon their previous work to propose fAnoGAN [11], a two-
step encoder-decoder architecture based on DCGANs where
the encoder (trained separately) learns to invert the mapping
learned by the Generator (i.e., decoder) of the DCGAN model.
We employ fAnoGAN as one of the baselines for empirical
comparison.
Contrastive Learning for Anomaly Detection: There are
multiple reports of contrastive learning being utilized for AD.
Masked Contrastive Learning [12] is a supervised method that
varies the weights of different classes in the contrastive loss
function to produce good representations that separate each
class. Even though this method shows promise, it requires
knowledge of anomaly labels. Contrasting Shifted Instances
(CSI) [13] and Mean Shifted Contrastive Loss [14] are two
unsupervised AD methods based on CL. CSI investigates the
power of self-supervised CL for detecting out-of-distribution
(OOD) data by using distributionally shifted variations of input
data. We employ CSI as one of our baselines. Mean Shifted
Contrastive Loss applies a contrastive loss modified using
the mean representation on representations generated using
models pre-trained on ImageNet data. However, this model is
not useful for wireless AD as it is pre-trained on a particular
kind of data. Also, none of these methods provide a means to
incorporate expert feedback.
Incorporating Expert Feedback: The solutions presented
in [15]–[19] all employ human feedback in various ways.
Active Anomaly Discovery (AAD) [15] is designed to op-
erate in an anomaly exploration loop where the algorithm
selects data to be presented to experts and also provides
a means to incorporate feedback into the model. However,
its performance is dependent on the number of feedback
loops that can be afforded. Hence, such a method could
not be applied to wireless AD where the volume of input
data is really high. RAMODO [17], combines representation
learning and outlier detection in a single objective function.
It utilizes pseudo labels generated by other state-of-the-art
outlier detection methods and Chebyshev’s inequality. This
dependence on other methods to generate pseudo labels can
sometimes be unreliable in cases where state-of-the-art outlier
detection methods perform poorly. SAAD [16], DevNet [18]
and DPLAN [19] are semi-supervised methods, all of which
require minimal labeled anomalies and are not suitable for our
problem.
The advantage of using contrastive learning for AD is that
it can be utilized in a self-supervised setup. That is, we can
augment the training samples to generate anomalous samples
that are very close to the training distribution and utilize them
as negative samples in contrastive loss. This allows our model
to detect unseen anomalies effectively. Also, the penultimate
layer of the GAN discriminators has recently been shown
to act as good representations of the input data [11], [20]–
[22]. Hence, the combination of these powerful techniques,
CL and GAN serve well for our AD task. None of the related
approaches outlined above have developed AD techniques that
combine the aforementioned techniques for AD. Also, none of
the state-of-the-art related AD approaches provide a means to
incorporate expert feedback via contrastive learning.
III. BACKGROUND
We propose CAAD and CAAD-EF which employ tech-
niques such as adversarial learning, contrastive learning (CL)
and uncertainty quantification (UQ). We shall now briefly
introduce these concepts before detailing the full CAAD-
EF framework in section IV.
A. Generative Adversarial Networks (GAN)
GANs are a class of generative models where the learning
problem is formulated as a game between two neural networks,
namely the generator (G) and the discriminator (D). The
problem setup of GANs comprises the generator learning to
transform inputs sampled from a noise distribution into a
distribution Pfsuch that it resembles the true data distribution
Pr. Essentially, the generator is trained to fool the discrim-
inator while the discriminator is tasked with distinguishing
between fake samples ˜xPfgenerated by G and real samples
xPr. The traditional GAN [2] setup minimizes the Jenson-
Shannon (JS) divergence between Prand Pf. However, this
divergence is not continuous with respect to the parameters
of G, leading to training instabilities. Wasserstein GAN [8]
(WGAN) was proposed to address this issue. WGAN employs
Earth-Mover distance (instead of the JS divergence) which
under mild assumptions does not have discontinuities and is
almost universally differentiable. Consider the discriminator
(also termed the critic2in WGAN) D parameterized by and
generator G parameterized by θ. Eq. 1 depicts the WGAN loss
function where Dis parameterized by ∈ B, where Bis the
set of 1-Lipschitz functions.
Lw= min
θmax
∈B
E
xPr
[D(x)] E
˜xPf
[D(˜x)] (1)
Enforcing the 1-Lipschitz constraint on Dhas been found
to be challenging. On the basis of the property that a function
is 1-Lipschitz if and only if it has a norm no greater than
1 everywhere, [23] proposed a solution of augmenting the
WGAN loss with a soft constraint enforcing that the norm
of the discriminator gradients (w.r.t the inputs) be 1. The ob-
jective function of the WGAN with this updated soft constraint
(termed a gradient penalty) is shown in Eq. 2.
Lgp =Lw+λE
ˇxPi
(k∇D(ˇx)k21)2(2)
Each sample ˇxPiis generated as a convex combination
of points from Pr,Pf(i.e., sampled from the line connecting
points from Pr,Pf). λenforces the strictness of the gradient
penalty.
B. Contrastive Learning (CL)
The paradigm of contrastive learning (CL) has recently
demonstrated highly effective results across a diverse set
of disciplines and tasks, especially in computer vision [4],
[6], [24]. The goal of CL is to impose structure on latent
representations learned by a model (M). This is often achieved
using soft penalties (e.g., additional loss terms) that influence
2words ‘critic’, ‘discriminator’ are used interchangeably in the paper.
representations generated by M to be structured so repre-
sentations of related instances are closer together relative to
instances that are known to be unrelated. Most CL losses are
set in a self-supervised context where relatedness is generated
via augmentations of an instance and two distinct instances
are considered to be unrelated.
Recently, [7] proposed supervised contrastive learning
(SupCon), which is an extension of the CL paradigm to super-
vised (classification) settings. A model trained with SupCon
on a labeled dataset learns latent representations grouped by
class labels while also forcing separation in representations
between instances belonging to different classes (i.e., low
intra-class separation and high inter-class separation of latent
representations).
Consider a dataset of instances D={(x1, y1), .., (xm, ym)}
such that xiRb×land yi∈ C is the label of xiand Cis
the set of class labels. Then, the supervised contrastive loss is
defined by Eq. 3.
Lsup =X
xi∈D
1
|P os(i)|X
xkP os(i)
log exp (zi·zk)
PjQ(i)exp (zi·zj)(3)
Here, ziRh×1is the latent representation of xigenerated by
model M. Pos(i) = {xk∈ D|yk== yik6=i}is the set of
instances that form the ‘positive set’ for xi.Q(i) = {D\xi}.
τR+is a hyperparameter. We employ Eq. 3 for CL but
with labels generated in a self-supervised manner.
C. Uncertainty Quantification (UQ)
Quantifying decision uncertainty is critical to the success
of real-world machine learning (ML) frameworks. It is of
special relevance in the current setting of anomaly detection
wherein the confidence of a model in its decision addition-
ally indicates the urgency of a potential alert issued by the
model. While traditional ML models yield point predictions,
Bayesian ML provides a framework for capturing model
uncertainty. One such UQ approach [25], can be considered to
approximate Gaussian Processes with neural network models.
This approach termed Monte-Carlo Dropout entails running a
monte-carlo sampling (during inference) of a trained model by
randomly masking a set of learned weights of the model each
time (i.e., dropout [26]). This is akin to sampling from the
approximate posterior which leads to uncovering the model’s
predictive distribution. Inferring the predictive distribution
is one of the methods to quantify model uncertainty with
Bayesian neural networks.
IV. PROBLEM FORMULATION
We shall now describe the various facets of our novel
human-in-the-loop anomaly detection framework CAAD-EF .
Fig. 2 details the overall architecture of CAAD-EF .
A. Self Supervised Anomaly Detection with Negative Trans-
formations
The core of the proposed framework is the Contrastive
Adversarial Anomaly Detection (CAAD ) model. The struc-
ture of the CAAD model resembles a WGAN with gradient
摘要:

DetectingIrregularNetworkActivitywithAdversarialLearningandExpertFeedbackGopikrishnaRathinavelVirginiaTechBlacksburg,VArgopikrishna@vt.eduNikhilMuralidharStevensInstituteofTechnologyHoboken,NJnmurali1@stevens.eduTimothyO'SheaDeepSigInc&VirginiaTechArlington,VAtim@deepsig.ioNarenRamakrishnanVirginiaT...

展开>> 收起<<
Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback Gopikrishna Rathinavel.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:2.39MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注