INQMAD I NCREMENTAL QUANTUM MEASUREMENT ANOMALY DETECTION Joseph A. Gallego-Mejia Oscar A. Bustos-Brinez Fabio A. González

2025-05-05 0 0 431.34KB 15 页 10玖币
侵权投诉
INQMAD: INCREMENTAL QUANTUM MEASUREMENT
ANOMALY DETECTION
Joseph A. Gallego-Mejia , Oscar A. Bustos-Brinez , Fabio A. González
MindLab
Universidad Nacional de Colombia
Bogotá, Colombia
{jagallegom,oabustosb,fagonzalezo}@unal.edu.co
ABSTRACT
Streaming anomaly detection refers to the problem of detecting anomalous data samples in streams
of data. This problem poses challenges that classical and deep anomaly detection methods are not
designed to cope with, such as conceptual drift and continuous learning. State-of-the-art flow anomaly
detection methods rely on fixed memory using hash functions or nearest neighbors that may not
be able to constrain high frequency values as in a moving average or remove seamless outliers and
cannot be trained in an end-to-end deep learning architecture. We present a new incremental anomaly
detection method that performs continuous density estimation based on random Fourier features and
the mechanism of quantum measurements and density matrices that can be viewed as an exponential
moving average density. It can process potentially endless data and its update complexity is constant
O(1)
. A systematic evaluation against 12 state-of-the-art streaming anomaly detection algorithms
and using 12 streaming datasets is presented.
Keywords
incremental learning
·
anomaly detection
·
density matrix
·
random Fourier features
·
kernel density
estimation ·approximations of kernel density estimation ·quantum machine learning
1 Introduction
Anomaly detection is a well-studied problem [9, 30, 35]. The main idea is to detect data points or a group of data
points that deviate from a ‘normality’ in a specific context (note that normality is not related to the Gaussian normal
distribution) . This problem arises in several domains, such as network security [26], telecommunications [12], retail
industry [29], network traffic [40], financial transactions [2], and wired and wireless sensors [41]. In recent years,
particular interest has been given to methods that can deal with problems where data is continuously generated as a
stream rather than as a batch of data points. This behavior is natural in credit card fraud detection [32], network system
intrusion detection [26], camera surveillance [37], Internet of Things (IOT) device problems [25], among others. Data
in this streaming environment is rapidly generated, potentially infinite, has tremendous volume, and can exhibit concept
drift.
Classical anomaly detection algorithms in batch environments are based on density estimation, such as kernel density
estimation [20], classification, such as one-class support vector machines [23], and distance, such as the isolation
forest [22]. Another type of recent solutions are based on deep neural networks that have shown good properties in the
anomaly detection task. They are mainly based on variational autoencoders [3], deep belief networks [19], one-class
deep networks [8] and adversarial autoencoders [10]. These methods assume that all training data points are available
in the training phase. However, in a streaming context the data arrives continuously. Some of these methods have poor
adaptability and extensibility, or inability to detect new anomalies continuously, where they have high model update
cost and/or slow update speed.
Citation:Joseph et al., InQMAD: Incremental Quantum Measurement Anomaly Detection.
arXiv:2210.05061v1 [cs.LG] 11 Oct 2022
InQMAD: Incremental Quantum Measurement Anomaly Detection
To pave the way, several methods have been developed in the last decade. Methods such as iForestASD [11], RCF [16],
xStream [24] and Ex. IF [17] present a modification of the Isolation forest batch anomaly detection method. One of the
problems of these methods is how to improve their inference complexity which is related to the depth of the trees which
typically will be
log(n)
where
n
is the number of data points. To solve this problem, HS-Tree [38], RS-Hash [36] and
MStream [5] use a hash structure to avoid traversing each tree. Other state-of-the-art methods use a modification of
the well-studied local outlier factor (LOF) [7]. Methods such as LODA [31], DILOF [28] and MemStream [5] use a
modification of the
k
-nearest neighbor algorithm, the roots of LOF, to score the outlierness of data points. However,
some of them are based on a memory that stores
m
-data points which, viewed as a moving average, may not be able to
detect high frequency points or possible outliers.
In this paper we present the novel method incremental quantum measurement anomaly detection (InQMAD). This
method uses adaptive Fourier features to map the original space to a Hilbert space. In this space, a density matrix is
used to capture the density of the dataset. A new point passes through each stage and provides a score that is used as
an anomaly score in the final stage. An important feature of the method is that it is able to build a density estimation
model that is continuously updated with the incoming data and it has the ability to give more importance to recent
samples similar to an exponential moving average. The method works in streaming, is an unsupervised algorithm and
retraining can be performed continuously. The proposed method requires constant memory to process new data points,
can process data in a single pass, and process potentially endless streaming data or even massive datasets. It can update
the model in constant time, and its complexity is O(1).
In summary, the contributions of the present work are:
A novel streaming anomaly detection method
: the new method works in a streaming, potentially infinite
and with potentially concept drift environment.
A systematic evaluation of the proposed method
: the algorithm is evaluated in 12 streaming datasets and
compared it against 12 state-of-the-art streaming anomaly detection methods.
An ablation study analyzing the new method
: a systematic evaluation of each component of the method is
performed.
Reproducibility: the code used in this paper is released as open source and can be found in
https://github.com/Joaggi/Incremental-Anomaly-Detection-using-Quantum-Measurements/ and zenodo
https://doi.org/10.5281/zenodo.7183564
The outline of the paper is as follows: in Section 2, we describe anomaly detection in streaming and present the
benchmark methods with which we will compare our algorithm. In Section 3, we present the new method, explaining
all the stages of the algorithm. In Section 4, we systematically evaluate the proposed method against state-of-the-art
streaming anomaly detection algorithms. In Section 5, we state conclusions and outline future lines of research.
2 Background and Related Work
2.1 Streaming Anomaly Detection
An anomaly can be broadly defined as an observation or data that deviates significantly from some kind of normality.
Several types of anomalies can occur in real datasets, such as point anomalies, group anomalies, contextual anomalies,
low-level texture anomalies, and high-level semantic anomalies. Classical methods have been used to solve this problem,
but they suffer with high-dimensional datasets [19]. Most of the recent deep learning methods capture a lot of attention
for their good properties, such as automatic feature extraction. However, their training time and streamwise inference is
prohibitively long [6].
Stream anomaly detection can be viewed as a generalization of the typical anomaly detection problem where data grows
infinitely. Therefore, it is impractical, impossible or unnecessary to store every data point that arrives as a stream. In this
context, the method has to distinguish between normal and anomalous data points, where concept drift can occur and
the number of anomalies is scarce compared to normal data [11]. Several types of concept drift can arise in streaming
datasets, such as sudden, gradual, incremental or recurrent drift. Concept drift occurs in data streams, where usually old
data is less important than new data. This trend in data has an evolutionary pattern, where recent behavior should be of
greater importance than older patterns [31]. In order to solve this problem, the method must use a constant memory
and a nearly constant inference processing time. Therefore, it will process the data in a single pass. In the following
subsection 2.3, 12 methods for streaming anomaly detection are presented.
2
InQMAD: Incremental Quantum Measurement Anomaly Detection
2.2 Streaming Anomaly Detection Problem
In streaming anomaly detection, the data points
X={x1,··· ,xt}
arrive as a
d
-feature-dimension sequence. This
sequence can be the sequence of transactions for a given credit card or the temperature recorded by an IOT sensor.
The challenge in this configuration in that the concept of "normality" evolves over time, meaning that a point that was
considered normal behavior can drift and become anomalous behavior. To solve this problem, there is a need to develop
algorithms that can learn on-line with high speed and low memory consumption.
2.3 Streaming Anomaly Detection Baseline Methods
STORM [4]: A Stream Manager and a Query Manager have been proposed. The former is an indexed stream
buffer whose job is to store the number of successive neighbors and the identifiers of the most recent preceding
neighbors. The latter is a procedure that efficiently answers queries about whether a certain data point is an
inlier or an outlier.
HS-Tree [38]: This method constructs complete binary trees where each tree has at most
2h+1 1
node. Each
subtree is constructed by randomly selecting a feature and breaking it in half. A random perturbation of each
subtree is performed to create diverse subtrees. Each point has to traverse each binary tree to capture the mass
profile. Finally, a scoring function is computed using the mass function of a query data point passing through
each data node.
iForestASD [11]: This algorithm has its roots in the Isolation Forest method. The method uses a window of
the streaming data and is sent to the Isolation Forest. An abnormality score is calculated using the average of
the depth of the point on each tree in the forest. If the point is normal, it is joined to the Isolation Forest.
RS-Hash [36]: This method uses the Isolation Forest method at its roots. A tree is constructed as a sequence of
randomly selected features. Anomalous points are those that are easily separated from the normal data points.
With the scoring function given by the Isolation Forest, a score of anomalous sliding windows is computed to
detect concept drift.
RCF [16]: The algorithm constructs a robust randomized cutting tree (rrct) using a random selection of the
dimension weighted by its range. A uniform distribution is used to cut the selected dimension. This process is
repeated
n
times, with
n
being the maximum depth. A forest is constructed using several rrct. A new point is
classified as an anomaly using the comparison of its insertion and deletion complexity.
LODA [31]: The algorithm uses a set of histograms for each dimension. Each histogram is mapped to a
projection space using
w
parameters that capture the importance of the feature. The log likelihood in the
projection space is then calculated. An anomalous data point is expected to have a lower value of the log
likelihood.
Kitsune [26]: Kitsune is an algorithm that uses an ensemble of autoencoders to provide an anomaly score.
Features are sent to
l
autoencoders of 3 layers each. The reconstruction error calculated as root mean square
error (RMSE) is sent to a final 3-layer autoencoder. Finally, the RMSE of the reconstruction is calculated and
used as the anomaly score.
DILOF [28]: The method uses the
k
nearest neighbor information as in the Local Outlier Factor (LOF) but
improves it in the case of streaming data. The algorithm has two phases: a detection phase and a summary
phase. In the first phase it decides whether a point is anomalous or not. In the second, it uses an approximation
algorithm with a sampling strategy to update the memory of the knearest neighbors.
xStream [24]: The method uses a hash structure to reduce the dimensionality of the data points, which allows
the evolution of features in the stream. After reduction, the method partitions the space into so-called half-space
chains. These partitions capture the density estimate at a different granularity. The outlier score is calculated
based on the density estimate score. For streaming, a previous window is used to score the point outlier.
MStream [5]: This algorithm uses two locality-sensitive hash functions: feature hashing and record hashing.
The former computes a hash for each feature in the data point. The latter computes a hash for all features
simultaneously. The anomaly score is calculated using a chi-square density function. The algorithm is
combined with an autoencoder to reduce the dimensionality of the data.
Ex. IF [17]: This algorithm uses a random slope to cut the hyperplane and a random intercept. This differs
from the isolation forest because the latter chooses random features and generates a random cut on that feature.
The method shows better results when compared to the isolation forest for anomaly detection.
MemStream [6]: The method proposes a nearest neighbor memory based algorithm. Each data point passes
through a shallow autoencoder to reduce the dimensionality of the original space. A memory is built using
3
摘要:

INQMAD:INCREMENTALQUANTUMMEASUREMENTANOMALYDETECTIONJosephA.Gallego-Mejia,OscarA.Bustos-Brinez,FabioA.GonzálezMindLabUniversidadNacionaldeColombiaBogotá,Colombia{jagallegom,oabustosb,fagonzalezo}@unal.edu.coABSTRACTStreaminganomalydetectionreferstotheproblemofdetectinganomalousdatasamplesinstreamso...

展开>> 收起<<
INQMAD I NCREMENTAL QUANTUM MEASUREMENT ANOMALY DETECTION Joseph A. Gallego-Mejia Oscar A. Bustos-Brinez Fabio A. González.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:431.34KB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注