INQMAD I NCREMENTAL QUANTUM MEASUREMENT ANOMALY DETECTION Joseph A. Gallego-Mejia Oscar A. Bustos-Brinez Fabio A. González

2025-05-05 0 0 431.34KB 15 页 10玖币

侵权投诉

INQMAD: INCREMENTAL QUANTUM MEASUREMENT

ANOMALY DETECTION ∗

Joseph A. Gallego-Mejia , Oscar A. Bustos-Brinez , Fabio A. González

MindLab

Universidad Nacional de Colombia

Bogotá, Colombia

{jagallegom,oabustosb,fagonzalezo}@unal.edu.co

ABSTRACT

Streaming anomaly detection refers to the problem of detecting anomalous data samples in streams

of data. This problem poses challenges that classical and deep anomaly detection methods are not

designed to cope with, such as conceptual drift and continuous learning. State-of-the-art ﬂow anomaly

detection methods rely on ﬁxed memory using hash functions or nearest neighbors that may not

be able to constrain high frequency values as in a moving average or remove seamless outliers and

cannot be trained in an end-to-end deep learning architecture. We present a new incremental anomaly

detection method that performs continuous density estimation based on random Fourier features and

the mechanism of quantum measurements and density matrices that can be viewed as an exponential

moving average density. It can process potentially endless data and its update complexity is constant

O(1)

. A systematic evaluation against 12 state-of-the-art streaming anomaly detection algorithms

and using 12 streaming datasets is presented.

Keywords

incremental learning

anomaly detection

density matrix

random Fourier features

kernel density

estimation ·approximations of kernel density estimation ·quantum machine learning

1 Introduction

Anomaly detection is a well-studied problem [9, 30, 35]. The main idea is to detect data points or a group of data

points that deviate from a ‘normality’ in a speciﬁc context (note that normality is not related to the Gaussian normal

distribution) . This problem arises in several domains, such as network security [26], telecommunications [12], retail

industry [29], network trafﬁc [40], ﬁnancial transactions [2], and wired and wireless sensors [41]. In recent years,

particular interest has been given to methods that can deal with problems where data is continuously generated as a

stream rather than as a batch of data points. This behavior is natural in credit card fraud detection [32], network system

intrusion detection [26], camera surveillance [37], Internet of Things (IOT) device problems [25], among others. Data

in this streaming environment is rapidly generated, potentially inﬁnite, has tremendous volume, and can exhibit concept

drift.

Classical anomaly detection algorithms in batch environments are based on density estimation, such as kernel density

estimation [20], classiﬁcation, such as one-class support vector machines [23], and distance, such as the isolation

forest [22]. Another type of recent solutions are based on deep neural networks that have shown good properties in the

anomaly detection task. They are mainly based on variational autoencoders [3], deep belief networks [19], one-class

deep networks [8] and adversarial autoencoders [10]. These methods assume that all training data points are available

in the training phase. However, in a streaming context the data arrives continuously. Some of these methods have poor

adaptability and extensibility, or inability to detect new anomalies continuously, where they have high model update

cost and/or slow update speed.

∗Citation:Joseph et al., InQMAD: Incremental Quantum Measurement Anomaly Detection.

arXiv:2210.05061v1 [cs.LG] 11 Oct 2022

InQMAD: Incremental Quantum Measurement Anomaly Detection

To pave the way, several methods have been developed in the last decade. Methods such as iForestASD [11], RCF [16],

xStream [24] and Ex. IF [17] present a modiﬁcation of the Isolation forest batch anomaly detection method. One of the

problems of these methods is how to improve their inference complexity which is related to the depth of the trees which

typically will be

log(n)

where

is the number of data points. To solve this problem, HS-Tree [38], RS-Hash [36] and

MStream [5] use a hash structure to avoid traversing each tree. Other state-of-the-art methods use a modiﬁcation of

the well-studied local outlier factor (LOF) [7]. Methods such as LODA [31], DILOF [28] and MemStream [5] use a

modiﬁcation of the

-nearest neighbor algorithm, the roots of LOF, to score the outlierness of data points. However,

some of them are based on a memory that stores

-data points which, viewed as a moving average, may not be able to

detect high frequency points or possible outliers.

In this paper we present the novel method incremental quantum measurement anomaly detection (InQMAD). This

method uses adaptive Fourier features to map the original space to a Hilbert space. In this space, a density matrix is

used to capture the density of the dataset. A new point passes through each stage and provides a score that is used as

an anomaly score in the ﬁnal stage. An important feature of the method is that it is able to build a density estimation

model that is continuously updated with the incoming data and it has the ability to give more importance to recent

samples similar to an exponential moving average. The method works in streaming, is an unsupervised algorithm and

retraining can be performed continuously. The proposed method requires constant memory to process new data points,

can process data in a single pass, and process potentially endless streaming data or even massive datasets. It can update

the model in constant time, and its complexity is O(1).

In summary, the contributions of the present work are:

•A novel streaming anomaly detection method

: the new method works in a streaming, potentially inﬁnite

and with potentially concept drift environment.

•A systematic evaluation of the proposed method

: the algorithm is evaluated in 12 streaming datasets and

compared it against 12 state-of-the-art streaming anomaly detection methods.

•An ablation study analyzing the new method

: a systematic evaluation of each component of the method is

performed.

Reproducibility: the code used in this paper is released as open source and can be found in

https://github.com/Joaggi/Incremental-Anomaly-Detection-using-Quantum-Measurements/ and zenodo

https://doi.org/10.5281/zenodo.7183564

The outline of the paper is as follows: in Section 2, we describe anomaly detection in streaming and present the

benchmark methods with which we will compare our algorithm. In Section 3, we present the new method, explaining

all the stages of the algorithm. In Section 4, we systematically evaluate the proposed method against state-of-the-art

streaming anomaly detection algorithms. In Section 5, we state conclusions and outline future lines of research.

2 Background and Related Work

2.1 Streaming Anomaly Detection

An anomaly can be broadly deﬁned as an observation or data that deviates signiﬁcantly from some kind of normality.

Several types of anomalies can occur in real datasets, such as point anomalies, group anomalies, contextual anomalies,

low-level texture anomalies, and high-level semantic anomalies. Classical methods have been used to solve this problem,

but they suffer with high-dimensional datasets [19]. Most of the recent deep learning methods capture a lot of attention

for their good properties, such as automatic feature extraction. However, their training time and streamwise inference is

prohibitively long [6].

Stream anomaly detection can be viewed as a generalization of the typical anomaly detection problem where data grows

inﬁnitely. Therefore, it is impractical, impossible or unnecessary to store every data point that arrives as a stream. In this

context, the method has to distinguish between normal and anomalous data points, where concept drift can occur and

the number of anomalies is scarce compared to normal data [11]. Several types of concept drift can arise in streaming

datasets, such as sudden, gradual, incremental or recurrent drift. Concept drift occurs in data streams, where usually old

data is less important than new data. This trend in data has an evolutionary pattern, where recent behavior should be of

greater importance than older patterns [31]. In order to solve this problem, the method must use a constant memory

and a nearly constant inference processing time. Therefore, it will process the data in a single pass. In the following

subsection 2.3, 12 methods for streaming anomaly detection are presented.

InQMAD: Incremental Quantum Measurement Anomaly Detection

2.2 Streaming Anomaly Detection Problem

In streaming anomaly detection, the data points

X={x1,··· ,xt}

arrive as a

-feature-dimension sequence. This

sequence can be the sequence of transactions for a given credit card or the temperature recorded by an IOT sensor.

The challenge in this conﬁguration in that the concept of "normality" evolves over time, meaning that a point that was

considered normal behavior can drift and become anomalous behavior. To solve this problem, there is a need to develop

algorithms that can learn on-line with high speed and low memory consumption.

2.3 Streaming Anomaly Detection Baseline Methods

•

STORM [4]: A Stream Manager and a Query Manager have been proposed. The former is an indexed stream

buffer whose job is to store the number of successive neighbors and the identiﬁers of the most recent preceding

neighbors. The latter is a procedure that efﬁciently answers queries about whether a certain data point is an

inlier or an outlier.

•

HS-Tree [38]: This method constructs complete binary trees where each tree has at most

2h+1 −1

node. Each

subtree is constructed by randomly selecting a feature and breaking it in half. A random perturbation of each

subtree is performed to create diverse subtrees. Each point has to traverse each binary tree to capture the mass

proﬁle. Finally, a scoring function is computed using the mass function of a query data point passing through

each data node.

•

iForestASD [11]: This algorithm has its roots in the Isolation Forest method. The method uses a window of

the streaming data and is sent to the Isolation Forest. An abnormality score is calculated using the average of

the depth of the point on each tree in the forest. If the point is normal, it is joined to the Isolation Forest.

•

RS-Hash [36]: This method uses the Isolation Forest method at its roots. A tree is constructed as a sequence of

randomly selected features. Anomalous points are those that are easily separated from the normal data points.

With the scoring function given by the Isolation Forest, a score of anomalous sliding windows is computed to

detect concept drift.

•

RCF [16]: The algorithm constructs a robust randomized cutting tree (rrct) using a random selection of the

dimension weighted by its range. A uniform distribution is used to cut the selected dimension. This process is

repeated

times, with

being the maximum depth. A forest is constructed using several rrct. A new point is

classiﬁed as an anomaly using the comparison of its insertion and deletion complexity.

•

LODA [31]: The algorithm uses a set of histograms for each dimension. Each histogram is mapped to a

projection space using

parameters that capture the importance of the feature. The log likelihood in the

projection space is then calculated. An anomalous data point is expected to have a lower value of the log

likelihood.

•

Kitsune [26]: Kitsune is an algorithm that uses an ensemble of autoencoders to provide an anomaly score.

Features are sent to

autoencoders of 3 layers each. The reconstruction error calculated as root mean square

error (RMSE) is sent to a ﬁnal 3-layer autoencoder. Finally, the RMSE of the reconstruction is calculated and

used as the anomaly score.

•

DILOF [28]: The method uses the

nearest neighbor information as in the Local Outlier Factor (LOF) but

improves it in the case of streaming data. The algorithm has two phases: a detection phase and a summary

phase. In the ﬁrst phase it decides whether a point is anomalous or not. In the second, it uses an approximation

algorithm with a sampling strategy to update the memory of the knearest neighbors.

•

xStream [24]: The method uses a hash structure to reduce the dimensionality of the data points, which allows

the evolution of features in the stream. After reduction, the method partitions the space into so-called half-space

chains. These partitions capture the density estimate at a different granularity. The outlier score is calculated

based on the density estimate score. For streaming, a previous window is used to score the point outlier.

•

MStream [5]: This algorithm uses two locality-sensitive hash functions: feature hashing and record hashing.

The former computes a hash for each feature in the data point. The latter computes a hash for all features

simultaneously. The anomaly score is calculated using a chi-square density function. The algorithm is

combined with an autoencoder to reduce the dimensionality of the data.

•

Ex. IF [17]: This algorithm uses a random slope to cut the hyperplane and a random intercept. This differs

from the isolation forest because the latter chooses random features and generates a random cut on that feature.

The method shows better results when compared to the isolation forest for anomaly detection.

•

MemStream [6]: The method proposes a nearest neighbor memory based algorithm. Each data point passes

through a shallow autoencoder to reduce the dimensionality of the original space. A memory is built using

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

INQMAD:INCREMENTALQUANTUMMEASUREMENTANOMALYDETECTIONJosephA.Gallego-Mejia,OscarA.Bustos-Brinez,FabioA.GonzálezMindLabUniversidadNacionaldeColombiaBogotá,Colombia{jagallegom,oabustosb,fagonzalezo}@unal.edu.coABSTRACTStreaminganomalydetectionreferstotheproblemofdetectinganomalousdatasamplesinstreamso...

展开>> 收起<<

INQMAD I NCREMENTAL QUANTUM MEASUREMENT ANOMALY DETECTION Joseph A. Gallego-Mejia Oscar A. Bustos-Brinez Fabio A. González.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

INQMAD I NCREMENTAL QUANTUM MEASUREMENT ANOMALY DETECTION Joseph A. Gallego-Mejia Oscar A. Bustos-Brinez Fabio A. González

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: