InQMAD: Incremental Quantum Measurement Anomaly Detection
To pave the way, several methods have been developed in the last decade. Methods such as iForestASD [11], RCF [16],
xStream [24] and Ex. IF [17] present a modification of the Isolation forest batch anomaly detection method. One of the
problems of these methods is how to improve their inference complexity which is related to the depth of the trees which
typically will be
log(n)
where
n
is the number of data points. To solve this problem, HS-Tree [38], RS-Hash [36] and
MStream [5] use a hash structure to avoid traversing each tree. Other state-of-the-art methods use a modification of
the well-studied local outlier factor (LOF) [7]. Methods such as LODA [31], DILOF [28] and MemStream [5] use a
modification of the
k
-nearest neighbor algorithm, the roots of LOF, to score the outlierness of data points. However,
some of them are based on a memory that stores
m
-data points which, viewed as a moving average, may not be able to
detect high frequency points or possible outliers.
In this paper we present the novel method incremental quantum measurement anomaly detection (InQMAD). This
method uses adaptive Fourier features to map the original space to a Hilbert space. In this space, a density matrix is
used to capture the density of the dataset. A new point passes through each stage and provides a score that is used as
an anomaly score in the final stage. An important feature of the method is that it is able to build a density estimation
model that is continuously updated with the incoming data and it has the ability to give more importance to recent
samples similar to an exponential moving average. The method works in streaming, is an unsupervised algorithm and
retraining can be performed continuously. The proposed method requires constant memory to process new data points,
can process data in a single pass, and process potentially endless streaming data or even massive datasets. It can update
the model in constant time, and its complexity is O(1).
In summary, the contributions of the present work are:
•A novel streaming anomaly detection method
: the new method works in a streaming, potentially infinite
and with potentially concept drift environment.
•A systematic evaluation of the proposed method
: the algorithm is evaluated in 12 streaming datasets and
compared it against 12 state-of-the-art streaming anomaly detection methods.
•An ablation study analyzing the new method
: a systematic evaluation of each component of the method is
performed.
Reproducibility: the code used in this paper is released as open source and can be found in
https://github.com/Joaggi/Incremental-Anomaly-Detection-using-Quantum-Measurements/ and zenodo
https://doi.org/10.5281/zenodo.7183564
The outline of the paper is as follows: in Section 2, we describe anomaly detection in streaming and present the
benchmark methods with which we will compare our algorithm. In Section 3, we present the new method, explaining
all the stages of the algorithm. In Section 4, we systematically evaluate the proposed method against state-of-the-art
streaming anomaly detection algorithms. In Section 5, we state conclusions and outline future lines of research.
2 Background and Related Work
2.1 Streaming Anomaly Detection
An anomaly can be broadly defined as an observation or data that deviates significantly from some kind of normality.
Several types of anomalies can occur in real datasets, such as point anomalies, group anomalies, contextual anomalies,
low-level texture anomalies, and high-level semantic anomalies. Classical methods have been used to solve this problem,
but they suffer with high-dimensional datasets [19]. Most of the recent deep learning methods capture a lot of attention
for their good properties, such as automatic feature extraction. However, their training time and streamwise inference is
prohibitively long [6].
Stream anomaly detection can be viewed as a generalization of the typical anomaly detection problem where data grows
infinitely. Therefore, it is impractical, impossible or unnecessary to store every data point that arrives as a stream. In this
context, the method has to distinguish between normal and anomalous data points, where concept drift can occur and
the number of anomalies is scarce compared to normal data [11]. Several types of concept drift can arise in streaming
datasets, such as sudden, gradual, incremental or recurrent drift. Concept drift occurs in data streams, where usually old
data is less important than new data. This trend in data has an evolutionary pattern, where recent behavior should be of
greater importance than older patterns [31]. In order to solve this problem, the method must use a constant memory
and a nearly constant inference processing time. Therefore, it will process the data in a single pass. In the following
subsection 2.3, 12 methods for streaming anomaly detection are presented.
2