AD-DMKDE: Anomaly Detection through Density Matrices and Fourier Features
process called “Adaptive Fourier Features"), and is able to calculate the best threshold using percentiles from a validation
data set.
The outline of the paper is as follows: in Section 2, we describe anomaly detection and density estimation in more
depth, and present the benchmark methods to which we will compare our algorithm. In Section 3, we present the novel
method, explaining the stages of the algorithm and how it uses Fourier features and density matrices. In Section 4, we
systematically compare the proposed algorithm with state-of-the-art anomaly detection algorithms. In Section 5, we
state the conclusions of this work and sketch future research directions.
2 Background and Related Work
2.1 Anomaly Detection
The main mechanism of anomaly detection algorithms is the construction of a model that determines a degree of
“normality" for the data points, and then detects anomalies as points that deviate from this model. The importance
of anomaly detection lies in the fact that recognizing and understanding anomalous behavior allows one to make
decisions and predictions. For example, anomalous traffic patterns on a network could mean that sensitive data is
being transmitted across it, or that there is an intruder attempting to attack a system [3]. Anomalous user behavior
in payment transactions could signify credit card fraud and give some clues as to how to avoid it in the future [23].
Anomaly detection should be distinguished from outlier detection, the purpose of which is data cleaning: to locate
outliers (mainly normal data affected by noise) and subsequently remove them from the data set.
Let
r
be the ratio of anomaly samples with respect to all samples. When
r
is high, the most common approach to
anomaly detection is to use a supervised classification method. However, when
r
is low (which is usually the case), the
best approach is to use semi-supervised or unsupervised AD algorithms. According to [16], classical AD algorithms can
be classified into six main types: classification-based approaches (such as the well-known one-class SVM), probabilistic
models (such as histograms and density estimation methods like KDE [8]), clustering models, information-based
methods, spectral analysis (where dimensionality reduction methods such as PCA are found), and distance-based
methods (including isolation forest and nearest neighbors).
In recent years, these classical models have been progressively replaced by deep learning-based algorithms, showing
high performance in anomaly detection tasks [25]. These algorithms can efficiently model the inherent features of
complex data through their distributed and hierarchical architectures, i.e., they can perform implicit feature engineering,
especially when a large amount of data is available [20]. For this reason, they are commonly used for anomaly detection
in scenarios involving very large data sets with high-dimensional instances, such as speech recognition, image/video
analysis, or natural language processing [14].
2.2 Anomaly Detection Baseline Methods
In this paper, we select 11 state-of-the-art anomaly detection methods that represent a complete sample of the most
common types of anomaly detection methods. All algorithms consider the proportion of anomalies in the data as
a necessary parameter for finding threshold values; other parameters for specific algorithms are mentioned in later
sections.
We selected five well-known methods that are based on classic mathematical approaches to anomaly detection. These
methods include One Class SVM [24], a kernel-based algorithm that encloses normal data in a boundary that leaves
anomalies outside of it; Covariance Estimator, that finds the smallest ellipsoid that wraps normal data; Isolation
Forest [12], that tries to separate points using decision trees, and those who are easier to isolate are the outliers; Local
Outlier Factor (LOF) [4], based on a distance measure from each point to a set of its neighbors; and K-nearest neighbor
(KNN) [18], that makes an scoring based on the distance from the k-th nearest neighbor.
We also decided to include methods developed in the last decade that do not use neural networks into their architectures,
but instead take other approaches. These type of methods include SOS [7], that builds an affinity matrix between all
points that acts as a similarity measure; COPOD [10], that builds univariant probability functions fro each dimension
and then joins them in a unified multivariant function that models the true distribution of the data; and LODA [15], that
combines classifiers in low dimensions and uses histograms to detect anomalies.
Finally, we complete the overview of baseline methods with some of the proposals that are based on the use of neural
networks as their central elements. These three models include VAE-Bayes [9], built around a variational autoencoder
that builds a latent space from data, where the probability distribution of the data can be retrieved by using Bayesian
assumptions; DSVDD [22], in where a neural network is used to transform data into a latent space where normal
data can be encompasssed into a hypersphere whose features are neurally optimized; and LAKE [13], that includes a
2