Abstract—Anomaly detection or outlier detection to recognize
unusual or rare events in large scale systems in a time sensitive
manner is critical in many industries, eg. bank fraud, glitches
in critical systems, medical alerts, malfunctioning equipment
etc. Large-scale systems often grow in size and complexity over
time, and anomaly detection algorithms need to adapt to the
changing structures. A hierarchical approach can take
advantage of the implicit relationships in complex systems and
capture anomalies based on context. Furthermore, the features
in complex systems may vary drastically in data distribution,
capturing different aspects from multiple data sources, and
when put together can provide a more complete view of the
entire system. Two main datasets are considered, the first
consisting of varied system metrics from machines running on
a cloud service, and the second of application metrics from a
complex distributed software system with inherent hierarchies
and interconnections amongst numerous system nodes.
Comparing algorithms running in a hierarchical manner,
across the Changepoint-based PELT algorithm, cognitive
learning-based Hierarchical Temporal Memory algorithms,
Support Vector Machines and Conditional Random Fields
provides a basis for proposing a Hierarchical Global-Local
Conditional Random Field approach to accurately capture
anomalies in complex systems, and across various features.
Hierarchical algorithms can learn both the intricacies of
lower-level or specific features, and utilize these in the global
abstracted representation to detect anomalous patterns
robustly across multi-source feature data and distributed
systems. A graphical network analysis on complex systems can
further fine-tune datasets to mine relationships based on
available features, which can benefit hierarchical models.
Furthermore, hierarchical solutions can adapt well to changes
at a localized level, learning on new data and changing
environments when parts of a system are over-hauled, and
translate these learnings to a global view of the system over
time.
Keywords—anomaly detection, hierarchical learning, complex
systems,, conditional random fields, enterprise systems,
hierarchical conditional random fields
I. INTRODUCTION
Anomalous behavior is inherent to large-scale, enterprise
software systems which power a variety of industries from
security and IT to energy and healthcare. Anomalies are
instances when the behavior of the system is significantly
different from the usual and may indicate a problem or
unusual activities in the system. In order to predict
anomalies, specific patterns of behavior leading up to the
anomaly must be identified and used for future prediction.
Anomaly detection on real-time streaming data from
systems enables corrective action to be taken in critical
scenarios thereby saving time, money and personhours.
With the advent of the cloud, these software systems span
multiple machines and networks within large-scale data
centers, logging large volumes of real-time performance
data. The data is often agglomerated from different
components of the system at regular intervals taking the
form of a time series dataset. The immense amount of data
poses a challenge for humans, even experts, to identify
anomalies early on.
Turning to machine learning approaches, sequence learning
models play an important role in identifying patterns in the
dataset which lead to an anomaly. In this paper, hierarchical
machine learning approaches are explored to address the
large-scale of data originating from multiple sources. An
especially interesting hierarchical model, Hierarchical
Temporal Memory, lies in the field of cognitive learning
algorithms and is compared to hierarchical approaches using
traditional machine learning and sequence learning models.
Applying hierarchical learning to the problem of large-scale
multi-source datasets from enterprise systems, a novel
hierarchical approach using a Local-Global Conditional
Random Field (CRF) model is proposed as a solution for
anomaly detection. Conditional Random Fields are robust
for sequence learning and the Local-Global method allows
the model to locally learn the idiosyncrasies of each data
source as well as globally generalize across the sources and
identify anomalies in the system.
The rest of the paper is organized as follows. Section II
describes related work in the field of anomaly detection. In
Section III, current approaches and models are discussed
and compared. Section IV focuses on the proposed
approach; detailing the Global-Local CRF model and the
motivation behind it. Section V discusses the experimental
approach with the nature of the dataset and the evaluation
metrics employed. The results of the models are evaluated
and compared with the proposed approach, augmented with
the network analysis, in Section VI. Section VII concludes
with the outcomes of the proposed approach and significant
findings from the comparative study.
II. RELATED WORK
Previous approaches to anomaly detection include both
supervised methods, such as support vector machines,
regression models, decision trees etc [1,2,3] as well as
unsupervised (eg. clustering), however these are yet to be
adapted to multi-source, real-time time series datasets.
Dimensionality-based methods such as variants of PCA
[4,5] are primarily used for high-dimensional, multivariate
data streams that can be projected onto a low dimensional
space. However, these are restrictive and have strict data
constraints, which hinders its adoption in real-world
anomaly detection scenarios. Statistical methods such as
multivariate statistics [6], Bayesian analysis [7], and
frequency and simple significance tests [8] have also been
used for anomaly detection. These methods, however,
cannot adapt well across multi-source datasets and results
get worse as the dataset becomes larger.
Time series analysis such as the ARIMA (Autoregressive
integrated moving average) method uses a combination of