Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin Andrey Y. Ageev and Andrei V. Konstantinov

2025-05-08 0 0 1.57MB 27 页 10玖币
侵权投诉
Improved Anomaly Detection by Using the Attention-Based
Isolation Forest
Lev V. Utkin, Andrey Y. Ageev and Andrei V. Konstantinov
Peter the Great St.Petersburg Polytechnic University
St.Petersburg, Russia
e-mail: lev.utkin@gmail.com, andreyageev1@mail.ru, andrue.konst@gmail.com
Abstract
A new modification of Isolation Forest called Attention-Based Isolation Forest (AB-
IForest) for solving the anomaly detection problem is proposed. It incorporates the
attention mechanism in the form of the Nadaraya-Watson regression into the Isolation
Forest for improving solution of the anomaly detection problem. The main idea under-
lying the modification is to assign attention weights to each path of trees with learnable
parameters depending on instances and trees themselves. The Huber’s contamination
model is proposed to be used for defining the attention weights and their parameters.
As a result, the attention weights are linearly depend on the learnable attention parame-
ters which are trained by solving the standard linear or quadratic optimization problem.
ABIForest can be viewed as the first modification of Isolation Forest, which incorporates
the attention mechanism in a simple way without applying gradient-based algorithms.
Numerical experiments with synthetic and real datasets illustrate outperforming results
of ABIForest. The code of proposed algorithms is available.
Keywords: anomaly detection, attention mechanism, Isolation Forest, Nadaraya-Watson
regression, quadratic programming, contamination model
1 Introduction
One of the important machine learning problems is the novelty or anomaly detection problem
which aims to detect abnormal or anomalous instances. This problem can be regarded as a
challenging task because there is no a strong definition of anomalous instance and the anomaly
itself depends on a certain application. Another difficulty which defines the challenge of the
problem is that anomalies usually seldom appear and this fact leads to highly imbalanced
training sets. Moreover, it is difficult to define a boundary between the normal and anomalous
observations [1]. Due to importance of the anomaly detection problem in many applications,
a huge amount of papers covering anomaly detection tasks and studying various aspects of
the anomaly detection have been published in the last decades. Many approaches to solving
the anomaly detection problem are analyzed in comprehensive survey papers [1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11].
1
arXiv:2210.02558v1 [cs.LG] 5 Oct 2022
According to [1, 12], anomalies also referred to as abnormalities, deviants, or outliers can
be viewed as data points which are located further away from the bulk of data points that
are referred to as normal data.
Various approaches to solving the anomaly detection problem can be divided into sev-
eral groups [10]. The first group consists of the probabilistic and density estimation models.
It includes the classic density estimation models, energy-based models, neural generative
models [10]. The second large group deals with the one-class classification models. This
group includes the well-known one-class classification SVMs [13, 14, 15]. The third group
includes reconstruction-based models which detect anomalies by reconstructing the data in-
stances. The well-known models from this group are autoencoders which incorrectly recon-
struct anomalous instances such that the distance between the instance and its reconstruction
is larger than a predefined threshold which is usually regarded as a hyperparameter of the
model.
The next group contains distance-based anomaly detection models. One of the most
popular and effective models from the group is the Isolation Forest (iForest) [16, 17] which is
a model for detecting anomalous points relative to a certain data distribution. According to
iForest, anomalies are detected using isolation which measures how far an instance is from
the rest of instances. iForest can be regarded as a tool implementing the isolation. It has
the linear time complexity and works well with large amounts of data. The core idea behind
iForest is the tendency for anomalous instances in a dataset to be more easily separated from
the rest of the sample (isolated) compared to normal instances. To isolate a data point, the
algorithm recursively creates sample partitions by randomly choosing an attribute and then
randomly choosing a split value for the attribute between the minimum and maximum values
allowed for that attribute. The recursive partition can be represented by a tree structure
called an isolation tree, while the number of partitions needed to isolate a point can be
interpreted as the length of the path within the tree to the end node, starting from the root.
Anomalous instances are those with a shorter path length in the tree [16, 17].
In order to improve iForest, we propose to modify it by using the attention mechanism
which can automatically distinguish the relative importance of instances and weigh them for
improving the overall accuracy of iForest. The attention mechanism has been successfully
applied to many applications, including the natural language processing models, the computer
vision area, etc. Comprehensive surveys of properties and forms of the attention mechanism
and transformers can be found in [18, 19, 20, 21, 22].
The idea to apply the attention mechanism to iForest stems from the attention-based
random forest (ABRF) models proposed in [23] where attention is implemented in the form
of the Nadaraya-Watson regression [24, 25] by assigning attention weights to leaves of trees in
a specific way such that the weights depend on trees and instances. The attention learnable
parameters in ABRF are trained by solving the standard quadratic optimization problem
with linear constraints. It turns out that this idea to consider the random forest as the
Nadaraya-Watson regression [24, 25] can be extended to iForest taking into account the
iForest peculiarities which differ it from the random forest. According to the original iForest,
the isolation measure is estimated as the mean value of the path lengths over all trees in the
2
forest. However, we can replace the averaging of the path lengths with the Nadaraya-Watson
regression where the path length of an instance in each tree can be regarded as a prediction
in the regression (the value in terms of the attention mechanism [27]), and weights (the
attention weights) depend on the corresponding tree and the instance (the query in terms of
the attention mechanism [27]). In other words, the final prediction of the expected path length
in accordance with the Nadaraya-Watson regression is a weighted sum of path lengths over all
trees. Weights of path lengths have learnable parameters (the learnable attention parameters)
which can be computed by minimizing a loss function of a specific form. We aim to reduce
the optimization problem to the quadratic programming problem or linear programming
problem which has many algorithms for solving. In order to achieve this aim, the Huber’s
-contamination model [26] is proposed to be used for computing the learnable attention
parameters. The contamination model allows us to represent attention weights in the form of
a linear combination of the softmax operation and learnable parameters with contamination
parameter , which can be viewed as probabilities. As a result, the loss function for computing
learnable parameters is linear with linear constraints on the parameters as probabilities. After
adding the L2regularization term, the optimization problem for computing attention weights
becomes to be quadratic one.
Our contributions can be summarized as follows:
1. A new modification of iForest called Attention-Based Isolation Forest (ABIForest) in-
corporating the attention mechanism in the form of the Nadaraya-Watson regression
for improving solution of the anomaly detection problem is proposed.
2. The algorithm of computing attention weights is reduced to solving the linear or
quadratic programming problems due to applying the Huber’s -contamination model.
Moreover, we propose to use the hinge-loss function to simplify the optimization prob-
lem. Contamination parameter is regarded as a tuning hyperparameter.
3. Numerical experiments with synthetic and real datasets are performed for studying
ABIForest. They demonstrate outperforming results for most datasets. The code
of proposed algorithms can be found in https://github.com/AndreyAgeev/Attention-
based-isolation-forest.
The paper is organized as follows. Related work can be found in Section 2. Brief in-
troductions to the attention mechanism, the Nadaraya-Watson regression and iForest are
given in Section 3. The proposed ABIForest model is considered in Section 4. Numerical
experiments with synthetic and real datasets illustrating peculiarities of ABIForest and its
comparison with iForest are provided in Section 5. Concluding remarks discussing advantages
and disadvantages of ABIForest can be found in Section 6.
2 Related work
Attention mechanism. The attention mechanism can be viewed as an effective method
for improving the performance of a large variety of machine learning models. Therefore,
3
there are many different types of attention mechanisms depending on their applications and
models where attention mechanisms are incorporated. The term “attention” was introduced
by Bahdanau et al. [27]. Following this paper, a huge amount of models based on the
attention mechanism can be found in the literature. There are also several types of attention
mechanisms [28], including soft and hard attention mechanisms [29], the local and global
attention [30], self-attention [31], multi-head attention [31], hierarchical attention [32]. It
is difficult to consider all papers devoted to the attention mechanisms and its applications.
Comprehensive surveys [18, 19, 20, 21, 22, 33] cover a large part of available models and
modifications of the attention mechanisms.
Most attention models are implemented as parts of neural networks. In order to extend
a set of attention models, several random forest models incorporated with the attention
mechanism were proposed in [23, 34, 35]. The gradient boosting machine added by the
attention mechanism was presented in [36].
Anomaly detection with attention. A wide set of machine learning tasks include
anomaly detection problems. Therefore, many methods and models have been developed
to address them [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. One of the tools for solving the anomaly
detection problems is the attention mechanism. Monotonic attention based autoencoders
was proposed in [37] as an unsupervised learning technique to detect the false data injec-
tion attacks. Anomaly detection method based on the Siamese network with an attention
mechanism for dealing with small datasets was proposed in [38]. The so-called residual
attention network that employs the attention mechanism and residual learning to improve
classification efficiency and accuracy was presented in [39]. The graph anomaly detection
algorithm based on the attention-based deep learning to assist the audit process was pro-
vided in [40]. Madan et al. [41] presented a novel self-supervised masked convolutional
transformer block that comprises the reconstruction-based functionality. Integration of the
reconstruction-based functionality into a novel self-supervised predictive architectural build-
ing block was considered in [42]. Huang et al. [43] improved the efficiency and effectiveness
of anomaly detection and localization at inference by using a progressive mask refinement
approach that progressively uncovers the normal regions and finally locates the anomalous
regions. A novel self-supervised framework for multivariate time-series anomaly detection via
a graph attention network was proposed in [44]. It can be seen from the above works that
the idea to apply attention in models solving the anomaly detection problem was successfully
implemented. However, the attention was used in the form of components of neural networks.
There are no forest-based anomaly detection models which use the attention mechanism.
iForest. iForest [16, 17] can be viewed as one of the important and effective methods
for solving novelty and anomaly detection problems. Therefore, many modifications of the
method have been developed [5] to improve it. A weighted iForest and Siamese Gated Re-
current Unit algorithm architecture which provides a more accurate and efficient method
for outlier detection of data is considered in [45]. Hariri et al. [46] proposed an extension
of the iForest, named Extended Isolation Forest, which resolves issues with assignment of
anomaly score to given data points. A theoretical framework that describes the effectiveness
of isolation-based approaches from a distributional viewpoint was studied in [47]. Lesouple
4
et al. [48] presented a generalized isolation forest algorithm which generates trees without
any empty branch, which significantly improves the execution times. The k-Means-Based
iForest was developed by Karczmarek et al. [49]. This modification of iForest allows to build
a search tree based on many branches in contrast to the only two considered in the original
method. Another modification, called the Fuzzy Set-Based Isolation Forest was proposed
in [50]. A probabilistic generalization of iForest was proposed in [51], which is based on
nonlinear dependence of a segment-cumulated probability from the length of segment. A ro-
bust anomaly detection method called the similarity-measured isolation forest was developed
by Li et al. [52] to detect abnormal segments in monitoring data. A novel hyperspectral
anomaly detection method with kernel Isolation Forest was proposed in [53]. The method
is based on an assumption that anomalies rather than background can be more susceptible
to isolation in the kernel space. An improved computational framework which allows us to
seek the most separable attributes and spot corresponding optimized split points effectively
was presented in [54]. Staerman et al. [55] introduced the so-called Functional Isolation
Forest which generalizes iForest to the infinite dimensional context, i.e., the model deals with
functional random variables that take their values in a space of functions. Xu et al. [56]
proposed the Deep Isolation Forest which is based on an isolation method with arbitrary
(linear/non-linear) partition of data implemented by using neural networks.
The above works is only a part of many extensions and modifications of iForest developed
due to excellent properties of the method. However, to the best of our knowledge, there are
no works considering approaches to incorporating the attention mechanism into iForest.
3 Preliminaries
3.1 Attention mechanism as the Nadaraya-Watson regression
If to consider the attention mechanism as a method for enhancing accuracy of iForest for
the anomaly detection problem solution, then it allows us to automatically distinguish the
relative importance of features, instances and isolation trees. According to [18, 57], the
original idea of attention can be understood from the statistical point of view applying the
Nadaraya-Watson kernel regression model [24, 25].
Given ninstances D={(x1, y1), ..., (xn, yn)}, in which xi= (xi1, ..., xid)Rdis a
feature vector involving mfeatures and yiRrepresents the regression outputs, the task
of regression is to construct a regressor f:RmRwhich can predict the output value ˜y
of a new observation x, using available data S. The similar task can be formulated for the
classification problem.
The original idea behind the attention mechanism is to replace the simple average of
outputs ˜y=n1Pn
i=1 yifor estimating the regression output y, corresponding to a new input
feature vector xwith the weighted average, in the form of the Nadaraya-Watson regression
model [24, 25]:
˜y=
n
X
i=1
α(x,xi)yi,(1)
5
摘要:

ImprovedAnomalyDetectionbyUsingtheAttention-BasedIsolationForestLevV.Utkin,AndreyY.AgeevandAndreiV.KonstantinovPetertheGreatSt.PetersburgPolytechnicUniversitySt.Petersburg,Russiae-mail:lev.utkin@gmail.com,andreyageev1@mail.ru,andrue.konst@gmail.comAbstractAnewmodi cationofIsolationForestcalledAttent...

展开>> 收起<<
Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin Andrey Y. Ageev and Andrei V. Konstantinov.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:1.57MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注