Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin Andrey Y. Ageev and Andrei V. Konstantinov

2025-05-08 0 0 1.57MB 27 页 10玖币

侵权投诉

Improved Anomaly Detection by Using the Attention-Based

Isolation Forest

Lev V. Utkin, Andrey Y. Ageev and Andrei V. Konstantinov

Peter the Great St.Petersburg Polytechnic University

St.Petersburg, Russia

e-mail: lev.utkin@gmail.com, andreyageev1@mail.ru, andrue.konst@gmail.com

Abstract

A new modiﬁcation of Isolation Forest called Attention-Based Isolation Forest (AB-

IForest) for solving the anomaly detection problem is proposed. It incorporates the

attention mechanism in the form of the Nadaraya-Watson regression into the Isolation

Forest for improving solution of the anomaly detection problem. The main idea under-

lying the modiﬁcation is to assign attention weights to each path of trees with learnable

parameters depending on instances and trees themselves. The Huber’s contamination

model is proposed to be used for deﬁning the attention weights and their parameters.

As a result, the attention weights are linearly depend on the learnable attention parame-

ters which are trained by solving the standard linear or quadratic optimization problem.

ABIForest can be viewed as the ﬁrst modiﬁcation of Isolation Forest, which incorporates

the attention mechanism in a simple way without applying gradient-based algorithms.

Numerical experiments with synthetic and real datasets illustrate outperforming results

of ABIForest. The code of proposed algorithms is available.

Keywords: anomaly detection, attention mechanism, Isolation Forest, Nadaraya-Watson

regression, quadratic programming, contamination model

1 Introduction

One of the important machine learning problems is the novelty or anomaly detection problem

which aims to detect abnormal or anomalous instances. This problem can be regarded as a

challenging task because there is no a strong deﬁnition of anomalous instance and the anomaly

itself depends on a certain application. Another diﬃculty which deﬁnes the challenge of the

problem is that anomalies usually seldom appear and this fact leads to highly imbalanced

training sets. Moreover, it is diﬃcult to deﬁne a boundary between the normal and anomalous

observations [1]. Due to importance of the anomaly detection problem in many applications,

a huge amount of papers covering anomaly detection tasks and studying various aspects of

the anomaly detection have been published in the last decades. Many approaches to solving

the anomaly detection problem are analyzed in comprehensive survey papers [1, 2, 3, 4, 5, 6,

7, 8, 9, 10, 11].

arXiv:2210.02558v1 [cs.LG] 5 Oct 2022

According to [1, 12], anomalies also referred to as abnormalities, deviants, or outliers can

be viewed as data points which are located further away from the bulk of data points that

are referred to as normal data.

Various approaches to solving the anomaly detection problem can be divided into sev-

eral groups [10]. The ﬁrst group consists of the probabilistic and density estimation models.

It includes the classic density estimation models, energy-based models, neural generative

models [10]. The second large group deals with the one-class classiﬁcation models. This

group includes the well-known one-class classiﬁcation SVMs [13, 14, 15]. The third group

includes reconstruction-based models which detect anomalies by reconstructing the data in-

stances. The well-known models from this group are autoencoders which incorrectly recon-

struct anomalous instances such that the distance between the instance and its reconstruction

is larger than a predeﬁned threshold which is usually regarded as a hyperparameter of the

model.

The next group contains distance-based anomaly detection models. One of the most

popular and eﬀective models from the group is the Isolation Forest (iForest) [16, 17] which is

a model for detecting anomalous points relative to a certain data distribution. According to

iForest, anomalies are detected using isolation which measures how far an instance is from

the rest of instances. iForest can be regarded as a tool implementing the isolation. It has

the linear time complexity and works well with large amounts of data. The core idea behind

iForest is the tendency for anomalous instances in a dataset to be more easily separated from

the rest of the sample (isolated) compared to normal instances. To isolate a data point, the

algorithm recursively creates sample partitions by randomly choosing an attribute and then

randomly choosing a split value for the attribute between the minimum and maximum values

allowed for that attribute. The recursive partition can be represented by a tree structure

called an isolation tree, while the number of partitions needed to isolate a point can be

interpreted as the length of the path within the tree to the end node, starting from the root.

Anomalous instances are those with a shorter path length in the tree [16, 17].

In order to improve iForest, we propose to modify it by using the attention mechanism

which can automatically distinguish the relative importance of instances and weigh them for

improving the overall accuracy of iForest. The attention mechanism has been successfully

applied to many applications, including the natural language processing models, the computer

vision area, etc. Comprehensive surveys of properties and forms of the attention mechanism

and transformers can be found in [18, 19, 20, 21, 22].

The idea to apply the attention mechanism to iForest stems from the attention-based

random forest (ABRF) models proposed in [23] where attention is implemented in the form

of the Nadaraya-Watson regression [24, 25] by assigning attention weights to leaves of trees in

a speciﬁc way such that the weights depend on trees and instances. The attention learnable

parameters in ABRF are trained by solving the standard quadratic optimization problem

with linear constraints. It turns out that this idea to consider the random forest as the

Nadaraya-Watson regression [24, 25] can be extended to iForest taking into account the

iForest peculiarities which diﬀer it from the random forest. According to the original iForest,

the isolation measure is estimated as the mean value of the path lengths over all trees in the

forest. However, we can replace the averaging of the path lengths with the Nadaraya-Watson

regression where the path length of an instance in each tree can be regarded as a prediction

in the regression (the value in terms of the attention mechanism [27]), and weights (the

attention weights) depend on the corresponding tree and the instance (the query in terms of

the attention mechanism [27]). In other words, the ﬁnal prediction of the expected path length

in accordance with the Nadaraya-Watson regression is a weighted sum of path lengths over all

trees. Weights of path lengths have learnable parameters (the learnable attention parameters)

which can be computed by minimizing a loss function of a speciﬁc form. We aim to reduce

the optimization problem to the quadratic programming problem or linear programming

problem which has many algorithms for solving. In order to achieve this aim, the Huber’s

-contamination model [26] is proposed to be used for computing the learnable attention

parameters. The contamination model allows us to represent attention weights in the form of

a linear combination of the softmax operation and learnable parameters with contamination

parameter , which can be viewed as probabilities. As a result, the loss function for computing

learnable parameters is linear with linear constraints on the parameters as probabilities. After

adding the L2regularization term, the optimization problem for computing attention weights

becomes to be quadratic one.

Our contributions can be summarized as follows:

1. A new modiﬁcation of iForest called Attention-Based Isolation Forest (ABIForest) in-

corporating the attention mechanism in the form of the Nadaraya-Watson regression

for improving solution of the anomaly detection problem is proposed.

2. The algorithm of computing attention weights is reduced to solving the linear or

quadratic programming problems due to applying the Huber’s -contamination model.

Moreover, we propose to use the hinge-loss function to simplify the optimization prob-

lem. Contamination parameter is regarded as a tuning hyperparameter.

3. Numerical experiments with synthetic and real datasets are performed for studying

ABIForest. They demonstrate outperforming results for most datasets. The code

of proposed algorithms can be found in https://github.com/AndreyAgeev/Attention-

based-isolation-forest.

The paper is organized as follows. Related work can be found in Section 2. Brief in-

troductions to the attention mechanism, the Nadaraya-Watson regression and iForest are

given in Section 3. The proposed ABIForest model is considered in Section 4. Numerical

experiments with synthetic and real datasets illustrating peculiarities of ABIForest and its

comparison with iForest are provided in Section 5. Concluding remarks discussing advantages

and disadvantages of ABIForest can be found in Section 6.

2 Related work

Attention mechanism. The attention mechanism can be viewed as an eﬀective method

for improving the performance of a large variety of machine learning models. Therefore,

there are many diﬀerent types of attention mechanisms depending on their applications and

models where attention mechanisms are incorporated. The term “attention” was introduced

by Bahdanau et al. [27]. Following this paper, a huge amount of models based on the

attention mechanism can be found in the literature. There are also several types of attention

mechanisms [28], including soft and hard attention mechanisms [29], the local and global

attention [30], self-attention [31], multi-head attention [31], hierarchical attention [32]. It

is diﬃcult to consider all papers devoted to the attention mechanisms and its applications.

Comprehensive surveys [18, 19, 20, 21, 22, 33] cover a large part of available models and

modiﬁcations of the attention mechanisms.

Most attention models are implemented as parts of neural networks. In order to extend

a set of attention models, several random forest models incorporated with the attention

mechanism were proposed in [23, 34, 35]. The gradient boosting machine added by the

attention mechanism was presented in [36].

Anomaly detection with attention. A wide set of machine learning tasks include

anomaly detection problems. Therefore, many methods and models have been developed

to address them [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. One of the tools for solving the anomaly

detection problems is the attention mechanism. Monotonic attention based autoencoders

was proposed in [37] as an unsupervised learning technique to detect the false data injec-

tion attacks. Anomaly detection method based on the Siamese network with an attention

mechanism for dealing with small datasets was proposed in [38]. The so-called residual

attention network that employs the attention mechanism and residual learning to improve

classiﬁcation eﬃciency and accuracy was presented in [39]. The graph anomaly detection

algorithm based on the attention-based deep learning to assist the audit process was pro-

vided in [40]. Madan et al. [41] presented a novel self-supervised masked convolutional

transformer block that comprises the reconstruction-based functionality. Integration of the

reconstruction-based functionality into a novel self-supervised predictive architectural build-

ing block was considered in [42]. Huang et al. [43] improved the eﬃciency and eﬀectiveness

of anomaly detection and localization at inference by using a progressive mask reﬁnement

approach that progressively uncovers the normal regions and ﬁnally locates the anomalous

regions. A novel self-supervised framework for multivariate time-series anomaly detection via

a graph attention network was proposed in [44]. It can be seen from the above works that

the idea to apply attention in models solving the anomaly detection problem was successfully

implemented. However, the attention was used in the form of components of neural networks.

There are no forest-based anomaly detection models which use the attention mechanism.

iForest. iForest [16, 17] can be viewed as one of the important and eﬀective methods

for solving novelty and anomaly detection problems. Therefore, many modiﬁcations of the

method have been developed [5] to improve it. A weighted iForest and Siamese Gated Re-

current Unit algorithm architecture which provides a more accurate and eﬃcient method

for outlier detection of data is considered in [45]. Hariri et al. [46] proposed an extension

of the iForest, named Extended Isolation Forest, which resolves issues with assignment of

anomaly score to given data points. A theoretical framework that describes the eﬀectiveness

of isolation-based approaches from a distributional viewpoint was studied in [47]. Lesouple

et al. [48] presented a generalized isolation forest algorithm which generates trees without

any empty branch, which signiﬁcantly improves the execution times. The k-Means-Based

iForest was developed by Karczmarek et al. [49]. This modiﬁcation of iForest allows to build

a search tree based on many branches in contrast to the only two considered in the original

method. Another modiﬁcation, called the Fuzzy Set-Based Isolation Forest was proposed

in [50]. A probabilistic generalization of iForest was proposed in [51], which is based on

nonlinear dependence of a segment-cumulated probability from the length of segment. A ro-

bust anomaly detection method called the similarity-measured isolation forest was developed

by Li et al. [52] to detect abnormal segments in monitoring data. A novel hyperspectral

anomaly detection method with kernel Isolation Forest was proposed in [53]. The method

is based on an assumption that anomalies rather than background can be more susceptible

to isolation in the kernel space. An improved computational framework which allows us to

seek the most separable attributes and spot corresponding optimized split points eﬀectively

was presented in [54]. Staerman et al. [55] introduced the so-called Functional Isolation

Forest which generalizes iForest to the inﬁnite dimensional context, i.e., the model deals with

functional random variables that take their values in a space of functions. Xu et al. [56]

proposed the Deep Isolation Forest which is based on an isolation method with arbitrary

(linear/non-linear) partition of data implemented by using neural networks.

The above works is only a part of many extensions and modiﬁcations of iForest developed

due to excellent properties of the method. However, to the best of our knowledge, there are

no works considering approaches to incorporating the attention mechanism into iForest.

3 Preliminaries

3.1 Attention mechanism as the Nadaraya-Watson regression

If to consider the attention mechanism as a method for enhancing accuracy of iForest for

the anomaly detection problem solution, then it allows us to automatically distinguish the

relative importance of features, instances and isolation trees. According to [18, 57], the

original idea of attention can be understood from the statistical point of view applying the

Nadaraya-Watson kernel regression model [24, 25].

Given ninstances D={(x1, y1), ..., (xn, yn)}, in which xi= (xi1, ..., xid)∈Rdis a

feature vector involving mfeatures and yi∈Rrepresents the regression outputs, the task

of regression is to construct a regressor f:Rm→Rwhich can predict the output value ˜y

of a new observation x, using available data S. The similar task can be formulated for the

classiﬁcation problem.

The original idea behind the attention mechanism is to replace the simple average of

outputs ˜y=n−1Pn

i=1 yifor estimating the regression output y, corresponding to a new input

feature vector xwith the weighted average, in the form of the Nadaraya-Watson regression

model [24, 25]:

˜y=

i=1

α(x,xi)yi,(1)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovedAnomalyDetectionbyUsingtheAttention-BasedIsolationForestLevV.Utkin,AndreyY.AgeevandAndreiV.KonstantinovPetertheGreatSt.PetersburgPolytechnicUniversitySt.Petersburg,Russiae-mail:lev.utkin@gmail.com,andreyageev1@mail.ru,andrue.konst@gmail.comAbstractAnewmodicationofIsolationForestcalledAttent...

展开>> 收起<<

Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin Andrey Y. Ageev and Andrei V. Konstantinov.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin Andrey Y. Ageev and Andrei V. Konstantinov

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: