Unsupervised Graph Outlier Detection Problem Revisit New Insight and Superior Method Yihong Huangy Liping Wangy Fan Zhangz Xuemin Linx

2025-05-06 0 0 1.78MB 15 页 10玖币
侵权投诉
Unsupervised Graph Outlier Detection: Problem
Revisit, New Insight, and Superior Method
Yihong Huang, Liping Wang, Fan Zhang, Xuemin Lin§
East China Normal University, Guangzhou University, §Shanghai Jiao tong University
hyh957947142@gmail.com, lipingwang@sei.ecnu.edu.cn
zhangf@gzhu.edu.cn, lxue@cse.unsw.edu.au
Abstract—A large number of studies on Graph Outlier De-
tection (GOD) have emerged in recent years due to its wide
applications, in which Unsupervised Node Outlier Detection
(UNOD) on attributed networks is an important area. UNOD
focuses on detecting two kinds of typical outliers in graphs: the
structural outlier and the contextual outlier. Most existing works
conduct experiments based on datasets with injected outliers.
However, we find that the most widely-used outlier injection
approach has a serious data leakage issue. By only utilizing
such data leakage, a simple approach can achieve state-of-the-art
performance in detecting outliers. In addition, we observe that
existing algorithms have a performance drop with the mitigated
data leakage issue. The other major issue is on balanced detection
performance between the two types of outliers, which has not
been considered by existing studies.
In this paper, we analyze the cause of the data leakage issue
in depth since the injection approach is a building block to
advance UNOD. Moreover, we devise a novel variance-based
model to detect structural outliers, which outperforms existing
algorithms significantly and is more robust at kinds of injection
settings. On top of this, we propose a new framework, Variance-
based Graph Outlier Detection (VGOD), which combines our
variance-based model and attribute reconstruction model to
detect outliers in a balanced way. Finally, we conduct extensive
experiments to demonstrate the effectiveness and efficiency of
VGOD. The results on 5 real-world datasets validate that VGOD
achieves not only the best performance in detecting outliers but
also a balanced detection performance between structural and
contextual outliers.
Index Terms—Graph Outlier Detection; Graph Neural Net-
work; Unsupervised Graph learning; Attributed Networks
I. INTRODUCTION
Graph Outlier Detection [1] (GOD, a.k.a. graph anomaly
detection) is a fundamental graph mining task. It has various
applications in high-impact domains and complex systems,
such as financial fraudster identification [2]. The detection
objects of GOD can be classified into different levels like
node, edge, community, and graph [3]. For example, detecting
abnormal users in a social network is the node-level GOD
task while detecting abnormal molecules can be regarded as a
graph-level GOD task.
Due to the high cost or unavailability of manually la-
beling the ground truth outliers, a large number of existing
GOD approaches are carried out in an unsupervised manner
[4, 5], which aims to detect the instances that significantly
deviate from the majority of instances in the graph [6].
Attributed networks (a.k.a. attributed graphs) are a powerful
data representation for many real-world complex systems (e.g.
a social network with user profiles), in which entities can
be represented as nodes with their attribute information; the
interaction or relationship between entities can be represented
as edges [7]. In recent years, the study of Unsupervised Node
Outlier Detection (UNOD) on attributed networks has been
blooming due to its wide applications [3, 8, 9]. Different
from traditional global outlier detection and time series outlier
detection, it defines two new typical types of outliers on
attributed networks, namely, structural outlier and contextual
outlier [4].
(a) Structural Outlier (b) Contextual Outlier
Normal Node Outlier Node Node Attributes
community A
community B
Fig. 1. An example of structural and contextual outliers in UNOD.
In Fig 1, there is a toy example for these two kinds
of basic outliers. Particularly, structural outliers are those
nodes structurally connected to different communities, i.e.,
their structural neighborhood is inconsistent. In other words,
a structural outlier has normal attributes while it may have
several abnormal links. For example, those people from dif-
ferent communities but have a strong connection with each
other can be regarded as structural outliers. As shown in Fig
1(a), there are two communities outlined with orange circles
and structural outliers are those nodes with abnormal links
to other communities. On the other hand, a contextual outlier
has a consistent neighborhood structure while its attributes are
corrupted, noisy, or significantly different from its neighbor
nodes’ attributes. For example, in Fig 1(b), suppose that the
node in red is a football player while the nodes in green
are music teachers. In this case, the node in red is regarded
as a contextual outlier since it has a vast difference from
its neighbors. In the real world, datasets are much more
complicated than this toy example and it is difficult to measure
the degree of inconsistency among nodes.
arXiv:2210.12941v3 [cs.LG] 9 Jan 2023
There have been various methods proposed to solve UNOD
[9]. They can be roughly divided into two categories, namely,
non-deep and deep-learning-based methods. Non-deep meth-
ods usually leverage traditional machine learning methods
such as matrix factorization [10], density-based clustering
[11], and relational learning [12] to encode the graph informa-
tion and detect outliers. However, these methods fail to address
the computational challenge with high-dimension data [13].
With the rapid prevalence of Graph Neural Networks (GNNs)
[14], more and more methods are based on deep learning
[15] and GNNs nowadays [3]. For example, DOMINANT [4]
employs two GNN autoencoders to reconstruct the attribute
information and structure information. According to the results
reported in [9], most deep-learning-based methods have a
much better performance than non-deep methods in detecting
injected outliers. To unify the outlier injection process, [9]
adopts the outlier injection approach from [4] as the standard
injection approach.
Challenge. Although the recent deep-learning-based methods
have achieved an excellent performance in UNOD, we find
that the most widely-used outlier injection approach, which
is adopted by [4, 9, 16, 17, 18, 19, 20, 21, 22, 23, 24],
will cause a serious data leakage issue. Here, we refer to
the data leakage [25] in machine learning, which means the
information strongly associated with the labels is leaked to
the training dataset. After employing this approach to inject
outliers, structural outliers will have a larger node degree than
the average while attribute vectors of contextual outliers will
have a larger L2-norm (a.k.a. Euclidean norm) than expected.
As a result, a simple solution only utilizing node degree or
L2-norm of attribute vectors as the outlier score to detect the
corresponding type of outliers can acquire a quite satisfying
performance as shown in Fig 2. The metric of AUC [26] is
adopted here to measure the detection performance. Under
such a serious data leakage issue in injected datasets, most
existing algorithms cannot have a better performance than the
simple solution. In addition, it is observed that existing algo-
rithms have a performance drop in varied injection settings, in
which the data leakage issue caused by the current injection
approach is alleviated. Therefore, it is urgent to find out the
cause of data leakage and reduce its impact. On the other hand,
it is also necessary to exploit an effective UNOD algorithm,
which has a superior performance and is robust to the data
leakage issue. Moreover, the balance between structural and
contextual outliers detection performance is little considered
in existing works [9]. An algorithm with unbalanced detection
may only have detection ability for a certain type of outliers.
It is found that existing algorithms focus more on contextual
outliers than structural outliers when detecting them. To gain
more feasible algorithms, comprehensive metrics for balance
evaluation should be devised.
Our Solution. In this paper, we are devoted to analyzing the
cause of data leakage and devising a superior outlier detection
method for UNOD to achieve better performance in both the
current injection setting and varied injection settings.
Fig. 2. After injecting outliers in four datasets, node degree is employed
to detect structural outliers while L2-norm is employed to detect contextual
outliers. Both of them, compared to the random detector, achieve unexpectedly
high scores.
In particular, we propose a novel variance-based model
to detect structural outliers, which adopts the variance of
representations of neighbor nodes to detect structural outliers.
To the best of our knowledge, it is the first time to employ
the neighbor variance to detect outliers. Existing algorithms
are based on either reconstruction of adjacency matrix [4, 18]
or contrastive learning [16, 23] to detect structural outliers.
According to the definition, the essence of a structural outlier
is its inconsistent neighbors that come from different commu-
nities. However, existing algorithms cannot directly capture
this essence. In this case, we devise a deep graph model to
measure the inconsistency among neighbors by the variance
of latent representations of neighbors, which captures the
essence and gives a better utility for detection. On top of this,
a new framework, Variance-based Graph Outlier Detection
(VGOD), is also proposed to detect two types of outliers with
a variance-based model and an attribute reconstruction model.
To address the balance issue, we separately train two models
to avoid overtraining and normalize two types of outlier scores
to eliminate the scale difference. To evaluate the detection
balance on two outlier types, we introduce a new metric to
measure the gap in the performance score. The experiments are
conducted on 5 real-world datasets and the results demonstrate
that VGOD achieves the best detection performance.
Contribution. Our major contributions are as follows.
1) To the best of our knowledge, we are the first to identify
the data leakage issue in the most widely-used outlier
injection approach.
2) We analyze the cause of data leakage in depth, give
suggestions for the future design of the outlier injection
approach, and purpose a new approach for injecting
structural outliers.
3) We propose a novel variance-based model and a new
VGOD framework, which outperforms existing algo-
rithms in detecting outliers and alleviates the issue of
the balance of detection.
4) Extensive experiments are conducted to demonstrate that
our approach achieves the best detection performance
and the detection balance.
II. RELATED WORK
A. Graph Neural Network
GNNs [14] are a group of neural network models which
utilize the graph structure for network representation learning
and various tasks. Among GNNs, GCN [27] is one of the most
influential models, which extends the convolutional operation
in sequence or grid data to graph-structured data. Furthermore,
to aggregate messages from neighbors more flexibly, GAT [28]
introduces an attention mechanism to learn the importance
of each neighbor node. On the other hand, GraphSage [29]
adopts the sampling-based method to aggregate the neighbor
information to work in large-scale graphs. From a topological
learning perspective, GIN [30] is a more expressive model
than GCN and can achieve the same discriminative power
as the 1-WL graph isomorphism test [31]. In our proposed
framework, GNN plays a vital role in the network embedding
representation of nodes. Generally, the GNN module in our
framework can be set to any type of the above-mentioned
GNNs.
B. Unsupervised Node Outlier Detection on Attributed Net-
works
UNOD on attributed networks has attracted considerable
research interest in recent years due to its wide application in
complex systems. Radar [32] utilizes the residuals of attribute
information and its coherence graph structure for outlier de-
tection. ANOMALOUS [33] conducts attribute selection and
outlier detection jointly based on CUR decomposition and
residual analysis. However, these methods have computation
limitations in high-dimension attributes due to their shallow
mechanisms.
Quite a few studies based on the deep-learning technique
have emerged recently [3]. Dominant [4] builds deep autoen-
coders on top of GCN layers to reconstruct the adjacency
and attribute matrices. AnomalyDAE [18] employs dual au-
toencoder architecture with cross-modality interactions and
the attention mechanism to reconstruct the adjacency and
attribute matrices. CoLA [16] and SL-GAD [23] perform
the UNOD task via contrastive self-supervised learning and
random walk to embed nodes. AEGIS [17] studies UNOD in
the inductive setting by utilizing generative adversarial ideas to
generate potential outliers. DONE [34] employs deep unsuper-
vised autoencoders to generate the network embedding which
eliminates the effects of outliers at the same time. CONAD
[19] adopts four data augmentation strategies and contrastive
learning for outlier detection. GUIDE [21] replaces adjacency
reconstruction with higher-order structure reconstruction to
detect structural outliers. Under the manner of outlier injection,
all these above deep methods show superior performance than
non-deep methods in detecting these two types of outliers. To
evaluate UNOD algorithms, [9] adopts the most widely-used
outlier injection approach from [4] as the standard injection
method and provides unified benchmarks for UNOD, which
facilitates fairness for comparing different methods.
Current UNOD methods have achieved an excellent per-
formance. However, as demonstrated in Fig 2, the widely-
used outlier injection approach exists a data leakage issue.
To our surprise, simply using the combination of L2-norm
and node degree to detect outliers can achieve state-of-the-
art performance. Therefore, our work focuses on analyzing
the cause of the data leakage issue and designing a superior
method. In addition, as mentioned in [9] that no current
method has a balanced detection performance on two outlier
types, we also consider the balance issue in our method.
III. PRELIMINARY
In this section, we formally present some concepts which
are used throughout this paper and define the problem. We
use lowercase letters (e.g. a), bold lowercase letters (e.g. x),
uppercase letters (e.g. X), and calligraphic fonts (e.g. V) to
denote scalars, vectors, matrices, and sets, respectively.
A. Graph Neural Network
GNNs stack L layers of message-passing layers. Each layer
performs a message passing through the given graph structure.
After the initial node feature h0Rd0is transformed by L
layers, the vector representation hLRdLis learned for each
node v. Most message-passing layers can be expressed using
the following rule:
h(l)
v=σ(l)(AGG({Φ(l)(h(l1)
u), u ∈ Nv∪ {v}}))) (1)
where σ(·)is the active function, Ψ(l)(·)and Φ(l)(·)de-
note differentiable functions such as Multi-Layer Perceptrons
(MLP). AGG(·)denotes a differentiable, permutation invariant
function (e.g. sum, mean, max) and Nvdenotes node vs
direct linked neighbors.
Here, we introduce three commonly used GNNs, namely
GCN, GAT, and GIN.
Graph Convolution Network (GCN) [27] is the most
widely-used GNN module, which adopts the propagation rule:
H(l+1) =σ(ˆ
AH(l)W(l))(2)
where ˆ
Ais the symmetric normalized adjacency matrix, H(l)
is the lth hidden layer node representation, and W(l)is the
parameters in the lth hidden layer.
Graph Attention Network (GAT) [28] flexibly aggregates
messages from neighbors with calculated weight αij (vs.
average weight adopted by GCN) of each edge hi, jias
αij =exp(LeakyReLU (aT[Whi||Whj]))
Pk∈Niexp(LeakyReLU (aT[Whi||Whk])) (3)
where aand Ware the learnable weights. Layer mask (l)is
omitted for simplicity.
Graph Isomorphism Network (GIN) [30] is the expres-
sively more powerful GNN model, which follows the rule to
propagate messages as
H(l)=σ(l)(A+ (1 + )·I)H(l1))(4)
where can be a fixed or learnable scalar parameter, and Iand
Ais the identity matrix and adjacency matrix, respectively.
摘要:

UnsupervisedGraphOutlierDetection:ProblemRevisit,NewInsight,andSuperiorMethodYihongHuangy,LipingWangy,FanZhangz,XueminLinxyEastChinaNormalUniversity,zGuangzhouUniversity,xShanghaiJiaotongUniversityhyh957947142@gmail.com,lipingwang@sei.ecnu.edu.cnzhangf@gzhu.edu.cn,lxue@cse.unsw.edu.auAbstract—Alarge...

展开>> 收起<<
Unsupervised Graph Outlier Detection Problem Revisit New Insight and Superior Method Yihong Huangy Liping Wangy Fan Zhangz Xuemin Linx.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.78MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注