DAGAD Data Augmentation for Graph Anomaly Detection Fanzhen Liu Xiaoxiao Ma Jia Wu Jian Yang Shan Xuey Amin Beheshti Chuan Zhouz

2025-05-06 0 0 1.66MB 10 页 10玖币
侵权投诉
DAGAD: Data Augmentation for Graph Anomaly
Detection
Fanzhen Liu], Xiaoxiao Ma], Jia Wu, Jian Yang, Shan Xue, Amin Beheshti, Chuan Zhou,
Hao Peng§, Quan Z. Sheng, and Charu C. Aggarwal
School of Computing, Macquarie University, Sydney, Australia
School of Computing and Information Technology, University of Wollongong, Wollongong, Australia
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
§Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China
IBM T. J. Watson Research Center, Yorktown, NY, USA
{fanzhen.liu, xiaoxiao.ma2}@hdr.mq.edu.au, {jia.wu, jian.yang, amin.beheshti, michael.sheng}@mq.edu.au,
sxue@uow.edu.au, zhouchuan@amss.ac.cn, penghao@buaa.edu.cn, charu@us.ibm.com
Abstract—Graph anomaly detection in this paper aims to
distinguish abnormal nodes that behave differently from the
benign ones accounting for the majority of graph-structured
instances. Receiving increasing attention from both academia
and industry, yet existing research on this task still suffers from
two critical issues when learning informative anomalous behavior
from graph data. For one thing, anomalies are usually hard
to capture because of their subtle abnormal behavior and the
shortage of background knowledge about them, which causes
severe anomalous sample scarcity. Meanwhile, the overwhelming
majority of objects in real-world graphs are normal, bringing
the class imbalance problem as well. To bridge the gaps, this
paper devises a novel Data Augmentation-based Graph Anomaly
Detection (DAGAD) framework for attributed graphs, equipped
with three specially designed modules: 1) an information fusion
module employing graph neural network encoders to learn rep-
resentations, 2) a graph data augmentation module that fertilizes
the training set with generated samples, and 3) an imbalance-
tailored learning module to discriminate the distributions of the
minority (anomalous) and majority (normal) classes. A series of
experiments on three datasets prove that DAGAD outperforms
ten state-of-the-art baseline detectors concerning various mostly-
used metrics, together with an extensive ablation study validating
the strength of our proposed modules.
Index Terms—Anomaly detection, graph mining, data augmen-
tation, anomalous sample scarcity, class imbalance, graph neural
networks, semi-supervised learning
I. INTRODUCTION
Anomalies appear as objects that deviate from other ref-
erence members [1], [2]. In various real-world scenarios,
they could be fake news [3], telecommunication fraudsters
[4], and spammers [5], which bring serious security and
economic problems to our society. Benefiting from the
power of graph modeling to characterize complicated inter-
actions/relationships as connections among real-world objects
[6], [7], graph anomaly detection demonstrates its advantages
in exposing anomalies by means of graph mining techniques
[4], [8], providing a comprehensive solution to dealing with
complex graph-structured data. In this way, real-world anoma-
lies can be depicted as - anomalous nodes representing single
]Equal contribution.
+
Data
Augmentation
Augmented
Samples
Embedding
Original
Samples
Graph
anomaly
detector or
Normal
Anomalous
Fig. 1. A toy example of data augmentation for graph anomaly detection. With
very few labeled samples, a graph anomaly detector () that only exploits
original data misidentifies some anomalies not easily exposed. Carefully
augmenting training samples based on node embeddings/representations ()
can complement information together with original samples to learn more
effective graph anomaly detectors.
objects like fraudsters [9], anomalous edges denoting inter-
actions like illegal transactions [10], and abnormal subgraphs
revealing groups of interconnected malevolent objects, such
as fraud groups [11]. This work concentrates on detecting
anomalous nodes that appear most frequently in real scenarios.
Existing studies on graph anomaly detection have made
efforts to discover anomalous objects dealing with graph
topological information and rich features, but they are vul-
nerable to the intuitive nature of data regarding two issues,
i.e., anomalous sample scarcity and class imbalance, to some
extent. For one thing, real-world anomalies are not easy to
observe. For instance, around 90% of victims in e-commerce
scenarios did not report through payment platforms like Alipay
(www.alipay.com), so only a small number of anomalies can
be captured [11]. For another, anomalous objects are far
less numerous than benign ones [12]. As a result of this,
graph anomaly detection is faced with the severely skewed
distribution of anomalies versus benign nodes in quantity.
arXiv:2210.09766v1 [cs.LG] 18 Oct 2022
However, they fail to utilize the knowledge of even a
limited number of anomalies, which to some extent sacrifices
the capability to distinguish between normal and anomalous
objects. Most of those unsupervised methods are built on
autoencoders [13] and rest on the assumption - that pursuing
error minimization before and after data reconstruction is able
to separate unusual items from the normal ones in a new
low-dimensional feature space. Others study graph anomaly
detection assisted by labeled nodes [14], [15], but they do not
look into the class imbalance issue, incurring subpar anomaly
detection performance.
In pursuit of better performance in anomaly detection, we
develop a novel Data Augmentation-based Graph Anomaly
Detection framework called DAGAD with three specially
designed modules in tandem with each other to address the
above two issues. DAGAD organizes these modules in a con-
solidated manner, summarized as follows: 1) an information
fusion module encodes node attributes and graph topology
information into low-dimensional vectors, a.k.a. node embed-
dings/representations, to represent fused features on nodes in
a unified way; 2) a data augmentation module enriches the
training set by generating additional training samples from
original nodes based on their representations, which alleviates
the suffering from anomalous sample scarcity, as shown in
Fig. 1; and 3) an imbalance-tailored learning module comes up
with a class-wise loss function to alleviate the class imbalance
issue. Taking the advantage of graph neural networks (GNNs)
in attributed graph learning [16], [17], DAGAD integrates
the above modules into a GNN-aided learning framework
to acquire an effective graph anomaly detector by extracting
discriminative representations for anomalies and normal nodes.
Most importantly, DAGAD is designed to exert maximum
leverage on a very limited number of labeled data to distin-
guish anomalies.
Contributions. This paper contributes to graph anomaly de-
tection from the points as follows:
The investigated graph data augmentation technology
generates additional samples derived from the original
training set in the embedding space. Augmented samples
together with original samples are leveraged by two clas-
sifiers in a complementary manner, to learn discriminative
representations for the anomalous and normal classes.
The representation-based data augmentation module in
our framework provides a comprehensive solution to
the scarcity of anomalous training samples in anomaly
detection. This module is also extendable to other graph
learning tasks that rely on learning features from a very
limited number of labeled instances.
A simple but effective imbalance-tailored learning mod-
ule is employed to alleviate the suffering from class
imbalance by utilizing a specially designed class-wise
loss, which can be easily integrated into other semi-
supervised graph anomaly detectors.
Extensive experiments on three datasets as well as an ab-
lation study prove DAGAD’s superiority and the proposed
modules’ effectiveness under diverse evaluation criteria.
II. RELATED WORK
This paper focuses on the anomalous node detection prob-
lem, which aims to identify the nodes that significantly deviate
from others in the graph. For completeness, we investigate
recent studies on graph anomaly detection as well as data
augmentation and class-imbalanced learning.
A. Graph Anomaly Detection
To date, various graph anomaly detection studies have been
conducted to identify potential anomalies (e.g., fraudsters and
network intruders) in real-world networks [4], [8]. These
studies explore the graph topology or non-structured node
features from different perspectives for fusing the patterns
of nodes and then identify anomalies that experience dif-
ferent patterns. Due to the advancement of deep graph data
representation, especially graph neural networks [16], and
their efficacy in graph analysis, uncovering graph anomalies
with deep learning techniques has been extensively studied in
contemporary works [18]–[21]. Unlike conventional machine
learning-based graph anomaly detection techniques that rely
heavily on expert knowledge and human-recognized statistical
features [22], [23], deep learning-based detectors deliver su-
perior performance in wide applications ranging from finance
to network security.
Most deep learning-based graph techniques stem from the
motivation to encode the rich graph data into high-level node
representations [17]. Graph anomalies can then be identified
in an unsupervised manner by assigning anomaly scores re-
garding the reconstruction loss introduced by each node [19],
[21], [24], [25], distance to the majority of nodes [20], or
through semi-supervised/supervised learning [26], [27] to train
deep classifiers. This line of research counts heavily on the
informativeness of node representations, and advanced graph
neural network models such as GCN [28], GAT [29], and
GraphSAGE [30], are therefore widely adopted for extracting
node representations. However, existing works almost fail to
fully capitalize on a very limited number of anomalies from
the training set, and the majority of them follow an introduced
assumption that anomalies can be manifest in reconstruction
error in an unsupervised manner. Even though there are a few
works under semi-supervised learning settings [14], [15], it is
difficult for them to effectively confront the challenges of the
scarcity of labeled anomalies and class imbalance associated
with anomaly detection. Further efforts to bridge these gaps
are of great demand for better anomaly detection solutions.
B. Data Augmentation
Data augmentation aims at enhancing the quantity and/or
size of training data by either slightly modifying original data
or generating synthetic instances from original data [31]. It
has been proved that fields ranging from natural language
processing [32] to computer vision [33] benefit from the power
of data augmentation. Hence, data augmentation can serve as
an effective tool to alleviate the lack of anomalous samples.
摘要:

DAGAD:DataAugmentationforGraphAnomalyDetectionFanzhenLiu],XiaoxiaoMa],JiaWu,JianYang,ShanXuey,AminBeheshti,ChuanZhouz,HaoPengx,QuanZ.Sheng,andCharuC.Aggarwal{SchoolofComputing,MacquarieUniversity,Sydney,AustraliaySchoolofComputingandInformationTechnology,UniversityofWollongong,Wollongong,Aust...

展开>> 收起<<
DAGAD Data Augmentation for Graph Anomaly Detection Fanzhen Liu Xiaoxiao Ma Jia Wu Jian Yang Shan Xuey Amin Beheshti Chuan Zhouz.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.66MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注