DAGAD Data Augmentation for Graph Anomaly Detection Fanzhen Liu Xiaoxiao Ma Jia Wu Jian Yang Shan Xuey Amin Beheshti Chuan Zhouz

2025-05-06 0 0 1.66MB 10 页 10玖币

侵权投诉

DAGAD: Data Augmentation for Graph Anomaly

Detection

Fanzhen Liu∗], Xiaoxiao Ma∗], Jia Wu∗, Jian Yang∗, Shan Xue†, Amin Beheshti∗, Chuan Zhou‡,

Hao Peng§, Quan Z. Sheng∗, and Charu C. Aggarwal¶

∗School of Computing, Macquarie University, Sydney, Australia

†School of Computing and Information Technology, University of Wollongong, Wollongong, Australia

‡Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

§Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China

¶IBM T. J. Watson Research Center, Yorktown, NY, USA

{fanzhen.liu, xiaoxiao.ma2}@hdr.mq.edu.au, {jia.wu, jian.yang, amin.beheshti, michael.sheng}@mq.edu.au,

sxue@uow.edu.au, zhouchuan@amss.ac.cn, penghao@buaa.edu.cn, charu@us.ibm.com

Abstract—Graph anomaly detection in this paper aims to

distinguish abnormal nodes that behave differently from the

benign ones accounting for the majority of graph-structured

instances. Receiving increasing attention from both academia

and industry, yet existing research on this task still suffers from

two critical issues when learning informative anomalous behavior

from graph data. For one thing, anomalies are usually hard

to capture because of their subtle abnormal behavior and the

shortage of background knowledge about them, which causes

severe anomalous sample scarcity. Meanwhile, the overwhelming

majority of objects in real-world graphs are normal, bringing

the class imbalance problem as well. To bridge the gaps, this

paper devises a novel Data Augmentation-based Graph Anomaly

Detection (DAGAD) framework for attributed graphs, equipped

with three specially designed modules: 1) an information fusion

module employing graph neural network encoders to learn rep-

resentations, 2) a graph data augmentation module that fertilizes

the training set with generated samples, and 3) an imbalance-

tailored learning module to discriminate the distributions of the

minority (anomalous) and majority (normal) classes. A series of

experiments on three datasets prove that DAGAD outperforms

ten state-of-the-art baseline detectors concerning various mostly-

used metrics, together with an extensive ablation study validating

the strength of our proposed modules.

Index Terms—Anomaly detection, graph mining, data augmen-

tation, anomalous sample scarcity, class imbalance, graph neural

networks, semi-supervised learning

I. INTRODUCTION

Anomalies appear as objects that deviate from other ref-

erence members [1], [2]. In various real-world scenarios,

they could be fake news [3], telecommunication fraudsters

[4], and spammers [5], which bring serious security and

economic problems to our society. Beneﬁting from the

power of graph modeling to characterize complicated inter-

actions/relationships as connections among real-world objects

[6], [7], graph anomaly detection demonstrates its advantages

in exposing anomalies by means of graph mining techniques

[4], [8], providing a comprehensive solution to dealing with

complex graph-structured data. In this way, real-world anoma-

lies can be depicted as - anomalous nodes representing single

]Equal contribution.

Data

Augmentation

Augmented

Samples

Embedding

Original

Samples

Graph

anomaly

detector or

Normal

Anomalous

Fig. 1. A toy example of data augmentation for graph anomaly detection. With

very few labeled samples, a graph anomaly detector (→) that only exploits

original data misidentiﬁes some anomalies not easily exposed. Carefully

augmenting training samples based on node embeddings/representations (→)

can complement information together with original samples to learn more

effective graph anomaly detectors.

objects like fraudsters [9], anomalous edges denoting inter-

actions like illegal transactions [10], and abnormal subgraphs

revealing groups of interconnected malevolent objects, such

as fraud groups [11]. This work concentrates on detecting

anomalous nodes that appear most frequently in real scenarios.

Existing studies on graph anomaly detection have made

efforts to discover anomalous objects dealing with graph

topological information and rich features, but they are vul-

nerable to the intuitive nature of data regarding two issues,

i.e., anomalous sample scarcity and class imbalance, to some

extent. For one thing, real-world anomalies are not easy to

observe. For instance, around 90% of victims in e-commerce

scenarios did not report through payment platforms like Alipay

(www.alipay.com), so only a small number of anomalies can

be captured [11]. For another, anomalous objects are far

less numerous than benign ones [12]. As a result of this,

graph anomaly detection is faced with the severely skewed

distribution of anomalies versus benign nodes in quantity.

arXiv:2210.09766v1 [cs.LG] 18 Oct 2022

However, they fail to utilize the knowledge of even a

limited number of anomalies, which to some extent sacriﬁces

the capability to distinguish between normal and anomalous

objects. Most of those unsupervised methods are built on

autoencoders [13] and rest on the assumption - that pursuing

error minimization before and after data reconstruction is able

to separate unusual items from the normal ones in a new

low-dimensional feature space. Others study graph anomaly

detection assisted by labeled nodes [14], [15], but they do not

look into the class imbalance issue, incurring subpar anomaly

detection performance.

In pursuit of better performance in anomaly detection, we

develop a novel Data Augmentation-based Graph Anomaly

Detection framework called DAGAD with three specially

designed modules in tandem with each other to address the

above two issues. DAGAD organizes these modules in a con-

solidated manner, summarized as follows: 1) an information

fusion module encodes node attributes and graph topology

information into low-dimensional vectors, a.k.a. node embed-

dings/representations, to represent fused features on nodes in

a uniﬁed way; 2) a data augmentation module enriches the

training set by generating additional training samples from

original nodes based on their representations, which alleviates

the suffering from anomalous sample scarcity, as shown in

Fig. 1; and 3) an imbalance-tailored learning module comes up

with a class-wise loss function to alleviate the class imbalance

issue. Taking the advantage of graph neural networks (GNNs)

in attributed graph learning [16], [17], DAGAD integrates

the above modules into a GNN-aided learning framework

to acquire an effective graph anomaly detector by extracting

discriminative representations for anomalies and normal nodes.

Most importantly, DAGAD is designed to exert maximum

leverage on a very limited number of labeled data to distin-

guish anomalies.

Contributions. This paper contributes to graph anomaly de-

tection from the points as follows:

•The investigated graph data augmentation technology

generates additional samples derived from the original

training set in the embedding space. Augmented samples

together with original samples are leveraged by two clas-

siﬁers in a complementary manner, to learn discriminative

representations for the anomalous and normal classes.

•The representation-based data augmentation module in

our framework provides a comprehensive solution to

the scarcity of anomalous training samples in anomaly

detection. This module is also extendable to other graph

learning tasks that rely on learning features from a very

limited number of labeled instances.

•A simple but effective imbalance-tailored learning mod-

ule is employed to alleviate the suffering from class

imbalance by utilizing a specially designed class-wise

loss, which can be easily integrated into other semi-

supervised graph anomaly detectors.

•Extensive experiments on three datasets as well as an ab-

lation study prove DAGAD’s superiority and the proposed

modules’ effectiveness under diverse evaluation criteria.

II. RELATED WORK

This paper focuses on the anomalous node detection prob-

lem, which aims to identify the nodes that signiﬁcantly deviate

from others in the graph. For completeness, we investigate

recent studies on graph anomaly detection as well as data

augmentation and class-imbalanced learning.

A. Graph Anomaly Detection

To date, various graph anomaly detection studies have been

conducted to identify potential anomalies (e.g., fraudsters and

network intruders) in real-world networks [4], [8]. These

studies explore the graph topology or non-structured node

features from different perspectives for fusing the patterns

of nodes and then identify anomalies that experience dif-

ferent patterns. Due to the advancement of deep graph data

representation, especially graph neural networks [16], and

their efﬁcacy in graph analysis, uncovering graph anomalies

with deep learning techniques has been extensively studied in

contemporary works [18]–[21]. Unlike conventional machine

learning-based graph anomaly detection techniques that rely

heavily on expert knowledge and human-recognized statistical

features [22], [23], deep learning-based detectors deliver su-

perior performance in wide applications ranging from ﬁnance

to network security.

Most deep learning-based graph techniques stem from the

motivation to encode the rich graph data into high-level node

representations [17]. Graph anomalies can then be identiﬁed

in an unsupervised manner by assigning anomaly scores re-

garding the reconstruction loss introduced by each node [19],

[21], [24], [25], distance to the majority of nodes [20], or

through semi-supervised/supervised learning [26], [27] to train

deep classiﬁers. This line of research counts heavily on the

informativeness of node representations, and advanced graph

neural network models such as GCN [28], GAT [29], and

GraphSAGE [30], are therefore widely adopted for extracting

node representations. However, existing works almost fail to

fully capitalize on a very limited number of anomalies from

the training set, and the majority of them follow an introduced

assumption that anomalies can be manifest in reconstruction

error in an unsupervised manner. Even though there are a few

works under semi-supervised learning settings [14], [15], it is

difﬁcult for them to effectively confront the challenges of the

scarcity of labeled anomalies and class imbalance associated

with anomaly detection. Further efforts to bridge these gaps

are of great demand for better anomaly detection solutions.

B. Data Augmentation

Data augmentation aims at enhancing the quantity and/or

size of training data by either slightly modifying original data

or generating synthetic instances from original data [31]. It

has been proved that ﬁelds ranging from natural language

processing [32] to computer vision [33] beneﬁt from the power

of data augmentation. Hence, data augmentation can serve as

an effective tool to alleviate the lack of anomalous samples.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DAGAD:DataAugmentationforGraphAnomalyDetectionFanzhenLiu],XiaoxiaoMa],JiaWu,JianYang,ShanXuey,AminBeheshti,ChuanZhouz,HaoPengx,QuanZ.Sheng,andCharuC.Aggarwal{SchoolofComputing,MacquarieUniversity,Sydney,AustraliaySchoolofComputingandInformationTechnology,UniversityofWollongong,Wollongong,Aust...

展开>> 收起<<

DAGAD Data Augmentation for Graph Anomaly Detection Fanzhen Liu Xiaoxiao Ma Jia Wu Jian Yang Shan Xuey Amin Beheshti Chuan Zhouz.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DAGAD Data Augmentation for Graph Anomaly Detection Fanzhen Liu Xiaoxiao Ma Jia Wu Jian Yang Shan Xuey Amin Beheshti Chuan Zhouz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: