However, they fail to utilize the knowledge of even a
limited number of anomalies, which to some extent sacrifices
the capability to distinguish between normal and anomalous
objects. Most of those unsupervised methods are built on
autoencoders [13] and rest on the assumption - that pursuing
error minimization before and after data reconstruction is able
to separate unusual items from the normal ones in a new
low-dimensional feature space. Others study graph anomaly
detection assisted by labeled nodes [14], [15], but they do not
look into the class imbalance issue, incurring subpar anomaly
detection performance.
In pursuit of better performance in anomaly detection, we
develop a novel Data Augmentation-based Graph Anomaly
Detection framework called DAGAD with three specially
designed modules in tandem with each other to address the
above two issues. DAGAD organizes these modules in a con-
solidated manner, summarized as follows: 1) an information
fusion module encodes node attributes and graph topology
information into low-dimensional vectors, a.k.a. node embed-
dings/representations, to represent fused features on nodes in
a unified way; 2) a data augmentation module enriches the
training set by generating additional training samples from
original nodes based on their representations, which alleviates
the suffering from anomalous sample scarcity, as shown in
Fig. 1; and 3) an imbalance-tailored learning module comes up
with a class-wise loss function to alleviate the class imbalance
issue. Taking the advantage of graph neural networks (GNNs)
in attributed graph learning [16], [17], DAGAD integrates
the above modules into a GNN-aided learning framework
to acquire an effective graph anomaly detector by extracting
discriminative representations for anomalies and normal nodes.
Most importantly, DAGAD is designed to exert maximum
leverage on a very limited number of labeled data to distin-
guish anomalies.
Contributions. This paper contributes to graph anomaly de-
tection from the points as follows:
•The investigated graph data augmentation technology
generates additional samples derived from the original
training set in the embedding space. Augmented samples
together with original samples are leveraged by two clas-
sifiers in a complementary manner, to learn discriminative
representations for the anomalous and normal classes.
•The representation-based data augmentation module in
our framework provides a comprehensive solution to
the scarcity of anomalous training samples in anomaly
detection. This module is also extendable to other graph
learning tasks that rely on learning features from a very
limited number of labeled instances.
•A simple but effective imbalance-tailored learning mod-
ule is employed to alleviate the suffering from class
imbalance by utilizing a specially designed class-wise
loss, which can be easily integrated into other semi-
supervised graph anomaly detectors.
•Extensive experiments on three datasets as well as an ab-
lation study prove DAGAD’s superiority and the proposed
modules’ effectiveness under diverse evaluation criteria.
II. RELATED WORK
This paper focuses on the anomalous node detection prob-
lem, which aims to identify the nodes that significantly deviate
from others in the graph. For completeness, we investigate
recent studies on graph anomaly detection as well as data
augmentation and class-imbalanced learning.
A. Graph Anomaly Detection
To date, various graph anomaly detection studies have been
conducted to identify potential anomalies (e.g., fraudsters and
network intruders) in real-world networks [4], [8]. These
studies explore the graph topology or non-structured node
features from different perspectives for fusing the patterns
of nodes and then identify anomalies that experience dif-
ferent patterns. Due to the advancement of deep graph data
representation, especially graph neural networks [16], and
their efficacy in graph analysis, uncovering graph anomalies
with deep learning techniques has been extensively studied in
contemporary works [18]–[21]. Unlike conventional machine
learning-based graph anomaly detection techniques that rely
heavily on expert knowledge and human-recognized statistical
features [22], [23], deep learning-based detectors deliver su-
perior performance in wide applications ranging from finance
to network security.
Most deep learning-based graph techniques stem from the
motivation to encode the rich graph data into high-level node
representations [17]. Graph anomalies can then be identified
in an unsupervised manner by assigning anomaly scores re-
garding the reconstruction loss introduced by each node [19],
[21], [24], [25], distance to the majority of nodes [20], or
through semi-supervised/supervised learning [26], [27] to train
deep classifiers. This line of research counts heavily on the
informativeness of node representations, and advanced graph
neural network models such as GCN [28], GAT [29], and
GraphSAGE [30], are therefore widely adopted for extracting
node representations. However, existing works almost fail to
fully capitalize on a very limited number of anomalies from
the training set, and the majority of them follow an introduced
assumption that anomalies can be manifest in reconstruction
error in an unsupervised manner. Even though there are a few
works under semi-supervised learning settings [14], [15], it is
difficult for them to effectively confront the challenges of the
scarcity of labeled anomalies and class imbalance associated
with anomaly detection. Further efforts to bridge these gaps
are of great demand for better anomaly detection solutions.
B. Data Augmentation
Data augmentation aims at enhancing the quantity and/or
size of training data by either slightly modifying original data
or generating synthetic instances from original data [31]. It
has been proved that fields ranging from natural language
processing [32] to computer vision [33] benefit from the power
of data augmentation. Hence, data augmentation can serve as
an effective tool to alleviate the lack of anomalous samples.