
heavy computation burdens. Some of the calculations are even redundant, in that not all neighbors
are informative for learning task-related embeddings. Unlike regular data such as images or texts, the
connectivity of irregular graph data causes random memory access, which further slows down the
efficiency of data readout. Secondly, class imbalance naturally exists in datasets from diverse practical
domains, such as bioinformatics and social networks. GNNs are sensitive to this imbalance and can
be biased toward the dominant classes. This bias may mislead GNNs’ learning process, therefore
making the model underfit on samples that are of real importance with respect to the downstream
tasks, and as a result yielding poor performance on the test data.
Figure 1: The principle of graph decantation. It decants
data samples based on rankings of their gradient scores,
and then uses them as the training set in the next epoch.
Accordingly, recent studies [
3
,
44
,
30
]
arise to address the issues of massiveness
or class-imbalanced in graph data. To
tackle the massiveness issue, [
7
,
2
] ex-
plore efficient data sampling policies to
reduce the computational cost from the
data perspective. From the model im-
provement perspective, some approaches
design the quantization-aware training and
low-precision inference method to reduce
GNNs’ operating costs. For example,
GLT [
3
] applies the lottery ticket tech-
nique [
9
] to simplify graph data and GNN
model simultaneously. To deal with the
imbalance issue in node classification on
graphs, GraphSMOTE [
44
] tries to gener-
ate new nodes for the minority classes to balance the training data. Improved upon GraphSMOTE,
GraphENS [
30
] further proposes a new augmentation method by constructing an ego network to learn
the representations of the minority classes. Despite progress made so far, existing methods fail to
tackle the two issues altogether. Furthermore, while one of the issues is being handled, extra compu-
tation costs are introduced at the same time. For example, the rewind steps in GLT [
3
] which search
for lottery subnets and subsets heavily increase the computation cost, although the final lotteries are
lightweight. The newly synthetic nodes in GraphSMOTE [
44
,
1
] and GraphENS [
30
], although help
alleviate the data imbalance, bring extra computational burdens for the next-coming training process.
Regarding the issues above, we make several observations. Firstly, we notice that a sparse pruned
GNN easily "forgets" the minority samples when trained with class-imbalanced graph data, as it
yields worse performance over the minorities compared with the original GNN [
17
]. To investigate
the cause of the above observation, we study how each graph sample affects the model parameters
update process by taking a closer look at the gradients it brings to the parameters. Specifically, at early
training stages, we found a small subset of the samples providing the most informative supervisory
signals reflected by the gradient norms. One hypothesis we make is that the training effectiveness of
the full training set can be approximated, to some extent, by that of the subset. We further hypothesize
that this effective approximation is guaranteed by the distance between the gradients of the subset
and the full dataset.
Based on the above observations and the hypothesises, we propose a novel method called
Graph
Dec
antation (GraphDec) to explore dynamic sparsity training from both model and data aspects.
The principle behind our method is illustrated in Figure 1. Given that informative samples bring
about higher gradient values/scores when trained with a sparse GNN, our method is inspired by
contrastive self-supervised learning because it can be modified to dynamically prune one branch of
contrastive backbone for improving its capability of identifying minority samples in class-imbalanced
dataset. In particular, we design a new contrastive backbone with a sparse GNN and enable the
model to identify informative samples in a self-supervised manner. To the best of our knowledge,
other learning processes (e.g., graph auto-encoder, supervised learning) are either unable to identify
informative samples or incapable of learning in a self-supervised manner. Accordingly, the proposed
framework can score samples in the current training set and keep only
k
most informative samples as
training set for the next epoch. Considering that a currently unimportant sample does not imply that
it will always be unimportant, we further design a data recycling process to randomly recycle prior
discarded data samples (samples that are considered unimportant in previous training epochs), and
add them back to current informative sparse subsets for reuse. The dynamically updated informative
subset supports the sparse GNN to learn more balanced representations. To summarize, our major
contributions in this work are:
2