task is universally the same prior across different graph tasks or datasets, which in fact is not always
guaranteed. For example, GSM [
1
] needs to manually define a superclass of graphs, which cannot
expand to node-level tasks. The task or data-specific design limits the GFL’s utility for different graph
mining tasks.
The above challenges call for a new generic GFL framework that can learn a generalizable, trans-
ferable, and effective GNN for various graph mining tasks with few labels. Fortunately, contrastive
learning has emerged to alleviate the dependence on labeled data, and learn label-irrelevant but trans-
ferable representations from unsupervised pretext tasks for vision, language, and graphs [
3
,
9
,
54
,
38
].
Thus, the natural idea is to leverage contrastive learning to boost GFL.
In this work, we are motivated to develop a general and effective
C
ontrastive
G
raph
F
ew-shot
L
earning framework (CGFL) with contrastive learning. To be specific, the proposed framework firstly
pre-trains a GNN by minimizing the contrastive loss between two views’ embeddings generated in two
augmented graphs. Later, we introduce a self-distillation step to bring additional elevation: the pre-
trained GNN is frozen as a teacher model and kept in the contrastive framework to distill a randomly
initialized student model by minimizing the agreement of two views’ embeddings generated by two
models. Both pre-training and the distillation steps can work at the meta-training and meta-testing
phases without requiring labeled data. Finally, the distilled student model is taken as the initialized
model fed to GFL for few-shot graph mining tasks. CGFL pre-trains GNN self-supervised, thus
mitigating the negative impact of distribution shift. The learned graph representation is transferable
and discriminable for new tasks in the test data. Besides, our simple and generic framework of CGFL
is applicable for different graph mining tasks. Furthermore, to quantitatively measure the capability
of CGFL, we introduce information-based method to measure the quality of learned node (or graph)
embeddings on each layer of the model: we allocate each node a learnable variable as a noise and
train these variables to maximize the entropy while keeping the change of output as small as possible.
To summarize, our contributions in this work are:
•
We develop a general and effective framework named CGFL to leverage a self-distilled contrastive
learning procedure to boost GFL. CGFL mitigates distribution shift impact and has the task and
data-independent capacity for a general graph mining purpose.
•
We introduce an information-based method to quantitatively measure the capability of CGFL by
measuring the quality of learned node (or graph) embeddings. To the best of our knowledge, this is
the first study to explore GFL model measurement.
•
Comprehensive experiments on multiple graph datasets demonstrate that CGFL outperforms
state-of-the-art methods for both node classification and graph classification tasks in the few-shot
scenario. Additional measurement results further show that CGFL learns better node (or graph)
embeddings than baseline methods.
2 Related Work
Few-Shot Learning on Graphs
. Many GFL models [
58
] have been proposed to solve various graph
mining problems in face of label sparsity issue, such as node classification [
51
,
4
,
18
,
30
,
31
,
44
,
58
],
relation prediction [
48
,
25
,
2
,
56
,
57
], and graph classification [
1
,
27
,
12
,
45
]. They are built on
meta-learning (or few-shot learning) techniques that can be categorized into two major groups: (1)
metric-based approaches [
43
,
37
]; (2) optimization-based algorithms [
7
]. For the first group, they
learn effective similarity metric between few-shot support data and query data. For example, GPN [
4
]
conducts node informativeness propagation to build weighted class prototypes for a distance-based
node classifier. The second group proposes to learn well-initialized GNN parameters that can be fast
adapted to new graph tasks with few labeled data. For instance, G-Meta [
18
] builds local subgraphs
to extract subgraph-specific information and optimizes GNN via MAML [
7
]. Unlike prior efforts that
rely on labeled data and have the task and data-specific design, we aim to build a novel framework
that explores unlabeled data and has a generic design for a general graph mining purpose.
Self-Supervised Learning on Graphs
. Recently, self-supervised graph learning (SGL) has attracted
significant attention due to its effectiveness in pre-training GNN and competitive performance in
various graph mining applications. Previous SGL models can be categorized into two major groups:
generative or contrastive, according to their learning tasks [
24
,
38
]. The generative models learn
graph representation by recovering feature or structural information on the graph. The task can solely
recover adjacency matrix alone [
53
] or along with the node features [
17
]. As for the contrastive
methods, they firstly define the node context which can be node-level or graph-level instances. Then,
2