Contrastive Graph Few-Shot Learning Chunhui Zhang1 Hongfu Liu1 Jundong Li2 Yanfang Ye3 Chuxu Zhang1 1Brandeis University USA

2025-04-27 0 0 697.34KB 18 页 10玖币
侵权投诉
Contrastive Graph Few-Shot Learning
Chunhui Zhang1, Hongfu Liu1, Jundong Li2, Yanfang Ye3, Chuxu Zhang1
1Brandeis University, USA
2University of Virginia, USA
3University of Notre Dame, USA
{chunhuizhang,hongfuliu,chuxuzhang}@brandeis.edu
jundong@virginia.edu, yye7@nd.edu
Abstract
Prevailing deep graph learning models often suffer from label sparsity issue. Al-
though many graph few-shot learning (GFL) methods have been developed to avoid
the performance degradation in face of limited annotated data, they excessively
rely on labeled data, where the distribution shift in the test phase might result
in impaired generalization ability. Additionally, they lack a general purpose as
their designs are coupled with task or data-specific characteristics. To this end, we
propose a general and effective
C
ontrastive
G
raph
F
ew-shot
L
earning framework
(CGFL). CGFL leverages a self-distilled contrastive learning procedure to boost
GFL. Specifically, our model firstly pre-trains a graph encoder with contrastive
learning using unlabeled data. Later, the trained encoder is frozen as a teacher
model to distill a student model with a contrastive loss. The distilled model is
finally fed to GFL. CGFL learns data representation in a self-supervised manner,
thus mitigating the distribution shift impact for better generalization and making
model task and data-independent for a general graph mining purpose. Furthermore,
we introduce an information-based method to quantitatively measure the capability
of CGFL. Comprehensive experiments demonstrate that CGFL outperforms state-
of-the-art baselines on several graph mining tasks in the few-shot scenario. We
also provide quantitative measurement of CGFLs success.
1 Introduction
Deep graph learning, e.g., graph neural networks (GNNs), has recently attracted tremendous atten-
tion due to its remarkable performance in various application domains, such as social/information
systems [
22
,
13
], molecular chemistry/biology [
20
,
14
], and recommendation [
52
,
5
]. The success of
GNNs often relies on massive annotated samples, which contradicts the fact that it is expensive to col-
lect sufficient labels. This motivates the graph few-shot learning (GFL) study to tackle performance
degradation in the face of limited labeled data.
Previous GFL models are built on meta-learning (or few-shot learning) techniques, either metric-
based approaches [
43
,
37
] or optimization-based algorithms [
7
]. They aim to quickly learn an
effective GNN adapted to new tasks with few labeled samples. GFL has been applied to a variety of
graph mining tasks, including node classification [
60
,
18
], relation prediction [
48
,
25
,
56
], and graph
classification [
1
,
27
]. Despite substantial progress, most previous GFL models still have the following
limitations: (i) Impaired generalization. Existing GFL methods excessively rely on labeled data and
attempt to inherit a strong inductive bias for new tasks in the test phase. However, a distribution shift
exists between non-overlapping meta-training data and meta-testing data. Without supervision signals
from ground-truth labels, GFL may not learn an effective GNN for novel classes of test data. This
gap limits the meta-trained GNN’s generalization and transferability. (ii) Constrained design. Most
of the current GFL methods lack a general purpose as they possess the premise that the designated
Preprint. Under review.
arXiv:2210.00084v1 [cs.LG] 30 Sep 2022
task is universally the same prior across different graph tasks or datasets, which in fact is not always
guaranteed. For example, GSM [
1
] needs to manually define a superclass of graphs, which cannot
expand to node-level tasks. The task or data-specific design limits the GFLs utility for different graph
mining tasks.
The above challenges call for a new generic GFL framework that can learn a generalizable, trans-
ferable, and effective GNN for various graph mining tasks with few labels. Fortunately, contrastive
learning has emerged to alleviate the dependence on labeled data, and learn label-irrelevant but trans-
ferable representations from unsupervised pretext tasks for vision, language, and graphs [
3
,
9
,
54
,
38
].
Thus, the natural idea is to leverage contrastive learning to boost GFL.
In this work, we are motivated to develop a general and effective
C
ontrastive
G
raph
F
ew-shot
L
earning framework (CGFL) with contrastive learning. To be specific, the proposed framework firstly
pre-trains a GNN by minimizing the contrastive loss between two views’ embeddings generated in two
augmented graphs. Later, we introduce a self-distillation step to bring additional elevation: the pre-
trained GNN is frozen as a teacher model and kept in the contrastive framework to distill a randomly
initialized student model by minimizing the agreement of two views’ embeddings generated by two
models. Both pre-training and the distillation steps can work at the meta-training and meta-testing
phases without requiring labeled data. Finally, the distilled student model is taken as the initialized
model fed to GFL for few-shot graph mining tasks. CGFL pre-trains GNN self-supervised, thus
mitigating the negative impact of distribution shift. The learned graph representation is transferable
and discriminable for new tasks in the test data. Besides, our simple and generic framework of CGFL
is applicable for different graph mining tasks. Furthermore, to quantitatively measure the capability
of CGFL, we introduce information-based method to measure the quality of learned node (or graph)
embeddings on each layer of the model: we allocate each node a learnable variable as a noise and
train these variables to maximize the entropy while keeping the change of output as small as possible.
To summarize, our contributions in this work are:
We develop a general and effective framework named CGFL to leverage a self-distilled contrastive
learning procedure to boost GFL. CGFL mitigates distribution shift impact and has the task and
data-independent capacity for a general graph mining purpose.
We introduce an information-based method to quantitatively measure the capability of CGFL by
measuring the quality of learned node (or graph) embeddings. To the best of our knowledge, this is
the first study to explore GFL model measurement.
Comprehensive experiments on multiple graph datasets demonstrate that CGFL outperforms
state-of-the-art methods for both node classification and graph classification tasks in the few-shot
scenario. Additional measurement results further show that CGFL learns better node (or graph)
embeddings than baseline methods.
2 Related Work
Few-Shot Learning on Graphs
. Many GFL models [
58
] have been proposed to solve various graph
mining problems in face of label sparsity issue, such as node classification [
51
,
4
,
18
,
30
,
31
,
44
,
58
],
relation prediction [
48
,
25
,
2
,
56
,
57
], and graph classification [
1
,
27
,
12
,
45
]. They are built on
meta-learning (or few-shot learning) techniques that can be categorized into two major groups: (1)
metric-based approaches [
43
,
37
]; (2) optimization-based algorithms [
7
]. For the first group, they
learn effective similarity metric between few-shot support data and query data. For example, GPN [
4
]
conducts node informativeness propagation to build weighted class prototypes for a distance-based
node classifier. The second group proposes to learn well-initialized GNN parameters that can be fast
adapted to new graph tasks with few labeled data. For instance, G-Meta [
18
] builds local subgraphs
to extract subgraph-specific information and optimizes GNN via MAML [
7
]. Unlike prior efforts that
rely on labeled data and have the task and data-specific design, we aim to build a novel framework
that explores unlabeled data and has a generic design for a general graph mining purpose.
Self-Supervised Learning on Graphs
. Recently, self-supervised graph learning (SGL) has attracted
significant attention due to its effectiveness in pre-training GNN and competitive performance in
various graph mining applications. Previous SGL models can be categorized into two major groups:
generative or contrastive, according to their learning tasks [
24
,
38
]. The generative models learn
graph representation by recovering feature or structural information on the graph. The task can solely
recover adjacency matrix alone [
53
] or along with the node features [
17
]. As for the contrastive
methods, they firstly define the node context which can be node-level or graph-level instances. Then,
2
they perform contrastive learning by either maximizing the mutual information between the node-
context pairs [
15
,
42
,
39
] or by discriminating context instances [
32
,
61
,
59
,
55
]. In addition to
above strategy, recently random propagation applies graph augmentation [
33
] for semi-supervised
learning [6]. Motivated by the success of SGL, we propose to leverage it to boost GFL.
3 Preliminary
GNNs
. A graph is represented as
G“ pV, E, Xq
, where
V
is the set of nodes,
EĎVˆV
is
the set of edges, and
X
is the set of node attributes. GNNs [
13
,
49
] learn compact representations
(embeddings) by considering both graph structure
E
and node attributes
X
. To be specific, let
fθp¨q
denote a GNN encoder with parameter
θ
, the updated embedding of node
v
at the
l
-th layer of GNN
can be formulated as:
hplq
vMphpl´1q
v,thpl´1q
u| @uPNvu;θq,(1)
where
Nv
denotes the neighbor set of
v
;
Mp¨q
is the message passing function for neighbor infor-
mation aggregation, such as a mean pooling layer followed by a fully-connected (FC) layer;
hp0q
v
is initialized with node attribute
Xv
. The whole graph embedding can be computed over all nodes’
embeddings as:
hplq
GREADOUTthplq
v| @vPVu,(2)
where the READOUT function can be a simple permutation invariant function such as summation.
GFL Setting and Problem
. Let
Cbase
and
Cnovel
denote the base classes set and novel (new) classes
set in training data
Ttrain
and testing data
Ttest
, respectively. Similar to the general meta-learning
problem [
7
], the purpose of graph few-shot learning (GFL) is to train a GNN encoder
fθp¨q
over
Cbase
, such that the trained GNN encoder can be quickly adapted to
Cnovel
with few labels per class.
Note that there is no overlapping between base classes and novel classes, i.e.,
Cbase XCnovel “ H
. In
K-shot setting, during the meta-training phase, a batch of classes (tasks) is randomly sampled from
Cbase
, where
K
labeled instances per class are sampled to form the support set
S
for model training
and the remaining instances are taken as the query set
Q
for model evaluation. After sufficient training,
the model is further transferred to the meta-testing phase to conduct
N
-way classification over
Cnovel
(
N
is the number of novel classes), where each class is only with
K
labeled instances. GFL applies to
different graph mining problems, depending on the class meaning. Each class corresponds to a node
label for the node classification problem or corresponds to a graph label for the graph classification
problem. In this work, we will study both node classification and graph classification problems under
few-shot setting, which are formally defined as follows:
Problem 1 Few-Shot Node Classification
. Given a graph
G“ pV, E, Xq
and labeled nodes of
Cbase
, the problem is to learn a GNN
fθp¨q
to classify nodes of
Cnovel
, where each class in
Cnovel
only has few labeled nodes.
Problem 2 Few-Shot Graph Classification
. Given a set of graphs
G
and labeled graphs of
Cbase
,
the problem is to learn a GNN
fθp¨q
to classify graphs of
Cnovel
, where each class in
Cnovel
only has
few labeled graphs.
Unlike previous studies that rely on labeled data of
Ttrain
and
Ttest
for GFL model training and
adaption, we consider both unlabeled graph information and labeled data to learn GFL model for
solving the above problems.
4 Methodology
Figure 1 illustrates the proposed CGFL framework, which includes two phases: self-distilled graph
contrastive learning and graph few-shot learning (GFL). In the first phase (Figure 1(a)), the framework
pre-trains a GNN encoder with contrastive learning, then introduces knowledge distillation to elevate
the pre-trained GNN in a self-supervised manner. The distilled GNN is finally fed to the GFL phase
(Figure 1(b)) for few-shot graph mining tasks. In addition to the proposed framework, we introduce
an information-based method to measure the superiority of CGFL quantitatively.
3
augment
2 views
Node dropping
Edge dropping
Attribute mask
!!!
!!
unlabeled !
stop grad
loss
target GNN f"($)
online GNN f#($)
FC
FC FC embedding &#
update weights by
exponential moving average
stop grad
loss
teacher GNN f#!($)
student GNN f#"($) FC
FC
FC embedding z′#"
stop weight sharing
embedding ℎ′#!
(a) self-distilled graph contrastive learning phase
norm
norm
support data
support
prototypes
query data
outer loop
update
inner loop
update
support loss
query loss
support data
support
prototypes
query data
inner loop
update
support loss
query loss
copy meta-trained
weights
copy distilled model for GFL
copy pre-trained model for distillation
GNN
GNN
support set
query set
support set
query set
meta-training set
meta-testing set
*$
*
%
embedding "
augment
2 views
!∗∗
!
unlabeled !
shared graph
augmentation
strategy
(b) graph few-shot learning phase
Figure 1: The overall framework of CGFL: (a) self-distilled graph contrastive learning phase which
pre-trains a GNN encoder with contrastive learning and further evaluates the model with knowledge
distillation in self-supervised manner; (b) graph few-shot learning phase which takes the distilled
student network as the initialized model and employs meta-learning algorithm for model optimization.
4.1 Self-Distilled Graph Contrastive Learning
GNN Contrastive Pre-training
. In the first phase, we firstly introduce contrastive learning to pre-
train GNN. Inspired by the representation bootstrapping technique [
10
], our method learns node (or
graph) representation by discriminating context instances. Specifically, two GNN encoders: an online
GNN
fθp¨q
and a target GNN
fξp¨q
, are introduced to encode two randomly augmented views of a
given graph. The online GNN is supervised under the target GNN’s output, while the target GNN is
updated by the online GNN’s exponential moving average. The contrastive pre-training step is shown
in Figure 1(a).
Graph Augmentation: The given graph
G
is processed with randomly data augmentations to generate
a contrastive pair (
G1
,
G2
) as the input for two GNN branches (online branch and target branch) of
the following GNN training. In this work, we apply a combination of stochastic node feature masking,
edge removing, and node dropping with constant probabilities for graph augmentation.
GNN Update: With the generated graph pair (
G1
,
G2
), the online GNN
fθp¨q
and the target GNN
fξp¨q
are respectively utilized to process
G1
and
G2
for node (or graph) embeddings generation. Both
GNNs have the same architecture, while a two-layer FC (one-layer FC) is attached after online GNN
(target GNN) to refine embedding. The reason that two branches have different FC layers is to prevent
the prediction of the online model from being exactly the same as the output of the target model,
thus avoiding the learned representation collapse. Later, to enforce online GNN’s embeddings
zθ
approximate the target GNN’s embeddings
hξ
, the mean squared error between them is formulated as
the objective function:
Lθ,ξ kzθ´hξk2
22´2¨zθ, hξ
kzθk2¨khξk2
.(3)
The parameters θof online GNN are updated with Adam optimizer [21]:
θÐAdampθ, θLθ,ξ , ηq,(4)
where
η
is the learning rate. The target GNN provides the regression target to supervise the online
GNN, and its parameters
ξ
are updated as exponential moving average (EMA) of the online GNN
parameters θ. More precisely, ξis updated as follows:
ξÐτξ ` p1´τqθ, (5)
where
τP r0,1s
is the decay rate. Note that the target GNN stops the backpropagation from
Lθ,ξ
,
and it is only updated by EMA.
Contrastive Distillation
. With the pre-trained GNN
fθp¨q
obtained
4
摘要:

ContrastiveGraphFew-ShotLearningChunhuiZhang1,HongfuLiu1,JundongLi2,YanfangYe3,ChuxuZhang11BrandeisUniversity,USA2UniversityofVirginia,USA3UniversityofNotreDame,USA{chunhuizhang,hongfuliu,chuxuzhang}@brandeis.edujundong@virginia.edu,yye7@nd.eduAbstractPrevailingdeepgraphlearningmodelsoftensufferfrom...

展开>> 收起<<
Contrastive Graph Few-Shot Learning Chunhui Zhang1 Hongfu Liu1 Jundong Li2 Yanfang Ye3 Chuxu Zhang1 1Brandeis University USA.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:697.34KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注