Contrastive Graph Few-Shot Learning Chunhui Zhang1 Hongfu Liu1 Jundong Li2 Yanfang Ye3 Chuxu Zhang1 1Brandeis University USA

2025-04-27 0 0 697.34KB 18 页 10玖币

侵权投诉

Contrastive Graph Few-Shot Learning

Chunhui Zhang1, Hongfu Liu1, Jundong Li2, Yanfang Ye3, Chuxu Zhang1

1Brandeis University, USA

2University of Virginia, USA

3University of Notre Dame, USA

{chunhuizhang,hongfuliu,chuxuzhang}@brandeis.edu

jundong@virginia.edu, yye7@nd.edu

Abstract

Prevailing deep graph learning models often suffer from label sparsity issue. Al-

though many graph few-shot learning (GFL) methods have been developed to avoid

the performance degradation in face of limited annotated data, they excessively

rely on labeled data, where the distribution shift in the test phase might result

in impaired generalization ability. Additionally, they lack a general purpose as

their designs are coupled with task or data-speciﬁc characteristics. To this end, we

propose a general and effective

ontrastive

raph

ew-shot

earning framework

(CGFL). CGFL leverages a self-distilled contrastive learning procedure to boost

GFL. Speciﬁcally, our model ﬁrstly pre-trains a graph encoder with contrastive

learning using unlabeled data. Later, the trained encoder is frozen as a teacher

model to distill a student model with a contrastive loss. The distilled model is

ﬁnally fed to GFL. CGFL learns data representation in a self-supervised manner,

thus mitigating the distribution shift impact for better generalization and making

model task and data-independent for a general graph mining purpose. Furthermore,

we introduce an information-based method to quantitatively measure the capability

of CGFL. Comprehensive experiments demonstrate that CGFL outperforms state-

of-the-art baselines on several graph mining tasks in the few-shot scenario. We

also provide quantitative measurement of CGFL’s success.

1 Introduction

Deep graph learning, e.g., graph neural networks (GNNs), has recently attracted tremendous atten-

tion due to its remarkable performance in various application domains, such as social/information

systems [

], molecular chemistry/biology [

], and recommendation [

]. The success of

GNNs often relies on massive annotated samples, which contradicts the fact that it is expensive to col-

lect sufﬁcient labels. This motivates the graph few-shot learning (GFL) study to tackle performance

degradation in the face of limited labeled data.

Previous GFL models are built on meta-learning (or few-shot learning) techniques, either metric-

based approaches [

] or optimization-based algorithms [

]. They aim to quickly learn an

effective GNN adapted to new tasks with few labeled samples. GFL has been applied to a variety of

graph mining tasks, including node classiﬁcation [

], relation prediction [

], and graph

classiﬁcation [

]. Despite substantial progress, most previous GFL models still have the following

limitations: (i) Impaired generalization. Existing GFL methods excessively rely on labeled data and

attempt to inherit a strong inductive bias for new tasks in the test phase. However, a distribution shift

exists between non-overlapping meta-training data and meta-testing data. Without supervision signals

from ground-truth labels, GFL may not learn an effective GNN for novel classes of test data. This

gap limits the meta-trained GNN’s generalization and transferability. (ii) Constrained design. Most

of the current GFL methods lack a general purpose as they possess the premise that the designated

Preprint. Under review.

arXiv:2210.00084v1 [cs.LG] 30 Sep 2022

task is universally the same prior across different graph tasks or datasets, which in fact is not always

guaranteed. For example, GSM [

] needs to manually deﬁne a superclass of graphs, which cannot

expand to node-level tasks. The task or data-speciﬁc design limits the GFL’s utility for different graph

mining tasks.

The above challenges call for a new generic GFL framework that can learn a generalizable, trans-

ferable, and effective GNN for various graph mining tasks with few labels. Fortunately, contrastive

learning has emerged to alleviate the dependence on labeled data, and learn label-irrelevant but trans-

ferable representations from unsupervised pretext tasks for vision, language, and graphs [

Thus, the natural idea is to leverage contrastive learning to boost GFL.

In this work, we are motivated to develop a general and effective

ontrastive

raph

ew-shot

earning framework (CGFL) with contrastive learning. To be speciﬁc, the proposed framework ﬁrstly

pre-trains a GNN by minimizing the contrastive loss between two views’ embeddings generated in two

augmented graphs. Later, we introduce a self-distillation step to bring additional elevation: the pre-

trained GNN is frozen as a teacher model and kept in the contrastive framework to distill a randomly

initialized student model by minimizing the agreement of two views’ embeddings generated by two

models. Both pre-training and the distillation steps can work at the meta-training and meta-testing

phases without requiring labeled data. Finally, the distilled student model is taken as the initialized

model fed to GFL for few-shot graph mining tasks. CGFL pre-trains GNN self-supervised, thus

mitigating the negative impact of distribution shift. The learned graph representation is transferable

and discriminable for new tasks in the test data. Besides, our simple and generic framework of CGFL

is applicable for different graph mining tasks. Furthermore, to quantitatively measure the capability

of CGFL, we introduce information-based method to measure the quality of learned node (or graph)

embeddings on each layer of the model: we allocate each node a learnable variable as a noise and

train these variables to maximize the entropy while keeping the change of output as small as possible.

To summarize, our contributions in this work are:

•

We develop a general and effective framework named CGFL to leverage a self-distilled contrastive

learning procedure to boost GFL. CGFL mitigates distribution shift impact and has the task and

data-independent capacity for a general graph mining purpose.

•

We introduce an information-based method to quantitatively measure the capability of CGFL by

measuring the quality of learned node (or graph) embeddings. To the best of our knowledge, this is

the ﬁrst study to explore GFL model measurement.

•

Comprehensive experiments on multiple graph datasets demonstrate that CGFL outperforms

state-of-the-art methods for both node classiﬁcation and graph classiﬁcation tasks in the few-shot

scenario. Additional measurement results further show that CGFL learns better node (or graph)

embeddings than baseline methods.

2 Related Work

Few-Shot Learning on Graphs

. Many GFL models [

] have been proposed to solve various graph

mining problems in face of label sparsity issue, such as node classiﬁcation [

relation prediction [

], and graph classiﬁcation [

]. They are built on

meta-learning (or few-shot learning) techniques that can be categorized into two major groups: (1)

metric-based approaches [

]; (2) optimization-based algorithms [

]. For the ﬁrst group, they

learn effective similarity metric between few-shot support data and query data. For example, GPN [

]

conducts node informativeness propagation to build weighted class prototypes for a distance-based

node classiﬁer. The second group proposes to learn well-initialized GNN parameters that can be fast

adapted to new graph tasks with few labeled data. For instance, G-Meta [

] builds local subgraphs

to extract subgraph-speciﬁc information and optimizes GNN via MAML [

]. Unlike prior efforts that

rely on labeled data and have the task and data-speciﬁc design, we aim to build a novel framework

that explores unlabeled data and has a generic design for a general graph mining purpose.

Self-Supervised Learning on Graphs

. Recently, self-supervised graph learning (SGL) has attracted

signiﬁcant attention due to its effectiveness in pre-training GNN and competitive performance in

various graph mining applications. Previous SGL models can be categorized into two major groups:

generative or contrastive, according to their learning tasks [

]. The generative models learn

graph representation by recovering feature or structural information on the graph. The task can solely

recover adjacency matrix alone [

] or along with the node features [

]. As for the contrastive

methods, they ﬁrstly deﬁne the node context which can be node-level or graph-level instances. Then,

they perform contrastive learning by either maximizing the mutual information between the node-

context pairs [

] or by discriminating context instances [

]. In addition to

above strategy, recently random propagation applies graph augmentation [

] for semi-supervised

learning [6]. Motivated by the success of SGL, we propose to leverage it to boost GFL.

3 Preliminary

GNNs

. A graph is represented as

G“ pV, E, Xq

, where

is the set of nodes,

EĎVˆV

the set of edges, and

is the set of node attributes. GNNs [

] learn compact representations

(embeddings) by considering both graph structure

and node attributes

. To be speciﬁc, let

fθp¨q

denote a GNN encoder with parameter

, the updated embedding of node

at the

-th layer of GNN

can be formulated as:

hplq

v“Mphpl´1q

v,thpl´1q

u| @uPNvu;θq,(1)

where

denotes the neighbor set of

;

Mp¨q

is the message passing function for neighbor infor-

mation aggregation, such as a mean pooling layer followed by a fully-connected (FC) layer;

hp0q

is initialized with node attribute

. The whole graph embedding can be computed over all nodes’

embeddings as:

hplq

G“READOUTthplq

v| @vPVu,(2)

where the READOUT function can be a simple permutation invariant function such as summation.

GFL Setting and Problem

. Let

Cbase

and

Cnovel

denote the base classes set and novel (new) classes

set in training data

Ttrain

and testing data

Ttest

, respectively. Similar to the general meta-learning

problem [

], the purpose of graph few-shot learning (GFL) is to train a GNN encoder

fθp¨q

over

Cbase

, such that the trained GNN encoder can be quickly adapted to

Cnovel

with few labels per class.

Note that there is no overlapping between base classes and novel classes, i.e.,

Cbase XCnovel “ H

. In

K-shot setting, during the meta-training phase, a batch of classes (tasks) is randomly sampled from

Cbase

, where

labeled instances per class are sampled to form the support set

for model training

and the remaining instances are taken as the query set

for model evaluation. After sufﬁcient training,

the model is further transferred to the meta-testing phase to conduct

-way classiﬁcation over

Cnovel

(

is the number of novel classes), where each class is only with

labeled instances. GFL applies to

different graph mining problems, depending on the class meaning. Each class corresponds to a node

label for the node classiﬁcation problem or corresponds to a graph label for the graph classiﬁcation

problem. In this work, we will study both node classiﬁcation and graph classiﬁcation problems under

few-shot setting, which are formally deﬁned as follows:

Problem 1 Few-Shot Node Classiﬁcation

. Given a graph

G“ pV, E, Xq

and labeled nodes of

Cbase

, the problem is to learn a GNN

fθp¨q

to classify nodes of

Cnovel

, where each class in

Cnovel

only has few labeled nodes.

Problem 2 Few-Shot Graph Classiﬁcation

. Given a set of graphs

and labeled graphs of

Cbase

the problem is to learn a GNN

fθp¨q

to classify graphs of

Cnovel

, where each class in

Cnovel

only has

few labeled graphs.

Unlike previous studies that rely on labeled data of

Ttrain

and

Ttest

for GFL model training and

adaption, we consider both unlabeled graph information and labeled data to learn GFL model for

solving the above problems.

4 Methodology

Figure 1 illustrates the proposed CGFL framework, which includes two phases: self-distilled graph

contrastive learning and graph few-shot learning (GFL). In the ﬁrst phase (Figure 1(a)), the framework

pre-trains a GNN encoder with contrastive learning, then introduces knowledge distillation to elevate

the pre-trained GNN in a self-supervised manner. The distilled GNN is ﬁnally fed to the GFL phase

(Figure 1(b)) for few-shot graph mining tasks. In addition to the proposed framework, we introduce

an information-based method to measure the superiority of CGFL quantitatively.

augment

2 views

Node dropping

Edge dropping

Attribute mask

!!!

unlabeled !

stop grad

loss

target GNN f"($)

online GNN f#($)

FC FC embedding &#

update weights by

exponential moving average

stop grad

loss

teacher GNN f#!($)

student GNN f#"($) FC

FC embedding z′#"

stop weight sharing

embedding ℎ′#!

(a) self-distilled graph contrastive learning phase

norm

…support data

support

prototypes

query data

outer loop

update

inner loop

update

support loss

query loss

…

support data

support

prototypes

query data

inner loop

update

support loss

query loss

copy meta-trained

weights

…

copy distilled model for GFL

copy pre-trained model for distillation

GNN

support set

query set

support set

query set

meta-training set

meta-testing set

embedding ℎ"

augment

2 views

!∗∗

!∗

unlabeled !

shared graph

augmentation

strategy

(b) graph few-shot learning phase

Figure 1: The overall framework of CGFL: (a) self-distilled graph contrastive learning phase which

pre-trains a GNN encoder with contrastive learning and further evaluates the model with knowledge

distillation in self-supervised manner; (b) graph few-shot learning phase which takes the distilled

student network as the initialized model and employs meta-learning algorithm for model optimization.

4.1 Self-Distilled Graph Contrastive Learning

GNN Contrastive Pre-training

. In the ﬁrst phase, we ﬁrstly introduce contrastive learning to pre-

train GNN. Inspired by the representation bootstrapping technique [

], our method learns node (or

graph) representation by discriminating context instances. Speciﬁcally, two GNN encoders: an online

GNN

fθp¨q

and a target GNN

fξp¨q

, are introduced to encode two randomly augmented views of a

given graph. The online GNN is supervised under the target GNN’s output, while the target GNN is

updated by the online GNN’s exponential moving average. The contrastive pre-training step is shown

in Figure 1(a).

Graph Augmentation: The given graph

is processed with randomly data augmentations to generate

a contrastive pair (

) as the input for two GNN branches (online branch and target branch) of

the following GNN training. In this work, we apply a combination of stochastic node feature masking,

edge removing, and node dropping with constant probabilities for graph augmentation.

GNN Update: With the generated graph pair (

), the online GNN

fθp¨q

and the target GNN

fξp¨q

are respectively utilized to process

and

for node (or graph) embeddings generation. Both

GNNs have the same architecture, while a two-layer FC (one-layer FC) is attached after online GNN

(target GNN) to reﬁne embedding. The reason that two branches have different FC layers is to prevent

the prediction of the online model from being exactly the same as the output of the target model,

thus avoiding the learned representation collapse. Later, to enforce online GNN’s embeddings

zθ

approximate the target GNN’s embeddings

hξ

, the mean squared error between them is formulated as

the objective function:

Lθ,ξ “kzθ´hξk2

2“2´2¨zθ, hξ

kzθk2¨khξk2

.(3)

The parameters θof online GNN are updated with Adam optimizer [21]:

θÐAdampθ, ∇θLθ,ξ , ηq,(4)

where

is the learning rate. The target GNN provides the regression target to supervise the online

GNN, and its parameters

are updated as exponential moving average (EMA) of the online GNN

parameters θ. More precisely, ξis updated as follows:

ξÐτξ ` p1´τqθ, (5)

where

τP r0,1s

is the decay rate. Note that the target GNN stops the backpropagation from

Lθ,ξ

and it is only updated by EMA.

Contrastive Distillation

. With the pre-trained GNN

fθp¨q

obtained

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ContrastiveGraphFew-ShotLearningChunhuiZhang1,HongfuLiu1,JundongLi2,YanfangYe3,ChuxuZhang11BrandeisUniversity,USA2UniversityofVirginia,USA3UniversityofNotreDame,USA{chunhuizhang,hongfuliu,chuxuzhang}@brandeis.edujundong@virginia.edu,yye7@nd.eduAbstractPrevailingdeepgraphlearningmodelsoftensufferfrom...

展开>> 收起<<

Contrastive Graph Few-Shot Learning Chunhui Zhang1 Hongfu Liu1 Jundong Li2 Yanfang Ye3 Chuxu Zhang1 1Brandeis University USA.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Contrastive Graph Few-Shot Learning Chunhui Zhang1 Hongfu Liu1 Jundong Li2 Yanfang Ye3 Chuxu Zhang1 1Brandeis University USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: