Graph Few-shot Learning with Task-speciﬁc Structures Song Wang

2025-05-06 0 0 605.52KB 20 页 10玖币

侵权投诉

Graph Few-shot Learning with

Task-speciﬁc Structures

Song Wang

University of Virginia

sw3wv@virginia.edu

Chen Chen

University of Virginia

zrh6du@virginia.edu

Jundong Li

University of Virginia

jundong@virginia.edu

Abstract

Graph few-shot learning is of great importance among various graph learning

tasks. Under the few-shot scenario, models are often required to conduct classiﬁ-

cation given limited labeled samples. Existing graph few-shot learning methods

typically leverage Graph Neural Networks (GNNs) and perform classiﬁcation

across a series of meta-tasks. Nevertheless, these methods generally rely on the

original graph (i.e., the graph that the meta-task is sampled from) to learn node

representations. Consequently, the graph structure used in each meta-task is iden-

tical. Since the class sets are different across meta-tasks, node representations

should be learned in a task-speciﬁc manner to promote classiﬁcation performance.

Therefore, to adaptively learn node representations across meta-tasks, we propose a

novel framework that learns a task-speciﬁc structure for each meta-task. To handle

the variety of nodes across meta-tasks, we extract relevant nodes and learn task-

speciﬁc structures based on node inﬂuence and mutual information. In this way,

we can learn node representations with the task-speciﬁc structure tailored for each

meta-task. We further conduct extensive experiments on ﬁve node classiﬁcation

datasets under both single- and multiple-graph settings to validate the superior-

ity of our framework over the state-of-the-art baselines. Our code is provided at

https://github.com/SongW-SW/GLITTER.

1 Introduction

Nowadays, graph-structured data is widely used in various real-world applications, such as molec-

ular property prediction [

], knowledge graph completion [

], and recommender systems [

More recently, Graph Neural Networks (GNNs) [

] have been proposed to learn node

representations via information aggregation based on the given graph structure. Generally, these

methods adopt a semi-supervised learning strategy to train models on a graph with abundant la-

beled samples [

]. However, in practice, it is often difﬁcult to obtain sufﬁcient labeled samples

due to the laborious labeling process [

]. Hence, there is a surge of research interests aiming at

performing graph learning with limited labeled samples as references, known as graph few-shot

learning [9, 21, 48].

Among various types of graph few-shot learning tasks, few-shot node classiﬁcation is essential in

real-world scenarios, including protein classiﬁcation [

] and document categorization [

]. To deal

with the label deﬁciency issue in node classiﬁcation, many recent works [

] incorporate

existing few-shot learning frameworks from other domains [

] into GNNs. Speciﬁcally, few-shot

classiﬁcation during evaluation is conducted on a speciﬁc number of meta-test tasks. Each meta-test

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.12130v1 [cs.LG] 21 Oct 2022

task contains a small number of labeled nodes as references (i.e., support nodes) and several unlabeled

nodes for classiﬁcation (i.e., query nodes). To extract transferable knowledge from classes with

abundant labeled nodes, the model is trained on a series of meta-training tasks that are sampled from

these disjoint classes but share similar structures with meta-test tasks. We refer to meta-training

and meta-test tasks as meta-tasks. Note that few-shot node classiﬁcation can be conducted on a

single graph (e.g., a citation network for author classiﬁcation) or across multiple graphs (e.g., a set

of protein-protein interaction networks for protein property predictions). Here each meta-task is

sampled from one single graph in both single-graph and multiple-graph settings, since each meta-test

task is conducted on one graph. Despite the success of recent studies on few-shot node classiﬁcation,

they mainly learn node representations from the original graph (i.e., the graph that the meta-task

is sampled from). However, the original graph can be redundant and uninformative for a speciﬁc

meta-task as each meta-task only contains a small number of nodes. As a result, the learned node

representations are not tailored for the meta-task (i.e., task-speciﬁc), which increases the difﬁculties

of few-shot learning. Thus, instead of leveraging the same original graph for all meta-tasks, it is

crucial to learn a task-speciﬁc structure for each meta-task.

Intuitively, the task-speciﬁc structure should contain nodes in the meta-task along with other relevant

nodes from the original graph. Moreover, the edge weights among these nodes should also be

learned in a task-speciﬁc manner. Nevertheless, it remains a daunting problem to learn a task-speciﬁc

structure for each meta-task due to two challenges: (1) It is non-trivial to select relevant nodes for the

task-speciﬁc structure. Particularly, this structure should contain nodes that are maximally relevant to

the support nodes in the meta-task. Nevertheless, since each meta-task consists of multiple support

nodes, it is difﬁcult to select nodes that are relevant to the entire support node set. (2) It is challenging

to learn edge weights for the task-speciﬁc structure. The task-speciﬁc structure should maintain

strong correlations for nodes in the same class, so that the learned node representations will be similar.

Nonetheless, the support nodes in the same class could be distributed across the original graph, which

increases the difﬁculty of enhancing such correlations for the task-speciﬁc structure learning.

To address these challenges, we propose a novel

raph few-shot

earning framework w

ask-

ciﬁc st

uctures - GLITTER, which aims at effectively learning a task-speciﬁc structure for each

meta-task in graph few-shot learning. Speciﬁcally, to reduce the irrelevant information from the

original graph, we propose to select nodes via two strategies according to their overall node inﬂuence

on support nodes in each meta-task. Moreover, we learn edge weights in the task-speciﬁc structure

based on node inﬂuence within classes and mutual information between query nodes and labels. With

the learned task-speciﬁc structures, our framework can effectively learn node representations that are

tailored for each meta-task. In summary, the main contributions of our framework are as follows:

(1) We selectively extract relevant nodes from the original graph and learn a task-speciﬁc structure

for each meta-task based on node inﬂuence and mutual information. (2) The proposed framework

can handle graph few-shot learning under both single-graph and multiple-graph settings. Differently,

most existing works only focus on the single-graph setting. (3) We conduct extensive experiments on

ﬁve real-world datasets under single-graph and multiple-graph settings. The superior performance

over the state-of-the-art methods further validates the effectiveness of our framework.

2 Problem Formulation

Denote the set of input graphs as

G={G1, . . . , GM}

(for the single-graph setting,

|G| = 1

), where

is the number of graphs. Here each graph can be represented as

G= (V,E,X)

, where

is the

set of nodes,

is the set of edges, and

X∈R|V|×d

is a feature matrix with the

-th row vector (

dimensional) representing the attribute of the

-th node. Under the prevalent meta-learning framework,

the training process is conducted on a series of meta-training tasks

{T1,...,TT}

, where

is the

number of meta-training tasks. More speciﬁcally,

Ti={Si,Qi}

, where

is the support set of

and

consists of

labeled nodes for each of

classes (i.e.,

|Si|=NK

). The corresponding label set of

, where

|Yi|=N

is sampled from the whole training label set

Ytrain

. With

as references,

the model is required to classify nodes in the query set

, which contains

unlabeled samples.

Note that the actual labels of query nodes are from

. After training, the model will be evaluated on

a series of meta-test tasks, which follow a similar setting as meta-training tasks, except that the label

set in each meta-test task is sampled from a distinct label set

Ytest

(i.e.,

Ytest ∩ Ytrain =∅

). It is

noteworthy that under the multiple-graph setting, meta-training and meta-test tasks can be sampled

from different graphs, while each meta-task is sampled from one single graph.

Scenario 1:

Multiple graphs

Task-specific

Structure

Support

Relevant

Node Influence

Mutual Information

GNN 𝜃

Query

Query Emb.

Classifier

Loss

Unlabeled

Optimize

Classifier

Loss

Meta-optimize

Optimize

Support Emb.

Extracted Nodes

Scenario 2:

A single graph

Figure 1: The overall framework of GLITTER. We ﬁrst extract relevant nodes based on two strategies:

local sampling and common sampling. Then we learn the task-speciﬁc structure with the extracted

nodes along with support and query nodes based on node inﬂuence and mutual information. The

learned structure will be used to generate node representations with a GNN. We further classify

support nodes with a classiﬁer, and the classiﬁcation loss is used to optimize the GNN and the

classiﬁer. Finally, we meta-optimize the GNN and the classiﬁer with the loss on query nodes.

3 Methodology

In this section, we introduce our framework that explores task-speciﬁc structures for different meta-

tasks in graph few-shot learning. The detailed framework is illustrated in Figure 1. We ﬁrst elaborate

on the process of selecting relevant nodes based on node inﬂuence to construct the task-speciﬁc

structure in each meta-task. Then we provide the detailed process of learning task-speciﬁc edge

weights via maximizing node inﬂuence within classes and mutual information between query nodes

and labels. Finally, we describe the meta-learning strategy used to optimize model parameters.

3.1 Selecting Nodes for Task-speciﬁc Structures

Given a meta-task

T={S,Q}

, we ﬁrst aim to extract relevant nodes that are helpful for

and

construct a task-speciﬁc structure

based on these nodes. In this way, we can reduce the impact

of redundant information on the original graph and focus on meta-task

. Nevertheless, it remains

difﬁcult to determine which nodes are useful for classiﬁcation in

. The reason is that the support

nodes in

can be distributed across the original graph, which increases the difﬁculty of selecting

nodes that are relevant to all these support nodes. Thus, we propose to leverage the concept of

node inﬂuence to select relevant nodes. Here we ﬁrst deﬁne node inﬂuence based on [

] as

follows:

Deﬁnition 1

(Node Inﬂuence)

The node inﬂuence from node

to node

is deﬁned as

I(vi, vj) =

k∂hi/∂hjk

, where

and

are the output representations of

and

in a GNN, respectively.

∂hi/∂hjis a Jacobian matrix, and the norm can be any speciﬁc subordinate norm.

According to Deﬁnition 1, large node inﬂuence denotes that the representation of a node can be easily

impacted by another node, thus rendering stronger correlations. Intuitively, we need to incorporate

more nodes with large inﬂuence on the support nodes into

. In this way,

can maintain the

most crucial information that is useful for classiﬁcation based on support nodes. To effectively select

nodes with larger inﬂuence on the support nodes, we consider important factors that affect node

inﬂuence. The following theorem provides a universal pattern for node inﬂuence on support nodes:

Theorem 3.1.

Consider the node inﬂuence from node

to the

-th class (i.e.,

) in a meta-task

. Denote the geometric mean of the node inﬂuence values to all support nodes in

ICi(vk) =

qQK

j=1 I(vk, si,j )

, where

si,j

is the

-th support node in

. Assume the node degrees are randomly

distributed with the mean value as

. Then,

E(log ICi(vk)) ≥ − log ¯

d·PK

j=1 SPD(vk, si,j )/K

, where

SPD(vk, si,j )denotes the shortest path distance between vkand si,j .

The proof is provided in Appendix A. Theorem 3.1 indicates that the lower bound of the node

inﬂuence on a speciﬁc class is measured by its shortest path distances to all support nodes in this

class. Therefore, to effectively select nodes with large inﬂuence on a speciﬁc class, we can choose

nodes with small average shortest path distances to support nodes of this class. Based on this theorem,

we propose two strategies, namely local sampling and common sampling, to select nodes for the

task-speciﬁc structure

. In particular, we combine the selected nodes with support and query

nodes in the meta-task (i.e.,

and

) to obtain the ﬁnal node set

VT=Vl∪ Vc∪ S ∪ Q

. Here

and

are the node sets extracted based on local sampling and common sampling, respectively.

and

are the support set and the query set of

, respectively. Then we introduce the two strategies in detail.

•Local Sampling.

In this strategy, we extract the local neighbor nodes of support nodes within a

speciﬁc distance (i.e., neighborhood size). Intuitively, the neighbor nodes can maintain a small

shortest path distance to a speciﬁc support node. Therefore, by combining neighbor nodes of

all support nodes in a class, we can obtain nodes with considerable node inﬂuence on this class

without calculating the shortest path distances. Speciﬁcally, the extracted node set is denoted as

Vl=∪vi∈S Nl(vi)

, where

Nl(vi) = {u|d(u, vi)≤h}

, and

is the pre-deﬁned neighborhood size.

•Common Sampling.

In this strategy, we select nodes that maintain a small average distance to

all nodes in the same class. In this way, the node inﬂuence on an entire class can be considered.

Speciﬁcally, for each of

classes in

, we extract nodes with the smallest average distances to

nodes in this support class. The overall extracted node set Vccan be presented as follows:

Vc=∪N

i=1 argmin

V0⊂V,|V0|=CX

v∈V0

j=1

d(v, si,j ),(1)

where

si,j

is the

-th node of the

-th class in

. Here we extract

nodes with the smallest sum of

shortest path distances to nodes in each class. Then we aggregate these nodes into the ﬁnal node set

. In this way, we can select nodes with large inﬂuence on an entire class. As a result, the selected

nodes will bear more crucial information for classifying a speciﬁc class. Note that since there are

only Nclasses in T, the maximum size of Vcis NC, i.e., |Vc| ≤ NC.

Edge Weight Functions.

With the extracted node set

, we intend to learn task-speciﬁc edge

weights for

. Intuitively, although the original structural information is crucial for classiﬁcation, it

can also be redundant for meta-task

. Therefore, we propose to construct the edges based on both

node representations and the shortest path distance between two nodes. In this way, the model will

learn to maintain and learn beneﬁcial edges for

. Particularly, the edge weight starting from node

to node

is denoted as

ai,j = (ar

i,j +as

i,j )/2

, where

i,j

and

i,j

are learned by two functions that

utilize node representations and structures as input, respectively.

•Node representations as input.

i,j = exp −



φ(W1xi)

kφ(W1xi)k2

−φ(W2xj)

kφ(W2xj)k2



2,(2)

where

is a non-linear activation function, and

denotes the input feature vector of node

W1∈Rda×d

and

W2∈Rda×d

are learnable parameters, where

is the dimension size of

W1xi

and

W2xj

i,j

is the edge weight between node

and

learned from node representations. Such

a design naturally satisﬁes that

i,j ∈(0,1]

. Moreover, by introducing two weight matrices

and W2, the learned task-speciﬁc structured will be a directed graph to encode more information.

•Structures as input.

i,j =Sigmoid (ψ(SPD(vi, vj))) ,(3)

where

is a learned function that outputs a scalar while utilizing the shortest path distance between

and

(i.e.,

SPD(vi, vj)

) on the original graph. In this way, we can preserve the structural

information on the original graph by mapping the distance to a scalar. For example, if

is learned

as a decreasing function regarding the input

SPD(vi, vj)

, the obtained task-speciﬁc structure will

result in stronger correlations among nodes that are close to each other on the original graph.

3.2 Learning Task-speciﬁc Structures from Labeled Nodes

With the proposed functions for edge weights, we still need to optimize these weights to obtain the

task-speciﬁc structure for

. In particular, we can leverage the label information inside labeled nodes

(i.e., support nodes in each meta-task). Intuitively, the task-speciﬁc structure should ensure that

the learned representations of nodes in the same class are similar, so that the classiﬁcation of this

class will be easier. According to Deﬁnition 1, larger node inﬂuence represents stronger correlations

between nodes, which will increase the similarity between the learned representations. Therefore,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GraphFew-shotLearningwithTask-specicStructuresSongWangUniversityofVirginiasw3wv@virginia.eduChenChenUniversityofVirginiazrh6du@virginia.eduJundongLiUniversityofVirginiajundong@virginia.eduAbstractGraphfew-shotlearningisofgreatimportanceamongvariousgraphlearningtasks.Underthefew-shotscenario,modelsa...

展开>> 收起<<

Graph Few-shot Learning with Task-speciﬁc Structures Song Wang.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Graph Few-shot Learning with Task-speciﬁc Structures Song Wang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: