Graph Few-shot Learning with Task-specific Structures Song Wang

2025-05-06 0 0 605.52KB 20 页 10玖币
侵权投诉
Graph Few-shot Learning with
Task-specific Structures
Song Wang
University of Virginia
sw3wv@virginia.edu
Chen Chen
University of Virginia
zrh6du@virginia.edu
Jundong Li
University of Virginia
jundong@virginia.edu
Abstract
Graph few-shot learning is of great importance among various graph learning
tasks. Under the few-shot scenario, models are often required to conduct classifi-
cation given limited labeled samples. Existing graph few-shot learning methods
typically leverage Graph Neural Networks (GNNs) and perform classification
across a series of meta-tasks. Nevertheless, these methods generally rely on the
original graph (i.e., the graph that the meta-task is sampled from) to learn node
representations. Consequently, the graph structure used in each meta-task is iden-
tical. Since the class sets are different across meta-tasks, node representations
should be learned in a task-specific manner to promote classification performance.
Therefore, to adaptively learn node representations across meta-tasks, we propose a
novel framework that learns a task-specific structure for each meta-task. To handle
the variety of nodes across meta-tasks, we extract relevant nodes and learn task-
specific structures based on node influence and mutual information. In this way,
we can learn node representations with the task-specific structure tailored for each
meta-task. We further conduct extensive experiments on five node classification
datasets under both single- and multiple-graph settings to validate the superior-
ity of our framework over the state-of-the-art baselines. Our code is provided at
https://github.com/SongW-SW/GLITTER.
1 Introduction
Nowadays, graph-structured data is widely used in various real-world applications, such as molec-
ular property prediction [
18
], knowledge graph completion [
47
], and recommender systems [
45
].
More recently, Graph Neural Networks (GNNs) [
15
,
32
,
43
,
44
] have been proposed to learn node
representations via information aggregation based on the given graph structure. Generally, these
methods adopt a semi-supervised learning strategy to train models on a graph with abundant la-
beled samples [
19
]. However, in practice, it is often difficult to obtain sufficient labeled samples
due to the laborious labeling process [
10
]. Hence, there is a surge of research interests aiming at
performing graph learning with limited labeled samples as references, known as graph few-shot
learning [9, 21, 48].
Among various types of graph few-shot learning tasks, few-shot node classification is essential in
real-world scenarios, including protein classification [
3
] and document categorization [
30
]. To deal
with the label deficiency issue in node classification, many recent works [
6
,
10
,
18
,
21
] incorporate
existing few-shot learning frameworks from other domains [
27
,
33
] into GNNs. Specifically, few-shot
classification during evaluation is conducted on a specific number of meta-test tasks. Each meta-test
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.12130v1 [cs.LG] 21 Oct 2022
task contains a small number of labeled nodes as references (i.e., support nodes) and several unlabeled
nodes for classification (i.e., query nodes). To extract transferable knowledge from classes with
abundant labeled nodes, the model is trained on a series of meta-training tasks that are sampled from
these disjoint classes but share similar structures with meta-test tasks. We refer to meta-training
and meta-test tasks as meta-tasks. Note that few-shot node classification can be conducted on a
single graph (e.g., a citation network for author classification) or across multiple graphs (e.g., a set
of protein-protein interaction networks for protein property predictions). Here each meta-task is
sampled from one single graph in both single-graph and multiple-graph settings, since each meta-test
task is conducted on one graph. Despite the success of recent studies on few-shot node classification,
they mainly learn node representations from the original graph (i.e., the graph that the meta-task
is sampled from). However, the original graph can be redundant and uninformative for a specific
meta-task as each meta-task only contains a small number of nodes. As a result, the learned node
representations are not tailored for the meta-task (i.e., task-specific), which increases the difficulties
of few-shot learning. Thus, instead of leveraging the same original graph for all meta-tasks, it is
crucial to learn a task-specific structure for each meta-task.
Intuitively, the task-specific structure should contain nodes in the meta-task along with other relevant
nodes from the original graph. Moreover, the edge weights among these nodes should also be
learned in a task-specific manner. Nevertheless, it remains a daunting problem to learn a task-specific
structure for each meta-task due to two challenges: (1) It is non-trivial to select relevant nodes for the
task-specific structure. Particularly, this structure should contain nodes that are maximally relevant to
the support nodes in the meta-task. Nevertheless, since each meta-task consists of multiple support
nodes, it is difficult to select nodes that are relevant to the entire support node set. (2) It is challenging
to learn edge weights for the task-specific structure. The task-specific structure should maintain
strong correlations for nodes in the same class, so that the learned node representations will be similar.
Nonetheless, the support nodes in the same class could be distributed across the original graph, which
increases the difficulty of enhancing such correlations for the task-specific structure learning.
To address these challenges, we propose a novel
G
raph few-shot
L
earning framework w
IT
h
T
ask-
sp
E
cific st
R
uctures - GLITTER, which aims at effectively learning a task-specific structure for each
meta-task in graph few-shot learning. Specifically, to reduce the irrelevant information from the
original graph, we propose to select nodes via two strategies according to their overall node influence
on support nodes in each meta-task. Moreover, we learn edge weights in the task-specific structure
based on node influence within classes and mutual information between query nodes and labels. With
the learned task-specific structures, our framework can effectively learn node representations that are
tailored for each meta-task. In summary, the main contributions of our framework are as follows:
(1) We selectively extract relevant nodes from the original graph and learn a task-specific structure
for each meta-task based on node influence and mutual information. (2) The proposed framework
can handle graph few-shot learning under both single-graph and multiple-graph settings. Differently,
most existing works only focus on the single-graph setting. (3) We conduct extensive experiments on
five real-world datasets under single-graph and multiple-graph settings. The superior performance
over the state-of-the-art methods further validates the effectiveness of our framework.
2 Problem Formulation
Denote the set of input graphs as
G={G1, . . . , GM}
(for the single-graph setting,
|G| = 1
), where
M
is the number of graphs. Here each graph can be represented as
G= (V,E,X)
, where
V
is the
set of nodes,
E
is the set of edges, and
XR|Vd
is a feature matrix with the
i
-th row vector (
d
-
dimensional) representing the attribute of the
i
-th node. Under the prevalent meta-learning framework,
the training process is conducted on a series of meta-training tasks
{T1,...,TT}
, where
T
is the
number of meta-training tasks. More specifically,
Ti={Si,Qi}
, where
Si
is the support set of
Ti
and
consists of
K
labeled nodes for each of
N
classes (i.e.,
|Si|=NK
). The corresponding label set of
Ti
is
Yi
, where
|Yi|=N
.
Yi
is sampled from the whole training label set
Ytrain
. With
Si
as references,
the model is required to classify nodes in the query set
Qi
, which contains
Q
unlabeled samples.
Note that the actual labels of query nodes are from
Yi
. After training, the model will be evaluated on
a series of meta-test tasks, which follow a similar setting as meta-training tasks, except that the label
set in each meta-test task is sampled from a distinct label set
Ytest
(i.e.,
Ytest ∩ Ytrain =
). It is
noteworthy that under the multiple-graph setting, meta-training and meta-test tasks can be sampled
from different graphs, while each meta-task is sampled from one single graph.
2
Scenario 1:
Multiple graphs
Task-specific
Structure
Support
Relevant
Node Influence
Mutual Information
GNN 𝜃
Query
Query Emb.
Classifier
Loss
Unlabeled
Optimize
Optimize
Classifier
Loss
Meta-optimize
Optimize
Support Emb.
Extracted Nodes
?
?
Scenario 2:
A single graph
?
?
?
Figure 1: The overall framework of GLITTER. We first extract relevant nodes based on two strategies:
local sampling and common sampling. Then we learn the task-specific structure with the extracted
nodes along with support and query nodes based on node influence and mutual information. The
learned structure will be used to generate node representations with a GNN. We further classify
support nodes with a classifier, and the classification loss is used to optimize the GNN and the
classifier. Finally, we meta-optimize the GNN and the classifier with the loss on query nodes.
3 Methodology
In this section, we introduce our framework that explores task-specific structures for different meta-
tasks in graph few-shot learning. The detailed framework is illustrated in Figure 1. We first elaborate
on the process of selecting relevant nodes based on node influence to construct the task-specific
structure in each meta-task. Then we provide the detailed process of learning task-specific edge
weights via maximizing node influence within classes and mutual information between query nodes
and labels. Finally, we describe the meta-learning strategy used to optimize model parameters.
3.1 Selecting Nodes for Task-specific Structures
Given a meta-task
T={S,Q}
, we first aim to extract relevant nodes that are helpful for
T
and
construct a task-specific structure
GT
based on these nodes. In this way, we can reduce the impact
of redundant information on the original graph and focus on meta-task
T
. Nevertheless, it remains
difficult to determine which nodes are useful for classification in
T
. The reason is that the support
nodes in
T
can be distributed across the original graph, which increases the difficulty of selecting
nodes that are relevant to all these support nodes. Thus, we propose to leverage the concept of
node influence to select relevant nodes. Here we first define node influence based on [
18
,
34
,
44
] as
follows:
Definition 1
(Node Influence)
.
The node influence from node
vi
to node
vj
is defined as
I(vi, vj) =
khi/∂hjk
, where
hi
and
hj
are the output representations of
vi
and
vj
in a GNN, respectively.
hi/∂hjis a Jacobian matrix, and the norm can be any specific subordinate norm.
According to Definition 1, large node influence denotes that the representation of a node can be easily
impacted by another node, thus rendering stronger correlations. Intuitively, we need to incorporate
more nodes with large influence on the support nodes into
GT
. In this way,
GT
can maintain the
most crucial information that is useful for classification based on support nodes. To effectively select
nodes with larger influence on the support nodes, we consider important factors that affect node
influence. The following theorem provides a universal pattern for node influence on support nodes:
Theorem 3.1.
Consider the node influence from node
vk
to the
i
-th class (i.e.,
Ci
) in a meta-task
T
. Denote the geometric mean of the node influence values to all support nodes in
Ci
as
ICi(vk) =
K
qQK
j=1 I(vk, si,j )
, where
si,j
is the
j
-th support node in
Ci
. Assume the node degrees are randomly
distributed with the mean value as
¯
d
. Then,
E(log ICi(vk)) log ¯
d·PK
j=1 SPD(vk, si,j )/K
, where
SPD(vk, si,j )denotes the shortest path distance between vkand si,j .
The proof is provided in Appendix A. Theorem 3.1 indicates that the lower bound of the node
influence on a specific class is measured by its shortest path distances to all support nodes in this
class. Therefore, to effectively select nodes with large influence on a specific class, we can choose
nodes with small average shortest path distances to support nodes of this class. Based on this theorem,
we propose two strategies, namely local sampling and common sampling, to select nodes for the
3
task-specific structure
GT
. In particular, we combine the selected nodes with support and query
nodes in the meta-task (i.e.,
S
and
Q
) to obtain the final node set
VT=Vl∪ Vc∪ S ∪ Q
. Here
Vl
and
Vc
are the node sets extracted based on local sampling and common sampling, respectively.
S
and
Q
are the support set and the query set of
T
, respectively. Then we introduce the two strategies in detail.
Local Sampling.
In this strategy, we extract the local neighbor nodes of support nodes within a
specific distance (i.e., neighborhood size). Intuitively, the neighbor nodes can maintain a small
shortest path distance to a specific support node. Therefore, by combining neighbor nodes of
all support nodes in a class, we can obtain nodes with considerable node influence on this class
without calculating the shortest path distances. Specifically, the extracted node set is denoted as
Vl=vi∈S Nl(vi)
, where
Nl(vi) = {u|d(u, vi)h}
, and
h
is the pre-defined neighborhood size.
Common Sampling.
In this strategy, we select nodes that maintain a small average distance to
all nodes in the same class. In this way, the node influence on an entire class can be considered.
Specifically, for each of
N
classes in
T
, we extract nodes with the smallest average distances to
nodes in this support class. The overall extracted node set Vccan be presented as follows:
Vc=N
i=1 argmin
V0⊂V,|V0|=CX
v∈V0
K
X
j=1
d(v, si,j ),(1)
where
si,j
is the
j
-th node of the
i
-th class in
T
. Here we extract
C
nodes with the smallest sum of
shortest path distances to nodes in each class. Then we aggregate these nodes into the final node set
Vc
. In this way, we can select nodes with large influence on an entire class. As a result, the selected
nodes will bear more crucial information for classifying a specific class. Note that since there are
only Nclasses in T, the maximum size of Vcis NC, i.e., |Vc| ≤ NC.
Edge Weight Functions.
With the extracted node set
VT
, we intend to learn task-specific edge
weights for
GT
. Intuitively, although the original structural information is crucial for classification, it
can also be redundant for meta-task
T
. Therefore, we propose to construct the edges based on both
node representations and the shortest path distance between two nodes. In this way, the model will
learn to maintain and learn beneficial edges for
T
. Particularly, the edge weight starting from node
vi
to node
vj
is denoted as
ai,j = (ar
i,j +as
i,j )/2
, where
ar
i,j
and
as
i,j
are learned by two functions that
utilize node representations and structures as input, respectively.
Node representations as input.
ar
i,j = exp
φ(W1xi)
kφ(W1xi)k2
φ(W2xj)
kφ(W2xj)k2
2,(2)
where
φ
is a non-linear activation function, and
xi
denotes the input feature vector of node
vi
.
W1Rda×d
and
W2Rda×d
are learnable parameters, where
da
is the dimension size of
W1xi
and
W2xj
.
ar
i,j
is the edge weight between node
vi
and
vj
learned from node representations. Such
a design naturally satisfies that
ar
i,j (0,1]
. Moreover, by introducing two weight matrices
W1
and W2, the learned task-specific structured will be a directed graph to encode more information.
Structures as input.
as
i,j =Sigmoid (ψ(SPD(vi, vj))) ,(3)
where
ψ
is a learned function that outputs a scalar while utilizing the shortest path distance between
vi
and
vj
(i.e.,
SPD(vi, vj)
) on the original graph. In this way, we can preserve the structural
information on the original graph by mapping the distance to a scalar. For example, if
ψ
is learned
as a decreasing function regarding the input
SPD(vi, vj)
, the obtained task-specific structure will
result in stronger correlations among nodes that are close to each other on the original graph.
3.2 Learning Task-specific Structures from Labeled Nodes
With the proposed functions for edge weights, we still need to optimize these weights to obtain the
task-specific structure for
T
. In particular, we can leverage the label information inside labeled nodes
(i.e., support nodes in each meta-task). Intuitively, the task-specific structure should ensure that
the learned representations of nodes in the same class are similar, so that the classification of this
class will be easier. According to Definition 1, larger node influence represents stronger correlations
between nodes, which will increase the similarity between the learned representations. Therefore,
4
摘要:

GraphFew-shotLearningwithTask-specicStructuresSongWangUniversityofVirginiasw3wv@virginia.eduChenChenUniversityofVirginiazrh6du@virginia.eduJundongLiUniversityofVirginiajundong@virginia.eduAbstractGraphfew-shotlearningisofgreatimportanceamongvariousgraphlearningtasks.Underthefew-shotscenario,modelsa...

展开>> 收起<<
Graph Few-shot Learning with Task-specific Structures Song Wang.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:605.52KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注