TUNEUP A Simple Improved Training Strategy for Graph Neural Networks Weihua Hu1 Kaidi Cao1 Kexin Huang1 Edward W Huang2 Karthik Subbian2 Kenji Kawaguchi3 Jure Leskovec1.

2025-05-06 0 0 1.41MB 18 页 10玖币
侵权投诉
TUNEUP: A Simple Improved Training Strategy for Graph Neural Networks
Weihua Hu1, Kaidi Cao1, Kexin Huang1, Edward W Huang2,
Karthik Subbian2, Kenji Kawaguchi3, Jure Leskovec1.
1Stanford University, 2Amazon, 3National University of Singapore
{weihuahu,kaidicao,kexinh,jure}@cs.stanford.edu, {ksubbian,ewhuang}@amazon.com, kenji@comp.nus.edu.sg
Abstract
Despite recent advances in Graph Neural Networks (GNNs),
their training strategies remain largely under-explored. The
conventional training strategy learns over all nodes in the orig-
inal graph(s) equally, which can be sub-optimal as certain
nodes are often more difficult to learn than others. Here we
present TUNEUP, a simple curriculum-based training strategy
for improving the predictive performance of GNNs. TUNEUP
trains a GNN in two stages. In the first stage, TUNEUPapplies
conventional training to obtain a strong base GNN. The base
GNN tends to perform well on head nodes (nodes with large
degrees) but less so on tail nodes (nodes with small degrees).
Therefore, the second stage of TUNEUPfocuses on improv-
ing prediction on the difficult tail nodes by further training
the base GNN on synthetically generated tail node data. We
theoretically analyze TUNEUPand show it provably improves
generalization performance on tail nodes. TUNEUPis simple
to implement and applicable to a broad range of GNN architec-
tures and prediction tasks. Extensive evaluation of TUNEUP
on five diverse GNN architectures, three types of prediction
tasks, and both transductive and inductive settings shows that
TUNEUPsignificantly improves the performance of the base
GNN on tail nodes, while often even improving the perfor-
mance on head nodes. Altogether, TUNEUPproduces up to
57.6% and 92.2% relative predictive performance improve-
ment in the transductive and the challenging inductive settings,
respectively.
Introduction
Graph Neural Networks (GNNs) are one of the most success-
ful and widely used paradigms for representation learning on
graphs, achieving state-of-the-art performance on a variety
of prediction tasks, such as semi-supervised node classifica-
tion (Kipf and Welling 2017; Velickovic et al. 2018), link
prediction (Hamilton, Ying, and Leskovec 2017; Kipf and
Welling 2016), and recommender systems (Ying et al. 2018;
He et al. 2020). There has been a surge of work on improv-
ing GNN model architectures (Velickovic et al. 2018; Xu
et al. 2019, 2018; Shi et al. 2020; Klicpera, Bojchevski, and
G
¨
unnemann 2019; Wu et al. 2019; Zhao and Akoglu 2019; Li
et al. 2019; Chen et al. 2020; Li et al. 2021) and task-specific
losses (Kipf and Welling 2016; Rendle et al. 2012; Verma
et al. 2021; Huang et al. 2021b). Despite all these advances,
strategies for training a GNN on a given supervised loss re-
main largely under-explored. Existing work has focused on
minimizing the given loss averaged over nodes in the original
graph(s), which neglects the fact that some nodes are more
difficult to learn than others.
Here we present TUNEUP, a simple improved training
strategy for improving the predictive performance of GNNs.
The key motivation behind TUNEUPis that GNNs tend to
under-perform on tail nodes, i.e., nodes with small (e.g., 0–
5) node degrees, due to the scarce neighbors to aggrega-
tion features from (Liu, Nguyen, and Fang 2021). Improv-
ing GNN performance on tail nodes is important since they
are prevalent in real-world scale-free graphs (Clauset, Shal-
izi, and Newman 2009) as well as newly arriving cold-start
nodes (Lika, Kolomvatsos, and Hadjiefthymiades 2014).
The key idea of TUNEUPis to adopt curriculum-based
training (Bengio et al. 2009), where it first trains a GNN to
perform well on relatively-easy head nodes. It then proceeds
to further train the GNN to perform well on the more difficult
tail nodes by minimizing the loss over supervised tail node
data that is synthetically generated via pseudo-labeling and
dropping edges (Rong et al. 2020).
We theoretically analyze TUNEUPand show that it prov-
ably improves the generalization performance on tail nodes
by utilizing information from head nodes. Our theory also
justifies how TUNEUPgenerates supervised synthetic tail
nodes in the second stage. Our theory suggests that both
pseudo-labeling and dropping edges are crucial for improved
generalization.
TUNEUPis simple to implement on top of the conven-
tional training pipeline of GNNs, as shown in Algorithm 1.
Thanks to its simplicity, TUNEUPcan be readily used with
a wide range of GNN models and supervised losses; hence,
applicable to many node and edge-level prediction tasks. This
is in contrast with recent methods for improving GNN per-
formance on tail nodes (Liu, Nguyen, and Fang 2021; Zheng
et al. 2022; Zhang et al. 2022; Kang et al. 2022) as they
all require non-trivial modifications of both GNN architec-
tures and loss, making them harder to implement and not
applicable to diverse prediction tasks.
To demonstrate the effectiveness and broad applicability
of TUNEUP, we perform extensive experiments on a wide
range of settings. We consider five diverse GNN architec-
tures, three types of key prediction tasks (semi-supervised
node classification, link prediction, and recommender sys-
tems) with a total of six datasets, as well as both transductive
arXiv:2210.14843v2 [stat.ML] 26 Aug 2023
Semi-sup node classification Link prediction Recommender systems
Figure 1: Degree-specific predictive performance of the base GNN (trained with conventional training) and TUNEUPGNN in
the transductive setting. Between “Base” and “TUNEUP”, only the training strategy differs; the model architecture and the loss
function stay exactly the same. The
x
-axis represents the node degrees in the training graph, and the
y
-axis is the predictive
performance averaged over nodes with specific degrees. We see from the dotted blue curves that the base GNN tends to perform
poorly on tail nodes, i.e., nodes with small degrees. Our TUNEUP(denoted by the solid orange curves) gives more accurate
GNNs than conventional training (“Base”). TUNEUPimproves the predictive performance across almost all node degrees, but
most significantly on tail nodes.
(i.e., prediction on nodes seen during training) and inductive
(i.e., prediction on new nodes not seen during training) set-
tings. For the inductive setting, we additionally consider the
challenging cold-start scenario (i.e., limited edge connectiv-
ity from new nodes) by randomly removing certain portions
of edges from new nodes.
Across the settings, TUNEUPproduces consistent improve-
ment in the predictive performance of GNNs. In the transduc-
tive setting, TUNEUPsignificantly improves the performance
of base GNNs on tail nodes, while oftentimes even improv-
ing the performance on head nodes (see Figure 1). In the
inductive setting, TUNEUPespecially shines in the cold-start
prediction scenario, where new nodes are tail-like, producing
up to 92.2% relative improvement in the predictive perfor-
mance. Moreover, our TUNEUPsignificantly outperforms the
recent specialized methods for tail nodes (Liu, Nguyen, and
Fang 2021; Zheng et al. 2022; Zhang et al. 2022; Kang et al.
2022), while not requiring any modification to GNN architec-
tures nor losses. Overall, our work shows that even a simple
training strategy can yield a surprisingly large improvement
in the predictive performance of GNNs, pointing to a promis-
ing direction to investigate effective training strategies for
GNNs, beyond architectures and losses.
General Setup and Conventional Training
Here we introduce a general task setup for machine learning
on graphs and review the conventional training strategy of
GNNs.
General Setup
We are given a graph
G= (V, E)
, with a set of nodes
V
and
edges
E
, possibly associated with some features. A GNN
Fθ
,
parameterized by
θ
, takes the graph
G
as input and makes
prediction
b
Y
for the task of interest. The loss function
L
measures the discrepancy between the GNN’s prediction
b
Y
and the target supervision
Y
. When input node features are
available, GNN
Fθ
can make not only transductive predic-
tions, i.e., prediction over existing nodes
V
, but also inductive
predictions (Hamilton, Ying, and Leskovec 2017), i.e., pre-
diction over new nodes
Vnew
that are not yet present in
V
.
This is a general task setup that covers many representative
predictive tasks over graphs as special cases:
Semi-supervised node classification (Kipf and Welling
2017)
Graph G: A graph with input node features.
Supervison
Y
: Class labels of labeled nodes
Vlabeled V
.
GNN
Fθ
: A model that takes
G
as input and predicts class
probabilities over V.
Prediction b
Y: The GNN’s prediction over Vlabeled.
Loss L: Cross-entropy loss.
Link prediction (Kipf and Welling 2016)
Graph G: A graph with input node features.
Supervison
Y
: Whether node
sV
is linked to node
tVin G(positive) or not (negative).
GNN
Fθ
: A model that takes
G
as input and predicts the
score for a pair of nodes
(s, t)V×V
. Specifically, the
model generates embedding
zv
for each node in
vV
and uses an MLP over the Hadamard product between
zs
and
zt
to predict the score for the pair
(s, t)
(Grover and
Leskovec 2016).
Prediction b
Y: The GNN’s predicted scores over V×V.
Loss
L
: The Bayesian Personalized Ranking (BPR)
loss (Rendle et al. 2012), which encourages the predicted
score for the positive pair
(s, tpos)
to be higher than that
for the negative pair (s, tneg)for each source node sV.
Recommender systems (Wang et al. 2019) A recom-
mender system is link prediction between user nodes
Vuser
and item nodes Vitem.
Graph
G
: User-item bipartite graph without input node
features.1
1
We consider the feature-less setting because input node features
Algorithm 1: TUNEUP. Compared to the conventional train-
ing of a GNN (L2–5), TUNEUPintroduces a two-stage train-
ing process and only adds two components (L8 and L12) that
are straightforward to implement. Each parameter update of
TUNEUPis as efficient as the conventional GNN training.
Given: GNN
Fθ
, graph
G
, loss
L
, supervision
Y
, DropEdge ratio
α.
1: # First stage: Conventional training to obtain a base GNN.
2: while θnot converged do
3: Make prediction
b
Y=Fθ(G)
4:
Compute loss
L(
b
Y , Y )
, compute gradient
θL
, and update
parameter θ.
5: end while
6: # Set up for the second stage.
7: if task is semi-supervised node classification then
8:
Use
Fθ
to predict pseudo-labels on non-isolated, unlabeled
nodes. Add the pseudo-labels into Y.
9: end if
10:
# Second stage: Further training the base GNN with in-
creased tail supervision.
11: while θnot converged do
12:
Synthesize tail nodes, i.e., randomly drop
α
of edges:
GDropEdge
e
G.
13: Make prediction
b
Y=Fθ(
e
G).
14:
Compute loss
L(
b
Y , Y )
, compute gradient
θL
, and update
parameter θ.
15: end while
Supervison
Y
: Whether a user node
u
has interacted with
an item node vin G(positive) or not (negative).
GNN
Fθ
: A model that takes
G
as input and predicts the
score for a pair of nodes
(u, v)Vuser ×Vitem
. Follow-
ing Wang et al. (2019), GNN parameter
θ
contains the input
shallow embeddings in addition to the original message
passing GNN parameter. To produce the score for the pair
of nodes
(u, v)
, we generate the user and item embeddings,
zu
and
zv
, and take the inner product
z
uzv
to compute
the score (Wang et al. 2019).
Prediction
b
Y
: The GNN’s predicted scores over
Vuser ×
Vitem.
Loss L: The BPR loss (Rendle et al. 2012).
Conventional GNN Training
A conventional way to train a GNN (Kipf and Welling 2017)
is to minimize the loss
L(b
Y , Y )
via gradient descent, as
shown in L2–5 of Algorithm1. Extension to mini-batch train-
ing (Hamilton, Ying, and Leskovec 2017; Zeng et al. 2020)
is straightforward by sampling subgraph
G
in each parameter
update.
Issue with Conventional Training. Conventional training
implicitly assumes GNNs can learn over all nodes equally
well. In practice, some nodes, such as low-degree tail nodes,
are more difficult for GNNs to learn due to the scarce neigh-
borhood information. As a result, GNNs trained with conven-
tional training often give poor predictive performance on the
difficult tail nodes (Liu, Nguyen, and Fang 2021).
are not available in many public recommender system datasets, and
most existing works rely solely on edge connectivity to predict links.
TUNEUP: An Improved GNN Training
To resolve the issue, we present TUNEUP, a simple curricu-
lum learning strategy, to improve GNN performance, espe-
cially on the difficult tail nodes. At a high level, TUNEUP
first trains a GNN to perform well on the relatively easy head
nodes. Then, it further trains the GNN to also perform well
on the more difficult tail nodes.
Specifically, in the first stage (L2–5 in Algorithm 1),
TUNEUPuses conventional GNN training to obtain a strong
base GNN model. The base GNN model tends to perform
well on head nodes, but poorly on tail nodes. To remedy this
issue, in the second training stage, TUNEUPfuther trains the
base GNN on synthetic tail nodes (L7–L15 in Algorithm 1).
TUNEUPsynthesizes supervised tail node data in two steps,
detailed next: (1) synthesizing additional tail node inputs, and
(2) adding target supervision on the synthetic tail nodes.
Synthesizing tail node inputs In many real-world graphs,
head nodes start off as tail nodes, e.g., well-cited paper nodes
are not cited at the beginning in a paper citation network,
and warm users (users with many item interactions) start off
as cold-start users in recommender systems. Hence, our key
idea is to synthesize tail nodes by systematically removing
edges from head nodes. There can be different ways to re-
move edges. In this work, we simply adopt DropEdge (Rong
et al. 2020) to instantiate our idea. DropEdge drops a certain
portion (given by hyperparameter
α
) of edges randomly from
the original graph
G
to obtain
e
G
(L12 in Algorithm 1). The
resulting
e
G
contains more nodes with low degrees, i.e., tail
nodes, than the original graph
G
. Hence, the GNN sees more
(synthetic) tail nodes as input during training.
Adding supervision on the synthetic tail nodes After
synthesizing the tail node inputs, TUNEUPthen adds target
supervision (e.g., class labels for node classification, edges
for link prediction) on them so that a supervised loss can
be computed. Our key idea is to reuse the target labels on
the original head nodes for the synthesized tail nodes. The
rationale is that many prediction tasks involve target labels
that are inherent node properties that do not change with node
degree. For example, additional citations will not change a
paper’s subject area, and additional purchases will not change
a product’s category.
Concretely, for link prediction tasks, TUNEUPdirectly
reuses the original edges
E
in
G
(before dropping) for the
target supervision on the synthetic tail nodes. For semi-
supervised node classification, TUNEUPcan similarly reuse
the target labels of labeled nodes
Vlabeled
in
G
as the labels
for synthetic tail nodes in
e
G
. A critical challenge here is
that the number of labeled nodes
Vlabeled
is often small in the
semi-supervised setting, e.g., 1%–5% of all nodes
V
, limiting
the amount of target label supervision TUNEUPcan reuse.
To resolve this issue, TUNEUPapplies the base GNN
(obtained in the first training stage) over
G
to predict
pseudo-labels (Lee et al. 2013) over non-isolated nodes
in
Vunlabeled V\Vlabeled
.
2
TUNEUPthen includes the
2
Note that the pseudo-labels do not need to be ones directly
predicted by the base GNN. For example, one can apply C&S
pseudo-labels as supervision
Y
in the second stage (L8 in
Algorithm 1). This significantly increases the size of the su-
pervision
Y
,e.g., by a factor of
100 if only 1% of nodes
are labeled. While the pseudo-labels can be noisy, they are
“best guesses” made by the base GNN in the sense that they
are predicted using full graph information
G
as input. In the
second stage, TUNEUPtrains the GNN to maintain its “best
guesses” given sparser graph
e
G
as input, which encourages
the GNN to perform well on nodes whose neighbors are actu-
ally scarce. Note that this strategy is fundamentally different
from the classical pseudo-labeling method (Lee et al. 2013)
that trains a model without sparsifying the input graph. In
the following sections, we will see this both theoretically and
empirically.
Theoretical Analysis
To theoretically understand TUNEUPwith clean insights, we
consider node classification with binary labels for a part of
a graph with two extreme groups of nodes: ones with full
degrees and ones with zero degrees. Considering this part of a
graph as an example, we mathematically show how TUNEUP
uses the nodes with high degrees to improve the generaliza-
tion for the nodes with low degrees via the curriculum-based
training strategy.
We analyze the generalization gap between the test errors
of nodes with low degrees and the training errors of nodes
with high degrees. This type of generalization is non-standard
and does not necessarily happen unless we take advantage of
some additional mechanisms such as dropping edges. Define
d
to be the dimensionality of the feature space of the last
hidden layer of a GNN. Denote by
m
the size of a set of
labeled nodes used for training. Let
Q
be the average training
loss at the end of the first stage of TUNEUPcurriculum
learning.
We prove a theorem (Theorem 1), which shows the follow-
ing three statements:
(i)
First, consider TUNEUPwithout pseudo labeling (de-
noted by
M1
).
M1
helps reduce the test errors of nodes
with low degrees via utilizing the nodes with high de-
grees by dropping edges: i.e., the generalization bound in
our theorem decreases towards zero at the rate of qd
m.
(ii)
The full TUNEUP(denoted by
M2
) further reduces the
test errors of nodes with low degrees by utilizing pseudo-
labels: i.e., the rate of
qd
m
is replaced by the rate of
q1
m+Q
, where typically
Q= 0
as
Q
is explicitly min-
imized as a training objective. Thus, curriculum-based
training with pseudo-labels can remove the factor d.
(iii)
TUNEUPwithout DropEdge (denoted by
M3
), i.e., the
classical pseudo-labeling method, degrades the test errors
of nodes with low degrees by incurring additional error
term
τ > 0
that measures the difference between the
losses with and without edges. This is consistent with
the above intuition that generalizing from high-degree
post-processing (Huang et al. 2021a) to improve the quality of the
pseudo-labels, which we leave for future work.
training nodes to the low-degree test nodes requires some
relationship between ones with and without edges.
For each method
M∈ {Mi}3
i=1
, we define
∆(M)
to
be the generalization gap between the test errors of nodes
with low degrees and the training errors of nodes with high
degrees.
Theorem 1. For any
δ > 0
, with probability at least
1δ
,
the following holds for all M∈ {M1, M2, M3}:
∆(M)s1{M=M1}8(d1) ln(16em
δ) + 8 ln(16em
δ)
m
+1{M̸=M1}Q+1{M=M3}τ+G,
where G0as the graph size approaches infinity.
Proof.
A more detailed version of Theorem 1 is presented
along with the complete proof in Appendix.
Related Work
Methods for Tail Nodes
Recently, a surge of methods have been developed for improv-
ing the predictive performance of GNNs on tail nodes (Liu,
Nguyen, and Fang 2021; Zheng et al. 2022; Kang et al. 2022;
Zhang et al. 2022). These methods augment GNNs with
complicated tail-node-specific architectural components and
losses, while TUNEUPfocuses on the training strategy that
does not require any architectural nor loss modification.
Data augmentation for GNNs
The second stage of TUNEUPis data augmentation over
graphs, on which there has been abundant work (Zhao et al.
2021; Feng et al. 2020; Verma et al. 2021; Kong et al. 2020;
Liu et al. 2022; Ding et al. 2022). The most relevant one is
DropEdge (Rong et al. 2020), which was originally devel-
oped to overcome the over-smoothing issue of GNNs (Li,
Han, and Wu 2018) specific to semi-supervised node classi-
fication. Our work has a different motivation and expended
scope: We use DropEdge to synthesize tail node inputs and
consider a wider range of prediction tasks. Our theoretical
analysis also differs and focuses on generalization on tail
nodes. Methodologically, TUNEUPadditionally employs cur-
riculum learning and pseudo-labels, both of which are crucial
in improving GNN performance over the vanilla DropEdge.
Experiments
We evaluate the broad applicability of TUNEUPby consider-
ing five GNN models and testing them on the three prediction
tasks (semi-supervised node classification, link prediction,
and recommender systems) with three predictive settings:
transductive, inductive, and cold-start inductive predictions.
Experimental Settings
We evaluate TUNEUPon realistic tail node scenarios in both
transductive (i.e., naturally occurring tail nodes in scale-free
networks) and inductive (i.e., newly arrived cold-start nodes)
settings. Conventional experimental setups (Hu et al. 2020;
摘要:

TUNEUP:ASimpleImprovedTrainingStrategyforGraphNeuralNetworksWeihuaHu1,KaidiCao1,KexinHuang1,EdwardWHuang2,KarthikSubbian2,KenjiKawaguchi3,JureLeskovec1.1StanfordUniversity,2Amazon,3NationalUniversityofSingapore{weihuahu,kaidicao,kexinh,jure}@cs.stanford.edu,{ksubbian,ewhuang}@amazon.com,kenji@comp.n...

展开>> 收起<<
TUNEUP A Simple Improved Training Strategy for Graph Neural Networks Weihua Hu1 Kaidi Cao1 Kexin Huang1 Edward W Huang2 Karthik Subbian2 Kenji Kawaguchi3 Jure Leskovec1..pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.41MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注