TUNEUP A Simple Improved Training Strategy for Graph Neural Networks Weihua Hu1 Kaidi Cao1 Kexin Huang1 Edward W Huang2 Karthik Subbian2 Kenji Kawaguchi3 Jure Leskovec1.

2025-05-06 0 0 1.41MB 18 页 10玖币

侵权投诉

TUNEUP: A Simple Improved Training Strategy for Graph Neural Networks

Weihua Hu1, Kaidi Cao1, Kexin Huang1, Edward W Huang2,

Karthik Subbian2, Kenji Kawaguchi3, Jure Leskovec1.

1Stanford University, 2Amazon, 3National University of Singapore

{weihuahu,kaidicao,kexinh,jure}@cs.stanford.edu, {ksubbian,ewhuang}@amazon.com, kenji@comp.nus.edu.sg

Abstract

Despite recent advances in Graph Neural Networks (GNNs),

their training strategies remain largely under-explored. The

conventional training strategy learns over all nodes in the orig-

inal graph(s) equally, which can be sub-optimal as certain

nodes are often more difﬁcult to learn than others. Here we

present TUNEUP, a simple curriculum-based training strategy

for improving the predictive performance of GNNs. TUNEUP

trains a GNN in two stages. In the ﬁrst stage, TUNEUPapplies

conventional training to obtain a strong base GNN. The base

GNN tends to perform well on head nodes (nodes with large

degrees) but less so on tail nodes (nodes with small degrees).

Therefore, the second stage of TUNEUPfocuses on improv-

ing prediction on the difﬁcult tail nodes by further training

the base GNN on synthetically generated tail node data. We

theoretically analyze TUNEUPand show it provably improves

generalization performance on tail nodes. TUNEUPis simple

to implement and applicable to a broad range of GNN architec-

tures and prediction tasks. Extensive evaluation of TUNEUP

on ﬁve diverse GNN architectures, three types of prediction

tasks, and both transductive and inductive settings shows that

TUNEUPsigniﬁcantly improves the performance of the base

GNN on tail nodes, while often even improving the perfor-

mance on head nodes. Altogether, TUNEUPproduces up to

57.6% and 92.2% relative predictive performance improve-

ment in the transductive and the challenging inductive settings,

respectively.

Introduction

Graph Neural Networks (GNNs) are one of the most success-

ful and widely used paradigms for representation learning on

graphs, achieving state-of-the-art performance on a variety

of prediction tasks, such as semi-supervised node classiﬁca-

tion (Kipf and Welling 2017; Velickovic et al. 2018), link

prediction (Hamilton, Ying, and Leskovec 2017; Kipf and

Welling 2016), and recommender systems (Ying et al. 2018;

He et al. 2020). There has been a surge of work on improv-

ing GNN model architectures (Velickovic et al. 2018; Xu

et al. 2019, 2018; Shi et al. 2020; Klicpera, Bojchevski, and

unnemann 2019; Wu et al. 2019; Zhao and Akoglu 2019; Li

et al. 2019; Chen et al. 2020; Li et al. 2021) and task-speciﬁc

losses (Kipf and Welling 2016; Rendle et al. 2012; Verma

et al. 2021; Huang et al. 2021b). Despite all these advances,

strategies for training a GNN on a given supervised loss re-

main largely under-explored. Existing work has focused on

minimizing the given loss averaged over nodes in the original

graph(s), which neglects the fact that some nodes are more

difﬁcult to learn than others.

Here we present TUNEUP, a simple improved training

strategy for improving the predictive performance of GNNs.

The key motivation behind TUNEUPis that GNNs tend to

under-perform on tail nodes, i.e., nodes with small (e.g., 0–

5) node degrees, due to the scarce neighbors to aggrega-

tion features from (Liu, Nguyen, and Fang 2021). Improv-

ing GNN performance on tail nodes is important since they

are prevalent in real-world scale-free graphs (Clauset, Shal-

izi, and Newman 2009) as well as newly arriving cold-start

nodes (Lika, Kolomvatsos, and Hadjiefthymiades 2014).

The key idea of TUNEUPis to adopt curriculum-based

training (Bengio et al. 2009), where it ﬁrst trains a GNN to

perform well on relatively-easy head nodes. It then proceeds

to further train the GNN to perform well on the more difﬁcult

tail nodes by minimizing the loss over supervised tail node

data that is synthetically generated via pseudo-labeling and

dropping edges (Rong et al. 2020).

We theoretically analyze TUNEUPand show that it prov-

ably improves the generalization performance on tail nodes

by utilizing information from head nodes. Our theory also

justiﬁes how TUNEUPgenerates supervised synthetic tail

nodes in the second stage. Our theory suggests that both

pseudo-labeling and dropping edges are crucial for improved

generalization.

TUNEUPis simple to implement on top of the conven-

tional training pipeline of GNNs, as shown in Algorithm 1.

Thanks to its simplicity, TUNEUPcan be readily used with

a wide range of GNN models and supervised losses; hence,

applicable to many node and edge-level prediction tasks. This

is in contrast with recent methods for improving GNN per-

formance on tail nodes (Liu, Nguyen, and Fang 2021; Zheng

et al. 2022; Zhang et al. 2022; Kang et al. 2022) as they

all require non-trivial modiﬁcations of both GNN architec-

tures and loss, making them harder to implement and not

applicable to diverse prediction tasks.

To demonstrate the effectiveness and broad applicability

of TUNEUP, we perform extensive experiments on a wide

range of settings. We consider ﬁve diverse GNN architec-

tures, three types of key prediction tasks (semi-supervised

node classiﬁcation, link prediction, and recommender sys-

tems) with a total of six datasets, as well as both transductive

arXiv:2210.14843v2 [stat.ML] 26 Aug 2023

Semi-sup node classification Link prediction Recommender systems

Figure 1: Degree-speciﬁc predictive performance of the base GNN (trained with conventional training) and TUNEUPGNN in

the transductive setting. Between “Base” and “TUNEUP”, only the training strategy differs; the model architecture and the loss

function stay exactly the same. The

-axis represents the node degrees in the training graph, and the

-axis is the predictive

performance averaged over nodes with speciﬁc degrees. We see from the dotted blue curves that the base GNN tends to perform

poorly on tail nodes, i.e., nodes with small degrees. Our TUNEUP(denoted by the solid orange curves) gives more accurate

GNNs than conventional training (“Base”). TUNEUPimproves the predictive performance across almost all node degrees, but

most signiﬁcantly on tail nodes.

(i.e., prediction on nodes seen during training) and inductive

(i.e., prediction on new nodes not seen during training) set-

tings. For the inductive setting, we additionally consider the

challenging cold-start scenario (i.e., limited edge connectiv-

ity from new nodes) by randomly removing certain portions

of edges from new nodes.

Across the settings, TUNEUPproduces consistent improve-

ment in the predictive performance of GNNs. In the transduc-

tive setting, TUNEUPsigniﬁcantly improves the performance

of base GNNs on tail nodes, while oftentimes even improv-

ing the performance on head nodes (see Figure 1). In the

inductive setting, TUNEUPespecially shines in the cold-start

prediction scenario, where new nodes are tail-like, producing

up to 92.2% relative improvement in the predictive perfor-

mance. Moreover, our TUNEUPsigniﬁcantly outperforms the

recent specialized methods for tail nodes (Liu, Nguyen, and

Fang 2021; Zheng et al. 2022; Zhang et al. 2022; Kang et al.

2022), while not requiring any modiﬁcation to GNN architec-

tures nor losses. Overall, our work shows that even a simple

training strategy can yield a surprisingly large improvement

in the predictive performance of GNNs, pointing to a promis-

ing direction to investigate effective training strategies for

GNNs, beyond architectures and losses.

General Setup and Conventional Training

Here we introduce a general task setup for machine learning

on graphs and review the conventional training strategy of

GNNs.

General Setup

We are given a graph

G= (V, E)

, with a set of nodes

and

edges

, possibly associated with some features. A GNN

Fθ

parameterized by

, takes the graph

as input and makes

prediction

for the task of interest. The loss function

measures the discrepancy between the GNN’s prediction

and the target supervision

. When input node features are

available, GNN

Fθ

can make not only transductive predic-

tions, i.e., prediction over existing nodes

, but also inductive

predictions (Hamilton, Ying, and Leskovec 2017), i.e., pre-

diction over new nodes

Vnew

that are not yet present in

This is a general task setup that covers many representative

predictive tasks over graphs as special cases:

Semi-supervised node classiﬁcation (Kipf and Welling

2017)

•Graph G: A graph with input node features.

•

Supervison

: Class labels of labeled nodes

Vlabeled ⊂V

•

GNN

Fθ

: A model that takes

as input and predicts class

probabilities over V.

•Prediction b

Y: The GNN’s prediction over Vlabeled.

•Loss L: Cross-entropy loss.

Link prediction (Kipf and Welling 2016)

•Graph G: A graph with input node features.

•

Supervison

: Whether node

s∈V

is linked to node

t∈Vin G(positive) or not (negative).

•

GNN

Fθ

: A model that takes

as input and predicts the

score for a pair of nodes

(s, t)∈V×V

. Speciﬁcally, the

model generates embedding

for each node in

v∈V

and uses an MLP over the Hadamard product between

and

to predict the score for the pair

(s, t)

(Grover and

Leskovec 2016).

•Prediction b

Y: The GNN’s predicted scores over V×V.

•

Loss

: The Bayesian Personalized Ranking (BPR)

loss (Rendle et al. 2012), which encourages the predicted

score for the positive pair

(s, tpos)

to be higher than that

for the negative pair (s, tneg)for each source node s∈V.

Recommender systems (Wang et al. 2019) A recom-

mender system is link prediction between user nodes

Vuser

and item nodes Vitem.

•

Graph

: User-item bipartite graph without input node

features.1

We consider the feature-less setting because input node features

Algorithm 1: TUNEUP. Compared to the conventional train-

ing of a GNN (L2–5), TUNEUPintroduces a two-stage train-

ing process and only adds two components (L8 and L12) that

are straightforward to implement. Each parameter update of

TUNEUPis as efﬁcient as the conventional GNN training.

Given: GNN

Fθ

, graph

, loss

, supervision

, DropEdge ratio

α.

1: # First stage: Conventional training to obtain a base GNN.

2: while θnot converged do

3: Make prediction

Y=Fθ(G)

Compute loss

Y , Y )

, compute gradient

∇θL

, and update

parameter θ.

5: end while

6: # Set up for the second stage.

7: if task is semi-supervised node classiﬁcation then

Use

Fθ

to predict pseudo-labels on non-isolated, unlabeled

nodes. Add the pseudo-labels into Y.

9: end if

10:

# Second stage: Further training the base GNN with in-

creased tail supervision.

11: while θnot converged do

12:

Synthesize tail nodes, i.e., randomly drop

of edges:

GDropEdge

−−−−−−→

13: Make prediction

Y=Fθ(

G).

14:

Compute loss

Y , Y )

, compute gradient

∇θL

, and update

parameter θ.

15: end while

•

Supervison

: Whether a user node

has interacted with

an item node vin G(positive) or not (negative).

•

GNN

Fθ

: A model that takes

as input and predicts the

score for a pair of nodes

(u, v)∈Vuser ×Vitem

. Follow-

ing Wang et al. (2019), GNN parameter

contains the input

shallow embeddings in addition to the original message

passing GNN parameter. To produce the score for the pair

of nodes

(u, v)

, we generate the user and item embeddings,

and

, and take the inner product

z⊤

uzv

to compute

the score (Wang et al. 2019).

•

Prediction

: The GNN’s predicted scores over

Vuser ×

Vitem.

•Loss L: The BPR loss (Rendle et al. 2012).

Conventional GNN Training

A conventional way to train a GNN (Kipf and Welling 2017)

is to minimize the loss

L(b

Y , Y )

via gradient descent, as

shown in L2–5 of Algorithm1. Extension to mini-batch train-

ing (Hamilton, Ying, and Leskovec 2017; Zeng et al. 2020)

is straightforward by sampling subgraph

in each parameter

update.

Issue with Conventional Training. Conventional training

implicitly assumes GNNs can learn over all nodes equally

well. In practice, some nodes, such as low-degree tail nodes,

are more difﬁcult for GNNs to learn due to the scarce neigh-

borhood information. As a result, GNNs trained with conven-

tional training often give poor predictive performance on the

difﬁcult tail nodes (Liu, Nguyen, and Fang 2021).

are not available in many public recommender system datasets, and

most existing works rely solely on edge connectivity to predict links.

TUNEUP: An Improved GNN Training

To resolve the issue, we present TUNEUP, a simple curricu-

lum learning strategy, to improve GNN performance, espe-

cially on the difﬁcult tail nodes. At a high level, TUNEUP

ﬁrst trains a GNN to perform well on the relatively easy head

nodes. Then, it further trains the GNN to also perform well

on the more difﬁcult tail nodes.

Speciﬁcally, in the ﬁrst stage (L2–5 in Algorithm 1),

TUNEUPuses conventional GNN training to obtain a strong

base GNN model. The base GNN model tends to perform

well on head nodes, but poorly on tail nodes. To remedy this

issue, in the second training stage, TUNEUPfuther trains the

base GNN on synthetic tail nodes (L7–L15 in Algorithm 1).

TUNEUPsynthesizes supervised tail node data in two steps,

detailed next: (1) synthesizing additional tail node inputs, and

(2) adding target supervision on the synthetic tail nodes.

Synthesizing tail node inputs In many real-world graphs,

head nodes start off as tail nodes, e.g., well-cited paper nodes

are not cited at the beginning in a paper citation network,

and warm users (users with many item interactions) start off

as cold-start users in recommender systems. Hence, our key

idea is to synthesize tail nodes by systematically removing

edges from head nodes. There can be different ways to re-

move edges. In this work, we simply adopt DropEdge (Rong

et al. 2020) to instantiate our idea. DropEdge drops a certain

portion (given by hyperparameter

) of edges randomly from

the original graph

to obtain

(L12 in Algorithm 1). The

resulting

contains more nodes with low degrees, i.e., tail

nodes, than the original graph

. Hence, the GNN sees more

(synthetic) tail nodes as input during training.

Adding supervision on the synthetic tail nodes After

synthesizing the tail node inputs, TUNEUPthen adds target

supervision (e.g., class labels for node classiﬁcation, edges

for link prediction) on them so that a supervised loss can

be computed. Our key idea is to reuse the target labels on

the original head nodes for the synthesized tail nodes. The

rationale is that many prediction tasks involve target labels

that are inherent node properties that do not change with node

degree. For example, additional citations will not change a

paper’s subject area, and additional purchases will not change

a product’s category.

Concretely, for link prediction tasks, TUNEUPdirectly

reuses the original edges

(before dropping) for the

target supervision on the synthetic tail nodes. For semi-

supervised node classiﬁcation, TUNEUPcan similarly reuse

the target labels of labeled nodes

Vlabeled

as the labels

for synthetic tail nodes in

. A critical challenge here is

that the number of labeled nodes

Vlabeled

is often small in the

semi-supervised setting, e.g., 1%–5% of all nodes

, limiting

the amount of target label supervision TUNEUPcan reuse.

To resolve this issue, TUNEUPapplies the base GNN

(obtained in the ﬁrst training stage) over

to predict

pseudo-labels (Lee et al. 2013) over non-isolated nodes

Vunlabeled ≡V\Vlabeled

TUNEUPthen includes the

Note that the pseudo-labels do not need to be ones directly

predicted by the base GNN. For example, one can apply C&S

pseudo-labels as supervision

in the second stage (L8 in

Algorithm 1). This signiﬁcantly increases the size of the su-

pervision

,e.g., by a factor of

≈

100 if only 1% of nodes

are labeled. While the pseudo-labels can be noisy, they are

“best guesses” made by the base GNN in the sense that they

are predicted using full graph information

as input. In the

second stage, TUNEUPtrains the GNN to maintain its “best

guesses” given sparser graph

as input, which encourages

the GNN to perform well on nodes whose neighbors are actu-

ally scarce. Note that this strategy is fundamentally different

from the classical pseudo-labeling method (Lee et al. 2013)

that trains a model without sparsifying the input graph. In

the following sections, we will see this both theoretically and

empirically.

Theoretical Analysis

To theoretically understand TUNEUPwith clean insights, we

consider node classiﬁcation with binary labels for a part of

a graph with two extreme groups of nodes: ones with full

degrees and ones with zero degrees. Considering this part of a

graph as an example, we mathematically show how TUNEUP

uses the nodes with high degrees to improve the generaliza-

tion for the nodes with low degrees via the curriculum-based

training strategy.

We analyze the generalization gap between the test errors

of nodes with low degrees and the training errors of nodes

with high degrees. This type of generalization is non-standard

and does not necessarily happen unless we take advantage of

some additional mechanisms such as dropping edges. Deﬁne

to be the dimensionality of the feature space of the last

hidden layer of a GNN. Denote by

the size of a set of

labeled nodes used for training. Let

be the average training

loss at the end of the ﬁrst stage of TUNEUPcurriculum

learning.

We prove a theorem (Theorem 1), which shows the follow-

ing three statements:

(i)

First, consider TUNEUPwithout pseudo labeling (de-

noted by

helps reduce the test errors of nodes

with low degrees via utilizing the nodes with high de-

grees by dropping edges: i.e., the generalization bound in

our theorem decreases towards zero at the rate of qd

(ii)

The full TUNEUP(denoted by

) further reduces the

test errors of nodes with low degrees by utilizing pseudo-

labels: i.e., the rate of

is replaced by the rate of

m+Q

, where typically

Q= 0

is explicitly min-

imized as a training objective. Thus, curriculum-based

training with pseudo-labels can remove the factor d.

(iii)

TUNEUPwithout DropEdge (denoted by

), i.e., the

classical pseudo-labeling method, degrades the test errors

of nodes with low degrees by incurring additional error

term

τ > 0

that measures the difference between the

losses with and without edges. This is consistent with

the above intuition that generalizing from high-degree

post-processing (Huang et al. 2021a) to improve the quality of the

pseudo-labels, which we leave for future work.

training nodes to the low-degree test nodes requires some

relationship between ones with and without edges.

For each method

M∈ {Mi}3

i=1

, we deﬁne

∆(M)

be the generalization gap between the test errors of nodes

with low degrees and the training errors of nodes with high

degrees.

Theorem 1. For any

δ > 0

, with probability at least

1−δ

the following holds for all M∈ {M1, M2, M3}:

∆(M)≤s1{M=M1}8(d−1) ln(16em

δ) + 8 ln(16em

δ)

+1{M̸=M1}Q+1{M=M3}τ+G,

where G→0as the graph size approaches inﬁnity.

Proof.

A more detailed version of Theorem 1 is presented

along with the complete proof in Appendix.

Related Work

Methods for Tail Nodes

Recently, a surge of methods have been developed for improv-

ing the predictive performance of GNNs on tail nodes (Liu,

Nguyen, and Fang 2021; Zheng et al. 2022; Kang et al. 2022;

Zhang et al. 2022). These methods augment GNNs with

complicated tail-node-speciﬁc architectural components and

losses, while TUNEUPfocuses on the training strategy that

does not require any architectural nor loss modiﬁcation.

Data augmentation for GNNs

The second stage of TUNEUPis data augmentation over

graphs, on which there has been abundant work (Zhao et al.

2021; Feng et al. 2020; Verma et al. 2021; Kong et al. 2020;

Liu et al. 2022; Ding et al. 2022). The most relevant one is

DropEdge (Rong et al. 2020), which was originally devel-

oped to overcome the over-smoothing issue of GNNs (Li,

Han, and Wu 2018) speciﬁc to semi-supervised node classi-

ﬁcation. Our work has a different motivation and expended

scope: We use DropEdge to synthesize tail node inputs and

consider a wider range of prediction tasks. Our theoretical

analysis also differs and focuses on generalization on tail

nodes. Methodologically, TUNEUPadditionally employs cur-

riculum learning and pseudo-labels, both of which are crucial

in improving GNN performance over the vanilla DropEdge.

Experiments

We evaluate the broad applicability of TUNEUPby consider-

ing ﬁve GNN models and testing them on the three prediction

tasks (semi-supervised node classiﬁcation, link prediction,

and recommender systems) with three predictive settings:

transductive, inductive, and cold-start inductive predictions.

Experimental Settings

We evaluate TUNEUPon realistic tail node scenarios in both

transductive (i.e., naturally occurring tail nodes in scale-free

networks) and inductive (i.e., newly arrived cold-start nodes)

settings. Conventional experimental setups (Hu et al. 2020;

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TUNEUP:ASimpleImprovedTrainingStrategyforGraphNeuralNetworksWeihuaHu1,KaidiCao1,KexinHuang1,EdwardWHuang2,KarthikSubbian2,KenjiKawaguchi3,JureLeskovec1.1StanfordUniversity,2Amazon,3NationalUniversityofSingapore{weihuahu,kaidicao,kexinh,jure}@cs.stanford.edu,{ksubbian,ewhuang}@amazon.com,kenji@comp.n...

展开>> 收起<<

TUNEUP A Simple Improved Training Strategy for Graph Neural Networks Weihua Hu1 Kaidi Cao1 Kexin Huang1 Edward W Huang2 Karthik Subbian2 Kenji Kawaguchi3 Jure Leskovec1..pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TUNEUP A Simple Improved Training Strategy for Graph Neural Networks Weihua Hu1 Kaidi Cao1 Kexin Huang1 Edward W Huang2 Karthik Subbian2 Kenji Kawaguchi3 Jure Leskovec1.

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: