
Algorithm 1: TUNEUP. Compared to the conventional train-
ing of a GNN (L2–5), TUNEUPintroduces a two-stage train-
ing process and only adds two components (L8 and L12) that
are straightforward to implement. Each parameter update of
TUNEUPis as efficient as the conventional GNN training.
Given: GNN
Fθ
, graph
G
, loss
L
, supervision
Y
, DropEdge ratio
α.
1: # First stage: Conventional training to obtain a base GNN.
2: while θnot converged do
3: Make prediction
b
Y=Fθ(G)
4:
Compute loss
L(
b
Y , Y )
, compute gradient
∇θL
, and update
parameter θ.
5: end while
6: # Set up for the second stage.
7: if task is semi-supervised node classification then
8:
Use
Fθ
to predict pseudo-labels on non-isolated, unlabeled
nodes. Add the pseudo-labels into Y.
9: end if
10:
# Second stage: Further training the base GNN with in-
creased tail supervision.
11: while θnot converged do
12:
Synthesize tail nodes, i.e., randomly drop
α
of edges:
GDropEdge
−−−−−−→
e
G.
13: Make prediction
b
Y=Fθ(
e
G).
14:
Compute loss
L(
b
Y , Y )
, compute gradient
∇θL
, and update
parameter θ.
15: end while
•
Supervison
Y
: Whether a user node
u
has interacted with
an item node vin G(positive) or not (negative).
•
GNN
Fθ
: A model that takes
G
as input and predicts the
score for a pair of nodes
(u, v)∈Vuser ×Vitem
. Follow-
ing Wang et al. (2019), GNN parameter
θ
contains the input
shallow embeddings in addition to the original message
passing GNN parameter. To produce the score for the pair
of nodes
(u, v)
, we generate the user and item embeddings,
zu
and
zv
, and take the inner product
z⊤
uzv
to compute
the score (Wang et al. 2019).
•
Prediction
b
Y
: The GNN’s predicted scores over
Vuser ×
Vitem.
•Loss L: The BPR loss (Rendle et al. 2012).
Conventional GNN Training
A conventional way to train a GNN (Kipf and Welling 2017)
is to minimize the loss
L(b
Y , Y )
via gradient descent, as
shown in L2–5 of Algorithm1. Extension to mini-batch train-
ing (Hamilton, Ying, and Leskovec 2017; Zeng et al. 2020)
is straightforward by sampling subgraph
G
in each parameter
update.
Issue with Conventional Training. Conventional training
implicitly assumes GNNs can learn over all nodes equally
well. In practice, some nodes, such as low-degree tail nodes,
are more difficult for GNNs to learn due to the scarce neigh-
borhood information. As a result, GNNs trained with conven-
tional training often give poor predictive performance on the
difficult tail nodes (Liu, Nguyen, and Fang 2021).
are not available in many public recommender system datasets, and
most existing works rely solely on edge connectivity to predict links.
TUNEUP: An Improved GNN Training
To resolve the issue, we present TUNEUP, a simple curricu-
lum learning strategy, to improve GNN performance, espe-
cially on the difficult tail nodes. At a high level, TUNEUP
first trains a GNN to perform well on the relatively easy head
nodes. Then, it further trains the GNN to also perform well
on the more difficult tail nodes.
Specifically, in the first stage (L2–5 in Algorithm 1),
TUNEUPuses conventional GNN training to obtain a strong
base GNN model. The base GNN model tends to perform
well on head nodes, but poorly on tail nodes. To remedy this
issue, in the second training stage, TUNEUPfuther trains the
base GNN on synthetic tail nodes (L7–L15 in Algorithm 1).
TUNEUPsynthesizes supervised tail node data in two steps,
detailed next: (1) synthesizing additional tail node inputs, and
(2) adding target supervision on the synthetic tail nodes.
Synthesizing tail node inputs In many real-world graphs,
head nodes start off as tail nodes, e.g., well-cited paper nodes
are not cited at the beginning in a paper citation network,
and warm users (users with many item interactions) start off
as cold-start users in recommender systems. Hence, our key
idea is to synthesize tail nodes by systematically removing
edges from head nodes. There can be different ways to re-
move edges. In this work, we simply adopt DropEdge (Rong
et al. 2020) to instantiate our idea. DropEdge drops a certain
portion (given by hyperparameter
α
) of edges randomly from
the original graph
G
to obtain
e
G
(L12 in Algorithm 1). The
resulting
e
G
contains more nodes with low degrees, i.e., tail
nodes, than the original graph
G
. Hence, the GNN sees more
(synthetic) tail nodes as input during training.
Adding supervision on the synthetic tail nodes After
synthesizing the tail node inputs, TUNEUPthen adds target
supervision (e.g., class labels for node classification, edges
for link prediction) on them so that a supervised loss can
be computed. Our key idea is to reuse the target labels on
the original head nodes for the synthesized tail nodes. The
rationale is that many prediction tasks involve target labels
that are inherent node properties that do not change with node
degree. For example, additional citations will not change a
paper’s subject area, and additional purchases will not change
a product’s category.
Concretely, for link prediction tasks, TUNEUPdirectly
reuses the original edges
E
in
G
(before dropping) for the
target supervision on the synthetic tail nodes. For semi-
supervised node classification, TUNEUPcan similarly reuse
the target labels of labeled nodes
Vlabeled
in
G
as the labels
for synthetic tail nodes in
e
G
. A critical challenge here is
that the number of labeled nodes
Vlabeled
is often small in the
semi-supervised setting, e.g., 1%–5% of all nodes
V
, limiting
the amount of target label supervision TUNEUPcan reuse.
To resolve this issue, TUNEUPapplies the base GNN
(obtained in the first training stage) over
G
to predict
pseudo-labels (Lee et al. 2013) over non-isolated nodes
in
Vunlabeled ≡V\Vlabeled
.
2
TUNEUPthen includes the
2
Note that the pseudo-labels do not need to be ones directly
predicted by the base GNN. For example, one can apply C&S