
TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS
Juan Cervi˜
no1, Luana Ruiz2, and Alejandro Ribeiro1
1Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA
2Simons-Berkeley Institute
ABSTRACT
Graph Neural Networks (GNNs) rely on graph convolutions to ex-
ploit meaningful patterns in networked data. Based on matrix mul-
tiplications, convolutions incur in high computational costs lead-
ing to scalability limitations in practice. To overcome these limita-
tions, proposed methods rely on training GNNs in smaller number of
nodes, and then transferring the GNN to larger graphs. Even though
these methods are able to bound the difference between the output
of the GNN with different number of nodes, they do not provide
guarantees against the optimal GNN on the very large graph. In this
paper, we propose to learn GNNs on very large graphs by leveraging
the limit object of a sequence of growing graphs, the graphon. We
propose to grow the size of the graph as we train, and we show that
our proposed methodology – learning by transference – converges to
a neighborhood of a first order stationary point on the graphon data.
A numerical experiment validates our proposed approach.
Index Terms—Graph Neural Networks, Graphons
1. INTRODUCTION
Graph Neural Networks (GNNs) are deep convolutional architec-
tures for non-Euclidean data supported on graphs [1, 2]. GNNs are
formed by a succession of layers, each of which is composed of a
graph convolutional filter and a point-wise non-linearity. In practice,
GNNs have shown state-of-the-art performance in several learning
tasks [3, 4], and have found applications in biology [5–7], recom-
mendation systems [8–10] and robotics [11, 12]. In theory, part of
their success is credited to their stability to graph perturbations [2],
the fact that they are invariant to relabelings [13,14], their expressive
power [15,16], and their generalization capabilities [17].
The ability of GNNs to exploit patterns in network data is largely
due to their graph convolutional layers. Graph convolutional filters
leverage the graph diffusion sequence to extract features that are
shared across the graph [2]. However, these filters rely on computing
matrix-vector multiplications, which become computationally costly
when the number of nodes is large. On the other hand, the number of
parameters in a GNN is independent of the number of nodes due to
its local parametrization, which motivates training GNNs on small
graphs and then deploying them on larger graphs.
Several works have been proposed which exploit this transfer-
ability property of the GNN [18–20]. Generally speaking, these
works upper bound the output difference between two GNNs with
the same parameters supported on graphs with different number of
nodes. They show that if the graphs belong to a family of graphs
modeled by a graphon [21], as the number of nodes increases the so-
called transferability error decreases. But while this result is useful,
Support by NSF CCF 1717120, ARL DCIST CRA under Grant
W911NF-17-2-0181 and Theorinet Simons.
it does not guarantee than the optimal parameters achieved by train-
ing the GNN on the small graph are optimal, or sufficiently close to
optimal, on the large graph. This subtle albeit crucial observation is
what motivates our work.
More specifically, we leverage the transferability properties of
GNNs to introduce a method that trains GNNs by successively in-
creasing the size of the graph during the training process. We call
this procedure learning by transference. We prove that the learning
directions (gradients) on the graphon and on the graph are aligned as
long as the number of nodes in the GNN is sufficiently large. Un-
der a minimum graph size at each epoch, we show that our method
converges to a neighborhood of the first order stationary point on
graphon by taking gradient steps on the graph. We benchmark our
procedure in a multi-agent system problem, where a GNN is trained
to learn a decentralized control policy for agents communicating via
a proximity graph.
Related work. This work extends upon previous works that consider
the nodes to be sampled from a regular partition of the graphon [22].
In this work, we consider the nodes to be uniformly sampled, which
not only provides a more general modeling of GNNs, but also more
closely correlates with graph signals seen in practice. This more gen-
eral sampling strategy induces further sampling errors (in addition
to the edge sampling error) [19], which worsen the approximation of
graphons and graphon data by graphs and graph data thus requiring
additional theoretical considerations.
2. GRAPH AND GRAPHON NEURAL NETWORKS
A graph is represented by the triplet Gn= (V,E, W ), where
V,|V| =nis the set of nodes, E ⊆ V × V is the set of edges, and
W:E → Ris a map assigning weights to each edge. Alternatively,
we can represent the graph Gnby its graph shift operator (GSO)
Sn∈Rn×n. Examples of GSO are the adjecency matrix A, the
graph laplacian L=diag(A1)−A, to name a few.
Data on graphs is defined as a graph signal x= [x1,...,xn],
where the ith component of xcorresponds to the value of the signal
on node i. A graph signal xcan be aggregated through the graph, by
applying the GSO Snas follows,
z=Snxn.(1)
Intuitively, the value of signal zat coordinate iis the weighted av-
erage of the information present in its one hop neighborhood, i.e.
zi=Pj∈N (i)[Sn]ij xj. We can construct k-hop diffusion by apply-
ing kpower of Sn. We define the graph convolutions, as a weighted
average over the powers of Sn. Explicitly, by letting the coefficients
be h= [h0,...,hK−1], we define the graph convolution as,
yn=h∗Snxn=
K−1
X
k=0
hkSk
nxn(2)
arXiv:2210.15567v1 [cs.LG] 27 Oct 2022