TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS Juan Cervi no1 Luana Ruiz2 and Alejandro Ribeiro1 1Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia USA

2025-05-06 0 0 319.21KB 6 页 10玖币

侵权投诉

TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS

Juan Cervi˜

no1, Luana Ruiz2, and Alejandro Ribeiro1

1Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA

2Simons-Berkeley Institute

ABSTRACT

Graph Neural Networks (GNNs) rely on graph convolutions to ex-

ploit meaningful patterns in networked data. Based on matrix mul-

tiplications, convolutions incur in high computational costs lead-

ing to scalability limitations in practice. To overcome these limita-

tions, proposed methods rely on training GNNs in smaller number of

nodes, and then transferring the GNN to larger graphs. Even though

these methods are able to bound the difference between the output

of the GNN with different number of nodes, they do not provide

guarantees against the optimal GNN on the very large graph. In this

paper, we propose to learn GNNs on very large graphs by leveraging

the limit object of a sequence of growing graphs, the graphon. We

propose to grow the size of the graph as we train, and we show that

our proposed methodology – learning by transference – converges to

a neighborhood of a ﬁrst order stationary point on the graphon data.

A numerical experiment validates our proposed approach.

Index Terms—Graph Neural Networks, Graphons

1. INTRODUCTION

Graph Neural Networks (GNNs) are deep convolutional architec-

tures for non-Euclidean data supported on graphs [1, 2]. GNNs are

formed by a succession of layers, each of which is composed of a

graph convolutional ﬁlter and a point-wise non-linearity. In practice,

GNNs have shown state-of-the-art performance in several learning

tasks [3, 4], and have found applications in biology [5–7], recom-

mendation systems [8–10] and robotics [11, 12]. In theory, part of

their success is credited to their stability to graph perturbations [2],

the fact that they are invariant to relabelings [13,14], their expressive

power [15,16], and their generalization capabilities [17].

The ability of GNNs to exploit patterns in network data is largely

due to their graph convolutional layers. Graph convolutional ﬁlters

leverage the graph diffusion sequence to extract features that are

shared across the graph [2]. However, these ﬁlters rely on computing

matrix-vector multiplications, which become computationally costly

when the number of nodes is large. On the other hand, the number of

parameters in a GNN is independent of the number of nodes due to

its local parametrization, which motivates training GNNs on small

graphs and then deploying them on larger graphs.

Several works have been proposed which exploit this transfer-

ability property of the GNN [18–20]. Generally speaking, these

works upper bound the output difference between two GNNs with

the same parameters supported on graphs with different number of

nodes. They show that if the graphs belong to a family of graphs

modeled by a graphon [21], as the number of nodes increases the so-

called transferability error decreases. But while this result is useful,

Support by NSF CCF 1717120, ARL DCIST CRA under Grant

W911NF-17-2-0181 and Theorinet Simons.

it does not guarantee than the optimal parameters achieved by train-

ing the GNN on the small graph are optimal, or sufﬁciently close to

optimal, on the large graph. This subtle albeit crucial observation is

what motivates our work.

More speciﬁcally, we leverage the transferability properties of

GNNs to introduce a method that trains GNNs by successively in-

creasing the size of the graph during the training process. We call

this procedure learning by transference. We prove that the learning

directions (gradients) on the graphon and on the graph are aligned as

long as the number of nodes in the GNN is sufﬁciently large. Un-

der a minimum graph size at each epoch, we show that our method

converges to a neighborhood of the ﬁrst order stationary point on

graphon by taking gradient steps on the graph. We benchmark our

procedure in a multi-agent system problem, where a GNN is trained

to learn a decentralized control policy for agents communicating via

a proximity graph.

Related work. This work extends upon previous works that consider

the nodes to be sampled from a regular partition of the graphon [22].

In this work, we consider the nodes to be uniformly sampled, which

not only provides a more general modeling of GNNs, but also more

closely correlates with graph signals seen in practice. This more gen-

eral sampling strategy induces further sampling errors (in addition

to the edge sampling error) [19], which worsen the approximation of

graphons and graphon data by graphs and graph data thus requiring

additional theoretical considerations.

2. GRAPH AND GRAPHON NEURAL NETWORKS

A graph is represented by the triplet Gn= (V,E, W ), where

V,|V| =nis the set of nodes, E ⊆ V × V is the set of edges, and

W:E → Ris a map assigning weights to each edge. Alternatively,

we can represent the graph Gnby its graph shift operator (GSO)

Sn∈Rn×n. Examples of GSO are the adjecency matrix A, the

graph laplacian L=diag(A1)−A, to name a few.

Data on graphs is deﬁned as a graph signal x= [x1,...,xn],

where the ith component of xcorresponds to the value of the signal

on node i. A graph signal xcan be aggregated through the graph, by

applying the GSO Snas follows,

z=Snxn.(1)

Intuitively, the value of signal zat coordinate iis the weighted av-

erage of the information present in its one hop neighborhood, i.e.

zi=Pj∈N (i)[Sn]ij xj. We can construct k-hop diffusion by apply-

ing kpower of Sn. We deﬁne the graph convolutions, as a weighted

average over the powers of Sn. Explicitly, by letting the coefﬁcients

be h= [h0,...,hK−1], we deﬁne the graph convolution as,

yn=h∗Snxn=

K−1

k=0

hkSk

nxn(2)

arXiv:2210.15567v1 [cs.LG] 27 Oct 2022

where xn,ynare graph signals, and h∗Sndenotes the convolution

operator with GSO Sn.

In the case of undirected graphs, Sn/n becomes symmetric, ad-

mitting a spectral decomposition Sn=VΛVT, where the columns

of Vare the eigenvectors of Sn, and Λis a diagonal matrix with

−1≤ ··· ≤ . . . 1. Since the eigenvector of Snfor a basis of Rn,

we can project ﬁlter yninto this basis to obtain,

h(λ) =

K−1

k=0

hkλk(3)

Note that h(λ)only depends on hk, and on the eigenvalues of the

GSO. By the Cayley-Hamilton theorem, convolutional ﬁlters may

be used to represent any graph ﬁlter with spectral representation

h(λ) = f(λ), where fis analytic.

Graph Neural Networks are layered architectures, composed

of graph convolutions followed by point-wise non-linearities. For-

mally, introducing a point-wise non-linearity ρ, and by stacking

all the graph signals at layer lin a matrix Xl= [x1

nl,...,xFl

nl ]∈

Rn×Fl, where Flindicates the Flfeatures at layer l. Notice that for

l= 0, the input matrix is a concatenation of the input signal, i.e.

X0= [xn,...,xn]. We can deﬁne the llayer of a GNN as,

Xl=ρK

k=1

nXl1Hlk,(4)

where matrix Hlk ∈RFl−1×Flrepresents the kcoefﬁcient of the

graph convolution. By grouping all the learnable parameters we can

obtain a more succint representation of the GNN as φ(x;H,S), with

H={Hlk}lk,1≤l≤L, and 0≤k≤K−1.

2.1. Graphon Neural Networks

Graphons are the limit object of a converging sequence of dense

undirected graphs. Formally, a graphon is a symmetric, bounded,

and measurable function W: [0,1]2→[0,1]. Sequences of dense

graphs converge to a graphon in the sense that the densities of ad-

jacency preserving graph motifs converge to the densities of these

same motifs on the graphon [18, 21].

Analogously to graph signals, we can deﬁne a graphon signal as

a function X∈L2([0,1]). A graphon signal can be diffused over

the graphon by applying the linear integrator operator given by,

TWX(v) = Z1

W(u, v)X(v)du (5)

which we denote graphon shift operator (WSO). Since graphon W

is bounded and symmetric, TWis a Hilbert-Schmidt and self adjoint

operator [23]. Therefore, we can express Wby its eigen decompo-

sition, W(u, v) = Pi∈Zr0λiφi(u)φi(v), where λiis the eigen-

value associated with eigenfunction φi. The absolute value of the

eigenvalues is bounded by 1, and the eigenfunctions form an orthog-

onal basis of L2([0,1]). Utilizing the graphon shift, we deﬁne the

graphon convolution with parameters h= [h0,...,hK−1], as

Y=ThX=h∗WX=

K−1

k=0

hk(T(k)

WX)(v)with (6)

(T(k)

WX)(v) = Z1

W(u, v)(T(k−1)

WX)(u)du (7)

where X, Y are graphon signals, ∗Wdeﬁnes the convolution oper-

ator, and T(0) =Iis the identity [24]. Projecting the ﬁlter (6) onto

the eigenbasis {φi}i∈Zr0, the graphon convolution admits a spectral

representation given by,

h(λ) =

K−1

k=0

hkλk.(8)

Like its graphon counterpart (cf. (3)) the graphon convolution admits

a spectral representation that only depends on the coefﬁcients of the

ﬁlter h.

Graphon Neural Networks (WNNs) are the extension of GNNs

to graphon data. Each layer of a WNN is composed by a graphon

convolution (6), and a non-linearity ρ. Denoting Flthe features at

layer l, we can group the parameters of the Fl−1×Flconvolutions

into Kmatrices {Hlk} ∈ RFl−1×Fl, to write the fth feature of the

lth layer as follows,

l=ρFl−1

g=1

K−1

k=1

(T(k)

WXg

l−1)[Hlk]gf (9)

for 1≤g≤F0. For an Llayered WNN, Xg

0is given by the

input data Xg, and (9) is repeated for 1≤l≤L, and the output

of the WNN is given by Y=XL. Analogoues to GNNS, a more

succint representation of the WNN can be obtained be grouping all

the coefﬁcients, i.e. φ(X;H,W), with H={Hlk }lk ,1≤l≤L,

and 0≤k≤K−1.

2.2. From Graphons to Graphs, and Back

In this paper we focus on graphons, and graphon signals as genera-

tive models of graphs, and graph signals. Let {ui}N

i=1 be npoints

sampled independently at uniformly from [0,1],ui∼unif(0,1).

The n-node stochastic GSO of graph Gnis obtained from graphon

Was follows,

[Sn]ij = [Sn]ji ∼Bernoulli(W(ui, uj))) (10)

A graph signal xcan be obtained by evaluating the graphon signal,

[x]i=X(ui)for all i∈[n].(11)

Therefore, we can obtain stochastic graphs Sn, and graph signals x

from graphon data W, X.

A graphon can be induced by a graph even if the node labels

{ui}N

i=1 are unknown. Let Sn∈Rn×nbe the GSO of a graph, and

deﬁne ui= (i−1)/n for 1≤i≤n. We construct the intervals

Ii= [ui, ui+1]for 1≤i≤n. Letting 1be the indicator function,

the induced graphon WS, and graph signal xcan be obtained as,

WS(u, v) =

i=1

j=1

[Sn]ij 1(u∈Ii)1(v∈Ij),and (12)

Xn(u) =

i=1

[xn]i1(u∈Ii).(13)

3. LEARNING BY TRANSFERENCE

We are interested in solving a statistical learning problem where the

data is supported in very large graphs. In the limit, this corresponds

to a graphon, so we consider the problem of learning a WNN.

Let `:R×R→Rbe a non-negative loss function such that

`(x, y)=0if and only if x=y, and let p(X, Y )be an unknown

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TRAININGGRAPHNEURALNETWORKSONGROWINGSTOCHASTICGRAPHSJuanCervino1,LuanaRuiz2,andAlejandroRibeiro11DepartmentofElectricalandSystemsEngineering,UniversityofPennsylvania,Philadelphia,USA2Simons-BerkeleyInstituteABSTRACTGraphNeuralNetworks(GNNs)relyongraphconvolutionstoex-ploitmeaningfulpatternsinnetwor...

展开>> 收起<<

TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS Juan Cervi no1 Luana Ruiz2 and Alejandro Ribeiro1 1Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia USA.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS Juan Cervi no1 Luana Ruiz2 and Alejandro Ribeiro1 1Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: