TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS Juan Cervi no1 Luana Ruiz2 and Alejandro Ribeiro1 1Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia USA

2025-05-06 0 0 319.21KB 6 页 10玖币
侵权投诉
TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS
Juan Cervi˜
no1, Luana Ruiz2, and Alejandro Ribeiro1
1Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA
2Simons-Berkeley Institute
ABSTRACT
Graph Neural Networks (GNNs) rely on graph convolutions to ex-
ploit meaningful patterns in networked data. Based on matrix mul-
tiplications, convolutions incur in high computational costs lead-
ing to scalability limitations in practice. To overcome these limita-
tions, proposed methods rely on training GNNs in smaller number of
nodes, and then transferring the GNN to larger graphs. Even though
these methods are able to bound the difference between the output
of the GNN with different number of nodes, they do not provide
guarantees against the optimal GNN on the very large graph. In this
paper, we propose to learn GNNs on very large graphs by leveraging
the limit object of a sequence of growing graphs, the graphon. We
propose to grow the size of the graph as we train, and we show that
our proposed methodology – learning by transference – converges to
a neighborhood of a first order stationary point on the graphon data.
A numerical experiment validates our proposed approach.
Index TermsGraph Neural Networks, Graphons
1. INTRODUCTION
Graph Neural Networks (GNNs) are deep convolutional architec-
tures for non-Euclidean data supported on graphs [1, 2]. GNNs are
formed by a succession of layers, each of which is composed of a
graph convolutional filter and a point-wise non-linearity. In practice,
GNNs have shown state-of-the-art performance in several learning
tasks [3, 4], and have found applications in biology [5–7], recom-
mendation systems [8–10] and robotics [11, 12]. In theory, part of
their success is credited to their stability to graph perturbations [2],
the fact that they are invariant to relabelings [13,14], their expressive
power [15,16], and their generalization capabilities [17].
The ability of GNNs to exploit patterns in network data is largely
due to their graph convolutional layers. Graph convolutional filters
leverage the graph diffusion sequence to extract features that are
shared across the graph [2]. However, these filters rely on computing
matrix-vector multiplications, which become computationally costly
when the number of nodes is large. On the other hand, the number of
parameters in a GNN is independent of the number of nodes due to
its local parametrization, which motivates training GNNs on small
graphs and then deploying them on larger graphs.
Several works have been proposed which exploit this transfer-
ability property of the GNN [18–20]. Generally speaking, these
works upper bound the output difference between two GNNs with
the same parameters supported on graphs with different number of
nodes. They show that if the graphs belong to a family of graphs
modeled by a graphon [21], as the number of nodes increases the so-
called transferability error decreases. But while this result is useful,
Support by NSF CCF 1717120, ARL DCIST CRA under Grant
W911NF-17-2-0181 and Theorinet Simons.
it does not guarantee than the optimal parameters achieved by train-
ing the GNN on the small graph are optimal, or sufficiently close to
optimal, on the large graph. This subtle albeit crucial observation is
what motivates our work.
More specifically, we leverage the transferability properties of
GNNs to introduce a method that trains GNNs by successively in-
creasing the size of the graph during the training process. We call
this procedure learning by transference. We prove that the learning
directions (gradients) on the graphon and on the graph are aligned as
long as the number of nodes in the GNN is sufficiently large. Un-
der a minimum graph size at each epoch, we show that our method
converges to a neighborhood of the first order stationary point on
graphon by taking gradient steps on the graph. We benchmark our
procedure in a multi-agent system problem, where a GNN is trained
to learn a decentralized control policy for agents communicating via
a proximity graph.
Related work. This work extends upon previous works that consider
the nodes to be sampled from a regular partition of the graphon [22].
In this work, we consider the nodes to be uniformly sampled, which
not only provides a more general modeling of GNNs, but also more
closely correlates with graph signals seen in practice. This more gen-
eral sampling strategy induces further sampling errors (in addition
to the edge sampling error) [19], which worsen the approximation of
graphons and graphon data by graphs and graph data thus requiring
additional theoretical considerations.
2. GRAPH AND GRAPHON NEURAL NETWORKS
A graph is represented by the triplet Gn= (V,E, W ), where
V,|V| =nis the set of nodes, E V × V is the set of edges, and
W:E Ris a map assigning weights to each edge. Alternatively,
we can represent the graph Gnby its graph shift operator (GSO)
SnRn×n. Examples of GSO are the adjecency matrix A, the
graph laplacian L=diag(A1)A, to name a few.
Data on graphs is defined as a graph signal x= [x1,...,xn],
where the ith component of xcorresponds to the value of the signal
on node i. A graph signal xcan be aggregated through the graph, by
applying the GSO Snas follows,
z=Snxn.(1)
Intuitively, the value of signal zat coordinate iis the weighted av-
erage of the information present in its one hop neighborhood, i.e.
zi=Pj∈N (i)[Sn]ij xj. We can construct k-hop diffusion by apply-
ing kpower of Sn. We define the graph convolutions, as a weighted
average over the powers of Sn. Explicitly, by letting the coefficients
be h= [h0,...,hK1], we define the graph convolution as,
yn=hSnxn=
K1
X
k=0
hkSk
nxn(2)
arXiv:2210.15567v1 [cs.LG] 27 Oct 2022
where xn,ynare graph signals, and hSndenotes the convolution
operator with GSO Sn.
In the case of undirected graphs, Sn/n becomes symmetric, ad-
mitting a spectral decomposition Sn=VΛVT, where the columns
of Vare the eigenvectors of Sn, and Λis a diagonal matrix with
1≤ ··· ≤ . . . 1. Since the eigenvector of Snfor a basis of Rn,
we can project filter yninto this basis to obtain,
h(λ) =
K1
X
k=0
hkλk(3)
Note that h(λ)only depends on hk, and on the eigenvalues of the
GSO. By the Cayley-Hamilton theorem, convolutional filters may
be used to represent any graph filter with spectral representation
h(λ) = f(λ), where fis analytic.
Graph Neural Networks are layered architectures, composed
of graph convolutions followed by point-wise non-linearities. For-
mally, introducing a point-wise non-linearity ρ, and by stacking
all the graph signals at layer lin a matrix Xl= [x1
nl,...,xFl
nl ]
Rn×Fl, where Flindicates the Flfeatures at layer l. Notice that for
l= 0, the input matrix is a concatenation of the input signal, i.e.
X0= [xn,...,xn]. We can define the llayer of a GNN as,
Xl=ρK
X
k=1
Xk
nXl1Hlk,(4)
where matrix Hlk RFl1×Flrepresents the kcoefficient of the
graph convolution. By grouping all the learnable parameters we can
obtain a more succint representation of the GNN as φ(x;H,S), with
H={Hlk}lk,1lL, and 0kK1.
2.1. Graphon Neural Networks
Graphons are the limit object of a converging sequence of dense
undirected graphs. Formally, a graphon is a symmetric, bounded,
and measurable function W: [0,1]2[0,1]. Sequences of dense
graphs converge to a graphon in the sense that the densities of ad-
jacency preserving graph motifs converge to the densities of these
same motifs on the graphon [18, 21].
Analogously to graph signals, we can define a graphon signal as
a function XL2([0,1]). A graphon signal can be diffused over
the graphon by applying the linear integrator operator given by,
TWX(v) = Z1
0
W(u, v)X(v)du (5)
which we denote graphon shift operator (WSO). Since graphon W
is bounded and symmetric, TWis a Hilbert-Schmidt and self adjoint
operator [23]. Therefore, we can express Wby its eigen decompo-
sition, W(u, v) = PiZr0λiφi(u)φi(v), where λiis the eigen-
value associated with eigenfunction φi. The absolute value of the
eigenvalues is bounded by 1, and the eigenfunctions form an orthog-
onal basis of L2([0,1]). Utilizing the graphon shift, we define the
graphon convolution with parameters h= [h0,...,hK1], as
Y=ThX=hWX=
K1
X
k=0
hk(T(k)
WX)(v)with (6)
(T(k)
WX)(v) = Z1
0
W(u, v)(T(k1)
WX)(u)du (7)
where X, Y are graphon signals, Wdefines the convolution oper-
ator, and T(0) =Iis the identity [24]. Projecting the filter (6) onto
the eigenbasis {φi}iZr0, the graphon convolution admits a spectral
representation given by,
h(λ) =
K1
X
k=0
hkλk.(8)
Like its graphon counterpart (cf. (3)) the graphon convolution admits
a spectral representation that only depends on the coefficients of the
filter h.
Graphon Neural Networks (WNNs) are the extension of GNNs
to graphon data. Each layer of a WNN is composed by a graphon
convolution (6), and a non-linearity ρ. Denoting Flthe features at
layer l, we can group the parameters of the Fl1×Flconvolutions
into Kmatrices {Hlk} ∈ RFl1×Fl, to write the fth feature of the
lth layer as follows,
Xf
l=ρFl1
X
g=1
K1
X
k=1
(T(k)
WXg
l1)[Hlk]gf (9)
for 1gF0. For an Llayered WNN, Xg
0is given by the
input data Xg, and (9) is repeated for 1lL, and the output
of the WNN is given by Y=XL. Analogoues to GNNS, a more
succint representation of the WNN can be obtained be grouping all
the coefficients, i.e. φ(X;H,W), with H={Hlk }lk ,1lL,
and 0kK1.
2.2. From Graphons to Graphs, and Back
In this paper we focus on graphons, and graphon signals as genera-
tive models of graphs, and graph signals. Let {ui}N
i=1 be npoints
sampled independently at uniformly from [0,1],uiunif(0,1).
The n-node stochastic GSO of graph Gnis obtained from graphon
Was follows,
[Sn]ij = [Sn]ji Bernoulli(W(ui, uj))) (10)
A graph signal xcan be obtained by evaluating the graphon signal,
[x]i=X(ui)for all i[n].(11)
Therefore, we can obtain stochastic graphs Sn, and graph signals x
from graphon data W, X.
A graphon can be induced by a graph even if the node labels
{ui}N
i=1 are unknown. Let SnRn×nbe the GSO of a graph, and
define ui= (i1)/n for 1in. We construct the intervals
Ii= [ui, ui+1]for 1in. Letting 1be the indicator function,
the induced graphon WS, and graph signal xcan be obtained as,
WS(u, v) =
n
X
i=1
n
X
j=1
[Sn]ij 1(uIi)1(vIj),and (12)
Xn(u) =
n
X
i=1
[xn]i1(uIi).(13)
3. LEARNING BY TRANSFERENCE
We are interested in solving a statistical learning problem where the
data is supported in very large graphs. In the limit, this corresponds
to a graphon, so we consider the problem of learning a WNN.
Let `:R×RRbe a non-negative loss function such that
`(x, y)=0if and only if x=y, and let p(X, Y )be an unknown
摘要:

TRAININGGRAPHNEURALNETWORKSONGROWINGSTOCHASTICGRAPHSJuanCervi˜no1,LuanaRuiz2,andAlejandroRibeiro11DepartmentofElectricalandSystemsEngineering,UniversityofPennsylvania,Philadelphia,USA2Simons-BerkeleyInstituteABSTRACTGraphNeuralNetworks(GNNs)relyongraphconvolutionstoex-ploitmeaningfulpatternsinnetwor...

展开>> 收起<<
TRAINING GRAPH NEURAL NETWORKS ON GROWING STOCHASTIC GRAPHS Juan Cervi no1 Luana Ruiz2 and Alejandro Ribeiro1 1Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia USA.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:319.21KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注