FOSR F IRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNN S Kedar Karhadkar

2025-05-06 0 0 1.62MB 21 页 10玖币
侵权投诉
FOSR: FIRST-ORDER SPECTRAL REWIRING FOR
ADDRESSING OVERSQUASHING IN GNNS
Kedar Karhadkar
UCLA
kedar@math.ucla.edu
Pradeep Kr. Banerjee
MPI MiS
pradeep@mis.mpg.de
Guido Montúfar
UCLA & MPI MiS
montufar@math.ucla.edu
ABSTRACT
Graph neural networks (GNNs) are able to leverage the structure of graph data by
passing messages along the edges of the graph. While this allows GNNs to learn
features depending on the graph structure, for certain graph topologies it leads to
inefficient information propagation and a problem known as oversquashing. This
has recently been linked with the curvature and spectral gap of the graph. On the
other hand, adding edges to the message-passing graph can lead to increasingly
similar node representations and a problem known as oversmoothing. We propose
a computationally efficient algorithm that prevents oversquashing by systematically
adding edges to the graph based on spectral expansion. We combine this with a
relational architecture, which lets the GNN preserve the original graph structure
and provably prevents oversmoothing. We find experimentally that our algorithm
outperforms existing graph rewiring methods in several graph classification tasks.
1 INTRODUCTION
Graph neural networks (GNNs) (Gori et al.,2005;Scarselli et al.,2008) are a broad class of models
which process graph-structured data by passing messages between nodes of the graph. Due to the
versatility of graphs, GNNs have been applied to a variety of domains, such as chemistry, social
networks, knowledge graphs, and recommendation systems (Zhou et al.,2020;Wu et al.,2020).
GNNs broadly follow a message-passing framework, meaning that each layer of the GNN aggregates
the representations of a node and its neighbors, and transforms these features into a new representation
for that node. The aggregation function used by the GNN layer is taken to be locally permutation-
invariant, since the ordering of the neighbors of a node is arbitrary, and its specific form is a key
component of the GNN architecture; varying it gives rise to several common GNN variants (Kipf and
Welling,2017;Veliˇ
ckovi´
c et al.,2018;Li et al.,2015;Hamilton et al.,2017;Xu et al.,2019). The
output of a GNN can be used for tasks such as graph classification or node classification.
Although GNNs are successful in computing dependencies between nodes of a graph, they have
been found to suffer from a limited capacity to capture long-range interactions. For a fixed graph,
this is caused by a variety of problems depending on the number of layers in the GNN. Since graph
convolutions are local operations, a GNN with a small number of layers can only provide a node with
information from nodes close to itself. For a GNN with
l
layers, the receptive field of a node (the set
of nodes it receives messages from) is exactly the ball of radius
l
about the node. For small values of
l
, this results in “underreaching”, and directly limits which functions the GNN can represent. On
a related note, the functions representable by GNNs with
l
layers are limited to those computable
by
l
steps of the Weisfeiler-Lehman (WL) graph isomorphism test (Morris et al.,2019;Xu et al.,
2019;Barceló et al.,2020). On the other hand, increasing the number of layers leads to its own set of
problems. In contrast to other architectures that benefit from the expressivity of deeper networks,
GNNs experience a decrease in accuracy as the number of layers increases (Li et al.,2018;Chen
et al.,2020). This phenomenon has partly been attributed to “oversmoothing”, where repeated graph
convolutions eventually render node features indistinguishable (Li et al.,2018;Oono and Suzuki,
2020;Cai and Wang,2020;Zhao and Akoglu,2020;Rong et al.,2020;Di Giovanni et al.,2022).
Separate from oversmoothing is the problem of “oversquashing” first pointed out by Alon and Yahav
(2021). As the number of layers of a GNN increases, information from (potentially) exponentially-
growing receptive fields need to be concurrently propagated at each message-passing step. This leads
1
arXiv:2210.11790v2 [cs.LG] 16 Feb 2023
to a bottleneck that causes oversquashing, when an exponential amount of information is squashed
into fixed-size node vectors (Alon and Yahav,2021). Consequently, for prediction tasks relying on
long-range interactions, the GNN can fail. Oversquashing usually occurs when there are enough
layers in the GNN to reach any node (the receptive fields are large enough), but few enough that
the GNN cannot process all of the necessary relations between nodes. Hence, for a fixed graph,
the problems of underreaching, oversquashing, and oversmoothing occur in three different regimes,
depending on the number of layers of the GNN.
Figure 1: Top: Schematic showing different
rewiring methods, FoSR (ours), SDRF (Top-
ping et al.,2022), and G-RLEF (Banerjee et al.,
2022) for alleviating structural bottlenecks in
the input graph. Our method adds new edges
that are labeled differently from the existing
ones so that the GNN can distinguish them
in training. Bottom: Normalized spectral gap
and training accuracy as functions of the num-
ber of rewiring iterations for a learning task
modeled on the NEIGHBORSMATCH problem
for a path-of-cliques input (for details, see Ap-
pendix B.1.1).
A common approach to addressing oversquashing
is to rewire the input graph, making changes to its
edges so that it has fewer structural bottlenecks. A
simple approach to rewiring is to make the last layer
of the GNN fully adjacent, allowing all nodes to
interact with one another (Alon and Yahav,2021).
Alternatively, one can make changes to edges of
the input graph, feeding the modified graph into all
layers of the GNN (Topping et al.,2022;Banerjee
et al.,2022). The latter approaches can be viewed
as optimizing the spectral gap of the input graph for
alleviating structural bottlenecks and improving the
overall quality of signal propagation across nodes
(see Figure 1).
While these rewiring methods improve the connec-
tivity of the graph, there are drawbacks to making
too many modifications to the input. The most obvi-
ous problem is that we are losing out on topological
information about the original graph. If the structure
of the original graph is indeed relevant, adding and
removing edges diminishes that benefit to the task.
Another issue arises from the smoothing effects of
adding edges: If we add too many edges to the in-
put graph, an ordinary GCN will suffer from over-
smoothing (Li et al.,2018). In other words, if we use
this natural approach to rewiring, we experience a
trade-off between oversquashing and oversmoothing.
This observation, which does not seem to have been
pointed out in earlier works, is the main motivation
for the approach that we develop in this work.
1.1 MAIN CONTRIBUTIONS
This paper presents a new framework for rewiring a graph to reduce oversquashing in GNNs while
preventing oversmoothing. Here are our main contributions:
We introduce a framework for graph rewiring which can be used with any rewiring method that
sequentially adds edges. In contrast to previous approaches that only modify the input graph (e.g.,
Topping et al.,2022;Banerjee et al.,2022;Bober et al.,2022), our solution gives special labels to
the added edges. We then use a relational GNN on this new graph, with the relations corresponding
to whether the edge was originally in the input graph or added during the rewiring. This allows
us to preserve the input graph topology while using the new edges to improve its connectivity. In
Theorem 3we show that this approach also prevents oversmoothing.
We introduce a new rewiring method, FoSR (First-order Spectral Rewiring) aimed at optimizing the
spectral gap of the graph input to the GNN (Algorithm 1). This algorithm computes the first-order
change in the spectral gap from adding each edge, and then adds the edge which maximizes this
(Theorem 4and Proposition 5).
We empirically demonstrate that the proposed method results in faster spectral expansion (a marker
of reduced oversquashing) and improved test accuracy against several baselines on several graph
2
classification tasks (see Table 1). Experiments demonstrate that the relational structure preserving
the original input graph significantly boosts test accuracy.
1.2 RELATED WORKS
Past approaches to reducing oversquashing have hinged upon choosing a measure of oversquashing,
and modifying the edges of the graph to minimize it. Topping et al. (2022) argue that negatively
curved edges are responsible for oversquashing drawing on curvature notions from Forman (2003) and
Ollivier (2009). They introduce a rewiring method known as stochastic discrete Ricci Flow (SDRF),
which aims to increase the balanced Forman curvature of negatively curved edges by adding new
edges. Bober et al. (2022) extend this line of investigation by considering the same type of rewiring
but using different notions of discrete curvature. Banerjee et al. (2022) approach oversquashing
from an information-theoeretic viewpoint, measuring it in terms of the spectral gap of the graph and
demonstrate empirically that this can increase accuracy for certain graph classification tasks. They
propose a rewiring algorithm greedy random local edge flip (G-RLEF) motivated by an expander
graph construction employing an effective resistance (Lyons and Peres,2017) based edge sampling
strategy. The work of Alon and Yahav (2021) first pointing at oversquashing also introduced an
approach to rewiring, where they made the last GNN layer an expander – the complete graph that
allows every pair of nodes to connect to each other. They also experimented with making the last
layer partially adjacent (randomly including any potential edge). This can be thought of as a form of
spectral expansion in the final layer since random graphs have high spectral gap (Friedman,1991). In
contrast to these works, our method gives a practical way of achieving the largest possible increase in
the spectral graph with the smallest possible modification of the input graph and in fact preserving
the input graph topology via a relational structure.
Although not as closely related, we find it worthwhile also pointing at following works in this general
context. Prior to the diagnosis of the oversquashing problem, Klicpera et al. (2019) used graph
diffusion to rewire the input graph, improving long-range connectivity for the GNN. Rewiring can
also be performed while training a GNN. Arnaiz-Rodríguez et al. (2022) use first-order spectral
methods to define a loss function depending on the adjacency matrix, allowing a GNN to learn
a rewiring that alleviates oversquashing. We should mention that aside from rewiring the input
graph, some works pursue different approaches to solve oversquashing, such as creating positional
embeddings for the nodes or edges inspired by the transformer architecture (Vaswani et al.,2017).
The most direct generalization of this approach to graphs is using Laplacian embeddings (Kreuzer
et al.,2021;Dwivedi and Bresson,2020). Brüel-Gabrielsson et al. (2022) combine this with adding
neighbors to encode the edges which are the result of multiple hops.
2 PRELIMINARIES
2.1 BACKGROUND ON SPECTRAL GRAPH THEORY
Let
G= (V,E,R)
be an undirected graph with node set
V
,
|V| =n
, edge set
E
,
|E| =m
, and
relation set
R
. The set
R
is a finite set of relation types, and elements
(u, v, r)∈ E
consist of a pair
of nodes
u, v ∈ V
together with an associated relation type
r∈ R
. When the relation type of an edge
is not relevant, we will simply write
(u, v)
for an edge. For each
v∈ V
we define
N(v)
to consist of
all neighbors of
v
, that is all
u∈ V
such that there exists an edge
(u, v)∈ E
. For each
r∈ R
and
v∈ V
, we define
Nr(v)
to consist of all neighbors of
v
of relation type
r
. The degree
dv
of a node
v∈ V
is the number of neighbors of
v
. We define the adjacency matrix
A=A(G)
by
Aij = 1
if
(i, j)∈ E
, and
Aij = 0
otherwise. Let
D=D(G)
denote the diagonal matrix of degrees given by
Dii =di
. The normalized Laplacian
L=L(G)
is defined as
L=ID1/2AD1/2
. We will often
add self-loops (edges
(i, i)
for
i∈ V
) to the graphs we consider, so we define augmented versions of
the above matrices corresponding to graphs with self-loops added. If
G
is a graph without self-loops,
we define its augmented adjacency matrix
˜
A:= I+A
, its augmented degree matrix
˜
D=I+D
,
and its augmented Laplacian ˜
L=I˜
D1/2˜
A˜
D1/2.
We denote the eigenvalues of the normalized Laplacian
L
by
0 = λ1λ2≤ ··· ≤ λn2
.
Let
1
denote the constant function which assumes the value 1 on each node. Then
D1/21
is an
eigenfunction of
L
with eigenvalue
0
. The spectral gap of
G
is
λ2λ1=λ2
. We say that
G
has
3
good spectral expansion if it has a large spectral gap. In Appendix A, we review the relation between
the spectral gap and a related measure of graph expansion, the Cheeger constant.
2.2 BACKGROUND ON RELATIONAL GNNS
The framework we propose fundamentally relies on relational GNNs (R-GNNs) (Battaglia et al.,
2018), so we review their formulation here. We define a general R-GNN layer by
h(k+1)
v=φkh(k)
v,Pr∈R Pu∈Nr(v)ψk,r (h(k)
u, h(k)
v),
where the
ψk,r :Rdk×RdkRdk+1
are message-passing functions, and the
φk:Rdk+1 Rdk+1
are update functions. The
φk
and
ψk,r
can either be fixed mappings or learned during training. All
GNN layer types we consider will be special cases of this general R-GNN layer. We recover a
classical (non-relational) GNN when there is only one relation, |R| = 1.
As a special case of R-GNNs, R-GCN layers are defined by the update
h(k+1)
v=σW(k)h(k)
v+Pr∈R Pu∈Nr(v)1
cu,v W(k)
rh(k)
u,
where
σ
is a nonlinear activation,
cu,v
is a normalization factor, and
W(k)
,
W(k)
r
are relation-
specific learned linear maps. One can interpret the
W h(k)
v
term as adding self-loops to the original
graph, and encoding these self-loops as their own relation. We often take
σ=ReLU
, and
cu,v =
p(1 + du)(1 + dv).
Other specializations of R-GNNs include graph convolutional networks (GCNs) (Kipf and Welling,
2017), defined by
h(k+1)
v=σPu∈N(v)∪{v}1
cu,v W(k)h(k)
u,
and graph isomorphism networks (GINs) (Xu et al.,2019), defined by
h(k+1)
v=MLP(k)Pu∈N(v)∪{v}h(k)
u.
3 RELATIONAL REWIRING OF GNNS
We introduce a graph rewiring framework to improve connectivity while retaining the original graph
via a relational structure, which demonstrably allows us to also control the rate of smoothing. The
rate of smoothing measures the similarity of neighboring node features in the GNN output graph.
Adding separate weights for the rewired edges allows the network more flexibility in learning an
appropriate rate of smoothing. Our main result in this section, Theorem 3makes this precise.
3.1 RELATIONAL REWIRING
We incorporate relations into our architecture in the following way. Suppose that we rewire
G=
(V,E1)
by adding edges, yielding a rewired graph
G0= (V,E1∪ E2)
. We equip
G0
with a relational
structure by assigning each edge in
E1
the edge type 1, and each edge in
E2
the edge type 2. For
example, in an R-GCN, this relational structure would result in the following layer form:
h(k+1)
v=σW(k)h(k)
v+P(u,v)∈E1
1
cu,v W(k)
1h(k)
u+P(u,v)∈E2
1
cu,v W(k)
2h(k)
u.
The original layer (before relational rewiring) would include only the first two terms. A rewired graph
with no relational structure would add the third term but use the same weights W(k)
2=W(k)
1.
A relational structure serves to provide the GNN with more flexibility, since it remembers both the
original structure of the graph and the rewired structure. In the worst case scenario, if we add too
many edges, the GNN may counterbalance this by setting most of the weights
W(k)
2
to be 0, focusing
its attention on the original graph. In a more realistic scenario, the GNN can use the original edge
weights
W(k)
1
to compute important structural features of the graph, and the weights
W(k)
2
simply to
transmit information along the graph. Since the weights assigned to the original and rewired edges
are separate, their purposes can be served in parallel without sacrificing the topology of the graph.
Finally, this R-GCN model allows us to better regulate the rate of smoothing as we demonstrate next.
4
3.2 RATE OF SMOOTHING FOR R-GCN WITH REWIRING
In this section, we analyze the smoothing effects of repeatedly applying R-GCN layers with rewiring.
Here, we consider R-GCN layers ϕ:Rn×pRn×pof the form
ϕ(X) = XΘ + Pr∈R D1/2ArD1/2XΘr,(1)
where
Θ,ΘrRp×p
are weight matrices,
Ar
is the adjacency matrix of the
r
-th relation, and
D
is
the matrix of degrees of
G
. For a vanilla GCN without relations or residual connections, one problem
with rewiring the graph is that adding too many edges will result in oversmoothing (Di Giovanni
et al.,2022). Indeed, adding edges to a graph will by definition increase its Cheeger constant. For
input graphs with a high spectral gap (and hence high Cheeger constant), a GCN will produce similar
representations for each node (Xu et al.,2018;Oono and Suzuki,2020). As an extreme case, if we
replace the graph with a complete graph, the GCN layer will assign each node the same representation.
We will show that R-GCNs are robust to this effect, allowing us to add edges without the adverse side
effect of oversmoothing.
Definition 1.
Let
G
be a connected graph with adjacency matrix
A
and normalized Laplacian
L
. For
i∈ V
, let
di
denote the degree of node
i
. Given a scalar field
fRn
, its Dirichlet energy with
respect to Gis defined as
E(f) := 1
2Pi,j Ai,j fi
difj
dj2
=fTLf.
For a vector field XRn×p, we define
E(X) := 1
2Pi,j,k Ai,j Xi,k
diXj,k
dj2
= Tr(XTLX).
The Dirichlet energy of a unit-norm function on the nodes of a graph is a measure of how “non-
smooth” it is (Chung and Graham,1997). The Dirichlet energy of a function
f
is small when
fi/di
is close in value to
fj/pdj
for all edges
(i, j)
. In the case where
G
is
d
-regular (all nodes have
degree
d
), this reduces to the familiar notion of adjacent nodes having representations close in value.
Hence, we make the following definition:
Definition 2.
Let
G
be a graph and
ϕ:Rn×pRn×p
be a mapping. We define the rate of smoothing
of ϕwith respect to Gas
RSG(ϕ) := 1 supX:E(X)6=0 E(ϕ(X))/E(X)
supX:X6=0 kϕ(X)k2
F/kXk2
F1/2
.
The numerator of the fraction above indicates the rate of decay (or expansion) of the Dirichlet energy
upon applying
ϕ
. We take a supremum over
X
to find the largest possible relative change in energy
upon applying
ϕ
. We would also like our notion of the rate of smoothing to be scale-invariant;
multiplying
ϕ
by a scalar should not change its rate of smoothing. To impose scale-invariance, we
divide by a factor in the denominator which captures how much
ϕ
scales up the entries of
X
. By
defining the the rate of smoothing as a ratio of two norms, we can estimate them separately which
gives us a good theoretical handle of smoothing. Note that if ϕis linear,
RSG(ϕ) := 1 supE(X)=1 E(ϕ(X))
supkXkF=1 kϕ(X)k2
F!1/2
,
since
k · k2
F
and
E
are quadratic forms. The following theorem shows that R-GCN layers are flexible
in their ability to choose an appropriate rate of smoothing for a graph:
Theorem 3.
Let
G1= (V,E1)
be a graph and
G2= (V,E1∪ E2)
be a rewiring of
G1
. Consider an
R-GCN layer
ϕ
defined as in
(1)
, with relations
r1=E1
,
r2=E2
. Then for any
λ[0, λ2(L(G2))]
,
there exist values of Θ,Θ1,Θ2for which ϕsmooths with rate RSG2(ϕ) = λwith respect to G2.
Proof. The map ϕis given by
ϕ(X; Θ,Θ1,Θ2) = XΘ + D1/2A1D1/2XΘ1+D1/2A2D1/2XΘ2,
5
摘要:

FOSR:FIRST-ORDERSPECTRALREWIRINGFORADDRESSINGOVERSQUASHINGINGNNSKedarKarhadkarUCLAkedar@math.ucla.eduPradeepKr.BanerjeeMPIMiSpradeep@mis.mpg.deGuidoMontúfarUCLA&MPIMiSmontufar@math.ucla.eduABSTRACTGraphneuralnetworks(GNNs)areabletoleveragethestructureofgraphdatabypassingmessagesalongtheedgesofthegra...

展开>> 收起<<
FOSR F IRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNN S Kedar Karhadkar.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:1.62MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注