FOSR F IRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNN S Kedar Karhadkar

2025-05-06 1 0 1.62MB 21 页 10玖币

侵权投诉

FOSR: FIRST-ORDER SPECTRAL REWIRING FOR

ADDRESSING OVERSQUASHING IN GNNS

Kedar Karhadkar

UCLA

kedar@math.ucla.edu

Pradeep Kr. Banerjee

MPI MiS

pradeep@mis.mpg.de

Guido Montúfar

UCLA & MPI MiS

montufar@math.ucla.edu

ABSTRACT

Graph neural networks (GNNs) are able to leverage the structure of graph data by

passing messages along the edges of the graph. While this allows GNNs to learn

features depending on the graph structure, for certain graph topologies it leads to

inefﬁcient information propagation and a problem known as oversquashing. This

has recently been linked with the curvature and spectral gap of the graph. On the

other hand, adding edges to the message-passing graph can lead to increasingly

similar node representations and a problem known as oversmoothing. We propose

a computationally efﬁcient algorithm that prevents oversquashing by systematically

adding edges to the graph based on spectral expansion. We combine this with a

relational architecture, which lets the GNN preserve the original graph structure

and provably prevents oversmoothing. We ﬁnd experimentally that our algorithm

outperforms existing graph rewiring methods in several graph classiﬁcation tasks.

1 INTRODUCTION

Graph neural networks (GNNs) (Gori et al.,2005;Scarselli et al.,2008) are a broad class of models

which process graph-structured data by passing messages between nodes of the graph. Due to the

versatility of graphs, GNNs have been applied to a variety of domains, such as chemistry, social

networks, knowledge graphs, and recommendation systems (Zhou et al.,2020;Wu et al.,2020).

GNNs broadly follow a message-passing framework, meaning that each layer of the GNN aggregates

the representations of a node and its neighbors, and transforms these features into a new representation

for that node. The aggregation function used by the GNN layer is taken to be locally permutation-

invariant, since the ordering of the neighbors of a node is arbitrary, and its speciﬁc form is a key

component of the GNN architecture; varying it gives rise to several common GNN variants (Kipf and

Welling,2017;Veliˇ

ckovi´

c et al.,2018;Li et al.,2015;Hamilton et al.,2017;Xu et al.,2019). The

output of a GNN can be used for tasks such as graph classiﬁcation or node classiﬁcation.

Although GNNs are successful in computing dependencies between nodes of a graph, they have

been found to suffer from a limited capacity to capture long-range interactions. For a ﬁxed graph,

this is caused by a variety of problems depending on the number of layers in the GNN. Since graph

convolutions are local operations, a GNN with a small number of layers can only provide a node with

information from nodes close to itself. For a GNN with

layers, the receptive ﬁeld of a node (the set

of nodes it receives messages from) is exactly the ball of radius

about the node. For small values of

, this results in “underreaching”, and directly limits which functions the GNN can represent. On

a related note, the functions representable by GNNs with

layers are limited to those computable

steps of the Weisfeiler-Lehman (WL) graph isomorphism test (Morris et al.,2019;Xu et al.,

2019;Barceló et al.,2020). On the other hand, increasing the number of layers leads to its own set of

problems. In contrast to other architectures that beneﬁt from the expressivity of deeper networks,

GNNs experience a decrease in accuracy as the number of layers increases (Li et al.,2018;Chen

et al.,2020). This phenomenon has partly been attributed to “oversmoothing”, where repeated graph

convolutions eventually render node features indistinguishable (Li et al.,2018;Oono and Suzuki,

2020;Cai and Wang,2020;Zhao and Akoglu,2020;Rong et al.,2020;Di Giovanni et al.,2022).

Separate from oversmoothing is the problem of “oversquashing” ﬁrst pointed out by Alon and Yahav

(2021). As the number of layers of a GNN increases, information from (potentially) exponentially-

growing receptive ﬁelds need to be concurrently propagated at each message-passing step. This leads

arXiv:2210.11790v2 [cs.LG] 16 Feb 2023

to a bottleneck that causes oversquashing, when an exponential amount of information is squashed

into ﬁxed-size node vectors (Alon and Yahav,2021). Consequently, for prediction tasks relying on

long-range interactions, the GNN can fail. Oversquashing usually occurs when there are enough

layers in the GNN to reach any node (the receptive ﬁelds are large enough), but few enough that

the GNN cannot process all of the necessary relations between nodes. Hence, for a ﬁxed graph,

the problems of underreaching, oversquashing, and oversmoothing occur in three different regimes,

depending on the number of layers of the GNN.

Figure 1: Top: Schematic showing different

rewiring methods, FoSR (ours), SDRF (Top-

ping et al.,2022), and G-RLEF (Banerjee et al.,

2022) for alleviating structural bottlenecks in

the input graph. Our method adds new edges

that are labeled differently from the existing

ones so that the GNN can distinguish them

in training. Bottom: Normalized spectral gap

and training accuracy as functions of the num-

ber of rewiring iterations for a learning task

modeled on the NEIGHBORSMATCH problem

for a path-of-cliques input (for details, see Ap-

pendix B.1.1).

A common approach to addressing oversquashing

is to rewire the input graph, making changes to its

edges so that it has fewer structural bottlenecks. A

simple approach to rewiring is to make the last layer

of the GNN fully adjacent, allowing all nodes to

interact with one another (Alon and Yahav,2021).

Alternatively, one can make changes to edges of

the input graph, feeding the modiﬁed graph into all

layers of the GNN (Topping et al.,2022;Banerjee

et al.,2022). The latter approaches can be viewed

as optimizing the spectral gap of the input graph for

alleviating structural bottlenecks and improving the

overall quality of signal propagation across nodes

(see Figure 1).

While these rewiring methods improve the connec-

tivity of the graph, there are drawbacks to making

too many modiﬁcations to the input. The most obvi-

ous problem is that we are losing out on topological

information about the original graph. If the structure

of the original graph is indeed relevant, adding and

removing edges diminishes that beneﬁt to the task.

Another issue arises from the smoothing effects of

adding edges: If we add too many edges to the in-

put graph, an ordinary GCN will suffer from over-

smoothing (Li et al.,2018). In other words, if we use

this natural approach to rewiring, we experience a

trade-off between oversquashing and oversmoothing.

This observation, which does not seem to have been

pointed out in earlier works, is the main motivation

for the approach that we develop in this work.

1.1 MAIN CONTRIBUTIONS

This paper presents a new framework for rewiring a graph to reduce oversquashing in GNNs while

preventing oversmoothing. Here are our main contributions:

•

We introduce a framework for graph rewiring which can be used with any rewiring method that

sequentially adds edges. In contrast to previous approaches that only modify the input graph (e.g.,

Topping et al.,2022;Banerjee et al.,2022;Bober et al.,2022), our solution gives special labels to

the added edges. We then use a relational GNN on this new graph, with the relations corresponding

to whether the edge was originally in the input graph or added during the rewiring. This allows

us to preserve the input graph topology while using the new edges to improve its connectivity. In

Theorem 3we show that this approach also prevents oversmoothing.

•

We introduce a new rewiring method, FoSR (First-order Spectral Rewiring) aimed at optimizing the

spectral gap of the graph input to the GNN (Algorithm 1). This algorithm computes the ﬁrst-order

change in the spectral gap from adding each edge, and then adds the edge which maximizes this

(Theorem 4and Proposition 5).

•

We empirically demonstrate that the proposed method results in faster spectral expansion (a marker

of reduced oversquashing) and improved test accuracy against several baselines on several graph

classiﬁcation tasks (see Table 1). Experiments demonstrate that the relational structure preserving

the original input graph signiﬁcantly boosts test accuracy.

1.2 RELATED WORKS

Past approaches to reducing oversquashing have hinged upon choosing a measure of oversquashing,

and modifying the edges of the graph to minimize it. Topping et al. (2022) argue that negatively

curved edges are responsible for oversquashing drawing on curvature notions from Forman (2003) and

Ollivier (2009). They introduce a rewiring method known as stochastic discrete Ricci Flow (SDRF),

which aims to increase the balanced Forman curvature of negatively curved edges by adding new

edges. Bober et al. (2022) extend this line of investigation by considering the same type of rewiring

but using different notions of discrete curvature. Banerjee et al. (2022) approach oversquashing

from an information-theoeretic viewpoint, measuring it in terms of the spectral gap of the graph and

demonstrate empirically that this can increase accuracy for certain graph classiﬁcation tasks. They

propose a rewiring algorithm greedy random local edge ﬂip (G-RLEF) motivated by an expander

graph construction employing an effective resistance (Lyons and Peres,2017) based edge sampling

strategy. The work of Alon and Yahav (2021) ﬁrst pointing at oversquashing also introduced an

approach to rewiring, where they made the last GNN layer an expander – the complete graph that

allows every pair of nodes to connect to each other. They also experimented with making the last

layer partially adjacent (randomly including any potential edge). This can be thought of as a form of

spectral expansion in the ﬁnal layer since random graphs have high spectral gap (Friedman,1991). In

contrast to these works, our method gives a practical way of achieving the largest possible increase in

the spectral graph with the smallest possible modiﬁcation of the input graph and in fact preserving

the input graph topology via a relational structure.

Although not as closely related, we ﬁnd it worthwhile also pointing at following works in this general

context. Prior to the diagnosis of the oversquashing problem, Klicpera et al. (2019) used graph

diffusion to rewire the input graph, improving long-range connectivity for the GNN. Rewiring can

also be performed while training a GNN. Arnaiz-Rodríguez et al. (2022) use ﬁrst-order spectral

methods to deﬁne a loss function depending on the adjacency matrix, allowing a GNN to learn

a rewiring that alleviates oversquashing. We should mention that aside from rewiring the input

graph, some works pursue different approaches to solve oversquashing, such as creating positional

embeddings for the nodes or edges inspired by the transformer architecture (Vaswani et al.,2017).

The most direct generalization of this approach to graphs is using Laplacian embeddings (Kreuzer

et al.,2021;Dwivedi and Bresson,2020). Brüel-Gabrielsson et al. (2022) combine this with adding

neighbors to encode the edges which are the result of multiple hops.

2 PRELIMINARIES

2.1 BACKGROUND ON SPECTRAL GRAPH THEORY

Let

G= (V,E,R)

be an undirected graph with node set

|V| =n

, edge set

|E| =m

, and

relation set

. The set

is a ﬁnite set of relation types, and elements

(u, v, r)∈ E

consist of a pair

of nodes

u, v ∈ V

together with an associated relation type

r∈ R

. When the relation type of an edge

is not relevant, we will simply write

(u, v)

for an edge. For each

v∈ V

we deﬁne

N(v)

to consist of

all neighbors of

, that is all

u∈ V

such that there exists an edge

(u, v)∈ E

. For each

r∈ R

and

v∈ V

, we deﬁne

Nr(v)

to consist of all neighbors of

of relation type

. The degree

of a node

v∈ V

is the number of neighbors of

. We deﬁne the adjacency matrix

A=A(G)

Aij = 1

(i, j)∈ E

, and

Aij = 0

otherwise. Let

D=D(G)

denote the diagonal matrix of degrees given by

Dii =di

. The normalized Laplacian

L=L(G)

is deﬁned as

L=I−D−1/2AD−1/2

. We will often

add self-loops (edges

(i, i)

for

i∈ V

) to the graphs we consider, so we deﬁne augmented versions of

the above matrices corresponding to graphs with self-loops added. If

is a graph without self-loops,

we deﬁne its augmented adjacency matrix

A:= I+A

, its augmented degree matrix

D=I+D

and its augmented Laplacian ˜

L=I−˜

D−1/2˜

A˜

D−1/2.

We denote the eigenvalues of the normalized Laplacian

0 = λ1≤λ2≤ ··· ≤ λn≤2

Let

denote the constant function which assumes the value 1 on each node. Then

D1/21

is an

eigenfunction of

with eigenvalue

. The spectral gap of

λ2−λ1=λ2

. We say that

has

good spectral expansion if it has a large spectral gap. In Appendix A, we review the relation between

the spectral gap and a related measure of graph expansion, the Cheeger constant.

2.2 BACKGROUND ON RELATIONAL GNNS

The framework we propose fundamentally relies on relational GNNs (R-GNNs) (Battaglia et al.,

2018), so we review their formulation here. We deﬁne a general R-GNN layer by

h(k+1)

v=φkh(k)

v,Pr∈R Pu∈Nr(v)ψk,r (h(k)

u, h(k)

v),

where the

ψk,r :Rdk×Rdk→Rdk+1

are message-passing functions, and the

φk:Rdk+1 →Rdk+1

are update functions. The

φk

and

ψk,r

can either be ﬁxed mappings or learned during training. All

GNN layer types we consider will be special cases of this general R-GNN layer. We recover a

classical (non-relational) GNN when there is only one relation, |R| = 1.

As a special case of R-GNNs, R-GCN layers are deﬁned by the update

h(k+1)

v=σW(k)h(k)

v+Pr∈R Pu∈Nr(v)1

cu,v W(k)

rh(k)

u,

where

is a nonlinear activation,

cu,v

is a normalization factor, and

W(k)

are relation-

speciﬁc learned linear maps. One can interpret the

W h(k)

term as adding self-loops to the original

graph, and encoding these self-loops as their own relation. We often take

σ=ReLU

, and

cu,v =

p(1 + du)(1 + dv).

Other specializations of R-GNNs include graph convolutional networks (GCNs) (Kipf and Welling,

2017), deﬁned by

h(k+1)

v=σPu∈N(v)∪{v}1

cu,v W(k)h(k)

u,

and graph isomorphism networks (GINs) (Xu et al.,2019), deﬁned by

h(k+1)

v=MLP(k)Pu∈N(v)∪{v}h(k)

u.

3 RELATIONAL REWIRING OF GNNS

We introduce a graph rewiring framework to improve connectivity while retaining the original graph

via a relational structure, which demonstrably allows us to also control the rate of smoothing. The

rate of smoothing measures the similarity of neighboring node features in the GNN output graph.

Adding separate weights for the rewired edges allows the network more ﬂexibility in learning an

appropriate rate of smoothing. Our main result in this section, Theorem 3makes this precise.

3.1 RELATIONAL REWIRING

We incorporate relations into our architecture in the following way. Suppose that we rewire

(V,E1)

by adding edges, yielding a rewired graph

G0= (V,E1∪ E2)

. We equip

with a relational

structure by assigning each edge in

the edge type 1, and each edge in

the edge type 2. For

example, in an R-GCN, this relational structure would result in the following layer form:

h(k+1)

v=σW(k)h(k)

v+P(u,v)∈E1

cu,v W(k)

1h(k)

u+P(u,v)∈E2

cu,v W(k)

2h(k)

u.

The original layer (before relational rewiring) would include only the ﬁrst two terms. A rewired graph

with no relational structure would add the third term but use the same weights W(k)

2=W(k)

A relational structure serves to provide the GNN with more ﬂexibility, since it remembers both the

original structure of the graph and the rewired structure. In the worst case scenario, if we add too

many edges, the GNN may counterbalance this by setting most of the weights

W(k)

to be 0, focusing

its attention on the original graph. In a more realistic scenario, the GNN can use the original edge

weights

W(k)

to compute important structural features of the graph, and the weights

W(k)

simply to

transmit information along the graph. Since the weights assigned to the original and rewired edges

are separate, their purposes can be served in parallel without sacriﬁcing the topology of the graph.

Finally, this R-GCN model allows us to better regulate the rate of smoothing as we demonstrate next.

3.2 RATE OF SMOOTHING FOR R-GCN WITH REWIRING

In this section, we analyze the smoothing effects of repeatedly applying R-GCN layers with rewiring.

Here, we consider R-GCN layers ϕ:Rn×p→Rn×pof the form

ϕ(X) = XΘ + Pr∈R D−1/2ArD−1/2XΘr,(1)

where

Θ,Θr∈Rp×p

are weight matrices,

is the adjacency matrix of the

-th relation, and

the matrix of degrees of

. For a vanilla GCN without relations or residual connections, one problem

with rewiring the graph is that adding too many edges will result in oversmoothing (Di Giovanni

et al.,2022). Indeed, adding edges to a graph will by deﬁnition increase its Cheeger constant. For

input graphs with a high spectral gap (and hence high Cheeger constant), a GCN will produce similar

representations for each node (Xu et al.,2018;Oono and Suzuki,2020). As an extreme case, if we

replace the graph with a complete graph, the GCN layer will assign each node the same representation.

We will show that R-GCNs are robust to this effect, allowing us to add edges without the adverse side

effect of oversmoothing.

Deﬁnition 1.

Let

be a connected graph with adjacency matrix

and normalized Laplacian

. For

i∈ V

, let

denote the degree of node

. Given a scalar ﬁeld

f∈Rn

, its Dirichlet energy with

respect to Gis deﬁned as

E(f) := 1

2Pi,j Ai,j fi

√di−fj

√dj2

=fTLf.

For a vector ﬁeld X∈Rn×p, we deﬁne

E(X) := 1

2Pi,j,k Ai,j Xi,k

√di−Xj,k

√dj2

= Tr(XTLX).

The Dirichlet energy of a unit-norm function on the nodes of a graph is a measure of how “non-

smooth” it is (Chung and Graham,1997). The Dirichlet energy of a function

is small when

fi/√di

is close in value to

fj/pdj

for all edges

(i, j)

. In the case where

-regular (all nodes have

degree

), this reduces to the familiar notion of adjacent nodes having representations close in value.

Hence, we make the following deﬁnition:

Deﬁnition 2.

Let

be a graph and

ϕ:Rn×p→Rn×p

be a mapping. We deﬁne the rate of smoothing

of ϕwith respect to Gas

RSG(ϕ) := 1 −supX:E(X)6=0 E(ϕ(X))/E(X)

supX:X6=0 kϕ(X)k2

F/kXk2

F1/2

The numerator of the fraction above indicates the rate of decay (or expansion) of the Dirichlet energy

upon applying

. We take a supremum over

to ﬁnd the largest possible relative change in energy

upon applying

. We would also like our notion of the rate of smoothing to be scale-invariant;

multiplying

by a scalar should not change its rate of smoothing. To impose scale-invariance, we

divide by a factor in the denominator which captures how much

scales up the entries of

. By

deﬁning the the rate of smoothing as a ratio of two norms, we can estimate them separately which

gives us a good theoretical handle of smoothing. Note that if ϕis linear,

RSG(ϕ) := 1 − supE(X)=1 E(ϕ(X))

supkXkF=1 kϕ(X)k2

F!1/2

since

k · k2

and

are quadratic forms. The following theorem shows that R-GCN layers are ﬂexible

in their ability to choose an appropriate rate of smoothing for a graph:

Theorem 3.

Let

G1= (V,E1)

be a graph and

G2= (V,E1∪ E2)

be a rewiring of

. Consider an

R-GCN layer

deﬁned as in

(1)

, with relations

r1=E1

r2=E2

. Then for any

λ∈[0, λ2(L(G2))]

there exist values of Θ,Θ1,Θ2for which ϕsmooths with rate RSG2(ϕ) = λwith respect to G2.

Proof. The map ϕis given by

ϕ(X; Θ,Θ1,Θ2) = XΘ + D−1/2A1D−1/2XΘ1+D−1/2A2D−1/2XΘ2,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FOSR:FIRST-ORDERSPECTRALREWIRINGFORADDRESSINGOVERSQUASHINGINGNNSKedarKarhadkarUCLAkedar@math.ucla.eduPradeepKr.BanerjeeMPIMiSpradeep@mis.mpg.deGuidoMontúfarUCLA&MPIMiSmontufar@math.ucla.eduABSTRACTGraphneuralnetworks(GNNs)areabletoleveragethestructureofgraphdatabypassingmessagesalongtheedgesofthegra...

展开>> 收起<<

FOSR F IRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNN S Kedar Karhadkar.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FOSR F IRST-ORDER SPECTRAL REWIRING FOR ADDRESSING OVERSQUASHING IN GNN S Kedar Karhadkar

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: