A Practical Progressively-Expressive GNN Lingxiao Zhao Carnegie Mellon University

2025-04-30 0 0 1.32MB 31 页 10玖币

侵权投诉

A Practical, Progressively-Expressive GNN

Lingxiao Zhao

Carnegie Mellon University

lingxiao@cmu.edu

Louis Härtel

RWTH Aachen University

haertel@informatik.rwth-aachen.de

Neil Shah

Snap Inc.

nshah@snap.com

Leman Akoglu

Carnegie Mellon University

lakoglu@andrew.cmu.edu

Abstract

Message passing neural networks (MPNNs) have become a dominant ﬂavor of

graph neural networks (GNNs) in recent years. Yet, MPNNs come with no-

table limitations; namely, they are at most as powerful as the 1-dimensional

Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism

testing framework. To this end, researchers have drawn inspiration from the

k-WL hierarchy to develop more expressive GNNs. However, current k-WL-

equivalent GNNs are not practical for even small values of k, as k-WL becomes

combinatorially more complex as kgrows. At the same time, several works have

found great empirical success in graph learning tasks without highly expressive

models, implying that chasing expressiveness with a “coarse-grained ruler” of

expressivity like k-WL is often unneeded in practical tasks. To truly under-

stand the expressiveness-complexity tradeoff, one desires a more “ﬁne-grained

ruler,” which can more gradually increase expressiveness. Our work puts forth

such a proposal: Namely, we ﬁrst propose the (k, c)(≤)-SETWL hierarchy with

greatly reduced complexity from k-WL, achieved by moving from k-tuples of

nodes to sets with ≤knodes deﬁned over ≤cconnected components in the in-

duced original graph. We show favorable theoretical results for this model in

relation to k-WL, and concretize it via (k, c)(≤)-SETGNN, which is as expres-

sive as (k, c)(≤)-SETWL. Our model is practical and progressively-expressive,

increasing in power with kand c. We demonstrate effectiveness on several bench-

mark datasets, achieving several state-of-the-art results with runtime and mem-

ory usage applicable to practical graphs. We open source our implementation at

https://github.com/LingxiaoShawn/KCSetGNN.

1 Introduction

In recent years, graph neural networks (GNNs) have gained considerable attention [58,60] for their

ability to tackle various node-level [35,56], link-level [51,66] and graph-level [9,40] learning tasks,

given their ability to learn rich representations for complex graph-structured data. The common tem-

plate for designing GNNs follows the message passing paradigm; these so-called message-passing

neural networks (MPNNs) are built by stacking layers which encompass feature transformation and

aggregation operations over the input graph [25,39]. Despite their advantages, MPNNs have sev-

eral limitations including oversmoothing [12,47,67], oversquashing [2,55], inability to distinguish

node identities [63] and positions [62], and expressive power [61].

Since Xu et al.’s [61] seminal work showed that MPNNs are at most as powerful as the ﬁrst-order

Weisfeiler-Leman (1-WL) test in the graph isomorphism (GI) testing framework, there have been

several follow-up works on improving the understanding of GNN expressiveness [3,15]. In re-

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.09521v3 [cs.LG] 3 Nov 2022

123

124

125

134

135

145

234

235

245

345

Input Graph 1-sets

2-sets 3-sets

c=1 sets

c=2 sets

c=3 sets

super-edges in (3, >=1) super-graph

super-edges in (3, >=2) super-graph

super-edges only in (3, 3) super-graph

3-Bipartite Super-graph

Extract ≤3 Sets Subgraphs

...

GNN

... ... ...

...

345

145

... ... ... ...

Init Set "Color"

Build Super-graph

BidirectionalMessage Passing

All

3(≤)-Sets

Colors

(Emb.)

Pooling

Graph Emb.

Backward

Forward

k-Bipartite GNN on

Super-graph

Figure 1: Main steps of (k, c)(≤)-SETGNN. Given a graph and (k, c), we build the (k, c)-bipartite

super-graph (in middle) containing sets with up to knodes and cconnected components in the

induced subgraph, on which a base GNN assigns initial “colors”. Bidirectional bipartite GNN layers

with frw.-bckw. message passing learn set embeddings, pooled into graph embedding. The size of

super-graph, and accordingly its expressiveness, grows progressively with increasing kand c. The 2-

bipartite message passing generalizes normal GNN, edge GNN, and line graph (see Appendix.A.12).

sponse, the community proposed many GNN models to overcome such limitations [4,52,68].

Several of these aim to design powerful higher-order GNNs which are increasingly expressive

[9,33,41,43] by inspiration from the k-WL hierarchy [54].

A key limitation of such higher-order models that reach beyond 1-WL is their poor scalability;

in fact, these models can only be applied to small graphs with small kin practice [40,41,43]

due to combinatorial growth in complexity. On the other hand lies an open question on whether

practical graph learning tasks indeed need such complex, and extremely expressive GNN models.

Historically, Babai et al. [5] showed that almost all graphs on nnodes can be distinguished by 1-

WL. In other contexts like node classiﬁcation, researchers have encountered superior generalization

performance with graph-augmented multi-layer perceptron (GA-MLP) models [37,50] compared

to MPNNs, despite the former being strictly less expressive than the latter [13]. Considering that

increased model complexity has negative implications in terms of overﬁtting and generalization [29],

it is worth re-evaluating continued efforts to pursue maximally expressive GNNs in practice. Ideally,

one could study these models’ generalization performance on various real-world tasks by increasing

k(expressiveness) in the k-WL hierarchy. Yet, given the impracticality of this too-coarse “ruler”

of expressiveness, which becomes infeasible beyond k>3, one desires a more ﬁne-grained “ruler”,

that is, a new hierarchy whose expressiveness grows more gradually. Such a hierarchy could enable

us to gradually build more expressive models which admit improved scaling, and avoid unnecessary

leaps in model complexity for tasks which do not require them, guarding against overﬁtting.

Present Work. In this paper, we propose such a hierarchy, and an associated practical progressively-

expressive GNN model, called (k, c)(≤)-SETGNN, whose expressiveness can be modulated through

kand c. In a nutshell, we take inspiration from k-WL, yet achieve practicality through three design

ideas which simplify key bottlenecks of scaling k-WL: First, we move away from k-tuples of the

original k-WL to k-multisets (unordered), and then to k(≤)-sets. We demonstrate that these steps

drastically reduce the number nodes in the k-WL graph while retaining signiﬁcant expressiveness.

(Sec.s 3.2−3.3)Second, by considering the underlying sparsity of an input graph, we reduce scope

to k, c(≤)-sets that consist of ≤knodes whose induced subgraph is comprised of ≤c≤kconnected

components. This also yields massive reduction in the number of nodes while improving practical-

ity; i.e. small values of con sparse graphs can allow one to increase kwell beyond 3in practice.

(Sec. 3.4)Third, we design a super-graph architecture that consists of a sequence of k−1bi-

partite graphs over which we learn embeddings for our k, c(≤)-sets, using bidirectional message

passing. These embeddings can be pooled to yield a ﬁnal graph embedding.. (Sec.s 4.1−4.2) We

also speed up initializing “colors” for the k, c(≤)-sets, for c > 1, substantially reducing compu-

tation and memory. (Sec. 4.3) Fig. 1overviews of our proposed framework. Experimentally,

our (k, c)(≤)-SETGNN outperforms existing state-of-the-art GNNs on simulated expressiveness as

well as real-world graph-level tasks, achieving new bests on substructure counting and ZINC-12K,

respectively. We show that generalization performance reﬂects increasing expressiveness by kand

c. Our proposed scalable designs allow us to train models with e.g. k=10 or c=3 with practical

running time and memory requirements.

2 Related Work

MPNN limitations. Given the understanding that MPNNs have expressiveness upper-bounded by

1-WL [61], several researchers investigated what else MPNNs cannot learn. To this end, Chen et al.

[15] showed that MPNNs cannot count induced connected substructures of 3 or more nodes, while

along with [15], Arvind et al. [3] showed that MPNNs can only count star-shaped patterns. Loukas

[38] further proved several results regarding decision problems on graphs (e.g. subgraph veriﬁcation,

cycle detection), ﬁnding that MPNNs cannot solve these problems unless strict conditions of depth

and width are met. Moreover, Garg et al. [22] showed that many standard MPNNs cannot compute

properties such as longest or shortest cycles, diameters or existence of cliques.

Improving expressivity. Several works aim to improve expressiveness limitations of MPNNs. One

approach is to inject features into the MPNN aggregation, motivated by Loukas [38] who showed

that MPNNs can be universal approximators when nodes are sufﬁciently distinguishable. Sato et al.

[53] show that injecting random features can better enable MPNNs to solve problems like minimum

dominating set and maximum matching. You et al. [63] inject cycle counts as node features, moti-

vated by the limitation of MPNNs not being able to count cycles. Others proposed utilizing subgraph

isomorphism counting to empower MPNNs with substructure information they cannot learn [9,10].

Earlier work by You et al. [62] adds positional features to distinguish node which naïve MPNN

embeddings would not. Recently, several works also propose utilizing subgraph aggregation; Zhang

et al. [65], Zhao et al. [68] and Bevilacqua et al. [8] propose subgraph variants of the WL test which

are no less powerful than 3-WL. Recently a following up work [21] shows that rooted subgraph

based extension with 1-WL as kernel is bounded by 3-WL. The community is yet exploring several

distinct avenues in overcoming limitations: Murphy et al. [46] propose a relational pooling mecha-

nism which sums over all permutations of a permutation-sensitive function to achieve above-1-WL

expressiveness. Balcilar et al. [6] generalizes spatial and spectral MPNNs and shows that instances

of spectral MPNNs are more powerful than 1-WL. Azizian et al. [4] and Geerts et al. [24] unify

expressivity and approximation ability results for existing GNNs.

k-WL-inspired GNNs. The k-WL test captures higher-order interactions in graph data by consider-

ing all k-tuples, i.e., size kordered multisets deﬁned over the set of nodes. While highly expressive,

it does not scale to practical graphs beyond a very small k(k= 3 pushes the practical limit even

for small graphs). However, several works propose designs which can achieve k-WL in theory and

strive to make them practical. Maron et al. [41] proposes a general class of invariant graph networks

k-IGNs having exact k-WL expressivity [23], while being not scalable. Morris et al. have several

work on k-WL and its variants like k-LWL [42], k-GNN [43], and δ-k-WL-GNN [44]. Both k-LWL

and k-GNN use a variant of k-WL that only considers k-sets and are strictly weaker than k-WL but

much more practical. In our paper we claim that k(≤)-sets should be used with additional designs

to keep the best expressivity while remaining practical. The δ-k-WL-GNN works with k-tuples but

sparsiﬁes the connections among tuples by considering locality. Qian et al. [48] extends the k-tuple

to subgraph network but has same scalability bottleneck as k-WL. A recent concurrent work Spe-

qNet [45] proposes to reduce number of tuples by restricting the number of connected components

of tuples’ induced subgraphs. The idea is independently explored in our paper in Sec.3.4. All four

variants proposed by Morris et al. do not have realizations beyond k > 3in their experiments while

we manage to reach k= 10. Interestingly, a recent work [34] links graph transformer with k-IGN

that having better (or equal) expressivity, however is still not scalable.

3 A practical progressively-expressive isomorphism test: (k, c)(≤)-SETWL

We ﬁrst motivate our method (k, c)(≤)-SETGNN from the GI testing perspective by introducing

(k, c)(≤)-SETWL. We ﬁrst introduce notation and background of k-WL (Sec. 3.1). Next, we show

how to increase the practicality of k-WL without reducing much of its expressivity, by removing

node ordering (Sec. 3.2), node repetitions (Sec. 3.3), and leveraging graph sparsity (Sec. 3.4). We

then give the complexity analysis (Sec. 3.5). We close the section with extending the idea to k-FWL

which is as expressive as (k+1)-WL (Sec. 3.6). All proofs are included in Appendix A.4.

Notation: Let G= (V(G), E(G), lG)be an undirected, colored graph with nodes V(G), edges

E(G), and a color-labeling function lG:V(G)→Cwhere C={c1, ..., cd}denotes a set of

ddistinct colors. Let [n] = {1,2,3, ..., n}. Let (·)denote a tuple (ordered multiset), {{·}} de-

note a multiset (set which allows repetition), and {·} denote a set. We deﬁne −→

v= (v1, ..., vk)

as a k-tuple, ˜

v={{v1, ..., vk}} as a k-multiset, and ˆ

v={v1, ..., vk}as a k-set. Let −→

v[x/i] =

(v1, ..., vi−1, x, vi+1, ..., vk)denote replacing the i-th element in −→

vwith x, and analogously for

v[x/i]and ˆ

v[x/i](assume mutlisets and sets are represented with the canonical ordering). When

vi∈V(G), let G[−→

v],G[˜

v],G[ˆ

v]denote the induced subgraph on Gwith nodes inside −→

v,˜

v,ˆ

vre-

spectively (keeping repetitions). An isomorphism between two graphs Gand G0is a bijective map-

ping p:V(G)→V(G0)which satisﬁes ∀u, v ∈V(G),(u, v)∈E(G)⇐⇒ (p(u), p(v)) ∈E(G0)

and ∀u∈V(G), lG(u) = lG0(p(u)). Two graphs G, G0are isomorphic if there exists an isomor-

phism between them, which we denote as G∼

=G0.

3.1 Preliminaries: the k-Weisfeiler-Leman (k-WL) Graph Isomorphism Test

The 1-dimensional Weisfeiler-Leman (1-WL) test, also known as color reﬁnement [49] algorithm, is

a widely celebrated approach to (approximately) test GI. Although extremely fast and effective for

most graphs (1-WL can provide canonical forms for all but n−1/7fraction of n-vertex graphs [5]),

it fails to distinguish members of many important graph families, such as regular graphs. The more

powerful k-WL algorithm ﬁrst appeared in [57], and extends coloring of vertices (or 1-tuples) to

that of k-tuples. k-WL is progressively expressive with increasing k, and can distinguish any ﬁnite

set of graphs given a sufﬁciently large k.k-WL has many interesting connections to logic, games,

and linear equations [11,28]. Another variant of k-WL is called k-FWL (Folklore WL), such that

(k+ 1)-WL is equally expressive as k-FWL [11,26]. Our work focuses on k-WL.

k-WL iteratively recolors all nkk-tuples deﬁned on a graph Gwith nnodes. At iteration 0, each

k-tuple −→

v= (v1, ..., vk)∈V(G)kis initialized with a color as its atomic type atk(G, −→

v). Assume

Ghas dcolors, then atk(G, −→

v)∈ {0,1}2(k

2)+kd is an ordered vector encoding. The ﬁrst k

2entries

indicate whether vi=vj,∀i, j, 1≤i<j≤k, which is the node repetition information. The second

k

2entries indicate whether (vi, vj)∈E(G). The last kd entries one-hot encode the initial color of

vi,∀i, 1≤i≤k. Importantly, atk(G, −→

v) = atk(G0,−→

v0)if and only if vi7→ v0

iis an isomorphism

from G[−→

v]to G0[−→

v0]. Let wl(t)

k(G, −→

v)denote the color of k-tuple −→

von graph Gat t-th iteration

of k-WL, where colors are initialized with wl(0)

k(G, −→

v) = atk(G, −→

v).

At the t-th iteration, k-WL updates the color of each −→

v∈V(G)kaccording to

wl(t+1)

k(G,−→

v)=HASHwl(t)

k(G,−→

v),nnwl(t)

k(G,−→

v[x/1])x∈V(G)oo,...,nnwl(t)

k(G,−→

v[x/k])x∈V(G)oo (1)

Let gwl(t)

k(G)denote the encoding of Gat t-th iteration of k-WL. Then,

gwl(t)

k(G) = HASHwl(t)

k(G, −→

v)



−→

v∈V(G)k (2)

Two k-tuples −→

vand −→

uare connected by an edge if |{i∈[k]|vi=ui}| =k−1, or informally

if they share (k−1) entries. Then, k-WL deﬁnes a super-graph Sk-wl(G)with its nodes being all

k-tuples in G, and edges deﬁned as above. Eq. (1) deﬁnes the rule of color reﬁnement on Sk-wl(G).

Intuitively, k-WL is akin to (but more powerful as it orders subgroups of neighbors) running 1-WL

algorithm on the supergraph Sk-wl(G). As t→ ∞, the color wl(t+1)

k(G, −→

v)converges to a stable

value, denoted as wl(∞)

k(G, −→

v)and the corresponding stable graph color denoted as gwl(∞)

k(G).

For two non-isomorphic graphs G, G0,k-WL can successfully distinguish them if gwl(∞)

k(G)6=

gwl(∞)

k(G0). The expressivity of wl(t)

kcan be exactly characterized by ﬁrst-order logic with count-

ing quantiﬁers. Let Ct

kdenote all ﬁrst-order formulas with at most kvariables and t-depth counting

quantiﬁers, then wl(t)

k(G, −→

v) = wl(t)

k(G0,−→

v0)⇐⇒ ∀φ∈Ct

k, φ(G, −→

v) = φ(G0,−→

v0). Additionally,

there is a t-step bijective pebble game that are equivalent to t-iteration k-WL in expressivity. See

Appendix.A.2 for the pebble game characterization of k-WL.

Despite its power, k-WL uses all nktuples and has O(knk)complexity at each iteration.

3.2 From k-WL to k-MULTISETWL: Removing Ordering

Our ﬁrst proposed adaptation to k-WL is to remove ordering in each k-tuple, i.e. changing k-tuples

to k-multisets. This greatly reduces the number of supernodes to consider by O(k!) times.

Let ˜

v={{v1, ..., vk}} be the corresponding multiset of tuple −→

v= (v1, ..., vk). We introduce a

canonical order function on G,oG:V(G)→[n], and a corresponding indexing function o−1

G(˜

v, i),

which returns the i-th element of v1, ..., vksorted according to oG. Let mwl(t)

k(G, ˜

v)denote the

color of the k-multiset ˜

vat t-th iteration of k-MULTISETWL, formally deﬁned next.

At t= 0, we initialize the color of ˜

vas mwl(0)

k(G, ˜

v) = HASH{{atk(G, p(˜

v))|p∈perm[k]}}

where perm[k]1denotes the set of all permutation mappings of kelements. It can be shown that

mwl(0)

k(G, ˜

v) = mwl(0)

k(G0,˜

v0)if and only if G[˜

v]and G0[˜

v0]are isomorphic.

At t-th iteration, k-MULTISETWL updates the color of every k-multiset by

mwl(t+1)

k(G, ˜

v) = HASHmwl(t)

k(G, ˜

v),{{mwl(t)

k(G, ˜

v[x/o−1

G(˜

v,1)])

x∈V(G)}},(3)

...,{{mwl(t)

k(G, ˜

v[x/o−1

G(˜

v, k)])

x∈V(G)}}

Where ˜

v[x/o−1

G(˜

v, i)]) denotes replacing the i-th (ordered by oG) element of the multiset with

x∈V(G). Let Sk-mwl(G)denote the super-graph deﬁned by k-MULTISETWL. Similar to Eq. (2),

the graph level encoding is gmwl(t)

k(G) = HASH({{mwl(t)

kG, ˜

v∀˜

v∈V(Sk-mwl(G))}}.

Interestingly, although k-MULTISETWL has signiﬁcantly fewer number of node groups than k-WL,

we show it is no less powerful than k-1-WL in terms of distinguishing graphs, while being upper

bounded by k-WL in distingushing both node groups and graphs.

Theorem 1. Let k≥1and wl(t)

k(G, ˜

v) := {{wl(t)

k(G, p(˜

v))|p∈perm[k]}}. For all t∈Nand

all graphs G, G0:k-MULTISETWL is upper bounded by k-WL in distinguishing multisets G, ˜

vand

G0,˜

v0at t-th iteration, i.e. wl(t)

k(G, ˜

v) = wl(t)

k(G0,˜

v0) =⇒mwl(t)

k(G, ˜

v) = mwl(t)

k(G0,˜

v0).

Theorem 2. k-MULTISETWL is no less powerful than (k-1)-WL in distinguishing graphs: for any

k≥3there exists graphs that can be distinguished by k-MULTISETWL but not by (k-1)-WL.

Theorem 2is proved by using a variant of a series of CFI [11] graphs which cannot be distinguished

by k-WL. This theorem shows that k-MULTISETWL is indeed very powerful and ﬁnding counter

examples of k-WL distinguishable graphs that cannot be distinguished by k-MULTISETWL is very

hard. Hence we conjecture that k-MULTISETWL may have the same expressivity as k-WL in dis-

tinguishing undirected graphs, with an attempt of proving it in Appendix.A.4.10. Additionally, the

theorem also implies that k-MULTISETWL is strictly more powerful than (k−1)-MULTISETWL.

We next give a pebble game characterization of k-MULTISETWL, which is named doubly bijective

k-pebble game presented in Appendix.A.3. The game is used in the proof of Theorem 2.

Theorem 3. k-MULTISETWL has the same expressivity as the doubly bijective k-pebble game.

3.3 From k-MULTISETWL to k(≤)-SETWL: Removing Repetition

Next, we propose further removing repetition inside any k-multiset, i.e. transforming k-multisets

to k(≤)-sets. We assume elements of k-multiset ˜

vand k(≤)-set ˆ

vin Gare sorted based on the

ordering function oG, and omit oGfor clarity. Let s(·)transform a multiset to set by removing

repeats, and let r(·)return a tuple with the number of repeats for each distinct element in a multiset.

Speciﬁcally, let ˆ

v=s(˜

v)and ˆ

n=r(˜

v), then m:= |ˆ

v|=|ˆ

n| ∈ [k]denotes the number of

distinct elements in k-multiset ˜

v, and ∀i∈[m],ˆ

viis the i-th distinct element with ˆ

nirepetitions.

Clearly there is an injective mapping between ˜

vand (ˆ

v,ˆ

n); let fbe the inverse mapping such that

v=f(s(˜

v), r(˜

v)). Equivalently, each m-set ˆ

vcan be mapped with a multiset of k-multisets: ˆ

v↔

{{˜

v=f(ˆ

v,ˆ

n)|Pm

i=1 ˆ

ni=k, ∀iˆ

ni≥1}}. Based on this relationship, we extend the connection

among k-multisets to k(≤)-sets: given m1, m2∈[k], a m1-set ˆ

vis connected to a m2-set ˆ

uif and

only if ∃ˆ

nv,ˆ

nu,f(ˆ

v,ˆ

nv)is connected with f(ˆ

u,ˆ

nu)in k-MULTISETWL. Let Sk-swl(G)denote

the deﬁned super-graph on Gby k(≤)-SETWL. It can be shown that this is equivalent to either

(1) (|m1−m2|= 1) ∧(|ˆ

v∩ˆ

u|= min(m1, m2)) or (2) (m1=m2)∧(|ˆ

v∩ˆ

u|=m1−1)

is true. Notice that Sk-swl(G)contains a sequence of k−1bipartite graphs with each reﬂecting

the connections among the (m−1)-sets and the m-sets. It also contains k−1subgraphs, i.e. the

connections among the m-sets for m= 2, ..., k. Later on we will show that these k−1subgraphs

can be ignored without affecting k(≤)-SETWL.

1This function should also consider repeated elements; we omit this for clarity of presentation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

APractical,Progressively-ExpressiveGNNLingxiaoZhaoCarnegieMellonUniversitylingxiao@cmu.eduLouisHärtelRWTHAachenUniversityhaertel@informatik.rwth-aachen.deNeilShahSnapInc.nshah@snap.comLemanAkogluCarnegieMellonUniversitylakoglu@andrew.cmu.eduAbstractMessagepassingneuralnetworks(MPNNs)havebecomeadomin...

展开>> 收起<<

A Practical Progressively-Expressive GNN Lingxiao Zhao Carnegie Mellon University.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Practical Progressively-Expressive GNN Lingxiao Zhao Carnegie Mellon University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: