A Practical Progressively-Expressive GNN Lingxiao Zhao Carnegie Mellon University

2025-04-30 0 0 1.32MB 31 页 10玖币
侵权投诉
A Practical, Progressively-Expressive GNN
Lingxiao Zhao
Carnegie Mellon University
lingxiao@cmu.edu
Louis Härtel
RWTH Aachen University
haertel@informatik.rwth-aachen.de
Neil Shah
Snap Inc.
nshah@snap.com
Leman Akoglu
Carnegie Mellon University
lakoglu@andrew.cmu.edu
Abstract
Message passing neural networks (MPNNs) have become a dominant flavor of
graph neural networks (GNNs) in recent years. Yet, MPNNs come with no-
table limitations; namely, they are at most as powerful as the 1-dimensional
Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism
testing framework. To this end, researchers have drawn inspiration from the
k-WL hierarchy to develop more expressive GNNs. However, current k-WL-
equivalent GNNs are not practical for even small values of k, as k-WL becomes
combinatorially more complex as kgrows. At the same time, several works have
found great empirical success in graph learning tasks without highly expressive
models, implying that chasing expressiveness with a “coarse-grained ruler” of
expressivity like k-WL is often unneeded in practical tasks. To truly under-
stand the expressiveness-complexity tradeoff, one desires a more “fine-grained
ruler,which can more gradually increase expressiveness. Our work puts forth
such a proposal: Namely, we first propose the (k, c)()-SETWL hierarchy with
greatly reduced complexity from k-WL, achieved by moving from k-tuples of
nodes to sets with knodes defined over cconnected components in the in-
duced original graph. We show favorable theoretical results for this model in
relation to k-WL, and concretize it via (k, c)()-SETGNN, which is as expres-
sive as (k, c)()-SETWL. Our model is practical and progressively-expressive,
increasing in power with kand c. We demonstrate effectiveness on several bench-
mark datasets, achieving several state-of-the-art results with runtime and mem-
ory usage applicable to practical graphs. We open source our implementation at
https://github.com/LingxiaoShawn/KCSetGNN.
1 Introduction
In recent years, graph neural networks (GNNs) have gained considerable attention [58,60] for their
ability to tackle various node-level [35,56], link-level [51,66] and graph-level [9,40] learning tasks,
given their ability to learn rich representations for complex graph-structured data. The common tem-
plate for designing GNNs follows the message passing paradigm; these so-called message-passing
neural networks (MPNNs) are built by stacking layers which encompass feature transformation and
aggregation operations over the input graph [25,39]. Despite their advantages, MPNNs have sev-
eral limitations including oversmoothing [12,47,67], oversquashing [2,55], inability to distinguish
node identities [63] and positions [62], and expressive power [61].
Since Xu et al.s [61] seminal work showed that MPNNs are at most as powerful as the first-order
Weisfeiler-Leman (1-WL) test in the graph isomorphism (GI) testing framework, there have been
several follow-up works on improving the understanding of GNN expressiveness [3,15]. In re-
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.09521v3 [cs.LG] 3 Nov 2022
1
2
43
5
1
2
3
4
5
12
13
14
15
23
24
25
34
35
45
123
124
125
134
135
145
234
235
245
345
Input Graph 1-sets
2-sets 3-sets
c=1 sets
c=2 sets
c=3 sets
super-edges in (3, >=1) super-graph
super-edges in (3, >=2) super-graph
super-edges only in (3, 3) super-graph
3-Bipartite Super-graph
1
1
2
Extract ≤3 Sets Subgraphs
...
...
15
43
5
1
45
...
...
GNN
GNN
GNN
GNN
GNN
... ... ...
...
1
12
15
345
145
... ... ... ...
Init Set "Color"
Build Super-graph
BidirectionalMessage Passing
BidirectionalMessage Passing
All
3(≤)-Sets
Colors
(Emb.)
Pooling
Graph Emb.
Backward
Forward
k-Bipartite GNN on
Super-graph
Figure 1: Main steps of (k, c)()-SETGNN. Given a graph and (k, c), we build the (k, c)-bipartite
super-graph (in middle) containing sets with up to knodes and cconnected components in the
induced subgraph, on which a base GNN assigns initial “colors”. Bidirectional bipartite GNN layers
with frw.-bckw. message passing learn set embeddings, pooled into graph embedding. The size of
super-graph, and accordingly its expressiveness, grows progressively with increasing kand c. The 2-
bipartite message passing generalizes normal GNN, edge GNN, and line graph (see Appendix.A.12).
sponse, the community proposed many GNN models to overcome such limitations [4,52,68].
Several of these aim to design powerful higher-order GNNs which are increasingly expressive
[9,33,41,43] by inspiration from the k-WL hierarchy [54].
A key limitation of such higher-order models that reach beyond 1-WL is their poor scalability;
in fact, these models can only be applied to small graphs with small kin practice [40,41,43]
due to combinatorial growth in complexity. On the other hand lies an open question on whether
practical graph learning tasks indeed need such complex, and extremely expressive GNN models.
Historically, Babai et al. [5] showed that almost all graphs on nnodes can be distinguished by 1-
WL. In other contexts like node classification, researchers have encountered superior generalization
performance with graph-augmented multi-layer perceptron (GA-MLP) models [37,50] compared
to MPNNs, despite the former being strictly less expressive than the latter [13]. Considering that
increased model complexity has negative implications in terms of overfitting and generalization [29],
it is worth re-evaluating continued efforts to pursue maximally expressive GNNs in practice. Ideally,
one could study these models’ generalization performance on various real-world tasks by increasing
k(expressiveness) in the k-WL hierarchy. Yet, given the impracticality of this too-coarse “ruler”
of expressiveness, which becomes infeasible beyond k>3, one desires a more fine-grained “ruler”,
that is, a new hierarchy whose expressiveness grows more gradually. Such a hierarchy could enable
us to gradually build more expressive models which admit improved scaling, and avoid unnecessary
leaps in model complexity for tasks which do not require them, guarding against overfitting.
Present Work. In this paper, we propose such a hierarchy, and an associated practical progressively-
expressive GNN model, called (k, c)()-SETGNN, whose expressiveness can be modulated through
kand c. In a nutshell, we take inspiration from k-WL, yet achieve practicality through three design
ideas which simplify key bottlenecks of scaling k-WL: First, we move away from k-tuples of the
original k-WL to k-multisets (unordered), and then to k()-sets. We demonstrate that these steps
drastically reduce the number nodes in the k-WL graph while retaining significant expressiveness.
(Sec.s 3.23.3)Second, by considering the underlying sparsity of an input graph, we reduce scope
to k, c()-sets that consist of knodes whose induced subgraph is comprised of ckconnected
components. This also yields massive reduction in the number of nodes while improving practical-
ity; i.e. small values of con sparse graphs can allow one to increase kwell beyond 3in practice.
(Sec. 3.4)Third, we design a super-graph architecture that consists of a sequence of k1bi-
partite graphs over which we learn embeddings for our k, c()-sets, using bidirectional message
passing. These embeddings can be pooled to yield a final graph embedding.. (Sec.s 4.14.2) We
also speed up initializing “colors” for the k, c()-sets, for c > 1, substantially reducing compu-
tation and memory. (Sec. 4.3) Fig. 1overviews of our proposed framework. Experimentally,
our (k, c)()-SETGNN outperforms existing state-of-the-art GNNs on simulated expressiveness as
well as real-world graph-level tasks, achieving new bests on substructure counting and ZINC-12K,
respectively. We show that generalization performance reflects increasing expressiveness by kand
c. Our proposed scalable designs allow us to train models with e.g. k=10 or c=3 with practical
running time and memory requirements.
2
2 Related Work
MPNN limitations. Given the understanding that MPNNs have expressiveness upper-bounded by
1-WL [61], several researchers investigated what else MPNNs cannot learn. To this end, Chen et al.
[15] showed that MPNNs cannot count induced connected substructures of 3 or more nodes, while
along with [15], Arvind et al. [3] showed that MPNNs can only count star-shaped patterns. Loukas
[38] further proved several results regarding decision problems on graphs (e.g. subgraph verification,
cycle detection), finding that MPNNs cannot solve these problems unless strict conditions of depth
and width are met. Moreover, Garg et al. [22] showed that many standard MPNNs cannot compute
properties such as longest or shortest cycles, diameters or existence of cliques.
Improving expressivity. Several works aim to improve expressiveness limitations of MPNNs. One
approach is to inject features into the MPNN aggregation, motivated by Loukas [38] who showed
that MPNNs can be universal approximators when nodes are sufficiently distinguishable. Sato et al.
[53] show that injecting random features can better enable MPNNs to solve problems like minimum
dominating set and maximum matching. You et al. [63] inject cycle counts as node features, moti-
vated by the limitation of MPNNs not being able to count cycles. Others proposed utilizing subgraph
isomorphism counting to empower MPNNs with substructure information they cannot learn [9,10].
Earlier work by You et al. [62] adds positional features to distinguish node which naïve MPNN
embeddings would not. Recently, several works also propose utilizing subgraph aggregation; Zhang
et al. [65], Zhao et al. [68] and Bevilacqua et al. [8] propose subgraph variants of the WL test which
are no less powerful than 3-WL. Recently a following up work [21] shows that rooted subgraph
based extension with 1-WL as kernel is bounded by 3-WL. The community is yet exploring several
distinct avenues in overcoming limitations: Murphy et al. [46] propose a relational pooling mecha-
nism which sums over all permutations of a permutation-sensitive function to achieve above-1-WL
expressiveness. Balcilar et al. [6] generalizes spatial and spectral MPNNs and shows that instances
of spectral MPNNs are more powerful than 1-WL. Azizian et al. [4] and Geerts et al. [24] unify
expressivity and approximation ability results for existing GNNs.
k-WL-inspired GNNs. The k-WL test captures higher-order interactions in graph data by consider-
ing all k-tuples, i.e., size kordered multisets defined over the set of nodes. While highly expressive,
it does not scale to practical graphs beyond a very small k(k= 3 pushes the practical limit even
for small graphs). However, several works propose designs which can achieve k-WL in theory and
strive to make them practical. Maron et al. [41] proposes a general class of invariant graph networks
k-IGNs having exact k-WL expressivity [23], while being not scalable. Morris et al. have several
work on k-WL and its variants like k-LWL [42], k-GNN [43], and δ-k-WL-GNN [44]. Both k-LWL
and k-GNN use a variant of k-WL that only considers k-sets and are strictly weaker than k-WL but
much more practical. In our paper we claim that k()-sets should be used with additional designs
to keep the best expressivity while remaining practical. The δ-k-WL-GNN works with k-tuples but
sparsifies the connections among tuples by considering locality. Qian et al. [48] extends the k-tuple
to subgraph network but has same scalability bottleneck as k-WL. A recent concurrent work Spe-
qNet [45] proposes to reduce number of tuples by restricting the number of connected components
of tuples’ induced subgraphs. The idea is independently explored in our paper in Sec.3.4. All four
variants proposed by Morris et al. do not have realizations beyond k > 3in their experiments while
we manage to reach k= 10. Interestingly, a recent work [34] links graph transformer with k-IGN
that having better (or equal) expressivity, however is still not scalable.
3 A practical progressively-expressive isomorphism test: (k, c)()-SETWL
We first motivate our method (k, c)()-SETGNN from the GI testing perspective by introducing
(k, c)()-SETWL. We first introduce notation and background of k-WL (Sec. 3.1). Next, we show
how to increase the practicality of k-WL without reducing much of its expressivity, by removing
node ordering (Sec. 3.2), node repetitions (Sec. 3.3), and leveraging graph sparsity (Sec. 3.4). We
then give the complexity analysis (Sec. 3.5). We close the section with extending the idea to k-FWL
which is as expressive as (k+1)-WL (Sec. 3.6). All proofs are included in Appendix A.4.
Notation: Let G= (V(G), E(G), lG)be an undirected, colored graph with nodes V(G), edges
E(G), and a color-labeling function lG:V(G)Cwhere C={c1, ..., cd}denotes a set of
ddistinct colors. Let [n] = {1,2,3, ..., n}. Let (·)denote a tuple (ordered multiset), {{·}} de-
note a multiset (set which allows repetition), and {·} denote a set. We define
v= (v1, ..., vk)
as a k-tuple, ˜
v={{v1, ..., vk}} as a k-multiset, and ˆ
v={v1, ..., vk}as a k-set. Let
v[x/i] =
3
(v1, ..., vi1, x, vi+1, ..., vk)denote replacing the i-th element in
vwith x, and analogously for
˜
v[x/i]and ˆ
v[x/i](assume mutlisets and sets are represented with the canonical ordering). When
viV(G), let G[
v],G[˜
v],G[ˆ
v]denote the induced subgraph on Gwith nodes inside
v,˜
v,ˆ
vre-
spectively (keeping repetitions). An isomorphism between two graphs Gand G0is a bijective map-
ping p:V(G)V(G0)which satisfies u, v V(G),(u, v)E(G)(p(u), p(v)) E(G0)
and uV(G), lG(u) = lG0(p(u)). Two graphs G, G0are isomorphic if there exists an isomor-
phism between them, which we denote as G
=G0.
3.1 Preliminaries: the k-Weisfeiler-Leman (k-WL) Graph Isomorphism Test
The 1-dimensional Weisfeiler-Leman (1-WL) test, also known as color refinement [49] algorithm, is
a widely celebrated approach to (approximately) test GI. Although extremely fast and effective for
most graphs (1-WL can provide canonical forms for all but n1/7fraction of n-vertex graphs [5]),
it fails to distinguish members of many important graph families, such as regular graphs. The more
powerful k-WL algorithm first appeared in [57], and extends coloring of vertices (or 1-tuples) to
that of k-tuples. k-WL is progressively expressive with increasing k, and can distinguish any finite
set of graphs given a sufficiently large k.k-WL has many interesting connections to logic, games,
and linear equations [11,28]. Another variant of k-WL is called k-FWL (Folklore WL), such that
(k+ 1)-WL is equally expressive as k-FWL [11,26]. Our work focuses on k-WL.
k-WL iteratively recolors all nkk-tuples defined on a graph Gwith nnodes. At iteration 0, each
k-tuple
v= (v1, ..., vk)V(G)kis initialized with a color as its atomic type atk(G,
v). Assume
Ghas dcolors, then atk(G,
v)∈ {0,1}2(k
2)+kd is an ordered vector encoding. The first k
2entries
indicate whether vi=vj,i, j, 1i<jk, which is the node repetition information. The second
k
2entries indicate whether (vi, vj)E(G). The last kd entries one-hot encode the initial color of
vi,i, 1ik. Importantly, atk(G,
v) = atk(G0,
v0)if and only if vi7→ v0
iis an isomorphism
from G[
v]to G0[
v0]. Let wl(t)
k(G,
v)denote the color of k-tuple
von graph Gat t-th iteration
of k-WL, where colors are initialized with wl(0)
k(G,
v) = atk(G,
v).
At the t-th iteration, k-WL updates the color of each
vV(G)kaccording to
wl(t+1)
k(G,
v)=HASHwl(t)
k(G,
v),nnwl(t)
k(G,
v[x/1])xV(G)oo,...,nnwl(t)
k(G,
v[x/k])xV(G)oo (1)
Let gwl(t)
k(G)denote the encoding of Gat t-th iteration of k-WL. Then,
gwl(t)
k(G) = HASHwl(t)
k(G,
v)
vV(G)k (2)
Two k-tuples
vand
uare connected by an edge if |{i[k]|vi=ui}| =k1, or informally
if they share (k1) entries. Then, k-WL defines a super-graph Sk-wl(G)with its nodes being all
k-tuples in G, and edges defined as above. Eq. (1) defines the rule of color refinement on Sk-wl(G).
Intuitively, k-WL is akin to (but more powerful as it orders subgroups of neighbors) running 1-WL
algorithm on the supergraph Sk-wl(G). As t→ ∞, the color wl(t+1)
k(G,
v)converges to a stable
value, denoted as wl()
k(G,
v)and the corresponding stable graph color denoted as gwl()
k(G).
For two non-isomorphic graphs G, G0,k-WL can successfully distinguish them if gwl()
k(G)6=
gwl()
k(G0). The expressivity of wl(t)
kcan be exactly characterized by first-order logic with count-
ing quantifiers. Let Ct
kdenote all first-order formulas with at most kvariables and t-depth counting
quantifiers, then wl(t)
k(G,
v) = wl(t)
k(G0,
v0)⇒ ∀φCt
k, φ(G,
v) = φ(G0,
v0). Additionally,
there is a t-step bijective pebble game that are equivalent to t-iteration k-WL in expressivity. See
Appendix.A.2 for the pebble game characterization of k-WL.
Despite its power, k-WL uses all nktuples and has O(knk)complexity at each iteration.
3.2 From k-WL to k-MULTISETWL: Removing Ordering
Our first proposed adaptation to k-WL is to remove ordering in each k-tuple, i.e. changing k-tuples
to k-multisets. This greatly reduces the number of supernodes to consider by O(k!) times.
Let ˜
v={{v1, ..., vk}} be the corresponding multiset of tuple
v= (v1, ..., vk). We introduce a
canonical order function on G,oG:V(G)[n], and a corresponding indexing function o1
G(˜
v, i),
4
which returns the i-th element of v1, ..., vksorted according to oG. Let mwl(t)
k(G, ˜
v)denote the
color of the k-multiset ˜
vat t-th iteration of k-MULTISETWL, formally defined next.
At t= 0, we initialize the color of ˜
vas mwl(0)
k(G, ˜
v) = HASH{{atk(G, p(˜
v))|pperm[k]}}
where perm[k]1denotes the set of all permutation mappings of kelements. It can be shown that
mwl(0)
k(G, ˜
v) = mwl(0)
k(G0,˜
v0)if and only if G[˜
v]and G0[˜
v0]are isomorphic.
At t-th iteration, k-MULTISETWL updates the color of every k-multiset by
mwl(t+1)
k(G, ˜
v) = HASHmwl(t)
k(G, ˜
v),{{mwl(t)
k(G, ˜
v[x/o1
G(˜
v,1)])
xV(G)}},(3)
...,{{mwl(t)
k(G, ˜
v[x/o1
G(˜
v, k)])
xV(G)}}
Where ˜
v[x/o1
G(˜
v, i)]) denotes replacing the i-th (ordered by oG) element of the multiset with
xV(G). Let Sk-mwl(G)denote the super-graph defined by k-MULTISETWL. Similar to Eq. (2),
the graph level encoding is gmwl(t)
k(G) = HASH({{mwl(t)
kG, ˜
v˜
vV(Sk-mwl(G))}}.
Interestingly, although k-MULTISETWL has significantly fewer number of node groups than k-WL,
we show it is no less powerful than k-1-WL in terms of distinguishing graphs, while being upper
bounded by k-WL in distingushing both node groups and graphs.
Theorem 1. Let k1and wl(t)
k(G, ˜
v) := {{wl(t)
k(G, p(˜
v))|pperm[k]}}. For all tNand
all graphs G, G0:k-MULTISETWL is upper bounded by k-WL in distinguishing multisets G, ˜
vand
G0,˜
v0at t-th iteration, i.e. wl(t)
k(G, ˜
v) = wl(t)
k(G0,˜
v0) =mwl(t)
k(G, ˜
v) = mwl(t)
k(G0,˜
v0).
Theorem 2. k-MULTISETWL is no less powerful than (k-1)-WL in distinguishing graphs: for any
k3there exists graphs that can be distinguished by k-MULTISETWL but not by (k-1)-WL.
Theorem 2is proved by using a variant of a series of CFI [11] graphs which cannot be distinguished
by k-WL. This theorem shows that k-MULTISETWL is indeed very powerful and finding counter
examples of k-WL distinguishable graphs that cannot be distinguished by k-MULTISETWL is very
hard. Hence we conjecture that k-MULTISETWL may have the same expressivity as k-WL in dis-
tinguishing undirected graphs, with an attempt of proving it in Appendix.A.4.10. Additionally, the
theorem also implies that k-MULTISETWL is strictly more powerful than (k1)-MULTISETWL.
We next give a pebble game characterization of k-MULTISETWL, which is named doubly bijective
k-pebble game presented in Appendix.A.3. The game is used in the proof of Theorem 2.
Theorem 3. k-MULTISETWL has the same expressivity as the doubly bijective k-pebble game.
3.3 From k-MULTISETWL to k()-SETWL: Removing Repetition
Next, we propose further removing repetition inside any k-multiset, i.e. transforming k-multisets
to k()-sets. We assume elements of k-multiset ˜
vand k()-set ˆ
vin Gare sorted based on the
ordering function oG, and omit oGfor clarity. Let s(·)transform a multiset to set by removing
repeats, and let r(·)return a tuple with the number of repeats for each distinct element in a multiset.
Specifically, let ˆ
v=s(˜
v)and ˆ
n=r(˜
v), then m:= |ˆ
v|=|ˆ
n| ∈ [k]denotes the number of
distinct elements in k-multiset ˜
v, and i[m],ˆ
viis the i-th distinct element with ˆ
nirepetitions.
Clearly there is an injective mapping between ˜
vand (ˆ
v,ˆ
n); let fbe the inverse mapping such that
˜
v=f(s(˜
v), r(˜
v)). Equivalently, each m-set ˆ
vcan be mapped with a multiset of k-multisets: ˆ
v
{{˜
v=f(ˆ
v,ˆ
n)|Pm
i=1 ˆ
ni=k, iˆ
ni1}}. Based on this relationship, we extend the connection
among k-multisets to k()-sets: given m1, m2[k], a m1-set ˆ
vis connected to a m2-set ˆ
uif and
only if ˆ
nv,ˆ
nu,f(ˆ
v,ˆ
nv)is connected with f(ˆ
u,ˆ
nu)in k-MULTISETWL. Let Sk-swl(G)denote
the defined super-graph on Gby k()-SETWL. It can be shown that this is equivalent to either
(1) (|m1m2|= 1) (|ˆ
vˆ
u|= min(m1, m2)) or (2) (m1=m2)(|ˆ
vˆ
u|=m11)
is true. Notice that Sk-swl(G)contains a sequence of k1bipartite graphs with each reflecting
the connections among the (m1)-sets and the m-sets. It also contains k1subgraphs, i.e. the
connections among the m-sets for m= 2, ..., k. Later on we will show that these k1subgraphs
can be ignored without affecting k()-SETWL.
1This function should also consider repeated elements; we omit this for clarity of presentation.
5
摘要:

APractical,Progressively-ExpressiveGNNLingxiaoZhaoCarnegieMellonUniversitylingxiao@cmu.eduLouisHärtelRWTHAachenUniversityhaertel@informatik.rwth-aachen.deNeilShahSnapInc.nshah@snap.comLemanAkogluCarnegieMellonUniversitylakoglu@andrew.cmu.eduAbstractMessagepassingneuralnetworks(MPNNs)havebecomeadomin...

展开>> 收起<<
A Practical Progressively-Expressive GNN Lingxiao Zhao Carnegie Mellon University.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:1.32MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注