
Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining KDD ’23, August 6–10, 2023, Long Beach, CA, USA
Graph neural networks. There exist many recent GNN vari-
ants; recent surveys [
34
,
38
] group them into spectral models [
5
,
14
],
sampling-based models [
9
,
36
,
40
], attention-based models [
2
,
13
,
31
], and deep models with residual connections [
3
,
19
]. Decoupled
models [
4
,
15
,
16
] separate the two major functionalities of GNNs:
the node-wise feature transformation and the propagation. GNNs
are often fused with graphical inference [
11
,
37
] to further improve
the predicted results. These GNNs have shown great performance
in many graph mining tasks, but suer from limited robustness
when applied to graphs with various characteristics possibly having
noisy observations, especially in semi-supervised learning.
Linear graph neural networks. Wu et al. [33] proposed SGC
by removing the nonlinear activation functions of GCN [
14
], reduc-
ing the propagator function to a simple matrix multiplication. Wang
et al
. [32]
and Zhu and Koniusz
[39]
improved SGC by manually
adjusting the strength of self-loops with hyperparameters, increas-
ing the number of propagation steps. Li et al
. [20]
proposed G
2
CN,
which improves the accuracy of DGC [
32
] on heterophily graphs
by combining multiple propagation settings (i.e. bandwidths). The
main limitation of these models is the high complexity of propaga-
tor functions with many hyperparameters, which impairs both the
robustness and interpretability of decisions even with linearity.
Graph kernel methods. Traditional works on graph kernel
methods [
12
,
28
] are closely related to linear GNNs, which can be
understood as applying a linear graph kernel to transform the raw
features. A notable limitation of such kernel methods is that they are
not capable of addressing various scenarios of real-world graphs,
such as heterophily graphs, as their motivation is to aggregate
all information in the local neighborhood of each node, rather
than ignoring noisy and useless ones. We implement three popular
kernel methods as additional baselines and show that our SlimG
outperforms them in both synthetic and real graphs.
3 PROPOSED FRAMEWORK: GNNEXP
Why do GNNs work well when they do? In what cases will a GNN
fail? We answer these questions with GnnExp, our proposed frame-
work for revealing the essence of each GNN. The idea is to derive
the essential feature propagator function on which each variant is
based, ignoring nonlinearity, so that all models are comparable on
the same ground. The observations from GnnExp motivate us to
propose our method SlimG, which we describe in Section 4.
Denition 2 (Linearization).Given a graph
𝐺=(
A
,
X
)
, let
𝑓(·
;
𝜃)
be a node classier function to predict the labels of all nodes in
𝐺
as
ˆ
y=𝑓(
A
,
X;
𝜃)
, where
𝜃
is the set of parameters. Then,
𝑓
is linearized
if 𝜃={W}and the optimal weight matrix W∗∈Rℎ×𝑐is given as
W∗=LR(P(A,X),y),(2)
where
P
is a feature propagator function that is linear with Xand
contains no learnable parameters, and
P(
A
,
X
) ∈ R𝑛×ℎ
. We ignore
the bias term without loss of generality.
Denition 3 (GnnExp).Given a GNN
𝑓
,GnnExp is to represent
𝑓
as a linearized GNN by replacing all (nonlinear) activation functions
in
𝑓
with the identity function and deriving a variant
𝑓′
that is at
least as expressive as 𝑓but contains no parameters in P.
GnnExp represents the characteristic of a GNN as a linear fea-
ture propagator function
P
, which transforms raw features Xby
Table 1:
GnnExp encompasses popular GNNs.
The * and **
superscripts mark fully and partially linearized models, re-
spectively. We derive Pain Points (Sec. 3.1) and Distinguishing
Factors (Sec. 3.2) of the variants through GnnExp.
Model Type Propagator function P (A,X)
LR Linear X
SGC Linear ˜
A𝐾
sym X
DGC Linear [(1−𝑇/𝐾)I+ (𝑇/𝐾)˜
Asym]𝐾X
S2GC Linear Í𝐾
𝑘=1(𝛼I+ (1−𝛼)˜
A𝑘
sym)X
G2CN Linear ∥𝑁
𝑖=1[I− (𝑇𝑖/𝐾)((𝑏𝑖−1)I+Asym)2]𝐾X
PPNP* Decoupled (I− (1−𝛼)˜
Asym)−1X
APPNP* Decoupled [Í𝐾−1
𝑘=0𝛼(1−𝛼)𝑘˜
A𝑘
sym + (1−𝛼)𝐾˜
A𝐾
sym]X
GDC* Decoupled S=sparse𝜖(Í∞
𝑘=0(1−𝛼)𝑘˜
A𝑘
sym)for ˜
Ssym X
GPR-GNN* Decoupled ∥𝐾
𝑘=0˜
A𝑘
sym X
ChebNet* Coupled ∥𝐾−1
𝑘=0A𝑘
sym X
GCN* Coupled ˜
A𝐾
sym X
SAGE* Coupled ∥𝐾
𝑘=0A𝑘
row X
GCNII* Coupled ∥𝐾−2
𝑘=0˜
A𝑘
symX∥ ((1−𝛼)˜
A𝐾
sym +𝛼˜
A𝐾−1
sym )X
H2GCN* Coupled ∥2𝐾
𝑘=0A𝑘
symX
GAT** Attention Î𝐾
𝑘=1[diag(Xw𝑘,1)˜
A+˜
Adiag(Xw𝑘,2)] X
DA-GNN** Attention Í𝐾
𝑘=0diag(˜
A𝑘
symXw)˜
A𝑘
sym X
utilizing the graph structure A. Lemma 1 shows that GnnExp gener-
alizes existing linear GNNs. Logistic regression is also represented
by GnnExp with the identity propagator P (A,X)=X.
Lemma 1. GnnExp includes existing linear graph neural networks
as its special cases: SGC, DGC, S2GC, and G2CN.
Proof. The proof is given in Appendix A.1. ■
In Table 1, we conduct a comprehensive linearization of existing
GNNs using GnnExp to understand the fundamental similarities
and dierences among GNN variants. The models are categorized
into linear, decoupled, coupled, and attention models. We ignore
bias terms for simplicity, without loss of generality. Refer to Ap-
pendix B for details of the linearization process for each model.
3.1 Pain Points of Existing GNNs
Based on the comprehensive linearization in Table 1, we derive four
pain points of existing GNNs which we address in Section 4.
Pain Point 1 (Lack of robustness).All models in Table 1 fail to
handle multiple graph scenarios at the same time, i.e., graphs with
homophily, heterophily, no network eects, or useless features.
Most models in Table 1 make an implicit assumption on given
graphs, such as homophily or heterophily, rather than being able to
perform well in multiple scenarios at the same time. For example, all
models except ChebNet, SAGE, and H
2
GCN have self-loops in the
new adjacency matrix, emphasizing the local neighborhood of each
node even in graphs with heterophily or no network eects. This
is the pain point that we also observe empirically from the sanity
checks (in Table 2), where none of the existing models succeeds in
making reasonable accuracy in all cases of synthetic graphs.
Pain Point 2 (Vulnerability to noisy features).All models in Table 1
cannot fully exploit the graph structure if the features are noisy, since
they depend on the node feature matrix X.