Less is More SlimG for Accurate Robust and Interpretable Graph Mining_2

2025-04-29 0 0 1.12MB 12 页 10玖币
侵权投诉
Less is More: SlimG for Accurate, Robust, and Interpretable
Graph Mining
Jaemin Yoo
Carnegie Mellon University
Pittsburgh, USA
jaeminyoo@cmu.edu
Meng-Chieh Lee
Carnegie Mellon University
Pittsburgh, USA
mengchil@cs.cmu.edu
Shubhranshu Shekhar
Carnegie Mellon University
Pittsburgh, USA
shubhras@andrew.cmu.edu
Christos Faloutsos
Carnegie Mellon University
Pittsburgh, USA
christos@cs.cmu.edu
ABSTRACT
How can we solve semi-supervised node classication in various
graphs possibly with noisy features and structures? Graph neural
networks (GNNs) have succeeded in many graph mining tasks, but
their generalizability to various graph scenarios is limited due to
the diculty of training, hyperparameter tuning, and the selection
of a model itself. Einstein said that we should “make everything as
simple as possible, but not simpler.” We rephrase it into the careful
simplicity principle: a carefully-designed simple model can surpass
sophisticated ones in real-world graphs. Based on the principle,
we propose SlimG for semi-supervised node classication, which
exhibits four desirable properties: It is (a) accurate, winning or tying
on 10 out of 13 real-world datasets; (b) robust, being the only one that
handles all scenarios of graph data (homophily, heterophily, random
structure, noisy features, etc.); (c) fast and scalable, showing up to
18
×
faster training in million-scale graphs; and (d) interpretable,
thanks to the linearity and sparsity. We explain the success of SlimG
through a systematic study of the designs of existing GNNs, sanity
checks, and comprehensive ablation studies.
CCS CONCEPTS
Computing methodologies Machine learning algorithms.
KEYWORDS
graph neural networks, semi-supervised node classication
ACM Reference Format:
Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, and Christos Faloutsos.
2023. Less is More: SlimG for Accurate, Robust, and Interpretable Graph
Mining. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA,
USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3580305.
3599413
1 INTRODUCTION
How can we solve semi-supervised node classication in various
types of graphs possibly with noisy features and structures? Graph
neural networks (GNNs) [
7
,
9
,
14
,
31
] have succeeded in various
Both authors contributed equally to this research.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
KDD ’23, August 6–10, 2023, Long Beach, CA, USA
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0103-0/23/08. . . $15.00
https://doi.org/10.1145/3580305.3599413
Graph
Noisy
Noisy
Homophily
Heterophily
Feature
Semantic
Homophily
Noisy
Homophily
Heterophily
Heterophily
Semantic
Semantic
SGC S2GC SAGE
Structural
Structural
SlimGGCNII GPR
Figure 1:
SlimG wins on all sanity checks.
Each row is a spe-
cic scenario of graph data that we propose for comprehen-
sive evaluation in Section 5. The table is generated from the
actual accuracy in Table 2: means the accuracy 80%.
graph mining tasks such as node classication, clustering, or link
prediction. However, due to the diculty of training, hyperparame-
ter tuning, and even the selection of a model itself, many GNNs fail
to show their best performance when applied to a large testbed that
contains real-world graphs with various characteristics. Especially
when a graph contains noisy observations in its features and/or its
graphical structure, which is common in real-world data, existing
models easily overt their parameters to such noises.
In response to the question, we propose SlimG, our novel clas-
sier model on graphs based on the careful simplicity principle: a
simple carefully-designed model can be more accurate than com-
plex ones thanks to better generalizability, robustness, and easier
training. The four design decisions of SlimG (D1-4 in Sec. 4) are
carefully made to follow this principle by observing and addressing
the pain points of existing GNNs; we generate and combine various
types of graph-based features (D1), design structure-only features
(D2), remove redundancy in feature transformation (D3), and make
the propagator function contain no hyperparameters (D4).
The resulting model, SlimG, is our main contribution (C1) which
exhibits the following desirable properties:
C1.1 - Accurate on both real-world and synthetic datasets,
almost always winning or tying in the rst place (see Figure 2,
Table 2, and Table 3).
C1.2 - Robust, being able to handle numerous real settings
such as homophily, heterophily, no network eects, useless
features (see Figure 1 and Table 2).
C1.3 - Fast and scalable, using carefully chosen features,
it takes only 32 seconds on million-scale real-world graphs
(ogbn-Products) on a stock server (see Figure 2).
C1.4 - Interpretable, learning the largest weights on in-
formative features, ignoring noisy ones, based on the linear
decision function (see Figure 5).
arXiv:2210.04081v4 [cs.LG] 16 Jun 2023
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, and Christos Faloutsos
10.4x
Faster
3.5%
Higher
2.5x
Faster
10.3%
Higher
18.0x
Faster
4.1%
Higher
Figure 2:
SlimG wins both on accuracy and training time
on (left) ogbn-arXiv, (middle) ogbn-Products, and (right) Pokec, which
are large real-world graphs (1.2M, 61.9M, and 30.6M edges, resp.). Several baselines run out of memory (crossed out).
Figure 3:
Why ‘less is more’:
The simple model
𝑓0(𝑥)=𝑐
(in
blue) matches the reality (in solid purple), while richer mod-
els with more polynomial powers end up capturing noise in
the given data: the tiny downward trend by
𝑓1(𝑥)
(in red) and
the spurious curvature by 𝑓2(𝑥)(in green).
Not only we propose a carefully designed, eective method (in
Sec. 4), but we also explain the reasons for its success. This is thanks
to our three additional contributions (C2-4):
C2 - Explanation (Sec. 3): We propose GnnExp, a frame-
work for the systematic linearization of a GNN. As shown
in Table 1, GnnExp highlights the similarities, dierences,
and weaknesses of successful GNNs.
C3 - Sanity checks (Sec. 5): We propose seven possible
scenarios of graphs (homophily, heterophily, no network
eects, etc.), which reveal the strong and weak points of
each GNN; see Figure 1 with more details in Table 2.
C4 - Experiments (Sec. 6): We conduct extensive experi-
ments to better understand the success of SlimG even with
its simplicity. Our results in Tables 5 to 9 show that SlimG
eectively selects the most informative component in each
dataset, fully exploiting its robustness and generality.
Less is more. Our justication for the counter-intuitive success
of simplicity is illustrated in Figure 3: A set of points are uniformly
distributed in
𝑥∈ (
1
,
1
)
and
𝑦∈ (
0
,
1
)
, and the tting polynomials
𝑓𝑖(𝑥)
with degree
𝑖=
0, 1 and 2 are given. Notice that the simplest
model
𝑓0
(blue line) matches the true generator
𝑓(𝑥)=
0
.
5. Richer
models use the 1st and the 2nd degree powers (many cooks spoil the
broth) and end up modeling tiny artifacts, like the small downward
slope of
𝑓1
(red line), and the curvature of
𝑓2
(green line). This ‘many
cooks’ issue is more subtle and counter-intuitive than overtting, as
𝑓2
and
𝑓3
have only 2 to 3 unknown parameters to t to the hundred
data points: even a small statistical can fail if it is not matched with
the underlying data-generating mechanism.
Reproducibility. Our code, along with our datasets for sanity
checks, is available at https://github.com/mengchillee/SlimG.
2 BACKGROUND AND RELATED WORKS
2.1 Background
We dene semi-supervised node classication as follows:
Given An undirected graph
𝐺=(
A
,
X
)
, where A
R𝑛×𝑛
is
an adjacency matrix, X
R𝑛×𝑑
is a node feature matrix,
𝑛
is the number of nodes, and 𝑑is the number of features.
Given The labels y
∈ {
1
,· · · , 𝑐 }𝑚
of
𝑚
nodes in
𝐺
, where
𝑚𝑛, and 𝑐is the number of classes.
Predict The unknown classes of 𝑛𝑚test nodes.
We use the following symbols to represent adjacency matrices
with various normalizations and/or self-loops. ˜
A=A+Iis the ad-
jacency matrix with self-loops.
˜
D=diag(˜
A1𝑛×1)
is the diagonal
degree matrix of
˜
A
, where 1
𝑛×1
is the matrix of size
𝑛×
1lled
with ones.
˜
Asym =˜
D1/2˜
A˜
D1/2
is the symmetrically normalized
˜
A
. Similarly,
Asym =D1/2AD1/2
is also the symmetrically nor-
malized Abut without self-loops. We also use a dierent type of
normalization
Arow =D1A
(and accordingly
˜
Arow
), which we call
row normalization, based on the position of the matrix D.
As a background, we dene logistic regression (LR) as a function
to nd the weight matrix Wthat best maps given features to labels
with a linear function.
Denition 1 (LR).Given a feature X
R𝑛×𝑑
and a label y
R𝑚
,
where
𝑚
is the number of observations such that
𝑚𝑛
, let Y
R𝑚×𝑐
be the one-hot representation of y, and
𝑦𝑖 𝑗
be the
(𝑖, 𝑗)
-th element in
Y. Then, logistic regression (LR) is a function that nds an optimal
weight matrix WR𝑑×𝑐from Xand yas follows:
LR(X,y)=arg max
W
𝑚
𝑖=1
𝑐
𝑗=1
𝑦𝑖 𝑗 log somax𝑗(Wx𝑖),(1)
where
somax𝑗(·)
represents selecting the
𝑗
-th element of the result
of the softmax function. We omit the bias term for brevity.
2.2 Related Works
We introduce related works categorized into graph neural networks
(GNN), linear GNNs, and graph kernel methods.
Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining KDD ’23, August 6–10, 2023, Long Beach, CA, USA
Graph neural networks. There exist many recent GNN vari-
ants; recent surveys [
34
,
38
] group them into spectral models [
5
,
14
],
sampling-based models [
9
,
36
,
40
], attention-based models [
2
,
13
,
31
], and deep models with residual connections [
3
,
19
]. Decoupled
models [
4
,
15
,
16
] separate the two major functionalities of GNNs:
the node-wise feature transformation and the propagation. GNNs
are often fused with graphical inference [
11
,
37
] to further improve
the predicted results. These GNNs have shown great performance
in many graph mining tasks, but suer from limited robustness
when applied to graphs with various characteristics possibly having
noisy observations, especially in semi-supervised learning.
Linear graph neural networks. Wu et al. [33] proposed SGC
by removing the nonlinear activation functions of GCN [
14
], reduc-
ing the propagator function to a simple matrix multiplication. Wang
et al
. [32]
and Zhu and Koniusz
[39]
improved SGC by manually
adjusting the strength of self-loops with hyperparameters, increas-
ing the number of propagation steps. Li et al
. [20]
proposed G
2
CN,
which improves the accuracy of DGC [
32
] on heterophily graphs
by combining multiple propagation settings (i.e. bandwidths). The
main limitation of these models is the high complexity of propaga-
tor functions with many hyperparameters, which impairs both the
robustness and interpretability of decisions even with linearity.
Graph kernel methods. Traditional works on graph kernel
methods [
12
,
28
] are closely related to linear GNNs, which can be
understood as applying a linear graph kernel to transform the raw
features. A notable limitation of such kernel methods is that they are
not capable of addressing various scenarios of real-world graphs,
such as heterophily graphs, as their motivation is to aggregate
all information in the local neighborhood of each node, rather
than ignoring noisy and useless ones. We implement three popular
kernel methods as additional baselines and show that our SlimG
outperforms them in both synthetic and real graphs.
3 PROPOSED FRAMEWORK: GNNEXP
Why do GNNs work well when they do? In what cases will a GNN
fail? We answer these questions with GnnExp, our proposed frame-
work for revealing the essence of each GNN. The idea is to derive
the essential feature propagator function on which each variant is
based, ignoring nonlinearity, so that all models are comparable on
the same ground. The observations from GnnExp motivate us to
propose our method SlimG, which we describe in Section 4.
Denition 2 (Linearization).Given a graph
𝐺=(
A
,
X
)
, let
𝑓(·
;
𝜃)
be a node classier function to predict the labels of all nodes in
𝐺
as
ˆ
y=𝑓(
A
,
X;
𝜃)
, where
𝜃
is the set of parameters. Then,
𝑓
is linearized
if 𝜃={W}and the optimal weight matrix WR×𝑐is given as
W=LR(P(A,X),y),(2)
where
P
is a feature propagator function that is linear with Xand
contains no learnable parameters, and
P(
A
,
X
) R𝑛×
. We ignore
the bias term without loss of generality.
Denition 3 (GnnExp).Given a GNN
𝑓
,GnnExp is to represent
𝑓
as a linearized GNN by replacing all (nonlinear) activation functions
in
𝑓
with the identity function and deriving a variant
𝑓
that is at
least as expressive as 𝑓but contains no parameters in P.
GnnExp represents the characteristic of a GNN as a linear fea-
ture propagator function
P
, which transforms raw features Xby
Table 1:
GnnExp encompasses popular GNNs.
The * and **
superscripts mark fully and partially linearized models, re-
spectively. We derive Pain Points (Sec. 3.1) and Distinguishing
Factors (Sec. 3.2) of the variants through GnnExp.
Model Type Propagator function P (A,X)
LR Linear X
SGC Linear ˜
A𝐾
sym X
DGC Linear [(1𝑇/𝐾)I+ (𝑇/𝐾)˜
Asym]𝐾X
S2GC Linear Í𝐾
𝑘=1(𝛼I+ (1𝛼)˜
A𝑘
sym)X
G2CN Linear 𝑁
𝑖=1[I− (𝑇𝑖/𝐾)((𝑏𝑖1)I+Asym)2]𝐾X
PPNP* Decoupled (I− (1𝛼)˜
Asym)1X
APPNP* Decoupled [Í𝐾1
𝑘=0𝛼(1𝛼)𝑘˜
A𝑘
sym + (1𝛼)𝐾˜
A𝐾
sym]X
GDC* Decoupled S=sparse𝜖(Í
𝑘=0(1𝛼)𝑘˜
A𝑘
sym)for ˜
Ssym X
GPR-GNN* Decoupled 𝐾
𝑘=0˜
A𝑘
sym X
ChebNet* Coupled 𝐾1
𝑘=0A𝑘
sym X
GCN* Coupled ˜
A𝐾
sym X
SAGE* Coupled 𝐾
𝑘=0A𝑘
row X
GCNII* Coupled 𝐾2
𝑘=0˜
A𝑘
symX∥ ((1𝛼)˜
A𝐾
sym +𝛼˜
A𝐾1
sym )X
H2GCN* Coupled 2𝐾
𝑘=0A𝑘
symX
GAT** Attention Î𝐾
𝑘=1[diag(Xw𝑘,1)˜
A+˜
Adiag(Xw𝑘,2)] X
DA-GNN** Attention Í𝐾
𝑘=0diag(˜
A𝑘
symXw)˜
A𝑘
sym X
utilizing the graph structure A. Lemma 1 shows that GnnExp gener-
alizes existing linear GNNs. Logistic regression is also represented
by GnnExp with the identity propagator P (A,X)=X.
Lemma 1. GnnExp includes existing linear graph neural networks
as its special cases: SGC, DGC, S2GC, and G2CN.
Proof. The proof is given in Appendix A.1.
In Table 1, we conduct a comprehensive linearization of existing
GNNs using GnnExp to understand the fundamental similarities
and dierences among GNN variants. The models are categorized
into linear, decoupled, coupled, and attention models. We ignore
bias terms for simplicity, without loss of generality. Refer to Ap-
pendix B for details of the linearization process for each model.
3.1 Pain Points of Existing GNNs
Based on the comprehensive linearization in Table 1, we derive four
pain points of existing GNNs which we address in Section 4.
Pain Point 1 (Lack of robustness).All models in Table 1 fail to
handle multiple graph scenarios at the same time, i.e., graphs with
homophily, heterophily, no network eects, or useless features.
Most models in Table 1 make an implicit assumption on given
graphs, such as homophily or heterophily, rather than being able to
perform well in multiple scenarios at the same time. For example, all
models except ChebNet, SAGE, and H
2
GCN have self-loops in the
new adjacency matrix, emphasizing the local neighborhood of each
node even in graphs with heterophily or no network eects. This
is the pain point that we also observe empirically from the sanity
checks (in Table 2), where none of the existing models succeeds in
making reasonable accuracy in all cases of synthetic graphs.
Pain Point 2 (Vulnerability to noisy features).All models in Table 1
cannot fully exploit the graph structure if the features are noisy, since
they depend on the node feature matrix X.
摘要:

LessisMore:SlimGforAccurate,Robust,andInterpretableGraphMiningJaeminYoo∗CarnegieMellonUniversityPittsburgh,USAjaeminyoo@cmu.eduMeng-ChiehLee∗CarnegieMellonUniversityPittsburgh,USAmengchil@cs.cmu.eduShubhranshuShekharCarnegieMellonUniversityPittsburgh,USAshubhras@andrew.cmu.eduChristosFaloutsosCarneg...

展开>> 收起<<
Less is More SlimG for Accurate Robust and Interpretable Graph Mining_2.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.12MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注