Less is More SlimG for Accurate Robust and Interpretable Graph Mining_2

2025-04-29 0 0 1.12MB 12 页 10玖币

侵权投诉

Less is More: SlimG for Accurate, Robust, and Interpretable

Graph Mining

Jaemin Yoo∗

Carnegie Mellon University

Pittsburgh, USA

jaeminyoo@cmu.edu

Meng-Chieh Lee∗

Carnegie Mellon University

Pittsburgh, USA

mengchil@cs.cmu.edu

Shubhranshu Shekhar

Carnegie Mellon University

Pittsburgh, USA

shubhras@andrew.cmu.edu

Christos Faloutsos

Carnegie Mellon University

Pittsburgh, USA

christos@cs.cmu.edu

ABSTRACT

How can we solve semi-supervised node classication in various

graphs possibly with noisy features and structures? Graph neural

networks (GNNs) have succeeded in many graph mining tasks, but

their generalizability to various graph scenarios is limited due to

the diculty of training, hyperparameter tuning, and the selection

of a model itself. Einstein said that we should “make everything as

simple as possible, but not simpler.” We rephrase it into the careful

simplicity principle: a carefully-designed simple model can surpass

sophisticated ones in real-world graphs. Based on the principle,

we propose SlimG for semi-supervised node classication, which

exhibits four desirable properties: It is (a) accurate, winning or tying

on 10 out of 13 real-world datasets; (b) robust, being the only one that

handles all scenarios of graph data (homophily, heterophily, random

structure, noisy features, etc.); (c) fast and scalable, showing up to

faster training in million-scale graphs; and (d) interpretable,

thanks to the linearity and sparsity. We explain the success of SlimG

through a systematic study of the designs of existing GNNs, sanity

checks, and comprehensive ablation studies.

CCS CONCEPTS

•Computing methodologies →Machine learning algorithms.

KEYWORDS

graph neural networks, semi-supervised node classication

ACM Reference Format:

Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, and Christos Faloutsos.

2023. Less is More: SlimG for Accurate, Robust, and Interpretable Graph

Mining. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge

Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA,

USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3580305.

3599413

1 INTRODUCTION

How can we solve semi-supervised node classication in various

types of graphs possibly with noisy features and structures? Graph

neural networks (GNNs) [

] have succeeded in various

∗Both authors contributed equally to this research.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

KDD ’23, August 6–10, 2023, Long Beach, CA, USA

ACM ISBN 979-8-4007-0103-0/23/08. . . $15.00

https://doi.org/10.1145/3580305.3599413

Graph

Noisy

Homophily

Heterophily

Feature

Semantic

Homophily

Noisy

Homophily

Heterophily

Semantic

SGC S2GC SAGE

Structural

SlimGGCNII GPR

Figure 1:

SlimG wins on all sanity checks.

Each row is a spe-

cic scenario of graph data that we propose for comprehen-

sive evaluation in Section 5. The table is generated from the

actual accuracy in Table 2: ✓means the accuracy ≥80%.

graph mining tasks such as node classication, clustering, or link

prediction. However, due to the diculty of training, hyperparame-

ter tuning, and even the selection of a model itself, many GNNs fail

to show their best performance when applied to a large testbed that

contains real-world graphs with various characteristics. Especially

when a graph contains noisy observations in its features and/or its

graphical structure, which is common in real-world data, existing

models easily overt their parameters to such noises.

In response to the question, we propose SlimG, our novel clas-

sier model on graphs based on the careful simplicity principle: a

simple carefully-designed model can be more accurate than com-

plex ones thanks to better generalizability, robustness, and easier

training. The four design decisions of SlimG (D1-4 in Sec. 4) are

carefully made to follow this principle by observing and addressing

the pain points of existing GNNs; we generate and combine various

types of graph-based features (D1), design structure-only features

(D2), remove redundancy in feature transformation (D3), and make

the propagator function contain no hyperparameters (D4).

The resulting model, SlimG, is our main contribution (C1) which

exhibits the following desirable properties:

•

C1.1 - Accurate on both real-world and synthetic datasets,

almost always winning or tying in the rst place (see Figure 2,

Table 2, and Table 3).

•

C1.2 - Robust, being able to handle numerous real settings

such as homophily, heterophily, no network eects, useless

features (see Figure 1 and Table 2).

•

C1.3 - Fast and scalable, using carefully chosen features,

it takes only 32 seconds on million-scale real-world graphs

(ogbn-Products) on a stock server (see Figure 2).

•

C1.4 - Interpretable, learning the largest weights on in-

formative features, ignoring noisy ones, based on the linear

decision function (see Figure 5).

arXiv:2210.04081v4 [cs.LG] 16 Jun 2023

KDD ’23, August 6–10, 2023, Long Beach, CA, USA Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, and Christos Faloutsos

10.4x

Faster

3.5%

Higher

2.5x

Faster

10.3%

Higher

18.0x

Faster

4.1%

Higher

Figure 2:

SlimG wins both on accuracy and training time

on (left) ogbn-arXiv, (middle) ogbn-Products, and (right) Pokec, which

are large real-world graphs (1.2M, 61.9M, and 30.6M edges, resp.). Several baselines run out of memory (crossed out).

Figure 3:

Why ‘less is more’:

The simple model

𝑓0(𝑥)=𝑐

(in

blue) matches the reality (in solid purple), while richer mod-

els with more polynomial powers end up capturing noise in

the given data: the tiny downward trend by

𝑓1(𝑥)

(in red) and

the spurious curvature by 𝑓2(𝑥)(in green).

Not only we propose a carefully designed, eective method (in

Sec. 4), but we also explain the reasons for its success. This is thanks

to our three additional contributions (C2-4):

•

C2 - Explanation (Sec. 3): We propose GnnExp, a frame-

work for the systematic linearization of a GNN. As shown

in Table 1, GnnExp highlights the similarities, dierences,

and weaknesses of successful GNNs.

•

C3 - Sanity checks (Sec. 5): We propose seven possible

scenarios of graphs (homophily, heterophily, no network

eects, etc.), which reveal the strong and weak points of

each GNN; see Figure 1 with more details in Table 2.

•

C4 - Experiments (Sec. 6): We conduct extensive experi-

ments to better understand the success of SlimG even with

its simplicity. Our results in Tables 5 to 9 show that SlimG

eectively selects the most informative component in each

dataset, fully exploiting its robustness and generality.

Less is more. Our justication for the counter-intuitive success

of simplicity is illustrated in Figure 3: A set of points are uniformly

distributed in

𝑥∈ (−

)

and

𝑦∈ (

)

, and the tting polynomials

𝑓𝑖(𝑥)

with degree

𝑖=

0, 1 and 2 are given. Notice that the simplest

model

𝑓0

(blue line) matches the true generator

𝑓(𝑥)=

5. Richer

models use the 1st and the 2nd degree powers (many cooks spoil the

broth) and end up modeling tiny artifacts, like the small downward

slope of

𝑓1

(red line), and the curvature of

𝑓2

(green line). This ‘many

cooks’ issue is more subtle and counter-intuitive than overtting, as

𝑓2

and

𝑓3

have only 2 to 3 unknown parameters to t to the hundred

data points: even a small statistical can fail if it is not matched with

the underlying data-generating mechanism.

Reproducibility. Our code, along with our datasets for sanity

checks, is available at https://github.com/mengchillee/SlimG.

2 BACKGROUND AND RELATED WORKS

2.1 Background

We dene semi-supervised node classication as follows:

•

Given An undirected graph

𝐺=(

)

, where A

∈R𝑛×𝑛

an adjacency matrix, X

∈R𝑛×𝑑

is a node feature matrix,

𝑛

is the number of nodes, and 𝑑is the number of features.

•

Given The labels y

∈ {

,· · · , 𝑐 }𝑚

𝑚

nodes in

𝐺

, where

𝑚≪𝑛, and 𝑐is the number of classes.

•Predict The unknown classes of 𝑛−𝑚test nodes.

We use the following symbols to represent adjacency matrices

with various normalizations and/or self-loops. ˜

A=A+Iis the ad-

jacency matrix with self-loops.

D=diag(˜

A1𝑛×1)

is the diagonal

degree matrix of

, where 1

𝑛×1

is the matrix of size

𝑛×

1lled

with ones.

Asym =˜

D−1/2˜

A˜

D−1/2

is the symmetrically normalized

. Similarly,

Asym =D−1/2AD−1/2

is also the symmetrically nor-

malized Abut without self-loops. We also use a dierent type of

normalization

Arow =D−1A

(and accordingly

Arow

), which we call

row normalization, based on the position of the matrix D.

As a background, we dene logistic regression (LR) as a function

to nd the weight matrix Wthat best maps given features to labels

with a linear function.

Denition 1 (LR).Given a feature X

∈R𝑛×𝑑

and a label y

∈R𝑚

where

𝑚

is the number of observations such that

𝑚≤𝑛

, let Y

∈R𝑚×𝑐

be the one-hot representation of y, and

𝑦𝑖 𝑗

be the

(𝑖, 𝑗)

-th element in

Y. Then, logistic regression (LR) is a function that nds an optimal

weight matrix W∈R𝑑×𝑐from Xand yas follows:

LR(X,y)=arg max

𝑚



𝑖=1

𝑐



𝑗=1

𝑦𝑖 𝑗 log somax𝑗(W⊤x𝑖),(1)

where

somax𝑗(·)

represents selecting the

𝑗

-th element of the result

of the softmax function. We omit the bias term for brevity.

2.2 Related Works

We introduce related works categorized into graph neural networks

(GNN), linear GNNs, and graph kernel methods.

Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining KDD ’23, August 6–10, 2023, Long Beach, CA, USA

Graph neural networks. There exist many recent GNN vari-

ants; recent surveys [

] group them into spectral models [

sampling-based models [

], attention-based models [

], and deep models with residual connections [

]. Decoupled

models [

] separate the two major functionalities of GNNs:

the node-wise feature transformation and the propagation. GNNs

are often fused with graphical inference [

] to further improve

the predicted results. These GNNs have shown great performance

in many graph mining tasks, but suer from limited robustness

when applied to graphs with various characteristics possibly having

noisy observations, especially in semi-supervised learning.

Linear graph neural networks. Wu et al. [33] proposed SGC

by removing the nonlinear activation functions of GCN [

], reduc-

ing the propagator function to a simple matrix multiplication. Wang

et al

. [32]

and Zhu and Koniusz

[39]

improved SGC by manually

adjusting the strength of self-loops with hyperparameters, increas-

ing the number of propagation steps. Li et al

. [20]

proposed G

CN,

which improves the accuracy of DGC [

] on heterophily graphs

by combining multiple propagation settings (i.e. bandwidths). The

main limitation of these models is the high complexity of propaga-

tor functions with many hyperparameters, which impairs both the

robustness and interpretability of decisions even with linearity.

Graph kernel methods. Traditional works on graph kernel

methods [

] are closely related to linear GNNs, which can be

understood as applying a linear graph kernel to transform the raw

features. A notable limitation of such kernel methods is that they are

not capable of addressing various scenarios of real-world graphs,

such as heterophily graphs, as their motivation is to aggregate

all information in the local neighborhood of each node, rather

than ignoring noisy and useless ones. We implement three popular

kernel methods as additional baselines and show that our SlimG

outperforms them in both synthetic and real graphs.

3 PROPOSED FRAMEWORK: GNNEXP

Why do GNNs work well when they do? In what cases will a GNN

fail? We answer these questions with GnnExp, our proposed frame-

work for revealing the essence of each GNN. The idea is to derive

the essential feature propagator function on which each variant is

based, ignoring nonlinearity, so that all models are comparable on

the same ground. The observations from GnnExp motivate us to

propose our method SlimG, which we describe in Section 4.

Denition 2 (Linearization).Given a graph

𝐺=(

)

, let

𝑓(·

;

𝜃)

be a node classier function to predict the labels of all nodes in

𝐺

y=𝑓(

𝜃)

, where

𝜃

is the set of parameters. Then,

𝑓

is linearized

if 𝜃={W}and the optimal weight matrix W∗∈Rℎ×𝑐is given as

W∗=LR(P(A,X),y),(2)

where

is a feature propagator function that is linear with Xand

contains no learnable parameters, and

) ∈ R𝑛×ℎ

. We ignore

the bias term without loss of generality.

Denition 3 (GnnExp).Given a GNN

𝑓

,GnnExp is to represent

𝑓

as a linearized GNN by replacing all (nonlinear) activation functions

𝑓

with the identity function and deriving a variant

𝑓′

that is at

least as expressive as 𝑓but contains no parameters in P.

GnnExp represents the characteristic of a GNN as a linear fea-

ture propagator function

, which transforms raw features Xby

Table 1:

GnnExp encompasses popular GNNs.

The * and **

superscripts mark fully and partially linearized models, re-

spectively. We derive Pain Points (Sec. 3.1) and Distinguishing

Factors (Sec. 3.2) of the variants through GnnExp.

Model Type Propagator function P (A,X)

LR Linear X

SGC Linear ˜

A𝐾

sym X

DGC Linear [(1−𝑇/𝐾)I+ (𝑇/𝐾)˜

Asym]𝐾X

S2GC Linear Í𝐾

𝑘=1(𝛼I+ (1−𝛼)˜

A𝑘

sym)X

G2CN Linear ∥𝑁

𝑖=1[I− (𝑇𝑖/𝐾)((𝑏𝑖−1)I+Asym)2]𝐾X

PPNP* Decoupled (I− (1−𝛼)˜

Asym)−1X

APPNP* Decoupled [Í𝐾−1

𝑘=0𝛼(1−𝛼)𝑘˜

A𝑘

sym + (1−𝛼)𝐾˜

A𝐾

sym]X

GDC* Decoupled S=sparse𝜖(Í∞

𝑘=0(1−𝛼)𝑘˜

A𝑘

sym)for ˜

Ssym X

GPR-GNN* Decoupled ∥𝐾

𝑘=0˜

A𝑘

sym X

ChebNet* Coupled ∥𝐾−1

𝑘=0A𝑘

sym X

GCN* Coupled ˜

A𝐾

sym X

SAGE* Coupled ∥𝐾

𝑘=0A𝑘

row X

GCNII* Coupled ∥𝐾−2

𝑘=0˜

A𝑘

symX∥ ((1−𝛼)˜

A𝐾

sym +𝛼˜

A𝐾−1

sym )X

H2GCN* Coupled ∥2𝐾

𝑘=0A𝑘

symX

GAT** Attention Î𝐾

𝑘=1[diag(Xw𝑘,1)˜

A+˜

Adiag(Xw𝑘,2)] X

DA-GNN** Attention Í𝐾

𝑘=0diag(˜

A𝑘

symXw)˜

A𝑘

sym X

utilizing the graph structure A. Lemma 1 shows that GnnExp gener-

alizes existing linear GNNs. Logistic regression is also represented

by GnnExp with the identity propagator P (A,X)=X.

Lemma 1. GnnExp includes existing linear graph neural networks

as its special cases: SGC, DGC, S2GC, and G2CN.

Proof. The proof is given in Appendix A.1. ■

In Table 1, we conduct a comprehensive linearization of existing

GNNs using GnnExp to understand the fundamental similarities

and dierences among GNN variants. The models are categorized

into linear, decoupled, coupled, and attention models. We ignore

bias terms for simplicity, without loss of generality. Refer to Ap-

pendix B for details of the linearization process for each model.

3.1 Pain Points of Existing GNNs

Based on the comprehensive linearization in Table 1, we derive four

pain points of existing GNNs which we address in Section 4.

Pain Point 1 (Lack of robustness).All models in Table 1 fail to

handle multiple graph scenarios at the same time, i.e., graphs with

homophily, heterophily, no network eects, or useless features.

Most models in Table 1 make an implicit assumption on given

graphs, such as homophily or heterophily, rather than being able to

perform well in multiple scenarios at the same time. For example, all

models except ChebNet, SAGE, and H

GCN have self-loops in the

new adjacency matrix, emphasizing the local neighborhood of each

node even in graphs with heterophily or no network eects. This

is the pain point that we also observe empirically from the sanity

checks (in Table 2), where none of the existing models succeeds in

making reasonable accuracy in all cases of synthetic graphs.

Pain Point 2 (Vulnerability to noisy features).All models in Table 1

cannot fully exploit the graph structure if the features are noisy, since

they depend on the node feature matrix X.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LessisMore:SlimGforAccurate,Robust,andInterpretableGraphMiningJaeminYoo∗CarnegieMellonUniversityPittsburgh,USAjaeminyoo@cmu.eduMeng-ChiehLee∗CarnegieMellonUniversityPittsburgh,USAmengchil@cs.cmu.eduShubhranshuShekharCarnegieMellonUniversityPittsburgh,USAshubhras@andrew.cmu.eduChristosFaloutsosCarneg...

展开>> 收起<<

Less is More SlimG for Accurate Robust and Interpretable Graph Mining_2.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Less is More SlimG for Accurate Robust and Interpretable Graph Mining_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: