Uplifting Message Passing Neural Network with Graph Original Information

2025-04-15 0 0 781.57KB 13 页 10玖币
侵权投诉
Uplifting Message Passing Neural Network with Graph Original Information
Xiao Liu 1Lijun Zhang 1Hui Guan 1
Abstract
Message passing neural networks (MPNNs) learn
the representation of graph-structured data based
on graph original information, including node
features and graph structures, and have shown
astonishing improvement in node classification
tasks. However, the expressive power of MPNNs
is upper bounded by the first-order Weisfeiler-
Leman test and its accuracy still has room for
improvement. This work studies how to improve
MPNNs’ expressiveness and generalizability by
fully exploiting graph original information both
theoretically and empirically. It further proposes
a new GNN model called INGNN (INformation-
enhanced Graph Neural Network) that leverages
the insights to improve node classification perfor-
mance. Extensive experiments on both synthetic
and real datasets demonstrate the superiority (av-
erage rank 1.78) of our INGNN compared with
state-of-the-art methods.
1. Introduction
Graph has been widely used to model structured data such
as proteins, social networks, and traffic networks. Such
graph-structured data usually consists of two types of raw
information, graph structure that describes connectivity be-
tween nodes, and node features that describes the attributes
of the nodes. We refer to the graph structure and the node
features collectively as graph original information. One of
the most popular machine learning tasks on graph-structured
data is node classification, whose goal is to predict the label,
which could be a type or category, associated with all the
nodes when we are only given the true labels on a subset of
nodes (Hamilton,2020).
Graph Neural Networks (GNNs) have been proven to be
a powerful approach to learning graph representations for
node classification tasks (Hamilton et al.,2017;Klicpera
et al.,2018;Wu et al.,2019). The most common type of
1
Department of Computer Science, University of Mas-
sachusetts Amherst. Correspondence to: Xiao liu
<
xi-
aoliu1990@cs.umass.edu>.
GNNs adopts a message passing paradigm that recursively
propagates and aggregates node features through the graph
structure to produce node representations (Gilmer et al.,
2017;Battaglia et al.,2018). The message passing paradigm,
fully relying on the graph original information and showing
premise of automatic learning, produces much better task
performance than traditional graph machine learning that
requires deliberate feature engineering.
However, unlike multi-layer feedforward networks (MLPs)
which could be universal approximators for any continu-
ous function (Hornik et al.,1989), message passing neural
networks (MPNNs) cannot approximate all graph functions
(Maron et al.,2018). MPNNs’ expressive power is upper
bounded by the first-order Weisfeiler-Leman (1-WL) iso-
morphism test (Xu et al.,2018), indicating that they cannot
distinguish two non-isomorphic graph structures if the 1-WL
test fails. This limitation of MPNNs implies open opportu-
nities to further improve MPNN’s performance.
Researchers have explored several directions in increasing
the expressive power of MPNNs. As
k
-WL is strictly more
expressive than 1-WL, many works tried to mimic the high-
order WL tests such as 1-2-3 GNN (Morris et al.,2019),
PPGN (Maron et al.,2019), ring-GNN (Chen et al.,2019).
However, they required
O(k)
-order tensors to achieve
k
-
WL expressiveness, and thus cannot be generalized to large-
scale graphs. In order to maintain linear scalability w.r.t. the
number of nodes, recent work focused on developing more
powerful MPNNs by adding handcrafted features, such as
subgraph counts (Bouritsas et al.,2022), distance encod-
ing (Li et al.,2020), and random features (Abboud et al.,
2020;Sato et al.,2021), to make nodes more distinguish-
able. Although these works achieve good results in many
cases, their handcrafted features could introduce inductive
bias that hurts generalizability. More importantly, they lose
the premise of automatic learning (Zhao et al.,2021), an
indispensable property of MPNNs that makes MPNNs more
appealing than traditional graph machine learning. This
trend raises a fundamental question: is it possible to improve
MPNNs’ expressiveness, without sacrificing the linear scal-
ability of MPNNs w.r.t. the number of nodes and also their
automatic learning property by getting rid of handcrafted
features?
In this work, we confirm the answer to the question and
arXiv:2210.05382v2 [cs.LG] 26 Jan 2023
INGNN: Uplifting Message Passing Neural Network with Graph Original Information
demonstrate that, simply exploiting the graph original in-
formation rather than deliberately handcrafting features can
improve the performance of MPNNs while preserving their
linear scalability. The two types of graph original informa-
tion, graph structures and node features, can be used both
together and separately, leading to three features including
ego-node features,graph structure features, and aggregated
neighborhood features. MPNNs, however, leverage only the
last feature for downstream tasks. We find that all three fea-
tures are necessary to improve the expressive power or the
generalizability of MPNNs and prove it both theoretically
and empirically. Based on the insight, we further propose an
INformation-enhanced MPNN (INGNN), which adaptively
fuses these three features to derive nodes’ representation and
achieves outstanding accuracy compared to state-of-the-art
baselines. We summarize our main contributions as follows:
Theoretical Findings.
We show theoretically that the
expressiveness of an MPNN that integrates graph struc-
ture features is strictly better than
1&2
-WL and no
less powerful than
3
-WL. We also prove that the node
misclassification rate for graphs in some cases can be
depressed by utilizing ego-node features separately
from MPNNs. We empirically verify the necessity of
the three features and show that the node classifica-
tion accuracy of start-of-the-art models, GCN (Kipf &
Welling,2016), H2GCN (Zhu et al.,2020), and LINKX
(Lim et al.,2021a) can be improved by adding some
of the three identified features that they do not have
originally.
INGNN.
We propose INGNN, a powerful GNN which
uplifts MPNNs’ expressiveness and accuracy by inte-
grating graph structure features and ego-node features.
The features are fused automatically with adaptive fea-
ture fusion under a bi-level optimization framework.
Our ablation study demonstrates that all three features,
the fusing mechanism, and the optimization procedure
play indispensable roles on different types of graphs.
Evaluation.
We conduct extensive experiments to
compare INGNN with state-of-the-art GNNs using
synthetic and real graph benchmarks that cover the full
homophily spectrum. INGNN outperforms 12 baseline
models and achieves an average rank of 1.78 on 9 real
datasets. INGNN achieves higher node classification
accuracy,
4.17%
higher than GCN (Kipf & Welling,
2016),
4.01%
higher than MIXHOP (Abu-El-Haija
et al.,2019), and
3.43%
higher than GPR-GNN (Chien
et al.,2020), on average over the real datasets.
2. Related Works
Regular MPNNs.
Many GNNs (Niepert et al.,2016;
Hamilton et al.,2017;Monti et al.,2017;Veli
ˇ
ckovi
´
c et al.,
2017;Gao et al.,2018;Xu et al.,2018;Wang et al.,2019)
fall into the message passing framework (Gilmer et al.,2017;
Battaglia et al.,2018), which iteratively transforms and prop-
agate the messages from the spatial neighborhoods through
the graph topology to update the embedding of a target node.
To name a few, GCN (Kipf & Welling,2016) designs a layer-
wise propagation rule based on a first-order approximation
of spectral convolutions on graphs. GraphSAGE (Hamilton
et al.,2017) extends GCN by introducing a recursive node-
wise sampling scheme to improve the scalability. Graph
attention networks (GAT) (Veli
ˇ
ckovi
´
c et al.,2017) enhances
GCN with the attention mechanism (Vaswani et al.,2017).
Improve Expressiveness.
Several works attempt to im-
prove the expressiveness of GNNs. The first line of research
mimics the high-order WL tests such as 1-2-3 GNN (Morris
et al.,2019), PPGN (Maron et al.,2019), ring-GNN (Chen
et al.,2019). They usually require exponentially increas-
ing space and time complexity w.r.t. the number of nodes
in a graph and thus cannot be generalized to large-scale
graphs. The second line of research maintains the linear
scalability of more powerful GNNs by adding features, such
as subgraph counts (Bouritsas et al.,2022), distance encod-
ing (Li et al.,2020), and random features (Abboud et al.,
2020;Sato et al.,2021). The handcrafted features may intro-
duce inductive bias that hurts generalization and also lose
the premise of automatic learning (Zhao et al.,2021). The
third line of work tries to design more expressive GNNs
by involving high-order neighbors. For example, H2GCN
(Zhu et al.,2020) identifies ego- and neighbor-embedding
separation, second-order neighbors, and the combination of
intermediate representations to boost representation learn-
ing. MixHop (Abu-El-Haija et al.,2019) proposes a graph
convolutional layer that utilizes multiple powers of the adja-
cency matrix to learn general mixed neighborhood informa-
tion. Similarly, Generalized PageRank (GPR-) GNN (Chien
et al.,2020) adaptively controls the contribution of different
propagation steps. In this paper, we propose to improve
MPNNs’ expressiveness by integrating the original graph
structure features so that we can retain the linear complexity
and the automatic learning property of MPNNs.
Improve Generalizability.
Recent works start to pay at-
tention to improving the GNNs’ generalizability, including
overcoming over-smoothing and handling graphs with var-
ious homophily. DAGNN (Liu et al.,2020) proposes to
decouple the transformation and the propagation operation
to increase receptive fields. APPNP (Klicpera et al.,2018)
leverages personalized PageRank to improve the propaga-
tion scheme. Later, BernNet (He et al.,2021) learns an
arbitrary spectral filter such as the ones in GCN, DAGNN,
and APPNP via Bernstein polynomial approximation. Sim-
ilarly, AdaGNN (Dong et al.,2021) introduces trainable
filters to capture the varying importance of different fre-
quency components. Considering graph heterophily, ACM-
GCN (Luan et al.,2022) proposes a multi-channel mixing
INGNN: Uplifting Message Passing Neural Network with Graph Original Information
(a) (4,4)-rook graph (b) Shrikhande graph
Figure 1.
Two non-isomorphic strongly regular graphs that cannot
be distinguished by 3-WL. The right side is the neighborhood
subgraph of an arbitrary node .
mechanism, enabling adaptive filtering at nodes with dif-
ferent homophily. And GloGNN (Li et al.,2022) handles
low-homophilic graphs with global homophily. To improve
MPNN’s generalizability, we propose to enhance MPNNs
with ego-node features, and show that it can depress the
node misclassification rate in some graph cases.
3. Improve MPNNs and Theory
This section first introduces the notations and then describes
how we can exploit the graph original information to im-
prove the expressiveness and the generalizability of MPNNs.
Notations.
Let
G= (V,E,X)
denotes a graph with
N
nodes and
M
edges, where
V
and
E
are the set of nodes
and edges respectively,
|V| =N
and
|E| =M
. We use
A∈ {0,1}N×N
as the adjacency matrix where
A[i, j]=1
if
(vi, vj)∈ E
otherwise
A[i, j]=0
. Each node
vi∈ V
has
a raw feature vector
xi
of size
D
. The raw feature vectors
of all nodes in the graph form a feature matrix
XRN×D
.
We refer to the adjacency matrix
A
and the feature matrix
Xtogether as graph original information.
We attempt to improve MPNNs by fully exploiting graph
original information. Specifically, given a graph
G
, we
investigate the three features
fego(X)
,
fstrc(A)
, and
faggr(X,A)
, which result from using the node features
X
and the adjacency matrix
A
together and separately.
fego(·)
and
fstrc(·)
can be MLPs for simulating any continuous
function, and
faggr(·)
is an MPNN. For the description pur-
pose, we refer to the output of
fego(X)
as ego-node features
since it only takes node feature itself as input, outputs of
fstrc(A)
as graph structure features since it only looks at
the graph structure, and outputs of
faggr(X,A)
as aggre-
gated neighborhood features.
3.1. Improve Expressiveness.
We find that integrating the graph structure features to
MPNNs, i.e., combining
fstrc(A)
and
faggr(X,A)
, can
improve the expressiveness of MPNNs.
Theorem 1.
The expressive power of the combination of
the aggregated neighborhood features and the graph struc-
ture feature, i.e.,
Fcomb(X,A) = [faggr (X,A), fstrc(A)]
,
where
fstrc
is injective functions, and
faggr
is an MPNN
with a sufficient number of layers that are also injective
functions, is strictly more powerful than 1&2-WL, and no
less powerful than 3-WL.
Proof of Theorem 1.
We first prove that
Fcomb
is at least
as powerful as 1-WL. In other words, if two graphs are
identified by
Fcomb
as isomorphic, then it is also identified
as isomorphic by 1-WL. Given two graphs
G1
and
G2
, if they
are identified as isomorphic by
Fcomb
, it means that
Fcomb
maps them to the same representation. Then we know that
every part of the two representations is the same as well,
e.g.,
faggr(XG1,AG1) = faggr (XG2,AG2)
, meaning that
faggr
cannot distinguish the two graphs. Since
faggr
is as
powerful as 1-WL (Maron et al.,2019;Azizian & Lelarge,
2020), G1and G2cannot be distinguished by 1-WL either.
We then prove that
Fcomb
is no less powerful than 3-WL,
which means that there exist some graphs that could be
distinguished by
Fcomb
but not by 3-WL (Chen et al.,2020).
We prove it using the two graphs in Figure 1, which are
strongly regular graphs with the same graph parameters
(16,6,2,2) 1
. These two graphs cannot be distinguished by
3-WL
2
. The figure also shows an example neighborhood
subgraph around an arbitrary node marked as red.
Now we prove that
Fcomb
can successfully distinguish the
graphs in Figure 1(a) and 1(b), due to the use of graph struc-
ture features. Notice that all 1-hop subgraphs for each node
in Figure 1(a) (also in Figure 1(b)) are the same. It implies
that
Fcomb
can distinguish them as long as it can distinguish
one subgraph pair as shown on the right side of Figure 1(a)
and 1(b). We can easily prove that
Fcomb
can distinguish
the two subgraphs, namely they are non-isomorphic, since
their adjacency matrices are non-identical under any neigh-
borhood permutation. A simple algorithm for proving it
is to enumerate all the neighborhood permutations, then
check if there is one pair of adjacency matrices that are the
same under some permutation. In this case, the enumeration
algorithm will return False. Given that the adjacency ma-
trices of the two subgraphs are different with any possible
node permutation,
Fcomb
will identify the two subgraphs as
non-isomorphic because
fstrc
is an injective function and
will map the adjacency matrices of the two subgraphs into
different representations. Therefore, Figure 1(a) and 1(b)
can be distinguished as well, since all subgraph pairs are the
same and can be distinguished. Hence, we have proven that
1
Strongly regular graphs are regular graphs with
v
nodes and de-
gree
k
. And every two adjacent vertices have
λ
common neighbors
and every two non-adjacent vertices have
µ
common neighbors.
The tuple (
v, k, λ, µ
) is the parameters of a strongly regular graph.
2
Any strongly regular graphs with the same parameters cannot
be distinguished by 3-WL (Arvind et al.,2020).
摘要:

UpliftingMessagePassingNeuralNetworkwithGraphOriginalInformationXiaoLiu1LijunZhang1HuiGuan1AbstractMessagepassingneuralnetworks(MPNNs)learntherepresentationofgraph-structureddatabasedongraphoriginalinformation,includingnodefeaturesandgraphstructures,andhaveshownastonishingimprovementinnodeclassicat...

展开>> 收起<<
Uplifting Message Passing Neural Network with Graph Original Information.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:学术论文 价格:10玖币 属性:13 页 大小:781.57KB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注