Efficient Graph Neural Network Inference at Large Scale

2025-08-18 0 0 752.51KB 10 页 10玖币
侵权投诉
Eicient Graph Neural Network Inference at Large Scale
Xinyi Gao
The University of Queensland
xinyi.gao@uq.edu.au
Wentao Zhang
Mila - Québec AI Institute
HEC Montréal
wentao.zhang@mila.quebec
Yingxia Shao
Beijing University of Posts and
Telecommunications
shaoyx@bupt.edu.cn
Quoc Viet Hung Nguyen
Grith University
henry.nguyen@grith.edu.au
Bin Cui
Peking University
bin.cui@pku.edu.cn
Hongzhi Yin
The University of Queensland
h.yin1@uq.edu.au
ABSTRACT
Graph neural networks (GNNs) have demonstrated excellent perfor-
mance in a wide range of applications. However, the enormous size
of large-scale graphs hinders their applications under real-time in-
ference scenarios. Although existing scalable GNNs leverage linear
propagation to preprocess the features and accelerate the training
and inference procedure, these methods still suer from scalability
issues when making inferences on unseen nodes, as the feature
preprocessing requires the graph is known and xed. To speed up
the inference in the inductive setting, we propose a novel adaptive
propagation order approach that generates the personalized prop-
agation order for each node based on its topological information.
This could successfully avoid the redundant computation of feature
propagation. Moreover, the trade-o between accuracy and infer-
ence latency can be exibly controlled by simple hyper-parameters
to match dierent latency constraints of application scenarios. To
compensate for the potential inference accuracy loss, we further
propose Inception Distillation to exploit the multi-scale reception
information and improve the inference performance. Extensive
experiments are conducted on four public datasets with dierent
scales and characteristics, and the experimental results show that
our proposed inference acceleration framework outperforms the
SOTA graph inference acceleration baselines in terms of both ac-
curacy and eciency. In particular, the advantage of our proposed
method is more signicant on larger-scale datasets, and our frame-
work achieves 75
×
inference speedup on the largest Ogbn-products
dataset.
1 INTRODUCTION
Developing a graph neural network (GNN) for very large graphs
has drawn increasing attention due to the powerful expressiveness
of GNNs and their enormous success in many industrial applica-
tions [
11
,
29
,
33
]. Although, GNNs provide a universal framework
to tackle various down-streaming tasks, performing the model
on large-scale industrial graphs suers from heavy computation
and high latency. This severely limits the application to latency-
sensitive scenarios. For example, recommender systems designed
for streaming sessions must completely perform real-time infer-
ence on user-item interaction graphs [
1
,
24
,
31
,
34
]. The fraud and
spam detection tasks require millisecond-level inference on the
million-scale graph to identify the malicious users and avoid the
property loss of the victim users [
18
,
21
,
30
]. In some computer
vision applications, GNNs are designed for 3D point clouds data
and deployed on edge devices such as self-driving cars to perform
object detection or semantic segmentation tasks [
17
,
23
,
25
]. In such
scenarios, real-time inference response is essential.
The root cause for the heavy computation and high latency of
GNNs is the neighbor explosion problem. Generally, most GNNs
adopt the message-passing pipeline and leverage the feature prop-
agation and transformation procedures to construct the model.
Through executing
𝑘
times of feature propagation, the
𝑘
-th or-
der propagated features can capture the node information from
𝑘
-hop neighborhoods. Especially in large-scale and sparsely labeled
graphs, multiple layers of propagation are needed to aggregate
enough label information from distant neighbors according to the
message-passing pipeline [
12
,
28
,
40
,
43
,
47
]. However, as the order
of propagation layers increases, the number of supporting nodes
grows exponentially. This directly incurs the high computation cost
of feature propagation.
To mitigate the expensive computation resulted from feature
propagation, several linear propagation-based GNNs [
4
,
6
,
32
,
45
,
46
,
49
,
52
], e.g., SGC, were proposed to remove the non-linearity
among feature propagation and aggregate node features during
the preprocessing procedure. Instead of performing feature prop-
agation during each training epoch, this time-consuming process
is only executed once in linear propagation-based GNNs. As a re-
sult, the time complexity of model training is signicantly reduced,
and the training of these models scales well with graph size. How-
ever, linear propagation-based GNNs still struggle with ecient
inference at scale because the preprocessing of feature propaga-
tion is based on the premise that the graph is known and xed.
This strong premise severely limits real-world applications, and
more practical scenarios require inference on unseen nodes, where
feature propagation has to be executed online. In addition, these
existing methods adopt a xed propagation order for all nodes. Due
to the complex topological structures, the xed propagation order
restricts the exibility of exploiting the multi-scale reception elds
and also tends to over-smooth the high-degree nodes, leading to
wasted computation and performance degradation.
To this end, we propose to reduce the redundant computation of
feature propagation to further accelerate the inference of scalable
GNNs. Specically, we design a plug-and-play technique: Node-
Adaptive Inference (NAI), which introduces node-wise adaptive
propagation order (or propagation depth) to customize the propa-
gation order for each node. By measuring the distance between the
current feature and the stationary state, the smoothing status of the
propagated feature is evaluated. Then we introduce simple global
hyper-parameters to adaptively determine the propagation order
for each node and eciently trade o between inference latency
arXiv:2211.00495v3 [cs.LG] 28 Dec 2022
and accuracy. This provides a variety of inference options for users
with dierent latency constraints. Moreover, we design a novel
Inception Distillation module in NAI to exploit the multi-scale re-
ception eld information and mitigate performance degradation.
With a more powerful supervision signal, NAI could accelerate the
inference speed with a negligible performance drop.
The main contributions of this paper are summarized as follows:
New Scenario.
We focus on the inference speedup in a more
real and challenging setting - graph-based inductive inference,
where the ever-scalable GNNs also struggle with heavy online
computation of feature propagation.
New Methodology.
Instead of using the xed order of feature
propagation as done in existing GNNs and other acceleration
methods, we propose a novel adaptive propagation order ap-
proach that generates the personalized propagation order for
each node based on its topological information. This could suc-
cessfully avoid the redundant computation of feature propaga-
tion and mitigate the over-smoothing problem. Moreover, the
trade-o between accuracy and inference latency can be exibly
controlled by simple hyper-parameters. To compensate for the
potential inference accuracy loss, we further propose Inception
Distillation to exploit the multi-scale reception information to
improve the inference performance.
SOTA Performance.
Extensive experiments are conducted on
four public datasets with dierent scales and characteristics, and
the experimental results show that our proposed ecient infer-
ence framework NAI outperforms the SOTA graph inference
acceleration baselines in terms of both accuracy and eciency.
In particular, the advantage of our NAI is more signicant on
larger-scale datasets, and NAI achieves 75
×
inference speedup
on the largest Ogbn-products dataset.
2 PRELIMINARY
2.1 Problem Formulation
Given a graph
G
= (
V
,
E
) with
|V | =𝑛
nodes and
|E| =𝑚
edges, its
node adjacency matrix and degree matrix are denoted as
AR𝑛×𝑛
and
D=diag(𝑑1, 𝑑2, ..., 𝑑𝑛
), where
𝑑𝑖=Í𝑣𝑗V A𝑖,𝑗
is the degree of
node
𝑣𝑖
. The adjacency matrix and degree matrix with self-loops are
denoted as
e
A
and
e
D
. The node feature matrix is
X={𝒙1,𝒙2, ..., 𝒙𝑛}
in which
𝒙𝑖R𝑓
represents the node attribute vector of
𝑣𝑖
, and
Y={𝒚1,𝒚2, ...,𝒚𝑙}
is the one-hot label matrix for classication task.
In the inductive setting, the entire node set
V
is partitioned into
training set
V
𝑡𝑟𝑎𝑖𝑛
(including labeled set
V
𝑙
and unlabeled set
V
𝑢
)
and test set
V
𝑡𝑒𝑠𝑡
. GNNs are trained on
G𝑡𝑟𝑎𝑖𝑛
which only includes
V
𝑡𝑟𝑎𝑖𝑛
and all edges connected to
𝑣∈ V
𝑡𝑟𝑎𝑖𝑛
. The evaluation is to
test the performance of trained GNNs on V
𝑡𝑒𝑠𝑡 .
2.2 Scalable Graph Neural Networks
GNNs
aim to learn node representation by using topological infor-
mation and node attributes. The existing GNNs adopt the message-
passing pipeline and construct models utilizing two processes: fea-
ture propagation and transformation. By stacking multiple layers,
the (𝑘+1)-th layer feature matrix X(𝑘+1)can be formulated as:
X(𝑘+1)=𝛿ˆ
AX(𝑘)W(𝑘),
ˆ
A=e
D𝑟1e
Ae
D𝑟,
(1)
where
W(𝑘)
is the model weights,
𝛿(·)
is the activation function
and
e
D
is the diagonal node degree matrix used to normalize
e
A
. In
each layer,
ˆ
A
propagates the information among neighbors, and
W(𝑘)
transforms the propagated features. Note that,
𝑟∈ [
0
,
1
]
is
the convolution coecient and could generalize Eq. (1) to various
existing models. By setting
𝑟=
1, 0.5 and 0, the convolution matrix
ˆ
A
represents the transition probability matrix
e
Ae
D1
[
5
,
8
,
41
], the
symmetric normalization adjacency matrix
e
D1
2e
Ae
D1
2
[
7
,
16
] and
the reverse transition probability matrix e
D1e
A[36], respectively.
Linear Propagation-based Scalable GNNs
. Although GNNs
achieve excellent performance by executing multiple feature propa-
gation and transformation processes, it was found that the aggrega-
tion of neighbor features (i.e., feature propagation) makes a major
contribution to the performance of GNNs and plays a more impor-
tant role [
32
]. Based on this nding, to improve the scalability of
GNNs, SGC [
32
] was proposed to decompose the two processes and
remove feature transformations in the middle layers. It propagates
the node features for
𝑘
times and then feeds
𝑘
-th order propagated
feature
X(𝑘)=ˆ
A𝑘X
to a linear model for classication. Beneting
from the linear propagation, SGC facilitates the precomputation of
the feature matrix and successfully reduces the training time.
Following SGC, more powerful scalable GNNs are designed by
adopting linear propagation. For example, SIGN [
6
] proposes to
transform propagated features in dierent orders by linear transfor-
mations, then concatenates them together to enhance the feature
representation. The transformation objective can be represented
as :
X(0)W0|| X(1)W1|| ... || X(𝑘)W𝑘
, where
|| ·||
denotes concate-
nation operations and
W𝑘
are transformation matrixes.
S2GC
[
52
]
averages propagated features in dierent orders to construct a
simple spectral graph convolution:
1
𝑘Í𝑘
𝑙=0X(𝑙)
. GAMLP[
49
] com-
bines propagated features in dierent orders by measuring the
feature information gain and constructing the node-wise attention
mechanism:
Í𝑘
𝑙=0𝑇(𝑙)X(𝑙)
, where
𝑇(𝑙)
are diagonal node-wise at-
tention matrices. The non-parametric feature propagation used in
these methods can successfully speed up the training procedure
and transductive graph inference by preprocessing the propagated
features. However, they cannot accelerate the graph inductive in-
ference on unseen nodes, as the feature preprocessing requires the
graph is known and xed.
3 METHOD
3.1 Architecture Overview
Figure 1 shows the overview of NAI for linear propagation-based
GNNs. Without loss of generality, we deploy NAI on SGC as an
example. For the training procedure, NAI employs Inception Dis-
tillation to compensate for the potential inference accuracy loss,
which includes two steps: oine distillation and online distillation.
Specically, given the raw feature matrix
X
, we rst compute the
propagated features in dierent orders
X(𝑙)
, where 1
𝑙𝑘
. Then,
2
摘要:

EfficientGraphNeuralNetworkInferenceatLargeScaleXinyiGaoTheUniversityofQueenslandxinyi.gao@uq.edu.auWentaoZhangMila-QuébecAIInstituteHECMontréalwentao.zhang@mila.quebecYingxiaShaoBeijingUniversityofPostsandTelecommunicationsshaoyx@bupt.edu.cnQuocVietHungNguyenGriffithUniversityhenry.nguyen@griffith....

展开>> 收起<<
Efficient Graph Neural Network Inference at Large Scale.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:752.51KB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注