Efficient Graph Neural Network Inference at Large Scale

2025-08-18 0 0 752.51KB 10 页 10玖币

侵权投诉

Eicient Graph Neural Network Inference at Large Scale

Xinyi Gao

The University of Queensland

xinyi.gao@uq.edu.au

Wentao Zhang

Mila - Québec AI Institute

HEC Montréal

wentao.zhang@mila.quebec

Yingxia Shao

Beijing University of Posts and

Telecommunications

shaoyx@bupt.edu.cn

Quoc Viet Hung Nguyen

Grith University

henry.nguyen@grith.edu.au

Bin Cui

Peking University

bin.cui@pku.edu.cn

Hongzhi Yin

The University of Queensland

h.yin1@uq.edu.au

ABSTRACT

Graph neural networks (GNNs) have demonstrated excellent perfor-

mance in a wide range of applications. However, the enormous size

of large-scale graphs hinders their applications under real-time in-

ference scenarios. Although existing scalable GNNs leverage linear

propagation to preprocess the features and accelerate the training

and inference procedure, these methods still suer from scalability

issues when making inferences on unseen nodes, as the feature

preprocessing requires the graph is known and xed. To speed up

the inference in the inductive setting, we propose a novel adaptive

propagation order approach that generates the personalized prop-

agation order for each node based on its topological information.

This could successfully avoid the redundant computation of feature

propagation. Moreover, the trade-o between accuracy and infer-

ence latency can be exibly controlled by simple hyper-parameters

to match dierent latency constraints of application scenarios. To

compensate for the potential inference accuracy loss, we further

propose Inception Distillation to exploit the multi-scale reception

information and improve the inference performance. Extensive

experiments are conducted on four public datasets with dierent

scales and characteristics, and the experimental results show that

our proposed inference acceleration framework outperforms the

SOTA graph inference acceleration baselines in terms of both ac-

curacy and eciency. In particular, the advantage of our proposed

method is more signicant on larger-scale datasets, and our frame-

work achieves 75

inference speedup on the largest Ogbn-products

dataset.

1 INTRODUCTION

Developing a graph neural network (GNN) for very large graphs

has drawn increasing attention due to the powerful expressiveness

of GNNs and their enormous success in many industrial applica-

tions [

]. Although, GNNs provide a universal framework

to tackle various down-streaming tasks, performing the model

on large-scale industrial graphs suers from heavy computation

and high latency. This severely limits the application to latency-

sensitive scenarios. For example, recommender systems designed

for streaming sessions must completely perform real-time infer-

ence on user-item interaction graphs [

]. The fraud and

spam detection tasks require millisecond-level inference on the

million-scale graph to identify the malicious users and avoid the

property loss of the victim users [

]. In some computer

vision applications, GNNs are designed for 3D point clouds data

and deployed on edge devices such as self-driving cars to perform

object detection or semantic segmentation tasks [

]. In such

scenarios, real-time inference response is essential.

The root cause for the heavy computation and high latency of

GNNs is the neighbor explosion problem. Generally, most GNNs

adopt the message-passing pipeline and leverage the feature prop-

agation and transformation procedures to construct the model.

Through executing

𝑘

times of feature propagation, the

𝑘

-th or-

der propagated features can capture the node information from

𝑘

-hop neighborhoods. Especially in large-scale and sparsely labeled

graphs, multiple layers of propagation are needed to aggregate

enough label information from distant neighbors according to the

message-passing pipeline [

]. However, as the order

of propagation layers increases, the number of supporting nodes

grows exponentially. This directly incurs the high computation cost

of feature propagation.

To mitigate the expensive computation resulted from feature

propagation, several linear propagation-based GNNs [

], e.g., SGC, were proposed to remove the non-linearity

among feature propagation and aggregate node features during

the preprocessing procedure. Instead of performing feature prop-

agation during each training epoch, this time-consuming process

is only executed once in linear propagation-based GNNs. As a re-

sult, the time complexity of model training is signicantly reduced,

and the training of these models scales well with graph size. How-

ever, linear propagation-based GNNs still struggle with ecient

inference at scale because the preprocessing of feature propaga-

tion is based on the premise that the graph is known and xed.

This strong premise severely limits real-world applications, and

more practical scenarios require inference on unseen nodes, where

feature propagation has to be executed online. In addition, these

existing methods adopt a xed propagation order for all nodes. Due

to the complex topological structures, the xed propagation order

restricts the exibility of exploiting the multi-scale reception elds

and also tends to over-smooth the high-degree nodes, leading to

wasted computation and performance degradation.

To this end, we propose to reduce the redundant computation of

feature propagation to further accelerate the inference of scalable

GNNs. Specically, we design a plug-and-play technique: Node-

Adaptive Inference (NAI), which introduces node-wise adaptive

propagation order (or propagation depth) to customize the propa-

gation order for each node. By measuring the distance between the

current feature and the stationary state, the smoothing status of the

propagated feature is evaluated. Then we introduce simple global

hyper-parameters to adaptively determine the propagation order

for each node and eciently trade o between inference latency

arXiv:2211.00495v3 [cs.LG] 28 Dec 2022

and accuracy. This provides a variety of inference options for users

with dierent latency constraints. Moreover, we design a novel

Inception Distillation module in NAI to exploit the multi-scale re-

ception eld information and mitigate performance degradation.

With a more powerful supervision signal, NAI could accelerate the

inference speed with a negligible performance drop.

The main contributions of this paper are summarized as follows:

•New Scenario.

We focus on the inference speedup in a more

real and challenging setting - graph-based inductive inference,

where the ever-scalable GNNs also struggle with heavy online

computation of feature propagation.

•New Methodology.

Instead of using the xed order of feature

propagation as done in existing GNNs and other acceleration

methods, we propose a novel adaptive propagation order ap-

proach that generates the personalized propagation order for

each node based on its topological information. This could suc-

cessfully avoid the redundant computation of feature propaga-

tion and mitigate the over-smoothing problem. Moreover, the

trade-o between accuracy and inference latency can be exibly

controlled by simple hyper-parameters. To compensate for the

potential inference accuracy loss, we further propose Inception

Distillation to exploit the multi-scale reception information to

improve the inference performance.

•SOTA Performance.

Extensive experiments are conducted on

four public datasets with dierent scales and characteristics, and

the experimental results show that our proposed ecient infer-

ence framework NAI outperforms the SOTA graph inference

acceleration baselines in terms of both accuracy and eciency.

In particular, the advantage of our NAI is more signicant on

larger-scale datasets, and NAI achieves 75

inference speedup

on the largest Ogbn-products dataset.

2 PRELIMINARY

2.1 Problem Formulation

Given a graph

= (

) with

|V | =𝑛

nodes and

|E| =𝑚

edges, its

node adjacency matrix and degree matrix are denoted as

A∈R𝑛×𝑛

and

D=diag(𝑑1, 𝑑2, ..., 𝑑𝑛

), where

𝑑𝑖=Í𝑣𝑗∈V A𝑖,𝑗

is the degree of

node

𝑣𝑖

. The adjacency matrix and degree matrix with self-loops are

denoted as

and

. The node feature matrix is

X={𝒙1,𝒙2, ..., 𝒙𝑛}

in which

𝒙𝑖∈R𝑓

represents the node attribute vector of

𝑣𝑖

, and

Y={𝒚1,𝒚2, ...,𝒚𝑙}

is the one-hot label matrix for classication task.

In the inductive setting, the entire node set

is partitioned into

training set

𝑡𝑟𝑎𝑖𝑛

(including labeled set

𝑙

and unlabeled set

𝑢

)

and test set

𝑡𝑒𝑠𝑡

. GNNs are trained on

G𝑡𝑟𝑎𝑖𝑛

which only includes

𝑡𝑟𝑎𝑖𝑛

and all edges connected to

𝑣∈ V

𝑡𝑟𝑎𝑖𝑛

. The evaluation is to

test the performance of trained GNNs on V

𝑡𝑒𝑠𝑡 .

2.2 Scalable Graph Neural Networks

GNNs

aim to learn node representation by using topological infor-

mation and node attributes. The existing GNNs adopt the message-

passing pipeline and construct models utilizing two processes: fea-

ture propagation and transformation. By stacking multiple layers,

the (𝑘+1)-th layer feature matrix X(𝑘+1)can be formulated as:

X(𝑘+1)=𝛿ˆ

AX(𝑘)W(𝑘),

A=e

D𝑟−1e

D−𝑟,

(1)

where

W(𝑘)

is the model weights,

𝛿(·)

is the activation function

and

is the diagonal node degree matrix used to normalize

. In

each layer,

propagates the information among neighbors, and

W(𝑘)

transforms the propagated features. Note that,

𝑟∈ [

]

the convolution coecient and could generalize Eq. (1) to various

existing models. By setting

𝑟=

1, 0.5 and 0, the convolution matrix

represents the transition probability matrix

D−1

[

], the

symmetric normalization adjacency matrix

D−1

[

] and

the reverse transition probability matrix e

D−1e

A[36], respectively.

Linear Propagation-based Scalable GNNs

. Although GNNs

achieve excellent performance by executing multiple feature propa-

gation and transformation processes, it was found that the aggrega-

tion of neighbor features (i.e., feature propagation) makes a major

contribution to the performance of GNNs and plays a more impor-

tant role [

]. Based on this nding, to improve the scalability of

GNNs, SGC [

] was proposed to decompose the two processes and

remove feature transformations in the middle layers. It propagates

the node features for

𝑘

times and then feeds

𝑘

-th order propagated

feature

X(𝑘)=ˆ

A𝑘X

to a linear model for classication. Beneting

from the linear propagation, SGC facilitates the precomputation of

the feature matrix and successfully reduces the training time.

Following SGC, more powerful scalable GNNs are designed by

adopting linear propagation. For example, SIGN [

] proposes to

transform propagated features in dierent orders by linear transfor-

mations, then concatenates them together to enhance the feature

representation. The transformation objective can be represented

as :

X(0)W0|| X(1)W1|| ... || X(𝑘)W𝑘

, where

|| ·||

denotes concate-

nation operations and

W𝑘

are transformation matrixes.

S2GC

[

]

averages propagated features in dierent orders to construct a

simple spectral graph convolution:

𝑘Í𝑘

𝑙=0X(𝑙)

. GAMLP[

] com-

bines propagated features in dierent orders by measuring the

feature information gain and constructing the node-wise attention

mechanism:

Í𝑘

𝑙=0𝑇(𝑙)X(𝑙)

, where

𝑇(𝑙)

are diagonal node-wise at-

tention matrices. The non-parametric feature propagation used in

these methods can successfully speed up the training procedure

and transductive graph inference by preprocessing the propagated

features. However, they cannot accelerate the graph inductive in-

ference on unseen nodes, as the feature preprocessing requires the

graph is known and xed.

3 METHOD

3.1 Architecture Overview

Figure 1 shows the overview of NAI for linear propagation-based

GNNs. Without loss of generality, we deploy NAI on SGC as an

example. For the training procedure, NAI employs Inception Dis-

tillation to compensate for the potential inference accuracy loss,

which includes two steps: oine distillation and online distillation.

Specically, given the raw feature matrix

, we rst compute the

propagated features in dierent orders

X(𝑙)

, where 1

≤𝑙≤𝑘

. Then,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfficientGraphNeuralNetworkInferenceatLargeScaleXinyiGaoTheUniversityofQueenslandxinyi.gao@uq.edu.auWentaoZhangMila-QuébecAIInstituteHECMontréalwentao.zhang@mila.quebecYingxiaShaoBeijingUniversityofPostsandTelecommunicationsshaoyx@bupt.edu.cnQuocVietHungNguyenGriffithUniversityhenry.nguyen@griffith....

展开>> 收起<<

Efficient Graph Neural Network Inference at Large Scale.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efficient Graph Neural Network Inference at Large Scale

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: