ASGNN G RAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE Zepeng Zhang1 Songtao Lu2 Zengfeng Huang3 Ziping Zhao1

2025-05-06 0 0 549.13KB 20 页 10玖币

侵权投诉

ASGNN: GRAPH NEURAL NETWORKS WITH

ADAPTIVE STRUCTURE

Zepeng Zhang1, Songtao Lu2, Zengfeng Huang3, Ziping Zhao1

1ShanghaiTech University, 2IBM Research, 3Fudan University

zhangzp1@shanghaitech.edu.cn,songtao@ibm.com,huangzf@fudan.edu.cn,

zhaoziping@shanghaitech.edu.cn

ABSTRACT

The graph neural network (GNN) models have presented impressive achievements

in numerous machine learning tasks. However, many existing GNN models are

shown to be vulnerable to adversarial attacks, which creates a stringent need to build

robust GNN architectures. In this work, we propose a novel interpretable message

passing scheme with adaptive structure (ASMP) to defend against adversarial

attacks on graph structure. Layers in ASMP are derived based on optimization

steps that minimize an objective function that learns the node feature and the graph

structure simultaneously. ASMP is adaptive in the sense that the message passing

process in different layers is able to be carried out over dynamically adjusted graphs.

Such property allows more ﬁne-grained handling of the noisy (or perturbed) graph

structure and hence improves the robustness. Convergence properties of the ASMP

scheme are theoretically established. Integrating ASMP with neural networks can

lead to a new family of GNN models with adaptive structure (ASGNN). Extensive

experiments on semi-supervised node classiﬁcation tasks demonstrate that the

proposed ASGNN outperforms the state-of-the-art GNN architectures in terms of

classiﬁcation performance under various adversarial attacks.

1 INTRODUCTION

Graphs, or networks, are ubiquitous data structures in many ﬁelds of science and engineering

(Newman, 2018), like molecular biology, computer vision, social science, ﬁnancial technology, etc.

In the past few years, due to its appealing capability of learning representations through message

passing over the graph structure, graph neural network (GNN) models have become popular choices

for processing graph-structured data and have achieved astonishing success in various applications

(Kipf and Welling, 2017; Bronstein et al., 2017; Wu et al., 2020; Zhou et al., 2020; Wu et al.,

2022). However, existing GNN backbones such as the graph convolutional network (GCN) (Kipf and

Welling, 2017) and the graph attention network (Veliˇ

ckovi´

c et al., 2018) are shown to be extremely

vulnerable to carefully designed adversarial attacks on the graph structure (Sun et al., 2018; Jin et al.,

2021; Günnemann, 2022). With unnoticeable malicious manipulations of the graph, the performance

of GNNs signiﬁcantly drops and may even be worse than the performance of a simple baseline that

ignores all the relational information among data feature (Dai et al., 2018; Zügner et al., 2018; Zügner

and Günnemann, 2019; Zhang and Zitnik, 2020). With the increasing deployments of GNN models

in various real-world applications, it is of vital importance to ensure their reliability and robustness,

especially in scenarios, such as medical diagnosis and credit scoring, where a deﬂected model can

lead to dramatic consequences (Günnemann, 2022).

To improve the robustness of GNNs with a potentially noisy graph structure input, a natural idea is

to “purify” the given graph structure. Existing work in this line can be roughly classiﬁed into two

categories. The ﬁrst category of robustifying GNNs can be viewed as a two-stage approach. A puriﬁed

graph is ﬁrstly obtained by “pre-processing” the input graph structure leveraging on information from

the node feature. Next, a GNN model is trained based on this puriﬁed graph. For example, in the

GNN-Jaccard method (Wu et al., 2019b), a new graph is obtained by removing the edges with small

“Jaccard similarity.” In Entezari et al. (2020), observing that adversarial attacks can scale up the rank

of the graph adjacency matrix, the authors propose to use a low-rank approximation version of the

given graph adjacency matrix as a substitute. In the second category, the graph adjacency matrix in

arXiv:2210.01002v1 [cs.LG] 3 Oct 2022

a GNN model is treated as an unknown, a puriﬁed graph structure with a parameterized form will

be “learned” through optimizing the supervised GNN training loss (Zhu et al., 2022). For example,

in Franceschi et al. (2019), the graph adjacency matrix is directly learned with a GNN in a bilevel

optimization way, where a full parametrization of the graph adjacency matrix is adopted. Moreover,

under this full parametrization setting, structural regularizers are adopted in Jin et al. (2020); Luo et al.

(2021) as augmentations on the training loss function to promote certain properties of the puriﬁed

graph. Besides the full parametrization approach, a multi-head weighted cosine similarity metric

function (Chen et al., 2020) and a GNN model (Yu et al., 2020) have also been used to parameterize

the graph adjacency matrix for structure learning.

Going beyond purifying the graph structures to robustify the GNN models, there are also efforts on

designing robust GNN architectures via directly designing the feature aggregation schemes. Under

the observation that aggregation functions such as sum, weighted mean, or the max operations can be

arbitrarily distorted by only a single outlier node, Geisler et al. (2020); Wang et al. (2020); Zhang and

Lu (2020) try to design robust GNN models via designing robust aggregation functions. Moreover,

some works apply the attention mechanism (Veliˇ

ckovi´

c et al., 2018) to mitigate the inﬂuence of

adversarial perturbations. For example, Zhu et al. (2019) consider the node feature following a

Gaussian distribution and use the variance information to determine the attention scores. Tang et al.

(2020) use clean graph information and their adversarial counterparts to train an attention mechanism

to learn to assign small attention scores to the perturbed edges. In Zhang and Zitnik (2020), the

authors deﬁne an attention mechanism based on the similarity of neighboring nodes.

Different from existing approaches to robustify GNNs, in this work, we propose a novel robust and

interpretable message passing scheme with adaptive structure (ASMP). Based on ASMP, a family

of GNN models with adaptive structure (ASGNN) can be designed. Prior works have revealed

that the message passing processes in a class of GNNs are actually (unrolled) gradient steps for

solving a graph signal denoising (GSD) problem (Zhu et al., 2021; Ma et al., 2021; Zhang and Zhao,

2022). ASMP is actually generated by an alternating (proximal) gradient descent algorithm for

simultaneously denoising the graph signal and the graph structure. Designed in such a principled

way, ASMP is not only friendly to back-propagation training but also achieves the desired structure

adaptivity with a theoretical convergence guarantee. Once trained, ASMP can be naturally interpreted

as a parameter-optimized iterative algorithm. This work falls into the category of GNN architecture

designs. Conceptually different from the existing robustiﬁed GNNs with ﬁxed graph structure,

ASGNN interweaves the graph puriﬁcation process and the message passing process, which makes

it possible to conduct message passing over different graph structures at different layers, i.e., in an

adaptive graph structure fashion. Thus, an edge might be excluded in some layers but included

in other layers, depending on the dynamic structure learning process. Such property allows more

ﬁne-grained handling of perturbations than existing graph puriﬁcation methods that use a single graph

in the entire GNN. To be more speciﬁc, the major contributions of this work are highlighted in the

following.

•

We propose a novel message passing scheme over graphs called ASMP with convergence

guarantee and speciﬁcations. To the best of our knowledge, ASMP is the ﬁrst message

passing scheme with adaptive structure that is designed based on an optimization problem.

•

Based on ASMP, a family of GNN models with adaptive structure, named ASGNN, are

further introduced. The adaptive structure in ASGNN allows more ﬁne-grained handling of

noisy graph structures and strengthens the model robustness against adversarial attacks.

•

Extensive experiments under various adversarial attack scenarios showcase the superiority of

the proposed ASGNN. The numerical results corroborate that the adaptive structure property

inherited in ASGNN can help mitigate the impact of perturbed graph structure.

2 PRELIMINARIES AND BACKGROUND

An unweighted graph with self-loops is denoted as

G= (V,E)

, where

and

denote the node set and

the edge set, respectively. The graph adjacency matrix is given by

A∈RN×N

. We denote by

and

the all-one column vector and the identity matrix, respectively. Given

D= Diag (A1)∈RN×N

as the diagonal degree matrix, the Laplacian matrix is deﬁned as

L=D−A

. We denote by

Arw =

D−1A

the random walk (or row-wise) normalized adjacency matrix and by

Asym =D−1

2AD−1

the

symmetric normalized adjacency matrix. Subsequently, the random walk normalized and symmetric

normalized Laplacian matrices are deﬁned as

Lrw =I−D−1A

and

Lsym =I−D−1

2AD−1

respectively.

X∈RN×M

(

is assumed to be the dimension of the node feature) is a node feature

matrix or a graph signal, and its

-th row

Xi,:

represents the feature vector at the

-th node with

i= 1, . . . , N

Xij

(or

[X]ij

) denotes the

(i, j)

-th element of

with

i, j = 1, . . . , N

. For vector

Xi,:,X−1

i,:represents its element-wise inverse.

2.1 GNNS AS GRAPH SIGNAL DENOISING

In the literature (Yang et al., 2021; Pan et al., 2021; Zhu et al., 2021), it has been realized that the

message passing layers for feature learning in many GNN models could be uniformly interpreted

as gradient steps for minimizing certain energy functions, which carries a meaning of GSD (Ma

et al., 2021). Recently, Zhang and Zhao (2022) further showed that some popular GNNs are neural

networks induced from unrolling (proximal) gradient descent algorithms for solving speciﬁc GSD

problems. Taking the approximate personalized propagation of neural predictions (APPNP) model

(Klicpera et al., 2019) as an example, the initial node feature matrix

is ﬁrst pre-propcessed by a

multilayer perceptron

gθ(·)

with model parameter

producing an output

X=gθ(Z)

, and then

fed into a K-layer message passing scheme given as follows:

H(0) =X,H(k+1) = (1 −α)AsymH(k)+αX,for k= 0, . . . , K −1,(1)

where

H(0)

denotes the input feature of the message passing process,

H(k)

represents the learned

feature after the

-th layer, and

is the teleport probability. Therefore, the message passing of an

APPNP model is fully speciﬁed by two parameters, namely, a graph structure matrix

Asym

and a

parameter α, in which Asym assumes to be known beforehand and αis treated as a hyperparameter.

From an optimization perspective, the message passing process in Eq.

(1)

can be seen as executing

steps of gradient descent to solve a GSD problem with initialization

H(0) =X

and step size

0.5

(Zhu et al., 2021; Ma et al., 2021; Zhang and Zhao, 2022), which is given by

minimize

H∈RN×MαkH−Xk2

F+ (1 −α) TrH>LsymH,(2)

where

and

are given and share the same meaning as in Eq.

(1)

. In Problem

(2)

, the ﬁrst term

is a ﬁdelity term forcing the recovered graph signal

to be as close as possible to a noisy graph

signal

, and the second term is the symmetric normalized Laplacian smoothing term measuring the

variation of the graph signal H, which can be explicitly expressed as

Tr H>LsymH=1

i=1

j=1

Aij 



Hi,:

√Dii −Hj,:

pDjj 



.(3)

For more technical discussions on relationships between GNNs with iterative optimization algorithms

for solving GSD problems, please refer to Ma et al. (2021); Zhang and Zhao (2022). Apart from

using the lens of optimization to interpret existing GNN models, there are also literature (Liu et al.,

2021b; Chen et al., 2021; Fu et al., 2022) working on building new GNN architectures based on

designing novel optimization problems and the corresponding iterative algorithms (more discussions

are provided in Appendix A).

2.2 GRAPH LEARNING WITH STRUCTURAL REGULARIZERS

Structural regularizers are commonly adopted to promote certain desirable properties when learning

a graph (Kalofolias, 2016; Pu et al., 2021). In the following, we discuss several widely used graph

structural regularizers which will be incorporated into the design of ASMP. We denote the learnable

graph adjacency matrix as Ssatisfying S∈ S, where

S=S∈RN×N|0≤Sij ≤1,for i, j = 1, . . . , N 

deﬁnes the class of adjacency matrices. Under the assumption that node feature changes smoothly be-

tween adjacent nodes (Ortega et al., 2018), the Laplacian smoothing regularization term is commonly

considered in graph structure learning. Eq.

(3)

is the symmetric normalized Laplacian smoothing

term, and a random walk normalized alternative can be similarly deﬁned by replacing

Lsym

in Eq.

(3) by Lrw.

Real-world graphs are normally sparsely connected, which can be represented by sparse adjacency

matrices. Moreover, it is also observed that singular values of these adjacency matrices are commonly

small (Zhou et al., 2013; Kumar et al., 2020). However, a noisy adjacency matrix (e.g., one perturbed

by adversarial attacks) tends to be dense and to gain singular values in larger magnitudes (Jin et al.,

2020). In view of this, graph structural regularizers for promoting sparsity and/or suppressing the

singular values are widely adopted in the literature of graph learning (Kalofolias, 2016; Egilmez

et al., 2017; Dong et al., 2019). Speciﬁcally, the

-norm of the adjacency matrix is often used to

promote sparsity, deﬁned as

kSk1=PN

i,j=1 |Sij |.

For penalizing the singular values, the

-norm

and the

-norm on the singular value vector of the adjacency matrix

can help. Equivalently,

they can be translated to be the nuclear norm and the Frobenius norm on

, which are given by

kSk∗=PN

i=1 σi(S)

and

kSkF=qPN

i=1 σ2

i(S)

, respectively, where

σ1(S)≥ ··· ≥ σN(S)

denote the ordered singular values of

. These two regularizers both restrict the scale of the singular

values while the nuclear norm also promotes low-rankness. A recent study (Deng et al., 2022) points

out that graph learning methods with low-rank promoting regularizers may lose a wide range of

spectrum of the clean graph corresponding to important structure in the spatial domain. Thus, the

nuclear norm regularizer may impair the quality of the reconstructed graph and therefore limit the

performance of GNNs. Besides, the nuclear norm is not amicable for back-propagation and incurs

high computational complexity (Luo et al., 2021). Arguably, the Frobenius norm of

is a more

suitable regularizer for graph structure learning in comparison with the nuclear norm.

3 THE PROPOSED GRAPH NEURAL NETWORKS

In this section, we ﬁrst motivate the design principle based on jointly node feature learning and graph

structure learning. Then, we develop an efﬁcient optimization algorithm for solving this optimization

problem, which eventually leads to a novel message passing scheme with adaptive structure (ASMP).

After that, we provide interpretations, convergence guarantees, and speciﬁcations of ASMP. Finally,

integrating ASMP with deep neural networks ends up with a new family of GNNs with adaptive

structure, named ASGNNs.

3.1 A NOVEL DESIGN PRINCIPLE WITH ADAPTIVE GRAPH STRUCTURE

As discussed in Section 2.1, the message passing procedure in many popular GNNs can be viewed

as performing graph signal denoising (or node feature learning) (Zhu et al., 2021; Ma et al., 2021;

Pan et al., 2021; Zhang and Zhao, 2022) over a preﬁxed graph. Unfortunately, if some edges in

the graph are task-irrelevant or even maliciously manipulated, the node feature learned may not be

appropriate for the downstream tasks. Motivated by this, we propose a new design principle for

message passing, that is, to learn the node feature and the graph structure simultaneously. It enables

learning an adaptive graph structure from the feature for the message passing procedure. Hence, such

a message passing scheme can potentially improve robustness against noisy input graph structure.

Speciﬁcally, we construct an optimization objective by augmenting the GSD objective in Eq.

(2)

(we

have used a random walk normalized graph Laplacian smoothing term) with a structural ﬁdelity term

kS−Ak2

, where

is the given initial graph adjacency matrix, and the structural regularizers

kSk1

and kSk2

F. Then we obtain the following optimization problem:

minimize

H∈RN×M,S∈S p(H,S) =

feature learning

z }| {

kH−Xk2

F+λTr H>LrwH+γkS−Ak2

F+µ1kSk1+µ2kSk2

| {z }

structure learning

(4)

where

is the feature variable,

is the structure variable, and

µ1

, and

µ2

are parameters

balancing different terms. To enable the interplay between feature learning and structure learning,

the Laplacian smoothing term is concerned with

rather than

, i.e.,

Lrw =I−D−1S

with

D= Diag (S1)

. When adversarial attacks exist, a perturbed adjacency matrix

will be generated.

Since attacks are generally designed to be unnoticeable (Jin et al., 2021), the perturbed graph

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ASGNN:GRAPHNEURALNETWORKSWITHADAPTIVESTRUCTUREZepengZhang1,SongtaoLu2,ZengfengHuang3,ZipingZhao11ShanghaiTechUniversity,2IBMResearch,3FudanUniversityzhangzp1@shanghaitech.edu.cn,songtao@ibm.com,huangzf@fudan.edu.cn,zhaoziping@shanghaitech.edu.cnABSTRACTThegraphneuralnetwork(GNN)modelshavepresentedim...

展开>> 收起<<

ASGNN G RAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE Zepeng Zhang1 Songtao Lu2 Zengfeng Huang3 Ziping Zhao1.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ASGNN G RAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE Zepeng Zhang1 Songtao Lu2 Zengfeng Huang3 Ziping Zhao1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: