ASGNN G RAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE Zepeng Zhang1 Songtao Lu2 Zengfeng Huang3 Ziping Zhao1

2025-05-06 0 0 549.13KB 20 页 10玖币
侵权投诉
ASGNN: GRAPH NEURAL NETWORKS WITH
ADAPTIVE STRUCTURE
Zepeng Zhang1, Songtao Lu2, Zengfeng Huang3, Ziping Zhao1
1ShanghaiTech University, 2IBM Research, 3Fudan University
zhangzp1@shanghaitech.edu.cn,songtao@ibm.com,huangzf@fudan.edu.cn,
zhaoziping@shanghaitech.edu.cn
ABSTRACT
The graph neural network (GNN) models have presented impressive achievements
in numerous machine learning tasks. However, many existing GNN models are
shown to be vulnerable to adversarial attacks, which creates a stringent need to build
robust GNN architectures. In this work, we propose a novel interpretable message
passing scheme with adaptive structure (ASMP) to defend against adversarial
attacks on graph structure. Layers in ASMP are derived based on optimization
steps that minimize an objective function that learns the node feature and the graph
structure simultaneously. ASMP is adaptive in the sense that the message passing
process in different layers is able to be carried out over dynamically adjusted graphs.
Such property allows more fine-grained handling of the noisy (or perturbed) graph
structure and hence improves the robustness. Convergence properties of the ASMP
scheme are theoretically established. Integrating ASMP with neural networks can
lead to a new family of GNN models with adaptive structure (ASGNN). Extensive
experiments on semi-supervised node classification tasks demonstrate that the
proposed ASGNN outperforms the state-of-the-art GNN architectures in terms of
classification performance under various adversarial attacks.
1 INTRODUCTION
Graphs, or networks, are ubiquitous data structures in many fields of science and engineering
(Newman, 2018), like molecular biology, computer vision, social science, financial technology, etc.
In the past few years, due to its appealing capability of learning representations through message
passing over the graph structure, graph neural network (GNN) models have become popular choices
for processing graph-structured data and have achieved astonishing success in various applications
(Kipf and Welling, 2017; Bronstein et al., 2017; Wu et al., 2020; Zhou et al., 2020; Wu et al.,
2022). However, existing GNN backbones such as the graph convolutional network (GCN) (Kipf and
Welling, 2017) and the graph attention network (Veliˇ
ckovi´
c et al., 2018) are shown to be extremely
vulnerable to carefully designed adversarial attacks on the graph structure (Sun et al., 2018; Jin et al.,
2021; Günnemann, 2022). With unnoticeable malicious manipulations of the graph, the performance
of GNNs significantly drops and may even be worse than the performance of a simple baseline that
ignores all the relational information among data feature (Dai et al., 2018; Zügner et al., 2018; Zügner
and Günnemann, 2019; Zhang and Zitnik, 2020). With the increasing deployments of GNN models
in various real-world applications, it is of vital importance to ensure their reliability and robustness,
especially in scenarios, such as medical diagnosis and credit scoring, where a deflected model can
lead to dramatic consequences (Günnemann, 2022).
To improve the robustness of GNNs with a potentially noisy graph structure input, a natural idea is
to “purify” the given graph structure. Existing work in this line can be roughly classified into two
categories. The first category of robustifying GNNs can be viewed as a two-stage approach. A purified
graph is firstly obtained by “pre-processing” the input graph structure leveraging on information from
the node feature. Next, a GNN model is trained based on this purified graph. For example, in the
GNN-Jaccard method (Wu et al., 2019b), a new graph is obtained by removing the edges with small
“Jaccard similarity. In Entezari et al. (2020), observing that adversarial attacks can scale up the rank
of the graph adjacency matrix, the authors propose to use a low-rank approximation version of the
given graph adjacency matrix as a substitute. In the second category, the graph adjacency matrix in
1
arXiv:2210.01002v1 [cs.LG] 3 Oct 2022
a GNN model is treated as an unknown, a purified graph structure with a parameterized form will
be “learned” through optimizing the supervised GNN training loss (Zhu et al., 2022). For example,
in Franceschi et al. (2019), the graph adjacency matrix is directly learned with a GNN in a bilevel
optimization way, where a full parametrization of the graph adjacency matrix is adopted. Moreover,
under this full parametrization setting, structural regularizers are adopted in Jin et al. (2020); Luo et al.
(2021) as augmentations on the training loss function to promote certain properties of the purified
graph. Besides the full parametrization approach, a multi-head weighted cosine similarity metric
function (Chen et al., 2020) and a GNN model (Yu et al., 2020) have also been used to parameterize
the graph adjacency matrix for structure learning.
Going beyond purifying the graph structures to robustify the GNN models, there are also efforts on
designing robust GNN architectures via directly designing the feature aggregation schemes. Under
the observation that aggregation functions such as sum, weighted mean, or the max operations can be
arbitrarily distorted by only a single outlier node, Geisler et al. (2020); Wang et al. (2020); Zhang and
Lu (2020) try to design robust GNN models via designing robust aggregation functions. Moreover,
some works apply the attention mechanism (Veliˇ
ckovi´
c et al., 2018) to mitigate the influence of
adversarial perturbations. For example, Zhu et al. (2019) consider the node feature following a
Gaussian distribution and use the variance information to determine the attention scores. Tang et al.
(2020) use clean graph information and their adversarial counterparts to train an attention mechanism
to learn to assign small attention scores to the perturbed edges. In Zhang and Zitnik (2020), the
authors define an attention mechanism based on the similarity of neighboring nodes.
Different from existing approaches to robustify GNNs, in this work, we propose a novel robust and
interpretable message passing scheme with adaptive structure (ASMP). Based on ASMP, a family
of GNN models with adaptive structure (ASGNN) can be designed. Prior works have revealed
that the message passing processes in a class of GNNs are actually (unrolled) gradient steps for
solving a graph signal denoising (GSD) problem (Zhu et al., 2021; Ma et al., 2021; Zhang and Zhao,
2022). ASMP is actually generated by an alternating (proximal) gradient descent algorithm for
simultaneously denoising the graph signal and the graph structure. Designed in such a principled
way, ASMP is not only friendly to back-propagation training but also achieves the desired structure
adaptivity with a theoretical convergence guarantee. Once trained, ASMP can be naturally interpreted
as a parameter-optimized iterative algorithm. This work falls into the category of GNN architecture
designs. Conceptually different from the existing robustified GNNs with fixed graph structure,
ASGNN interweaves the graph purification process and the message passing process, which makes
it possible to conduct message passing over different graph structures at different layers, i.e., in an
adaptive graph structure fashion. Thus, an edge might be excluded in some layers but included
in other layers, depending on the dynamic structure learning process. Such property allows more
fine-grained handling of perturbations than existing graph purification methods that use a single graph
in the entire GNN. To be more specific, the major contributions of this work are highlighted in the
following.
We propose a novel message passing scheme over graphs called ASMP with convergence
guarantee and specifications. To the best of our knowledge, ASMP is the first message
passing scheme with adaptive structure that is designed based on an optimization problem.
Based on ASMP, a family of GNN models with adaptive structure, named ASGNN, are
further introduced. The adaptive structure in ASGNN allows more fine-grained handling of
noisy graph structures and strengthens the model robustness against adversarial attacks.
Extensive experiments under various adversarial attack scenarios showcase the superiority of
the proposed ASGNN. The numerical results corroborate that the adaptive structure property
inherited in ASGNN can help mitigate the impact of perturbed graph structure.
2 PRELIMINARIES AND BACKGROUND
An unweighted graph with self-loops is denoted as
G= (V,E)
, where
V
and
E
denote the node set and
the edge set, respectively. The graph adjacency matrix is given by
ARN×N
. We denote by
1
and
I
the all-one column vector and the identity matrix, respectively. Given
D= Diag (A1)RN×N
as the diagonal degree matrix, the Laplacian matrix is defined as
L=DA
. We denote by
Arw =
D1A
the random walk (or row-wise) normalized adjacency matrix and by
Asym =D1
2AD1
2
the
2
symmetric normalized adjacency matrix. Subsequently, the random walk normalized and symmetric
normalized Laplacian matrices are defined as
Lrw =ID1A
and
Lsym =ID1
2AD1
2
,
respectively.
XRN×M
(
M
is assumed to be the dimension of the node feature) is a node feature
matrix or a graph signal, and its
i
-th row
Xi,:
represents the feature vector at the
i
-th node with
i= 1, . . . , N
.
Xij
(or
[X]ij
) denotes the
(i, j)
-th element of
X
with
i, j = 1, . . . , N
. For vector
Xi,:,X1
i,:represents its element-wise inverse.
2.1 GNNS AS GRAPH SIGNAL DENOISING
In the literature (Yang et al., 2021; Pan et al., 2021; Zhu et al., 2021), it has been realized that the
message passing layers for feature learning in many GNN models could be uniformly interpreted
as gradient steps for minimizing certain energy functions, which carries a meaning of GSD (Ma
et al., 2021). Recently, Zhang and Zhao (2022) further showed that some popular GNNs are neural
networks induced from unrolling (proximal) gradient descent algorithms for solving specific GSD
problems. Taking the approximate personalized propagation of neural predictions (APPNP) model
(Klicpera et al., 2019) as an example, the initial node feature matrix
Z
is first pre-propcessed by a
multilayer perceptron
gθ(·)
with model parameter
θ
producing an output
X=gθ(Z)
, and then
X
is
fed into a K-layer message passing scheme given as follows:
H(0) =X,H(k+1) = (1 α)AsymH(k)+αX,for k= 0, . . . , K 1,(1)
where
H(0)
denotes the input feature of the message passing process,
H(k)
represents the learned
feature after the
k
-th layer, and
α
is the teleport probability. Therefore, the message passing of an
APPNP model is fully specified by two parameters, namely, a graph structure matrix
Asym
and a
parameter α, in which Asym assumes to be known beforehand and αis treated as a hyperparameter.
From an optimization perspective, the message passing process in Eq.
(1)
can be seen as executing
K
steps of gradient descent to solve a GSD problem with initialization
H(0) =X
and step size
0.5
(Zhu et al., 2021; Ma et al., 2021; Zhang and Zhao, 2022), which is given by
minimize
HRN×MαkHXk2
F+ (1 α) TrH>LsymH,(2)
where
X
and
α
are given and share the same meaning as in Eq.
(1)
. In Problem
(2)
, the first term
is a fidelity term forcing the recovered graph signal
H
to be as close as possible to a noisy graph
signal
X
, and the second term is the symmetric normalized Laplacian smoothing term measuring the
variation of the graph signal H, which can be explicitly expressed as
Tr H>LsymH=1
2
N
X
i=1
N
X
j=1
Aij
Hi,:
Dii Hj,:
pDjj
2
2
.(3)
For more technical discussions on relationships between GNNs with iterative optimization algorithms
for solving GSD problems, please refer to Ma et al. (2021); Zhang and Zhao (2022). Apart from
using the lens of optimization to interpret existing GNN models, there are also literature (Liu et al.,
2021b; Chen et al., 2021; Fu et al., 2022) working on building new GNN architectures based on
designing novel optimization problems and the corresponding iterative algorithms (more discussions
are provided in Appendix A).
2.2 GRAPH LEARNING WITH STRUCTURAL REGULARIZERS
Structural regularizers are commonly adopted to promote certain desirable properties when learning
a graph (Kalofolias, 2016; Pu et al., 2021). In the following, we discuss several widely used graph
structural regularizers which will be incorporated into the design of ASMP. We denote the learnable
graph adjacency matrix as Ssatisfying S∈ S, where
S=SRN×N|0Sij 1,for i, j = 1, . . . , N
defines the class of adjacency matrices. Under the assumption that node feature changes smoothly be-
tween adjacent nodes (Ortega et al., 2018), the Laplacian smoothing regularization term is commonly
considered in graph structure learning. Eq.
(3)
is the symmetric normalized Laplacian smoothing
3
term, and a random walk normalized alternative can be similarly defined by replacing
Lsym
in Eq.
(3) by Lrw.
Real-world graphs are normally sparsely connected, which can be represented by sparse adjacency
matrices. Moreover, it is also observed that singular values of these adjacency matrices are commonly
small (Zhou et al., 2013; Kumar et al., 2020). However, a noisy adjacency matrix (e.g., one perturbed
by adversarial attacks) tends to be dense and to gain singular values in larger magnitudes (Jin et al.,
2020). In view of this, graph structural regularizers for promoting sparsity and/or suppressing the
singular values are widely adopted in the literature of graph learning (Kalofolias, 2016; Egilmez
et al., 2017; Dong et al., 2019). Specifically, the
`1
-norm of the adjacency matrix is often used to
promote sparsity, defined as
kSk1=PN
i,j=1 |Sij |.
For penalizing the singular values, the
`1
-norm
and the
`2
-norm on the singular value vector of the adjacency matrix
S
can help. Equivalently,
they can be translated to be the nuclear norm and the Frobenius norm on
S
, which are given by
kSk=PN
i=1 σi(S)
and
kSkF=qPN
i=1 σ2
i(S)
, respectively, where
σ1(S)≥ ··· ≥ σN(S)
denote the ordered singular values of
S
. These two regularizers both restrict the scale of the singular
values while the nuclear norm also promotes low-rankness. A recent study (Deng et al., 2022) points
out that graph learning methods with low-rank promoting regularizers may lose a wide range of
spectrum of the clean graph corresponding to important structure in the spatial domain. Thus, the
nuclear norm regularizer may impair the quality of the reconstructed graph and therefore limit the
performance of GNNs. Besides, the nuclear norm is not amicable for back-propagation and incurs
high computational complexity (Luo et al., 2021). Arguably, the Frobenius norm of
S
is a more
suitable regularizer for graph structure learning in comparison with the nuclear norm.
3 THE PROPOSED GRAPH NEURAL NETWORKS
In this section, we first motivate the design principle based on jointly node feature learning and graph
structure learning. Then, we develop an efficient optimization algorithm for solving this optimization
problem, which eventually leads to a novel message passing scheme with adaptive structure (ASMP).
After that, we provide interpretations, convergence guarantees, and specifications of ASMP. Finally,
integrating ASMP with deep neural networks ends up with a new family of GNNs with adaptive
structure, named ASGNNs.
3.1 A NOVEL DESIGN PRINCIPLE WITH ADAPTIVE GRAPH STRUCTURE
As discussed in Section 2.1, the message passing procedure in many popular GNNs can be viewed
as performing graph signal denoising (or node feature learning) (Zhu et al., 2021; Ma et al., 2021;
Pan et al., 2021; Zhang and Zhao, 2022) over a prefixed graph. Unfortunately, if some edges in
the graph are task-irrelevant or even maliciously manipulated, the node feature learned may not be
appropriate for the downstream tasks. Motivated by this, we propose a new design principle for
message passing, that is, to learn the node feature and the graph structure simultaneously. It enables
learning an adaptive graph structure from the feature for the message passing procedure. Hence, such
a message passing scheme can potentially improve robustness against noisy input graph structure.
Specifically, we construct an optimization objective by augmenting the GSD objective in Eq.
(2)
(we
have used a random walk normalized graph Laplacian smoothing term) with a structural fidelity term
kSAk2
F
, where
A
is the given initial graph adjacency matrix, and the structural regularizers
kSk1
and kSk2
F. Then we obtain the following optimization problem:
minimize
HRN×M,S∈S p(H,S) =
feature learning
z }| {
kHXk2
F+λTr H>LrwH+γkSAk2
F+µ1kSk1+µ2kSk2
F
| {z }
structure learning
,
(4)
where
H
is the feature variable,
S
is the structure variable, and
γ
,
λ
,
µ1
, and
µ2
are parameters
balancing different terms. To enable the interplay between feature learning and structure learning,
the Laplacian smoothing term is concerned with
S
rather than
A
, i.e.,
Lrw =ID1S
with
D= Diag (S1)
. When adversarial attacks exist, a perturbed adjacency matrix
A
will be generated.
Since attacks are generally designed to be unnoticeable (Jin et al., 2021), the perturbed graph
4
摘要:

ASGNN:GRAPHNEURALNETWORKSWITHADAPTIVESTRUCTUREZepengZhang1,SongtaoLu2,ZengfengHuang3,ZipingZhao11ShanghaiTechUniversity,2IBMResearch,3FudanUniversityzhangzp1@shanghaitech.edu.cn,songtao@ibm.com,huangzf@fudan.edu.cn,zhaoziping@shanghaitech.edu.cnABSTRACTThegraphneuralnetwork(GNN)modelshavepresentedim...

展开>> 收起<<
ASGNN G RAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE Zepeng Zhang1 Songtao Lu2 Zengfeng Huang3 Ziping Zhao1.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:549.13KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注