Privacy-Preserved Neural Graph Similarity Learning Yupeng Hou13 Wayne Xin Zhao134B Yaliang Li2 and Ji-Rong Wen13 1Gaoling School of Artificial Intelligence Renmin University of China

2025-05-02 0 0 638.22KB 10 页 10玖币
侵权投诉
Privacy-Preserved Neural Graph Similarity Learning
Yupeng Hou1,3, Wayne Xin Zhao1,3,4 B, Yaliang Li2, and Ji-Rong Wen1,3
1Gaoling School of Artificial Intelligence, Renmin University of China
2Alibaba Group
3Beijing Key Laboratory of Big Data Management and Analysis Methods
4Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education
{houyupeng,jrwen}@ruc.edu.cn, batmanfly@gmail.com, yaliang.li@alibaba-inc.com
Abstract—To develop effective and efficient graph similarity
learning (GSL) models, a series of data-driven neural algorithms
have been proposed in recent years. Although GSL models
are frequently deployed in privacy-sensitive scenarios, the user
privacy protection of neural GSL models has not drawn much
attention. To comprehensively understand the privacy protection
issues, we first introduce the concept of attackable representation
to systematically characterize the privacy attacks that each
model can face. Inspired by the qualitative results, we propose
a novel Privacy-Preserving neural Graph Matching network
model, named PPGM, for graph similarity learning. To prevent
reconstruction attacks, the proposed model does not commu-
nicate node-level representations between devices. Instead, we
learn multi-perspective graph representations based on learnable
context vectors. To alleviate the attacks to graph properties, the
obfuscated features that contain information from both graphs
are communicated. In this way, the private properties of each
graph can be difficult to infer. Based on the node-graph matching
techniques while calculating the obfuscated features, PPGM
can also be effective in similarity measuring. To quantitatively
evaluate the privacy-preserving ability of neural GSL models, we
further propose an evaluation protocol via training supervised
black-box attack models. Extensive experiments on widely-used
benchmarks show the effectiveness and strong privacy-protection
ability of the proposed model PPGM. The code is available at:
https://github.com/RUCAIBox/PPGM.
Index Terms—graph similarity learning, privacy-preserving,
graph neural networks
I. INTRODUCTION
Graph similarity learning (GSL) is one of the most fun-
damental tasks in the literature of graph machine learning,
intending to quantify the similarity of two given graphs [1].
Various methods have been proposed to improve the per-
formance of graph similarity learning methods, from early
algorithms based on graph edit distance (GED) [2] or maxi-
mum common subgraph (MCS) [3] metrics to the later graph
kernels [4]–[6]. However, these methods typically require
exponential time complexity, largely limiting the application
on real-world tasks. To further improve the performance and
efficiency of GSL models, a series of data-driven approximate
approaches based on graph neural networks (GNNs) have been
proposed recenrly [7]–[9]. These methods greatly broaden the
application scope of GSL to more realistic-sized graph data.
A basic setting of graph similarity learning assumed in
most methods is that the entire graph data is available to use
BCorresponding author.
93
95
97
65 70 75 80 85
Eective
Privacy-Preserved
(ours)PPGM
SimGNN
GMN GraphSim
SGNN
MGMN
H2MN
Graph-Graph Classif. AUC (%)
Graph Property Inference Attack AUC (%)
Fig. 1. Performance comparison of different neural graph similarity learning
models on privacy protection and graph-graph classification tasks on FFmpeg
[50,200] dataset.
from user devices, i.e., nodes, edges and the corresponding
features, while the assumption is usually not realistic in real
scenarios. We observe that GSL models are frequently used in
privacy-sensitive scenarios, such as binary function similarity
search [8], healthcare data management [4], and user portrait
matching in recommender systems [10]. For example, when
using code-checking systems based on GSL algorithms, users
may upload their compiled programs (as binary function
graphs [8]) from user devices. These uploaded graph data with
private information should be carefully protected, as they are
usually at high risk of privacy attacks. External attackers may
intercept the uploaded graphs from the communication (e.g.,
uploading) or disguise themselves as a fake data center to
steal these graph data with user privacy. As a result, it is
necessary to care about the privacy issue while developing
graph similarity learning methods.
However, it is challenging to conduct both privacy-
preserved and effective graph similarity learning models. Al-
though there have been several privacy protection techniques
proposed for graph neural networks [11], [12] or general
multi-party computation approaches [13], these techniques
usually can not be directly applied to the graph similarity
learning tasks. For example, graph publishing [14], [15] and
local differential privacy [11], [16] are widely studied, which
directly modify the raw graph data or learned representations
and introduce randomness. However, the modifications make
it even harder to measure precise graph similarity scores. The
arXiv:2210.11730v1 [cs.LG] 21 Oct 2022
introduced noises are almost certain to hurt the similarity mea-
suring performance. Besides, the concept of secure multiparty
computation [17], [18] and homomorphic encryption [19]
may potentially benefit the GSL tasks, as the input graphs
can be naturally seen as multiple parties that involve in
the computation. However, existing solutions that fulfill the
conceptions [13] typically necessitate extreme computation
and only support a few simple operators. Due to these re-
strictions, it’s difficult to address complex non-linear neural
networks with the above-mentioned algorithms. As a result,
we should design special privacy-preserving graph similarity
learning models, taking a comprehenisve consideration of the
effectiveness, efficiency, and privacy-preservation trade-offs.
To tackle the above challenges, the basic idea of our
approach is to deploy neural networks on user devices in
a distributed computing environment [20], [21]. We would
like to keep most graph calculations on the user side. In
this way, once the device can be viewed as a trustworthy
environment, then only the representations for communication
between devices are at risk of privacy attacks. Although several
neural GSL models can be deployed in a distributed computing
environment, we argue that the privacy-preserving abilities
of these models still vary greatly. The major reason is that
these representations carry significantly different levels of user
privacy information. Attackers can still leverage the represen-
tations sent off the devices to reconstruct graph structure or
infer properties with user privacy. As a result, it is necessary
to analyze the privacy leakage level for each model. Then we
can accordingly design highly privacy-preserved and effective
GSL models, making the communicated representations hard
to attack.
To this end, we first introduce the concept of attackable
representations to qualitatively analyze the types of potential
privacy attacks when a neural GSL model is deployed in dis-
tributed computing environments. We then propose a Privacy-
Preserved neural Graph Matching network for graph similarity
learning, named PPGM. The key point is to learn obfuscated
graph representations that are used for communication. First,
as no node representations are involved in communication, the
proposed method can naturally prevent reconstruction attacks.
Second, each obfuscated feature is fused from representations
provided by both the input graphs. In this way, properties of
one single graph are difficult to infer from the obfuscated
features, alleviating the property inference attacks. In detail,
our approach takes a pair of graphs as input and learns prelim-
inary node representations inside each device. Based on shared
context code vectors, multi-perspective graph representations
are learned and communicated as messages. Then graph-node
matching is performed on each device to generate obfuscated
features. As the attackers will have an equal chance to intercept
a graph representation (i.e., message) or an obfuscated feature
from the communicated representations of PPGM, so that
we can achieve the goal of privacy protection. Meanwhile,
the learning of obfuscated features introduces comprehensive
node-graph representation interactions, which make the pro-
posed model also effective on graph similarity learning tasks.
Neural GSL
Models
Graph with
User Privacy
Trustworthy Environment Untrustworthy Environment
Data Centor
Other Devices
Attackable
Representations
(Node/Graph)
Attacker
Intercepted
Fig. 2. Illustration of the privacy attacks on attackable representations while
deploying neural GSL models in a distributed computing environment. In this
work, we hypothesize that only the user devices are trustworthy environments,
while any representations sent off the devices (i.e., into untrustworthy envi-
ronments) may be intercepted by attackers.
To evaluate the proposed model PPGM, we conduct exten-
sive experiments on widely-used benchmark datasets. In ad-
dition, we propose a quantitative way to evaluate the privacy-
preserving ability of neural GSL models. Shadow datasets
extracted from original benchmarks and well-trained GSL
models are leveraged to train supervised black-box attack
models. Experimental results demonstrate that the proposed
approach is privacy-preserving as well as effective. In sum-
mary, the main contributions are highlighted as follows:
To the best of our knowledge, we are the first to em-
phasize the privacy-preserving concerns for neural graph
similarity learning models. We introduce the concept
of attackable representations, which is a useful tool to
systematically analyze the privacy attacks that neural
GSL models may suffer from. (Section II-C)
We propose a privacy-preserved neural graph similarity
learning model PPGM. With obfuscated features commu-
nicated between user devices, our model can be effective
in similarity measuring while preventing reconstruction
attacks and alleviating graph property inference attacks.
(Section III)
We propose a protocol to quantitatively evaluate the
privacy-preserving ability of neural GSL models via
training supervised black-box attack models on shadow
datasets. (Section IV-A)
II. TASK
In this section, we first briefly introduce the problem formu-
lation and notation for graph similarity learning tasks. Then,
we introduce the concept of attackable representations, which
is useful to qualitatively analyze the potential privacy attacks
for neural GSL models. Finally, we summarize three kinds of
privacy attacks that neural GSL models may suffer from when
deployed in a distributed computing environment.
A. Graph Similarity Learning Tasks
Given a pair of graph hG1, G2iand the corresponding
label y, the task of graph similarity learning is to predict y
摘要:

Privacy-PreservedNeuralGraphSimilarityLearningYupengHou1,3,WayneXinZhao1,3,4B,YaliangLi2,andJi-RongWen1,31GaolingSchoolofArticialIntelligence,RenminUniversityofChina2AlibabaGroup3BeijingKeyLaboratoryofBigDataManagementandAnalysisMethods4EngineeringResearchCenterofNext-GenerationIntelligentSearchand...

展开>> 收起<<
Privacy-Preserved Neural Graph Similarity Learning Yupeng Hou13 Wayne Xin Zhao134B Yaliang Li2 and Ji-Rong Wen13 1Gaoling School of Artificial Intelligence Renmin University of China.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:638.22KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注