FEDERATED GRAPH -BASED NETWORKS WITH SHARED EMBEDDING Tianyi Yu

2025-04-27 0 0 4.06MB 15 页 10玖币
侵权投诉
FEDERATED GRAPH-BASED NETWORKS WITH SHARED
EMBEDDING
Tianyi Yu
School of Electrical and Information Engineering
Beijing University of Civil Engineering and Architecture
Beijing
201906020144@stu.bucea.edu.cn
Pei Lai
School of Information Science and Technology
Southwest Jiaotong University
Chengdu
peilai@my.swjtu.edu.cn
Fei Teng
School of Information Science and Technology
Southwest Jiaotong University
Chengdu
fteng@swjtu.edu.cn
ABSTRACT
Nowadays, user privacy is becoming an issue that cannot be bypassed for system developers, especially
for that of web applications where data can be easily transferred through internet. Thankfully,
federated learning proposes an innovative method to train models with distributed devices while data
are kept in local storage. However, unlike general neural networks, although graph-based networks
have achieved great success in classification tasks and advanced recommendation system, its high
performance relies on the rich context provided by a graph structure, which is vulnerable when data
attributes are incomplete. Therefore, the latter becomes a realistic problem when implementing
federated learning for graph-based networks. Knowing that data embedding is a representation in a
different space, we propose our Federated Graph-based Networks with Shared Embedding (Feras),
which uses shared embedding data to train the network and avoids the direct sharing of original data.
A solid theoretical proof of the convergence of Feras is given in this work. Experiments on different
datasets (PPI, Flickr, Reddit) are conducted to show the efficiency of Feras for centralized learning.
Finally, Feras enables the training of current graph-based models in the federated learning framework
for privacy concern.
Keywords Graph Neural Networks ·Federated Learning ·Privacy-Preserving Computation ·Shared Embedding
1 Introduction
Recently, the user privacy issue on web applications arouses lots of attention. With the development of network
technology and improvement of IoT, a large volume of personal data is collected and analysed in daily life and some of
them may be uploaded to online network illegally. In 2018, the European Union (EU) and the European Economic Area
(EEA) enacted the General Data Protection Regulation (GDPR) [
1
], whose main purpose is to protect data privacy and
to limit personal data transfer between processors.
The launch of GDPR is an epitome of public concern on privacy in a digital age. In fact, the discussion on privacy
protection techniques can be traced back to decades ago. [
2
] explains that the main problem is not due to the
lack of available security mechanisms but the privacy preserving on computational aspect. Several technologies
are brought forward today: trusted execution environment [
3
], secure multiparty computation [
4
,
5
,
6
], federated
learning [
7
,
8
]. Among these three mechanisms, federated learning, which employs multiple devices to build a model
collaboratively while keeping all the data in local storage, stands out because of its preeminence of low computational
arXiv:2210.01803v1 [cs.LG] 3 Oct 2022
Preprint
costs. What’s more, federated learning also borrows tools such as differential privacy [
9
], secure multiparty computation
[4], homomorphic encryption [10] to offer enhanced security.
Graph Neural Networks (GNN) and their variants are showing dominant performance on many ML tasks [
11
]. For
example, Graph Convolutional Network (GCN) has been widely applied in many different domains such as semi-
supervised classification [
12
], multi-view networks [
13
] and link prediction [
14
], etc. However, the efficiency of
graph-based networks relies on the rich information among the elements contained in a graph. Once faced with a
perturbed topological structure or incomplete node attribute, the network either can no longer provide satisfied learning
performance [
15
] or needs a complex node attribute completion system [
16
]. The nature of graph-based networks
results in the hard implementation of federated learning, especially when the transmission of original user data, such as
features and labels of nodes, is prohibitive between processors.
Based on the observation that the embedding representation differs from initial data space, we propose our
Fe
derated
G
r
aph-b
a
sed Networks with
S
hared Embedding (
Feras
) model in this paper. Instead of exchanging raw data information
at the beginning, we facilitate embedding communication in the middle of the network. On the one hand, computations
are performed on edge hosts synchronously to speed up training without leaking raw data, on the other hand, node
information is propagated through the entire graph via embedding, which is critical to retain the network performance.
The rest of the paper proceeds as follows: we begin by discussing the latest related research of distributed learning
and training methods, then go introducing four main rules in application scenarios and proposing algorithm structure.
The next section is mainly concerned with the theoretical analysis and experiment results are reported in the following.
Some concluding remarks are put in the end.
2 Background
2.1 Fedeated learnig
Federated learning was first proposed by Google [
17
] in 2016 for Android mobiles phones. Its main idea is to train
centralized model with decentralized data. Indeed, devices use local data to train models and upload parameters to
server for aggregation. One of the most important features of federated learning is security and privacy [
8
]. Even
though sensitive data are not directly shared between devices, it is still urgent to protect the communication between
server and device. Represented by homomorphic encryption [
18
,
19
,
20
] and secure multiparty computation [
21
,
22
],
encryption methods provide a very safe and reliable solution while their computation cost is relatively high. Perturbation
methods, such as differential privacy [
23
,
24
], use a designed noise mechanism to protect sensitive information, while
no extra computation cost is demanded, there may exist risks on prediction accuracy. However, information can still be
leaked indirectly with intermediate results from updating process, [
25
,
26
] exploit some attacks and provide defense
mechanisms.
Most recently, a large scale of work has been done to implement federated learning in graph-based networks. [
27
]
addresses non-iid issue in graph data and confronts tasks with new label domains. [
28
] develops a multi-task federated
learning method without a server. [
29
] suggests FedGNN, a recommendation federated system based on GNN. [
30
]
proposes FedSAGE, which is a sub-graph level federated learning technique utilizing graph mining model GraphSAGE
[31]. [32] introduces a federated spatio-temporal model to improve the forecasting capacity of GNN. Nevertheless, to
the best of our knowledge, there is still no study concentrating on node embedding aggregation on sampling-based
graph neural network framework.
2.2 Related work
GraphSAINT [
33
] adopts an innovative way to build mini-batches. Instead of sampling across the layers, GraphSAINT
samples sub-graphs at first and then constructs a full GCN on it, which prevents neighbor explosion problem. It also
establishes an algorithm to reduce sampling bias and variance. From comparison experiments conducted on datasets in
[
33
], GraphSAINT shows absolute advantages on accuracy and convergence rate compared with other popular samplers
(GraphSAGE [31], FastGCN [34], ClusterGCN [35], etc).
A stochastic shared embedding (SSE) for graph-based networks is investigated in [
36
] to mitigate over-parameterization
problems. Embeddings are trainable vectors that are exchanged according to transition probability for each backprop-
agate step. Our proposed
Feras
is very different from SSE since [
36
] does not consider privacy issues and runs all
computations on the same server.
To prove the convergence of parallelized stochastic gradient descent, [
37
] offers a novel idea by borrowing the good
convergent property of contraction mappings, which inspired the mathematical demonstration of our paper even though
proof in [37] is drawn for traditional neural network and no embedding sharing mechanism is acknowledged.
2
Preprint
3 Proposed Approach
3.1 An alternative of federated learning
In this paper, we attempt to fill in the gap between privacy protection and traditional graph-based network. Let’s
begin with a simple user privacy scenario. A social network is built by integrating the client network of different
companies, the global topology structure is thus shared between each company. However, because of the sensitivity of
user information, personal data may not be shared directly between each company. For a client who buys the service of
several companies at the same time, its information is accessible to all seller companies. In such cases, a user can be
represented by a node and its features are only visible to its host company.
While traditional GCN training on separate host machines is unable to prevent the network from accuracy decrease
due to the lack of information, our
Feras
model still allows companies to corporate graph training via an Aggregation
Server (AS) without sharing the raw data. Turning now to a general privacy problem, four main roles can be retrieved
as below and their relation is depicted in Figure 1:
Client:
A client is a basic element of the network, it can be interpreted as a held node. Besides information (attributes)
attached to each client, which is the input of the network, it also records which host it belongs to. Two types of nodes
are classified: private node (visible to only one host), and public node (visible to two or more hosts).
Host:
Each host
n
only has access to a certain number of nodes
H(n)
and graph topology. For a distributed sub-graph
Gn(Vn,En)
, host
n
is capable to launch the training program with its available information. What’s more, all vectors
associated with unseen nodes in
Vn
are initialized as zero. Hosts need to communicate with AS to gain more knowledge
on the network.
Sampler:
The sampler is a work distributor. It is supposed to divide the entire network into sub-graphs and distribute
them to each host. The sampler should eliminate sampling bias and improve training efficiency as much as possible.
Aggregation Server (AS):
The AS works as a bond in our model. It collects the latest information from each host and
gives them back after treatment. The latest information can be anything except raw data, such as embeddings, gradient,
or weight matrices. To improve network latency, weight matrices can be shared after every qiterations (qN).
Sampler
(AS)
Aggregation
Server
Client Group
A
B
C
Host 1
Host 2
Host 3
Client Group
Figure 1: Relation between Client, Sampler, Host and AS. Hosts can only see nodes with the same color as their own.
Training parameter is denoted by
w
. For embedding of an unseen node, hosts push a blank vector to AS and pull the
latest overall embedding from it. The propagation of embeddings is manifested with a dotted line in AS.
3.2 Learning algorithm
In this paper, we focus on node classification using convolutional layers. The latest GraphSAINT [
33
] is adopted as the
sampler and the AS is more concerned with sharing embeddings and weight matrices.
3
摘要:

FEDERATEDGRAPH-BASEDNETWORKSWITHSHAREDEMBEDDINGTianyiYuSchoolofElectricalandInformationEngineeringBeijingUniversityofCivilEngineeringandArchitectureBeijing201906020144@stu.bucea.edu.cnPeiLaiSchoolofInformationScienceandTechnologySouthwestJiaotongUniversityChengdupeilai@my.swjtu.edu.cnFeiTengSchoolof...

展开>> 收起<<
FEDERATED GRAPH -BASED NETWORKS WITH SHARED EMBEDDING Tianyi Yu.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:4.06MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注