FEDERATED GRAPH -BASED NETWORKS WITH SHARED EMBEDDING Tianyi Yu

2025-04-27 0 0 4.06MB 15 页 10玖币

侵权投诉

FEDERATED GRAPH-BASED NETWORKS WITH SHARED

EMBEDDING

Tianyi Yu

School of Electrical and Information Engineering

Beijing University of Civil Engineering and Architecture

Beijing

201906020144@stu.bucea.edu.cn

Pei Lai

School of Information Science and Technology

Southwest Jiaotong University

Chengdu

peilai@my.swjtu.edu.cn

Fei Teng

School of Information Science and Technology

Southwest Jiaotong University

Chengdu

fteng@swjtu.edu.cn

ABSTRACT

Nowadays, user privacy is becoming an issue that cannot be bypassed for system developers, especially

for that of web applications where data can be easily transferred through internet. Thankfully,

federated learning proposes an innovative method to train models with distributed devices while data

are kept in local storage. However, unlike general neural networks, although graph-based networks

have achieved great success in classiﬁcation tasks and advanced recommendation system, its high

performance relies on the rich context provided by a graph structure, which is vulnerable when data

attributes are incomplete. Therefore, the latter becomes a realistic problem when implementing

federated learning for graph-based networks. Knowing that data embedding is a representation in a

different space, we propose our Federated Graph-based Networks with Shared Embedding (Feras),

which uses shared embedding data to train the network and avoids the direct sharing of original data.

A solid theoretical proof of the convergence of Feras is given in this work. Experiments on different

datasets (PPI, Flickr, Reddit) are conducted to show the efﬁciency of Feras for centralized learning.

Finally, Feras enables the training of current graph-based models in the federated learning framework

for privacy concern.

Keywords Graph Neural Networks ·Federated Learning ·Privacy-Preserving Computation ·Shared Embedding

1 Introduction

Recently, the user privacy issue on web applications arouses lots of attention. With the development of network

technology and improvement of IoT, a large volume of personal data is collected and analysed in daily life and some of

them may be uploaded to online network illegally. In 2018, the European Union (EU) and the European Economic Area

(EEA) enacted the General Data Protection Regulation (GDPR) [

], whose main purpose is to protect data privacy and

to limit personal data transfer between processors.

The launch of GDPR is an epitome of public concern on privacy in a digital age. In fact, the discussion on privacy

protection techniques can be traced back to decades ago. [

] explains that the main problem is not due to the

lack of available security mechanisms but the privacy preserving on computational aspect. Several technologies

are brought forward today: trusted execution environment [

], secure multiparty computation [

], federated

learning [

]. Among these three mechanisms, federated learning, which employs multiple devices to build a model

collaboratively while keeping all the data in local storage, stands out because of its preeminence of low computational

arXiv:2210.01803v1 [cs.LG] 3 Oct 2022

Preprint

costs. What’s more, federated learning also borrows tools such as differential privacy [

], secure multiparty computation

[4], homomorphic encryption [10] to offer enhanced security.

Graph Neural Networks (GNN) and their variants are showing dominant performance on many ML tasks [

]. For

example, Graph Convolutional Network (GCN) has been widely applied in many different domains such as semi-

supervised classiﬁcation [

], multi-view networks [

] and link prediction [

], etc. However, the efﬁciency of

graph-based networks relies on the rich information among the elements contained in a graph. Once faced with a

perturbed topological structure or incomplete node attribute, the network either can no longer provide satisﬁed learning

performance [

] or needs a complex node attribute completion system [

]. The nature of graph-based networks

results in the hard implementation of federated learning, especially when the transmission of original user data, such as

features and labels of nodes, is prohibitive between processors.

Based on the observation that the embedding representation differs from initial data space, we propose our

derated

aph-b

sed Networks with

hared Embedding (

Feras

) model in this paper. Instead of exchanging raw data information

at the beginning, we facilitate embedding communication in the middle of the network. On the one hand, computations

are performed on edge hosts synchronously to speed up training without leaking raw data, on the other hand, node

information is propagated through the entire graph via embedding, which is critical to retain the network performance.

The rest of the paper proceeds as follows: we begin by discussing the latest related research of distributed learning

and training methods, then go introducing four main rules in application scenarios and proposing algorithm structure.

The next section is mainly concerned with the theoretical analysis and experiment results are reported in the following.

Some concluding remarks are put in the end.

2 Background

2.1 Fedeated learnig

Federated learning was ﬁrst proposed by Google [

] in 2016 for Android mobiles phones. Its main idea is to train

centralized model with decentralized data. Indeed, devices use local data to train models and upload parameters to

server for aggregation. One of the most important features of federated learning is security and privacy [

]. Even

though sensitive data are not directly shared between devices, it is still urgent to protect the communication between

server and device. Represented by homomorphic encryption [

] and secure multiparty computation [

encryption methods provide a very safe and reliable solution while their computation cost is relatively high. Perturbation

methods, such as differential privacy [

], use a designed noise mechanism to protect sensitive information, while

no extra computation cost is demanded, there may exist risks on prediction accuracy. However, information can still be

leaked indirectly with intermediate results from updating process, [

] exploit some attacks and provide defense

mechanisms.

Most recently, a large scale of work has been done to implement federated learning in graph-based networks. [

]

addresses non-iid issue in graph data and confronts tasks with new label domains. [

] develops a multi-task federated

learning method without a server. [

] suggests FedGNN, a recommendation federated system based on GNN. [

]

proposes FedSAGE, which is a sub-graph level federated learning technique utilizing graph mining model GraphSAGE

[31]. [32] introduces a federated spatio-temporal model to improve the forecasting capacity of GNN. Nevertheless, to

the best of our knowledge, there is still no study concentrating on node embedding aggregation on sampling-based

graph neural network framework.

2.2 Related work

GraphSAINT [

] adopts an innovative way to build mini-batches. Instead of sampling across the layers, GraphSAINT

samples sub-graphs at ﬁrst and then constructs a full GCN on it, which prevents neighbor explosion problem. It also

establishes an algorithm to reduce sampling bias and variance. From comparison experiments conducted on datasets in

[

], GraphSAINT shows absolute advantages on accuracy and convergence rate compared with other popular samplers

(GraphSAGE [31], FastGCN [34], ClusterGCN [35], etc).

A stochastic shared embedding (SSE) for graph-based networks is investigated in [

] to mitigate over-parameterization

problems. Embeddings are trainable vectors that are exchanged according to transition probability for each backprop-

agate step. Our proposed

Feras

is very different from SSE since [

] does not consider privacy issues and runs all

computations on the same server.

To prove the convergence of parallelized stochastic gradient descent, [

] offers a novel idea by borrowing the good

convergent property of contraction mappings, which inspired the mathematical demonstration of our paper even though

proof in [37] is drawn for traditional neural network and no embedding sharing mechanism is acknowledged.

Preprint

3 Proposed Approach

3.1 An alternative of federated learning

In this paper, we attempt to ﬁll in the gap between privacy protection and traditional graph-based network. Let’s

begin with a simple user privacy scenario. A social network is built by integrating the client network of different

companies, the global topology structure is thus shared between each company. However, because of the sensitivity of

user information, personal data may not be shared directly between each company. For a client who buys the service of

several companies at the same time, its information is accessible to all seller companies. In such cases, a user can be

represented by a node and its features are only visible to its host company.

While traditional GCN training on separate host machines is unable to prevent the network from accuracy decrease

due to the lack of information, our

Feras

model still allows companies to corporate graph training via an Aggregation

Server (AS) without sharing the raw data. Turning now to a general privacy problem, four main roles can be retrieved

as below and their relation is depicted in Figure 1:

Client:

A client is a basic element of the network, it can be interpreted as a held node. Besides information (attributes)

attached to each client, which is the input of the network, it also records which host it belongs to. Two types of nodes

are classiﬁed: private node (visible to only one host), and public node (visible to two or more hosts).

Host:

Each host

only has access to a certain number of nodes

H(n)

and graph topology. For a distributed sub-graph

Gn(Vn,En)

, host

is capable to launch the training program with its available information. What’s more, all vectors

associated with unseen nodes in

are initialized as zero. Hosts need to communicate with AS to gain more knowledge

on the network.

Sampler:

The sampler is a work distributor. It is supposed to divide the entire network into sub-graphs and distribute

them to each host. The sampler should eliminate sampling bias and improve training efﬁciency as much as possible.

Aggregation Server (AS):

The AS works as a bond in our model. It collects the latest information from each host and

gives them back after treatment. The latest information can be anything except raw data, such as embeddings, gradient,

or weight matrices. To improve network latency, weight matrices can be shared after every qiterations (q∈N∗).

Sampler

(AS)

Aggregation

Server

Client Group

Host 1

Host 2

Host 3

Client Group

Figure 1: Relation between Client, Sampler, Host and AS. Hosts can only see nodes with the same color as their own.

Training parameter is denoted by

. For embedding of an unseen node, hosts push a blank vector to AS and pull the

latest overall embedding from it. The propagation of embeddings is manifested with a dotted line in AS.

3.2 Learning algorithm

In this paper, we focus on node classiﬁcation using convolutional layers. The latest GraphSAINT [

] is adopted as the

sampler and the AS is more concerned with sharing embeddings and weight matrices.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FEDERATEDGRAPH-BASEDNETWORKSWITHSHAREDEMBEDDINGTianyiYuSchoolofElectricalandInformationEngineeringBeijingUniversityofCivilEngineeringandArchitectureBeijing201906020144@stu.bucea.edu.cnPeiLaiSchoolofInformationScienceandTechnologySouthwestJiaotongUniversityChengdupeilai@my.swjtu.edu.cnFeiTengSchoolof...

展开>> 收起<<

FEDERATED GRAPH -BASED NETWORKS WITH SHARED EMBEDDING Tianyi Yu.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FEDERATED GRAPH -BASED NETWORKS WITH SHARED EMBEDDING Tianyi Yu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: