Aergia Leveraging Heterogeneity in Federated Learning Systems

2025-04-27 0 0 2.69MB 14 页 10玖币
侵权投诉
Aergia: Leveraging Heterogeneity in Federated
Learning Systems
Bart Cox
b.a.cox@tudel.nl
Delft University of Technology
Delft, Netherlands
Lydia Y. Chen
lydiaychen@ieee.org
Delft University of Technology
Delft, Netherlands
Jérémie Decouchant
j.decouchant@tudel.nl
Delft University of Technology
Delft, Netherlands
Abstract
Federated Learning (FL) is a popular deep learning approach
that prevents centralizing large amounts of data, and instead
relies on clients that update a global model using their lo-
cal datasets. Classical FL algorithms use a central federator
that, for each training round, waits for all clients to send
their model updates before aggregating them. In practical
deployments, clients might have dierent computing powers
and network capabilities, which might lead slow clients to
become performance bottlenecks. Previous works have sug-
gested to use a deadline for each learning round so that the
federator ignores the late updates of slow clients, or so that
clients send partially trained models before the deadline. To
speed up the training process, we instead propose
Aergia
,
a novel approach where slow clients (i) freeze the part of
their model that is the most computationally intensive to
train; (ii) train the unfrozen part of their model; and (iii)
ooad the training of the frozen part of their model to a
faster client that trains it using its own dataset. The ooad-
ing decisions are orchestrated by the federator based on the
training speed that clients report and on the similarities be-
tween their datasets, which are privately evaluated thanks
to a trusted execution environment. We show through exten-
sive experiments that
Aergia
maintains high accuracy and
signicantly reduces the training time under heterogeneous
settings by up to 27% and 53% compared to
FedAvg
and
TiFL
,
respectively.
CCS Concepts: Computing methodologies Distributed
articial intelligence
;
Computer systems organiza-
tion Cloud computing.
Keywords: Federated learning, Task Ooading, Stragglers
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
Middleware ’22, November 7–11, 2022, Quebec, QC, Canada
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9340-9/22/11. . . $15.00
hps://doi.org/10.1145/3528535.3565238
ACM Reference Format:
Bart Cox, Lydia Y. Chen, and Jérémie Decouchant. 2022.
Aergia
:
Leveraging Heterogeneity in Federated Learning Systems. In 23rd
ACM/IFIP International Middleware Conference (Middleware ’22),
November 7–11, 2022, Quebec, QC, Canada. ACM, New York, NY,
USA, 14 pages. hps://doi.org/10.1145/3528535.3565238
1 Introduction
Federated Learning (FL) is a decentralized and inherently
privacy-preserving learning paradigm where clients collec-
tively train a machine learning model [
3
,
22
]. During a learn-
ing round, a federator selects a subset of the clients that
return an update of the global model computed using their
local dataset. Upon receiving client updates, the federator
aggregates them into a global model update, which is then
shared with all clients. Most of existing aggregation algo-
rithms, including
FedAvg
[
22
] and
FedProx
[
21
], are syn-
chronous, and require the federator to collect all updates
from the selected clients before moving to the next training
round.
In a practical FL system, clients might have heterogeneous
computational resources and possess data that dier both
in quantities and class distribution. It has been shown that
both resource and data heterogeneity negatively impact the
performance of a FL system [
12
,
14
,
21
,
40
]. First, relying on
a mix of weak and strong clients instead of homogeneous
clients to train a model can signicantly prolong the training
time [
6
]. Second, a classication model trained with feder-
ated learning is less accurate when the client datasets are
non independently and identically distributed (non-IID) [
7
].
To mitigate the impact of weak clients, also called strag-
glers, the state-of-the-art methods attempt to equalize the
learning speed amongst the clients by (i) partitioning them
based on oine proling [
6
], or by (ii) dropping the up-
dates of stragglers during the training rounds [
20
,
24
]. The
former approach may fall short in capturing transient het-
erogeneity caused by applications possibly collocated on the
clients, whereas the latter might incur a severe accuracy
degradation. Moreover, the impact of stragglers is further
aggravated when encountering non-IID data among clients.
Indeed, stragglers might possess a unique dataset that is crit-
ical to the overall model accuracy. In addition, due to the
privacy preserving nature of FL, it is not really possible for
the federator to infer the data distribution based only on
the clients model updates [
21
,
33
]. To limit the risk of model
arXiv:2210.06154v1 [cs.LG] 12 Oct 2022
Middleware ’22, November 7–11, 2022, ebec, QC, Canada Bart Cox, Lydia Y. Chen, and Jérémie Decouchant
divergence, prior studies aggregate the non-IID client data
by adding a regularization term, like in
FedProx
[
21
], or by
estimating their contributions, like in
FedNova
[
33
]. How-
ever, these works implicitly assume that the client nodes are
homogeneous.
In this paper, we aim to accelerate the FL training of con-
volutional neural networks (CNN) in presence of stragglers
and non-IID data. A CNN is composed of convolutional lay-
ers and fully connected layers [
18
], which respectively learn
the representation of local data and map the extracted rep-
resentation into classes. The local training of CNN entails
forward and backward passes on both types of layers.
To retain the representation of the unique datasets of strag-
glers, we advocate to freeze their convolutional layers, and
ooad the computing and updating of the convolutional lay-
ers to strong clients. We propose
Aergia1
, a federated learn-
ing algorithm that monitors the local training of selected
clients and ooads part of the computing task of stragglers
to strong clients that have spare and idle capacities.
Aergia
relies on a client matching algorithm that associates a strag-
gler to a strong node based on an estimated performance gain
and on the similarity between their datasets, since blindly of-
oading local models to nodes that have drastically dierent
data distribution leads to weight divergence [
7
]. To ensure
privacy, data similarities are securely evaluated using the
clients’ local data distributions (i.e., the number of labels per
class) in an Intel SGX enclave [
8
], which is hosted by the
federator.
We implement
Aergia
in PyTorch as a middleware run-
ning on top of Kubernetes. We evaluate
Aergia
on three
datasets, namely MNIST, FMNIST, and Cifar-10, on dierent
network architectures against four previous heterogeneity
or non-IID aware aggregation solutions [
6
,
21
,
22
,
33
]. Our
FL systems consist of a mix of 24 weak, medium and strong
nodes that use a dierent number of CPU cores. Our evalua-
tion results show that
Aergia
achieves the highest accuracy
within the lowest training time.
In a nutshell, this paper makes the following contributions:
We explain how a straggler can ooad the training of
its model to a strong client.
We present an algorithm that matches the performance
prole and data similarity of clients.
We design
Aergia2
, a federated learning middleware
for highly heterogeneous clients and non-IID data that
leverages model training ooading and online client
matching.
Aergia
relies on a trusted execution envi-
ronment (an Intel SGX enclave) so that the federator
can evaluate the similarity of client datasets without
getting access to their private class distribution.
1
In Greek mythology,
Aergia
is the personication of sloth, idleness, indo-
lence and laziness.
2hps://github.com/bacox/fltk
We evaluate
Aergia
on a FL cluster built on top of
Kubernetes. Our evaluation results on three datasets
and several networks show that
Aergia
eectively
leverages the spare computational capacity of strong
clients to achieve high accuracy in low training time.
The remainder of this paper is organized as follows. §2
provides some background on Federated Learning, data and
resource heterogeneity, as well as on their impact on training
time and accuracy. §3provides an overview of
Aergia
, while
§4describes its algorithms and implementations details. §5
presents our performance evaluation. §6reviews the related
work. Finally, §7concludes this paper.
2 Background and Motivation
In this section, we rst recall necessary background on deep
learning models, which are core components of the federated
learning paradigm, the practical heterogeneity challenges
that federated learning faces and their impact on training
time and accuracy.
2.1 Premier on Convolutional Neural Networks
The state-of-the-art image classier follows the structure of
convolutional neural networks (CNN) [
18
], which consist of
convolutional and fully connected layers. The former maps
the image features into a compact representation, hence they
are also referred to as feature layers. The latter are dense fully
connected layers that classify the representation into one of
the classes. The other dierence between these two types of
layers is their resource demands [
9
]: convolutional layers are
computationally intensive while fully connected layers are
memory intensive. The training time of a client’s local model
can be divided into two parts: the forward pass that computes
the classication outcome of images, and the backward pass
that back-propagates the model weights. Consequently, the
training time of a typical CNN classier can thus further be
categorized into four parts: (i) : forward pass on feature
layers, (ii) fc: forward pass on fully connected layers, (iii)
bc: backward pass on fully connected layers, and (iv) bf:
backward pass on feature layers.
2.2 Federated Learning
Federated learning (FL) [
12
,
14
,
21
,
40
] is an emerging de-
centralized learning paradigm where
𝐾
clients and a feder-
ator jointly train a machine learning model in
𝑇
consecu-
tive rounds while local data stays on premise. In this paper,
we specically consider an image classication model that
maps images,
𝑥
to one of
𝐶
labels, denoted by
𝑦
, through
a function
𝑓(𝒘)
, parameterized by weights
𝒘
. Prior to the
training, the federator initializes the model architecture, the
objective function, the training algorithm, the training hy-
perparameters, and the aggregation protocol for the client’s
Aergia: Leveraging Heterogeneity in Federated Learning Systems Middleware ’22, November 7–11, 2022, ebec, QC, Canada
0.0 0.1 0.2 0.3 0.4 0.5
Variance of CPUs
1.00
1.25
1.50
1.75
2.00
2.25
Impact on round duration
Clients
7
6
5
4
3
2
(a)
Impact of CPU heterogeneity among
clients on training time (multiplicative fac-
tor compared to the homogeneous case).
70 50 30 10
Deadline in seconds
0.0
0.5
1.0
1.5
2.0
2.5
Time (103s)
(b)
Total training duration in seconds without
and with deadlines.
70 50 30 10
Deadline in seconds
60
65
70
75
80
Test Accuracy
(c)
Accuracy in a non-IID scenario without
and with deadlines.
Figure 1.
Heterogeneous computational powers among the clients increase the duration of the FL training process (Figure 1(a)).
One could use deadlines so that the federator discards late updates in a round before starting the next one, which eectively
reduces the training time (Figure 1(b)). However, using deadlines badly degrades the model accuracy, in particular in non-IID
settings (Figure 1(c)).
local update
3
. We consider convolutional neural networks
(CNN) [
18
] as the classier model. The clients train the clas-
sier model based on their own real data, which never leaves
their premises, whereas the central server iteratively aggre-
gates and distributes models submitted from clients until
reaching the global model convergence.
Local Training
. In each global round
𝑡
, the clients receive
the latest aggregated global model
𝑓(𝑤(𝑡
1
))
from the
federator and use their local data to perform local updates for
𝐸
epochs, e.g., stochastic gradient decent (SGD) updates [
23
].
The cross entropy loss function [
29
,
30
,
39
] is widely adopted
for classication problems. Specically, a client
𝑘
aims to
nd 𝒘𝑘(𝑡)that minimizes the loss function:
min
𝒘𝑘(𝑡)𝑓𝑘(𝒘𝑘(𝑡);𝑥𝑘, 𝑦𝑘),
using the
𝑛𝑘
local data points
(𝑥𝑘, 𝑦𝑘)
, where
𝑥𝑘
is an input
data, e.g., images, and
𝑦𝑘
is the class label. Upon nishing
the local training, clients send their local model parameters,
i.e., 𝒘𝑘(𝑡), to the federator.
Model aggregation
. After receiving all model updates
from clients, the federator aggregates the clients’ model pa-
rameters into the latest global model that is returned to
the clients in the beginning of the next round. Specically,
at each round
𝑡
, a subset of
𝐾
clients is selected to do local
training and send back their latest model weights,
𝑤𝑘(𝑡)
. The
aggregation algorithms dier in the frequency and weights
in aggregating the local models. FedSGD [
4
,
23
] treats all
local models equally, and trains the entire local data in one
epoch. The gradients are sent to the federator for aggrega-
tion every epoch. To minimize the communication and avoid
the divergence of local models,
FedAvg
[
22
] lets local models
train for multiple epochs and then aggregate the models.
Specically,
FedAvg
calculates the global model of round
𝑡
3
We note that a variant of FL [
26
] does not rely on a federator and aggregates
the model in a peer-to-peer manner. We do not consider this case here.
as the weighted average of all 𝐾local model weights:
𝒘(𝑡)=
𝐾
𝑘=1
𝑛𝑘
Í𝐾
𝑘=1𝑛𝑘
𝒘𝑘(𝑡)
2.3 Sources of heterogeneity
Data heterogeneity
. Clients possess dierent and unique
privacy-sensitive datasets. A common assumption in the
prior art is that client data are identically and independently
distributed, which is the so called independent and identi-
cally distributed (IID) case. Taking the image data benchmark
Cifar-10 [
16
] as an example, which contains 60,000 images
from 10 classes, in the IID case each client would own an
equal amount of images that would be equally distributed
across classes. Recent studies point out that in practice dis-
tributed datasets are highly non-IID, and dier both in size
and in distribution [
1
,
33
]. For instance, it is easier to identify
clients that own horse images than deer images (both are
classes in Cifar-10). Consequently, unique images like deer
are owned by a small client subset, whereas common images
like horse have a higher probability to be equally distributed
across all clients. Such non-IID data distribution, i.e., clients
owning data in dierent quantities and distributions, have
been shown to be challenging for FL and detrimental to the
accuracy of the global models [
12
,
14
,
40
]. The heterogene-
ity of a non-IID dataset can be captured by its Earth Mover
Distance (EMD) [
28
]. The higher the EMD of a dataset, the
higher the heterogeneity of the client data distribution. To
mitigate the accuracy degradation in FL due to non-IID data,
related studies [
21
,
33
] added regularization terms in the
objective function, altering the aggregation algorithms, or
augmenting the dataset.
Clients resource heterogeneity
. Edge devices are highly
heterogeneous in their computing and network resources [
34
].
Their hardware and software stacks evolve after each genera-
tion, i.e., every 5 to 6 years. It is challenging to nd an optimal
摘要:

Aergia:LeveragingHeterogeneityinFederatedLearningSystemsBartCoxb.a.cox@tudelft.nlDelftUniversityofTechnologyDelft,NetherlandsLydiaY.Chenlydiaychen@ieee.orgDelftUniversityofTechnologyDelft,NetherlandsJérémieDecouchantj.decouchant@tudelft.nlDelftUniversityofTechnologyDelft,NetherlandsAbstractFederated...

展开>> 收起<<
Aergia Leveraging Heterogeneity in Federated Learning Systems.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:2.69MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注