Aergia Leveraging Heterogeneity in Federated Learning Systems

2025-04-27 0 0 2.69MB 14 页 10玖币

侵权投诉

Aergia: Leveraging Heterogeneity in Federated

Learning Systems

Bart Cox

b.a.cox@tudel.nl

Delft University of Technology

Delft, Netherlands

Lydia Y. Chen

lydiaychen@ieee.org

Delft University of Technology

Delft, Netherlands

Jérémie Decouchant

j.decouchant@tudel.nl

Delft University of Technology

Delft, Netherlands

Abstract

Federated Learning (FL) is a popular deep learning approach

that prevents centralizing large amounts of data, and instead

relies on clients that update a global model using their lo-

cal datasets. Classical FL algorithms use a central federator

that, for each training round, waits for all clients to send

their model updates before aggregating them. In practical

deployments, clients might have dierent computing powers

and network capabilities, which might lead slow clients to

become performance bottlenecks. Previous works have sug-

gested to use a deadline for each learning round so that the

federator ignores the late updates of slow clients, or so that

clients send partially trained models before the deadline. To

speed up the training process, we instead propose

Aergia

a novel approach where slow clients (i) freeze the part of

their model that is the most computationally intensive to

train; (ii) train the unfrozen part of their model; and (iii)

ooad the training of the frozen part of their model to a

faster client that trains it using its own dataset. The ooad-

ing decisions are orchestrated by the federator based on the

training speed that clients report and on the similarities be-

tween their datasets, which are privately evaluated thanks

to a trusted execution environment. We show through exten-

sive experiments that

Aergia

maintains high accuracy and

signicantly reduces the training time under heterogeneous

settings by up to 27% and 53% compared to

FedAvg

and

TiFL

respectively.

CCS Concepts: •Computing methodologies →Distributed

articial intelligence

;

•Computer systems organiza-

tion →Cloud computing.

Keywords: Federated learning, Task Ooading, Stragglers

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for prot or commercial advantage and that copies bear

this notice and the full citation on the rst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specic permission and/or a fee. Request

permissions from permissions@acm.org.

Middleware ’22, November 7–11, 2022, Quebec, QC, Canada

ACM ISBN 978-1-4503-9340-9/22/11. . . $15.00

hps://doi.org/10.1145/3528535.3565238

ACM Reference Format:

Bart Cox, Lydia Y. Chen, and Jérémie Decouchant. 2022.

Aergia

Leveraging Heterogeneity in Federated Learning Systems. In 23rd

ACM/IFIP International Middleware Conference (Middleware ’22),

November 7–11, 2022, Quebec, QC, Canada. ACM, New York, NY,

USA, 14 pages. hps://doi.org/10.1145/3528535.3565238

1 Introduction

Federated Learning (FL) is a decentralized and inherently

privacy-preserving learning paradigm where clients collec-

tively train a machine learning model [

]. During a learn-

ing round, a federator selects a subset of the clients that

return an update of the global model computed using their

local dataset. Upon receiving client updates, the federator

aggregates them into a global model update, which is then

shared with all clients. Most of existing aggregation algo-

rithms, including

FedAvg

[

] and

FedProx

[

], are syn-

chronous, and require the federator to collect all updates

from the selected clients before moving to the next training

round.

In a practical FL system, clients might have heterogeneous

computational resources and possess data that dier both

in quantities and class distribution. It has been shown that

both resource and data heterogeneity negatively impact the

performance of a FL system [

]. First, relying on

a mix of weak and strong clients instead of homogeneous

clients to train a model can signicantly prolong the training

time [

]. Second, a classication model trained with feder-

ated learning is less accurate when the client datasets are

non independently and identically distributed (non-IID) [

To mitigate the impact of weak clients, also called strag-

glers, the state-of-the-art methods attempt to equalize the

learning speed amongst the clients by (i) partitioning them

based on oine proling [

], or by (ii) dropping the up-

dates of stragglers during the training rounds [

]. The

former approach may fall short in capturing transient het-

erogeneity caused by applications possibly collocated on the

clients, whereas the latter might incur a severe accuracy

degradation. Moreover, the impact of stragglers is further

aggravated when encountering non-IID data among clients.

Indeed, stragglers might possess a unique dataset that is crit-

ical to the overall model accuracy. In addition, due to the

privacy preserving nature of FL, it is not really possible for

the federator to infer the data distribution based only on

the clients model updates [

]. To limit the risk of model

arXiv:2210.06154v1 [cs.LG] 12 Oct 2022

Middleware ’22, November 7–11, 2022, ebec, QC, Canada Bart Cox, Lydia Y. Chen, and Jérémie Decouchant

divergence, prior studies aggregate the non-IID client data

by adding a regularization term, like in

FedProx

[

], or by

estimating their contributions, like in

FedNova

[

]. How-

ever, these works implicitly assume that the client nodes are

homogeneous.

In this paper, we aim to accelerate the FL training of con-

volutional neural networks (CNN) in presence of stragglers

and non-IID data. A CNN is composed of convolutional lay-

ers and fully connected layers [

], which respectively learn

the representation of local data and map the extracted rep-

resentation into classes. The local training of CNN entails

forward and backward passes on both types of layers.

To retain the representation of the unique datasets of strag-

glers, we advocate to freeze their convolutional layers, and

ooad the computing and updating of the convolutional lay-

ers to strong clients. We propose

Aergia1

, a federated learn-

ing algorithm that monitors the local training of selected

clients and ooads part of the computing task of stragglers

to strong clients that have spare and idle capacities.

Aergia

relies on a client matching algorithm that associates a strag-

gler to a strong node based on an estimated performance gain

and on the similarity between their datasets, since blindly of-

oading local models to nodes that have drastically dierent

data distribution leads to weight divergence [

]. To ensure

privacy, data similarities are securely evaluated using the

clients’ local data distributions (i.e., the number of labels per

class) in an Intel SGX enclave [

], which is hosted by the

federator.

We implement

Aergia

in PyTorch as a middleware run-

ning on top of Kubernetes. We evaluate

Aergia

on three

datasets, namely MNIST, FMNIST, and Cifar-10, on dierent

network architectures against four previous heterogeneity

or non-IID aware aggregation solutions [

]. Our

FL systems consist of a mix of 24 weak, medium and strong

nodes that use a dierent number of CPU cores. Our evalua-

tion results show that

Aergia

achieves the highest accuracy

within the lowest training time.

In a nutshell, this paper makes the following contributions:

•

We explain how a straggler can ooad the training of

its model to a strong client.

•

We present an algorithm that matches the performance

prole and data similarity of clients.

•

We design

Aergia2

, a federated learning middleware

for highly heterogeneous clients and non-IID data that

leverages model training ooading and online client

matching.

Aergia

relies on a trusted execution envi-

ronment (an Intel SGX enclave) so that the federator

can evaluate the similarity of client datasets without

getting access to their private class distribution.

In Greek mythology,

Aergia

is the personication of sloth, idleness, indo-

lence and laziness.

2hps://github.com/bacox/fltk

•

We evaluate

Aergia

on a FL cluster built on top of

Kubernetes. Our evaluation results on three datasets

and several networks show that

Aergia

eectively

leverages the spare computational capacity of strong

clients to achieve high accuracy in low training time.

The remainder of this paper is organized as follows. §2

provides some background on Federated Learning, data and

resource heterogeneity, as well as on their impact on training

time and accuracy. §3provides an overview of

Aergia

, while

§4describes its algorithms and implementations details. §5

presents our performance evaluation. §6reviews the related

work. Finally, §7concludes this paper.

2 Background and Motivation

In this section, we rst recall necessary background on deep

learning models, which are core components of the federated

learning paradigm, the practical heterogeneity challenges

that federated learning faces and their impact on training

time and accuracy.

2.1 Premier on Convolutional Neural Networks

The state-of-the-art image classier follows the structure of

convolutional neural networks (CNN) [

], which consist of

convolutional and fully connected layers. The former maps

the image features into a compact representation, hence they

are also referred to as feature layers. The latter are dense fully

connected layers that classify the representation into one of

the classes. The other dierence between these two types of

layers is their resource demands [

]: convolutional layers are

computationally intensive while fully connected layers are

memory intensive. The training time of a client’s local model

can be divided into two parts: the forward pass that computes

the classication outcome of images, and the backward pass

that back-propagates the model weights. Consequently, the

training time of a typical CNN classier can thus further be

categorized into four parts: (i) : forward pass on feature

layers, (ii) fc: forward pass on fully connected layers, (iii)

bc: backward pass on fully connected layers, and (iv) bf:

backward pass on feature layers.

2.2 Federated Learning

Federated learning (FL) [

] is an emerging de-

centralized learning paradigm where

𝐾

clients and a feder-

ator jointly train a machine learning model in

𝑇

consecu-

tive rounds while local data stays on premise. In this paper,

we specically consider an image classication model that

maps images,

𝑥

to one of

𝐶

labels, denoted by

𝑦

, through

a function

𝑓(𝒘)

, parameterized by weights

𝒘

. Prior to the

training, the federator initializes the model architecture, the

objective function, the training algorithm, the training hy-

perparameters, and the aggregation protocol for the client’s

Aergia: Leveraging Heterogeneity in Federated Learning Systems Middleware ’22, November 7–11, 2022, ebec, QC, Canada

0.0 0.1 0.2 0.3 0.4 0.5

Variance of CPUs

1.00

1.25

1.50

1.75

2.00

2.25

Impact on round duration

Clients

(a)

Impact of CPU heterogeneity among

clients on training time (multiplicative fac-

tor compared to the homogeneous case).

∞70 50 30 10

Deadline in seconds

0.0

0.5

1.0

1.5

2.0

2.5

Time (103s)

(b)

Total training duration in seconds without

and with deadlines.

∞70 50 30 10

Deadline in seconds

Test Accuracy

(c)

Accuracy in a non-IID scenario without

and with deadlines.

Figure 1.

Heterogeneous computational powers among the clients increase the duration of the FL training process (Figure 1(a)).

One could use deadlines so that the federator discards late updates in a round before starting the next one, which eectively

reduces the training time (Figure 1(b)). However, using deadlines badly degrades the model accuracy, in particular in non-IID

settings (Figure 1(c)).

local update

. We consider convolutional neural networks

(CNN) [

] as the classier model. The clients train the clas-

sier model based on their own real data, which never leaves

their premises, whereas the central server iteratively aggre-

gates and distributes models submitted from clients until

reaching the global model convergence.

Local Training

. In each global round

𝑡

, the clients receive

the latest aggregated global model

𝑓(𝑤(𝑡−

))

from the

federator and use their local data to perform local updates for

𝐸

epochs, e.g., stochastic gradient decent (SGD) updates [

The cross entropy loss function [

] is widely adopted

for classication problems. Specically, a client

𝑘

aims to

nd 𝒘𝑘(𝑡)that minimizes the loss function:

min

𝒘𝑘(𝑡)𝑓𝑘(𝒘𝑘(𝑡);𝑥𝑘, 𝑦𝑘),

using the

𝑛𝑘

local data points

(𝑥𝑘, 𝑦𝑘)

, where

𝑥𝑘

is an input

data, e.g., images, and

𝑦𝑘

is the class label. Upon nishing

the local training, clients send their local model parameters,

i.e., 𝒘𝑘(𝑡), to the federator.

Model aggregation

. After receiving all model updates

from clients, the federator aggregates the clients’ model pa-

rameters into the latest global model that is returned to

the clients in the beginning of the next round. Specically,

at each round

𝑡

, a subset of

𝐾

clients is selected to do local

training and send back their latest model weights,

𝑤𝑘(𝑡)

. The

aggregation algorithms dier in the frequency and weights

in aggregating the local models. FedSGD [

] treats all

local models equally, and trains the entire local data in one

epoch. The gradients are sent to the federator for aggrega-

tion every epoch. To minimize the communication and avoid

the divergence of local models,

FedAvg

[

] lets local models

train for multiple epochs and then aggregate the models.

Specically,

FedAvg

calculates the global model of round

𝑡

We note that a variant of FL [

] does not rely on a federator and aggregates

the model in a peer-to-peer manner. We do not consider this case here.

as the weighted average of all 𝐾local model weights:

𝒘(𝑡)=

𝐾



𝑘=1

𝑛𝑘

Í𝐾

𝑘=1𝑛𝑘

𝒘𝑘(𝑡)

2.3 Sources of heterogeneity

Data heterogeneity

. Clients possess dierent and unique

privacy-sensitive datasets. A common assumption in the

prior art is that client data are identically and independently

distributed, which is the so called independent and identi-

cally distributed (IID) case. Taking the image data benchmark

Cifar-10 [

] as an example, which contains 60,000 images

from 10 classes, in the IID case each client would own an

equal amount of images that would be equally distributed

across classes. Recent studies point out that in practice dis-

tributed datasets are highly non-IID, and dier both in size

and in distribution [

]. For instance, it is easier to identify

clients that own horse images than deer images (both are

classes in Cifar-10). Consequently, unique images like deer

are owned by a small client subset, whereas common images

like horse have a higher probability to be equally distributed

across all clients. Such non-IID data distribution, i.e., clients

owning data in dierent quantities and distributions, have

been shown to be challenging for FL and detrimental to the

accuracy of the global models [

]. The heterogene-

ity of a non-IID dataset can be captured by its Earth Mover

Distance (EMD) [

]. The higher the EMD of a dataset, the

higher the heterogeneity of the client data distribution. To

mitigate the accuracy degradation in FL due to non-IID data,

related studies [

] added regularization terms in the

objective function, altering the aggregation algorithms, or

augmenting the dataset.

Clients resource heterogeneity

. Edge devices are highly

heterogeneous in their computing and network resources [

Their hardware and software stacks evolve after each genera-

tion, i.e., every 5 to 6 years. It is challenging to nd an optimal

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Aergia:LeveragingHeterogeneityinFederatedLearningSystemsBartCoxb.a.cox@tudelft.nlDelftUniversityofTechnologyDelft,NetherlandsLydiaY.Chenlydiaychen@ieee.orgDelftUniversityofTechnologyDelft,NetherlandsJérémieDecouchantj.decouchant@tudelft.nlDelftUniversityofTechnologyDelft,NetherlandsAbstractFederated...

展开>> 收起<<

Aergia Leveraging Heterogeneity in Federated Learning Systems.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Aergia Leveraging Heterogeneity in Federated Learning Systems

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: