
Middleware ’22, November 7–11, 2022, ebec, QC, Canada Bart Cox, Lydia Y. Chen, and Jérémie Decouchant
divergence, prior studies aggregate the non-IID client data
by adding a regularization term, like in
FedProx
[
21
], or by
estimating their contributions, like in
FedNova
[
33
]. How-
ever, these works implicitly assume that the client nodes are
homogeneous.
In this paper, we aim to accelerate the FL training of con-
volutional neural networks (CNN) in presence of stragglers
and non-IID data. A CNN is composed of convolutional lay-
ers and fully connected layers [
18
], which respectively learn
the representation of local data and map the extracted rep-
resentation into classes. The local training of CNN entails
forward and backward passes on both types of layers.
To retain the representation of the unique datasets of strag-
glers, we advocate to freeze their convolutional layers, and
ooad the computing and updating of the convolutional lay-
ers to strong clients. We propose
Aergia1
, a federated learn-
ing algorithm that monitors the local training of selected
clients and ooads part of the computing task of stragglers
to strong clients that have spare and idle capacities.
Aergia
relies on a client matching algorithm that associates a strag-
gler to a strong node based on an estimated performance gain
and on the similarity between their datasets, since blindly of-
oading local models to nodes that have drastically dierent
data distribution leads to weight divergence [
7
]. To ensure
privacy, data similarities are securely evaluated using the
clients’ local data distributions (i.e., the number of labels per
class) in an Intel SGX enclave [
8
], which is hosted by the
federator.
We implement
Aergia
in PyTorch as a middleware run-
ning on top of Kubernetes. We evaluate
Aergia
on three
datasets, namely MNIST, FMNIST, and Cifar-10, on dierent
network architectures against four previous heterogeneity
or non-IID aware aggregation solutions [
6
,
21
,
22
,
33
]. Our
FL systems consist of a mix of 24 weak, medium and strong
nodes that use a dierent number of CPU cores. Our evalua-
tion results show that
Aergia
achieves the highest accuracy
within the lowest training time.
In a nutshell, this paper makes the following contributions:
•
We explain how a straggler can ooad the training of
its model to a strong client.
•
We present an algorithm that matches the performance
prole and data similarity of clients.
•
We design
Aergia2
, a federated learning middleware
for highly heterogeneous clients and non-IID data that
leverages model training ooading and online client
matching.
Aergia
relies on a trusted execution envi-
ronment (an Intel SGX enclave) so that the federator
can evaluate the similarity of client datasets without
getting access to their private class distribution.
1
In Greek mythology,
Aergia
is the personication of sloth, idleness, indo-
lence and laziness.
2hps://github.com/bacox/fltk
•
We evaluate
Aergia
on a FL cluster built on top of
Kubernetes. Our evaluation results on three datasets
and several networks show that
Aergia
eectively
leverages the spare computational capacity of strong
clients to achieve high accuracy in low training time.
The remainder of this paper is organized as follows. §2
provides some background on Federated Learning, data and
resource heterogeneity, as well as on their impact on training
time and accuracy. §3provides an overview of
Aergia
, while
§4describes its algorithms and implementations details. §5
presents our performance evaluation. §6reviews the related
work. Finally, §7concludes this paper.
2 Background and Motivation
In this section, we rst recall necessary background on deep
learning models, which are core components of the federated
learning paradigm, the practical heterogeneity challenges
that federated learning faces and their impact on training
time and accuracy.
2.1 Premier on Convolutional Neural Networks
The state-of-the-art image classier follows the structure of
convolutional neural networks (CNN) [
18
], which consist of
convolutional and fully connected layers. The former maps
the image features into a compact representation, hence they
are also referred to as feature layers. The latter are dense fully
connected layers that classify the representation into one of
the classes. The other dierence between these two types of
layers is their resource demands [
9
]: convolutional layers are
computationally intensive while fully connected layers are
memory intensive. The training time of a client’s local model
can be divided into two parts: the forward pass that computes
the classication outcome of images, and the backward pass
that back-propagates the model weights. Consequently, the
training time of a typical CNN classier can thus further be
categorized into four parts: (i) : forward pass on feature
layers, (ii) fc: forward pass on fully connected layers, (iii)
bc: backward pass on fully connected layers, and (iv) bf:
backward pass on feature layers.
2.2 Federated Learning
Federated learning (FL) [
12
,
14
,
21
,
40
] is an emerging de-
centralized learning paradigm where
𝐾
clients and a feder-
ator jointly train a machine learning model in
𝑇
consecu-
tive rounds while local data stays on premise. In this paper,
we specically consider an image classication model that
maps images,
𝑥
to one of
𝐶
labels, denoted by
𝑦
, through
a function
𝑓(𝒘)
, parameterized by weights
𝒘
. Prior to the
training, the federator initializes the model architecture, the
objective function, the training algorithm, the training hy-
perparameters, and the aggregation protocol for the client’s