Unbounded Gradients in Federated Leaning with Buffered
Asynchronous Aggregation
Mohammad Taha Toghani and C´
esar A. Uribe
Abstract— Synchronous updates may compromise the effi-
ciency of cross-device federated learning once the number
of active clients increases. The FedBuff algorithm (Nguyen
et al. [1]) alleviates this problem by allowing asynchronous
updates (staleness), which enhances the scalability of training
while preserving privacy via secure aggregation. We revisit
the FedBuff algorithm for asynchronous federated learning
and extend the existing analysis by removing the boundedness
assumptions from the gradient norm. This paper presents a the-
oretical analysis of the convergence rate of this algorithm when
heterogeneity in data, batch size, and delay are considered.
I. INTRODUCTION
Federated learning (FL) is an approach in machine learn-
ing theory and practice that allows training models on
distributed data sources [2], [3]. The distributed structure
of FL has numerous benefits over traditional centralized
methods, including parallel computing, efficient storage, and
improvements in data privacy. However, this framework also
presents communication efficiency, data heterogeneity, and
scalability challenges. Several works have been proposed
to improve the performance of FL [4]–[6]. Existing works
usually address a subset of these challenges while imposing
additional constraints or limitations in other aspects. For
example, the work in [7] shows a trade-off between privacy,
communication efficiency, and accuracy gains for the dis-
tributed discrete Gaussian mechanism for FL with secure
aggregation.
One of the most important advantages of FL is scalability.
Training models on centralized data stored on a single server
can be problematic when dealing with large amounts of
data. Servers may be unable to handle the load, or clients
might refuse to share their data with a third party. In FL,
the data is distributed across many devices, potentially im-
proving data privacy and computation scalability. However,
this also presents some challenges. First, keeping the update
mechanism synchronized across all devices may be very
difficult when the number of clients is large [8]. Second,
even if feasible, imposing synchronization results in huge
(unnecessary) delays in the learning procedure [6]. Finally,
each client often might have different data distributions,
which can impact the convergence of algorithms [9], [10].
In synchronous FL, e.g., FedAvg [2], [3], the server first
sends a copy of the current model to each client. The clients
The authors are with the Department of Electrical and Computer En-
gineering, Rice University, 6100 Main St, Houston, TX 77005, USA,
{mttoghani,cauribe}@rice.edu. This work was partially funded by ARPA-H
Strategic Initiative Seed Fund #916012. Part of this material is based upon
work supported by the National Science Foundation under Grants #2211815
and No. #2213568.
then train the model locally on their private data and send the
model updates back to the server. The server then aggregates
the client updates to produce a new shared model. The
process is repeated for many rounds until the shared model
converges to the desired accuracy. However, the existence
of delays, message losses, and stragglers hinders the per-
formance of distributed learning. Several works have been
proposed to improve the scalability of federated/distributed
learning via enabling asynchronous communications [6],
[8], [11]–[15]. In the majority of these results, each client
immediately communicates the parameters to the server after
applying a series of local updates. The server updates the
global parameter once it receives any client update. This has
the benefit of reducing the training time and better scalability
in practice and theory [6], [12], [15], [16] since the server
can start aggregating the client updates as soon as they are
available.
The setup, known as “vanilla” asynchronous FL, has
several challenges that must be addressed. First, due to the
nature of asynchronous updates, the clients are supposed to
deal with staleness, where the client updates are not up-to-
date with the current model on the server [1]. Moreover, the
asynchronous setup may imply potential risks for privacy
due to the lack of secure aggregation, i.e., the immediate
communication of every single client to the server [17], [18].
In [1], the authors proposed an algorithm called federated
learning with buffered asynchronous aggregation (FedBuff ),
which modifies pure asynchronous FL by enabling secure
aggregation while clients perform asynchronous updates.
This novel method is considered a variant of asynchronous
FL while serving as an intermediate approach between
synchronous and asynchronous FL.
FedBuff [1] is shown to converge for the class of smooth
and non-convex objective functions under the boundedness of
the gradient norm. By removing this assumption, we provide
a new analysis for FedBuff and improve the existing theory
by extending it to a broader class of functions. We derive our
bounds based on stochastic and heterogeneous variance and
the maximum delay between downloads and uploads across
all the clients. Table Isummarizes the properties and rate of
our analysis for FedBuff algorithm alongside and provides
a comparison with existing analyses for FedAsync [8] and
FedAvg [2], [3]. The rates reflect the complexity of the num-
ber of updates performed by the central server. The speed
of asynchronous algorithms is faster since the constraint for
synchronized updates is removed in asynchronous variations.
To our knowledge, this is the first analysis for (a variant
of) asynchronous federated learning with no boundedness
arXiv:2210.01161v1 [cs.LG] 3 Oct 2022