Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation Mohammad Taha Toghani and C esar A. Uribe

2025-05-06 0 0 567.44KB 8 页 10玖币
侵权投诉
Unbounded Gradients in Federated Leaning with Buffered
Asynchronous Aggregation
Mohammad Taha Toghani and C´
esar A. Uribe
Abstract Synchronous updates may compromise the effi-
ciency of cross-device federated learning once the number
of active clients increases. The FedBuff algorithm (Nguyen
et al. [1]) alleviates this problem by allowing asynchronous
updates (staleness), which enhances the scalability of training
while preserving privacy via secure aggregation. We revisit
the FedBuff algorithm for asynchronous federated learning
and extend the existing analysis by removing the boundedness
assumptions from the gradient norm. This paper presents a the-
oretical analysis of the convergence rate of this algorithm when
heterogeneity in data, batch size, and delay are considered.
I. INTRODUCTION
Federated learning (FL) is an approach in machine learn-
ing theory and practice that allows training models on
distributed data sources [2], [3]. The distributed structure
of FL has numerous benefits over traditional centralized
methods, including parallel computing, efficient storage, and
improvements in data privacy. However, this framework also
presents communication efficiency, data heterogeneity, and
scalability challenges. Several works have been proposed
to improve the performance of FL [4]–[6]. Existing works
usually address a subset of these challenges while imposing
additional constraints or limitations in other aspects. For
example, the work in [7] shows a trade-off between privacy,
communication efficiency, and accuracy gains for the dis-
tributed discrete Gaussian mechanism for FL with secure
aggregation.
One of the most important advantages of FL is scalability.
Training models on centralized data stored on a single server
can be problematic when dealing with large amounts of
data. Servers may be unable to handle the load, or clients
might refuse to share their data with a third party. In FL,
the data is distributed across many devices, potentially im-
proving data privacy and computation scalability. However,
this also presents some challenges. First, keeping the update
mechanism synchronized across all devices may be very
difficult when the number of clients is large [8]. Second,
even if feasible, imposing synchronization results in huge
(unnecessary) delays in the learning procedure [6]. Finally,
each client often might have different data distributions,
which can impact the convergence of algorithms [9], [10].
In synchronous FL, e.g., FedAvg [2], [3], the server first
sends a copy of the current model to each client. The clients
The authors are with the Department of Electrical and Computer En-
gineering, Rice University, 6100 Main St, Houston, TX 77005, USA,
{mttoghani,cauribe}@rice.edu. This work was partially funded by ARPA-H
Strategic Initiative Seed Fund #916012. Part of this material is based upon
work supported by the National Science Foundation under Grants #2211815
and No. #2213568.
then train the model locally on their private data and send the
model updates back to the server. The server then aggregates
the client updates to produce a new shared model. The
process is repeated for many rounds until the shared model
converges to the desired accuracy. However, the existence
of delays, message losses, and stragglers hinders the per-
formance of distributed learning. Several works have been
proposed to improve the scalability of federated/distributed
learning via enabling asynchronous communications [6],
[8], [11]–[15]. In the majority of these results, each client
immediately communicates the parameters to the server after
applying a series of local updates. The server updates the
global parameter once it receives any client update. This has
the benefit of reducing the training time and better scalability
in practice and theory [6], [12], [15], [16] since the server
can start aggregating the client updates as soon as they are
available.
The setup, known as “vanilla” asynchronous FL, has
several challenges that must be addressed. First, due to the
nature of asynchronous updates, the clients are supposed to
deal with staleness, where the client updates are not up-to-
date with the current model on the server [1]. Moreover, the
asynchronous setup may imply potential risks for privacy
due to the lack of secure aggregation, i.e., the immediate
communication of every single client to the server [17], [18].
In [1], the authors proposed an algorithm called federated
learning with buffered asynchronous aggregation (FedBuff ),
which modifies pure asynchronous FL by enabling secure
aggregation while clients perform asynchronous updates.
This novel method is considered a variant of asynchronous
FL while serving as an intermediate approach between
synchronous and asynchronous FL.
FedBuff [1] is shown to converge for the class of smooth
and non-convex objective functions under the boundedness of
the gradient norm. By removing this assumption, we provide
a new analysis for FedBuff and improve the existing theory
by extending it to a broader class of functions. We derive our
bounds based on stochastic and heterogeneous variance and
the maximum delay between downloads and uploads across
all the clients. Table Isummarizes the properties and rate of
our analysis for FedBuff algorithm alongside and provides
a comparison with existing analyses for FedAsync [8] and
FedAvg [2], [3]. The rates reflect the complexity of the num-
ber of updates performed by the central server. The speed
of asynchronous algorithms is faster since the constraint for
synchronized updates is removed in asynchronous variations.
To our knowledge, this is the first analysis for (a variant
of) asynchronous federated learning with no boundedness
arXiv:2210.01161v1 [cs.LG] 3 Oct 2022
TABLE I: Comparison of the characteristics considered in our analysis with relevant works for federated learning for smooth
& non-convex objective functions. Parameter τdenotes the maximum delay.
Algorithm Reference Asynchronous Buffered Unbounded Convergence
Update Aggregation Gradient Rate
McMahan et al. [2] 73- -
FedAvg Yu et al. [19] 737O1
T
Wang et al. [10] 73 3 O1
T
FedAsync Xie et al. [8] 37 7 O1
T+Oτ2
T
FedBuff Nguyen et al. [1] 3 3 7O1
T+Oτ2
T
This Work 3 3 3 O1
T+Oτ2
T
assumption on the gradient norm.
Following is an outline of the remainder of this paper.
The problem setup and FedBuff algorithm are presented
in Section II. Moreover, our convergence result and its
corresponding assumptions are provided in Section II. We
state detailed proof of our result in section III. Finally,
we conclude remarks and prospects for future research in
Section IV.
II. PROBLEM SETUP, ALGORITHM,&MAIN RESULT
In this section, we first state the problem setup, and after
explaining the FedBuff algorithm [1], we present our main
result along with the underlying assumptions.
Problem Setup: We consider a set of nclients and one
server, where each client i[n]owns a private function
fi:RdRand the goal is to jointly minimize the average
local cost functions via finding a d-dimensional parameter
wRdthat
min
wRdf(w):=1
n
n
X
i=1
fi(w),
with fi(w):=Eξipi[`i(w, ξi)],
(1)
where `i:Rd× SiRis a cost function that determines
the prediction error of wover a single data point ξi∈ Si
on user i, and pirepresents user is data distribution over
Si, for i[n]. In the above definition, fi(·)is the local cost
function of client i, and f(·)denotes the global (average) cost
function which the clients try to collaboratively minimize.
Now, let Dibe a data batch sampled from pi. Similar to (1),
we denote the stochastic cost function ˜
fi(w, Di)as follows:
˜
fi(w, Di):=1
|Di|X
ξi∈Di
`i(w, ξi).(2)
Minimization of (1) by having access to an oracle of samples
and its variants are extensively studied for many different
frameworks [4]. Now, we are ready to explain the FedBuff.
FedBuff Algorithm: Let w0be the initialization parameter
at the server. The ultimate goal is to minimize the cost
function in (1), using an algorithm via access to the stochastic
gradients. All clients can communicate with the server, and
each client i[n]communicates when its connection to the
server is stable. First, let us explain the FedBuff algorithm
from the client and server perspectives.
1) Client Algorithm: Each client irequests to read the
server’s parameter wRdonce the connection is stable
and the server is ready to send the parameter.1There is
often some delay in this step which we call the download
delay. This may be originated from factors such as
unstable connection, bandwidth limit, or communication
failure. For example, maybe the server seeks to reduce the
simultaneously active users by setting client ion hold.
The download delay can model all these factors. Once
the parameter is received (downloaded) from the server,
client iperforms Qsteps of local stochastic gradient
descent starting from the downloaded model wfor its
cost function fi(·). In words, agent iruns a Q-step
algorithm (loop of size Q), where at each local round
q∈ {0,1,...Q1}, client isamples a data batch Di,q
with respect to distribution piand performs one step
of gradient descent with local stepsize η > 0. Finally,
agent ireturns the updates (the difference between the
initial and final parameters) to the server. We refer to
the time required to broadcast parameters to the server
as the upload delay, which could have similar factors
as the download delay. Agent repeats all this procedure
until the server sends a termination message. Algorithm 1
summarizes the pseudo-code of operations at client i
[n], where Steps 4-8show the local updates performed at
the agent. Moreover, iin Step 9denotes the difference
communicated to the server.
2) Server Algorithm: The server considers an initialization
1We drop the timestep from the parameters in the client algorithm, for
clarity of exposition. We use the time notation in our analysis in Section III.
摘要:

UnboundedGradientsinFederatedLeaningwithBufferedAsynchronousAggregationMohammadTahaToghaniandC´esarA.UribeAbstract—Synchronousupdatesmaycompromisetheef-ciencyofcross-devicefederatedlearningoncethenumberofactiveclientsincreases.TheFedBuffalgorithm(Nguyenetal.[1])alleviatesthisproblembyallowingasynch...

展开>> 收起<<
Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation Mohammad Taha Toghani and C esar A. Uribe.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:8 页 大小:567.44KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注