Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation Mohammad Taha Toghani and C esar A. Uribe

2025-05-06 0 0 567.44KB 8 页 10玖币

侵权投诉

Unbounded Gradients in Federated Leaning with Buffered

Asynchronous Aggregation

Mohammad Taha Toghani and C´

esar A. Uribe

Abstract— Synchronous updates may compromise the efﬁ-

ciency of cross-device federated learning once the number

of active clients increases. The FedBuff algorithm (Nguyen

et al. [1]) alleviates this problem by allowing asynchronous

updates (staleness), which enhances the scalability of training

while preserving privacy via secure aggregation. We revisit

the FedBuff algorithm for asynchronous federated learning

and extend the existing analysis by removing the boundedness

assumptions from the gradient norm. This paper presents a the-

oretical analysis of the convergence rate of this algorithm when

heterogeneity in data, batch size, and delay are considered.

I. INTRODUCTION

Federated learning (FL) is an approach in machine learn-

ing theory and practice that allows training models on

distributed data sources [2], [3]. The distributed structure

of FL has numerous beneﬁts over traditional centralized

methods, including parallel computing, efﬁcient storage, and

improvements in data privacy. However, this framework also

presents communication efﬁciency, data heterogeneity, and

scalability challenges. Several works have been proposed

to improve the performance of FL [4]–[6]. Existing works

usually address a subset of these challenges while imposing

additional constraints or limitations in other aspects. For

example, the work in [7] shows a trade-off between privacy,

communication efﬁciency, and accuracy gains for the dis-

tributed discrete Gaussian mechanism for FL with secure

aggregation.

One of the most important advantages of FL is scalability.

Training models on centralized data stored on a single server

can be problematic when dealing with large amounts of

data. Servers may be unable to handle the load, or clients

might refuse to share their data with a third party. In FL,

the data is distributed across many devices, potentially im-

proving data privacy and computation scalability. However,

this also presents some challenges. First, keeping the update

mechanism synchronized across all devices may be very

difﬁcult when the number of clients is large [8]. Second,

even if feasible, imposing synchronization results in huge

(unnecessary) delays in the learning procedure [6]. Finally,

each client often might have different data distributions,

which can impact the convergence of algorithms [9], [10].

In synchronous FL, e.g., FedAvg [2], [3], the server ﬁrst

sends a copy of the current model to each client. The clients

The authors are with the Department of Electrical and Computer En-

gineering, Rice University, 6100 Main St, Houston, TX 77005, USA,

{mttoghani,cauribe}@rice.edu. This work was partially funded by ARPA-H

Strategic Initiative Seed Fund #916012. Part of this material is based upon

work supported by the National Science Foundation under Grants #2211815

and No. #2213568.

then train the model locally on their private data and send the

model updates back to the server. The server then aggregates

the client updates to produce a new shared model. The

process is repeated for many rounds until the shared model

converges to the desired accuracy. However, the existence

of delays, message losses, and stragglers hinders the per-

formance of distributed learning. Several works have been

proposed to improve the scalability of federated/distributed

learning via enabling asynchronous communications [6],

[8], [11]–[15]. In the majority of these results, each client

immediately communicates the parameters to the server after

applying a series of local updates. The server updates the

global parameter once it receives any client update. This has

the beneﬁt of reducing the training time and better scalability

in practice and theory [6], [12], [15], [16] since the server

can start aggregating the client updates as soon as they are

available.

The setup, known as “vanilla” asynchronous FL, has

several challenges that must be addressed. First, due to the

nature of asynchronous updates, the clients are supposed to

deal with staleness, where the client updates are not up-to-

date with the current model on the server [1]. Moreover, the

asynchronous setup may imply potential risks for privacy

due to the lack of secure aggregation, i.e., the immediate

communication of every single client to the server [17], [18].

In [1], the authors proposed an algorithm called federated

learning with buffered asynchronous aggregation (FedBuff ),

which modiﬁes pure asynchronous FL by enabling secure

aggregation while clients perform asynchronous updates.

This novel method is considered a variant of asynchronous

FL while serving as an intermediate approach between

synchronous and asynchronous FL.

FedBuff [1] is shown to converge for the class of smooth

and non-convex objective functions under the boundedness of

the gradient norm. By removing this assumption, we provide

a new analysis for FedBuff and improve the existing theory

by extending it to a broader class of functions. We derive our

bounds based on stochastic and heterogeneous variance and

the maximum delay between downloads and uploads across

all the clients. Table Isummarizes the properties and rate of

our analysis for FedBuff algorithm alongside and provides

a comparison with existing analyses for FedAsync [8] and

FedAvg [2], [3]. The rates reﬂect the complexity of the num-

ber of updates performed by the central server. The speed

of asynchronous algorithms is faster since the constraint for

synchronized updates is removed in asynchronous variations.

To our knowledge, this is the ﬁrst analysis for (a variant

of) asynchronous federated learning with no boundedness

arXiv:2210.01161v1 [cs.LG] 3 Oct 2022

TABLE I: Comparison of the characteristics considered in our analysis with relevant works for federated learning for smooth

& non-convex objective functions. Parameter τdenotes the maximum delay.

Algorithm Reference Asynchronous Buffered Unbounded Convergence

Update Aggregation Gradient Rate

McMahan et al. [2] 73- -

FedAvg Yu et al. [19] 737O1

√T

Wang et al. [10] 73 3 O1

√T

FedAsync Xie et al. [8] 37 7 O1

√T+Oτ2

T

FedBuff Nguyen et al. [1] 3 3 7O1

√T+Oτ2

T

This Work 3 3 3 O1

√T+Oτ2

T

assumption on the gradient norm.

Following is an outline of the remainder of this paper.

The problem setup and FedBuff algorithm are presented

in Section II. Moreover, our convergence result and its

corresponding assumptions are provided in Section II. We

state detailed proof of our result in section III. Finally,

we conclude remarks and prospects for future research in

Section IV.

II. PROBLEM SETUP, ALGORITHM,&MAIN RESULT

In this section, we ﬁrst state the problem setup, and after

explaining the FedBuff algorithm [1], we present our main

result along with the underlying assumptions.

Problem Setup: We consider a set of nclients and one

server, where each client i∈[n]owns a private function

fi:Rd→Rand the goal is to jointly minimize the average

local cost functions via ﬁnding a d-dimensional parameter

w∈Rdthat

min

w∈Rdf(w):=1

i=1

fi(w),

with fi(w):=Eξi∼pi[`i(w, ξi)],

(1)

where `i:Rd× Si→Ris a cost function that determines

the prediction error of wover a single data point ξi∈ Si

on user i, and pirepresents user i’s data distribution over

Si, for i∈[n]. In the above deﬁnition, fi(·)is the local cost

function of client i, and f(·)denotes the global (average) cost

function which the clients try to collaboratively minimize.

Now, let Dibe a data batch sampled from pi. Similar to (1),

we denote the stochastic cost function ˜

fi(w, Di)as follows:

fi(w, Di):=1

|Di|X

ξi∈Di

`i(w, ξi).(2)

Minimization of (1) by having access to an oracle of samples

and its variants are extensively studied for many different

frameworks [4]. Now, we are ready to explain the FedBuff.

FedBuff Algorithm: Let w0be the initialization parameter

at the server. The ultimate goal is to minimize the cost

function in (1), using an algorithm via access to the stochastic

gradients. All clients can communicate with the server, and

each client i∈[n]communicates when its connection to the

server is stable. First, let us explain the FedBuff algorithm

from the client and server perspectives.

1) Client Algorithm: Each client irequests to read the

server’s parameter w∈Rdonce the connection is stable

and the server is ready to send the parameter.1There is

often some delay in this step which we call the download

delay. This may be originated from factors such as

unstable connection, bandwidth limit, or communication

failure. For example, maybe the server seeks to reduce the

simultaneously active users by setting client ion hold.

The download delay can model all these factors. Once

the parameter is received (downloaded) from the server,

client iperforms Qsteps of local stochastic gradient

descent starting from the downloaded model wfor its

cost function fi(·). In words, agent iruns a Q-step

algorithm (loop of size Q), where at each local round

q∈ {0,1,...Q−1}, client isamples a data batch Di,q

with respect to distribution piand performs one step

of gradient descent with local stepsize η > 0. Finally,

agent ireturns the updates (the difference between the

initial and ﬁnal parameters) to the server. We refer to

the time required to broadcast parameters to the server

as the upload delay, which could have similar factors

as the download delay. Agent repeats all this procedure

until the server sends a termination message. Algorithm 1

summarizes the pseudo-code of operations at client i∈

[n], where Steps 4-8show the local updates performed at

the agent. Moreover, ∆iin Step 9denotes the difference

communicated to the server.

2) Server Algorithm: The server considers an initialization

1We drop the timestep from the parameters in the client algorithm, for

clarity of exposition. We use the time notation in our analysis in Section III.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnboundedGradientsinFederatedLeaningwithBufferedAsynchronousAggregationMohammadTahaToghaniandC´esarA.UribeAbstractSynchronousupdatesmaycompromisetheef-ciencyofcross-devicefederatedlearningoncethenumberofactiveclientsincreases.TheFedBuffalgorithm(Nguyenetal.[1])alleviatesthisproblembyallowingasynch...

展开>> 收起<<

Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation Mohammad Taha Toghani and C esar A. Uribe.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unbounded Gradients in Federated Leaning with Buffered Asynchronous Aggregation Mohammad Taha Toghani and C esar A. Uribe

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: