PersA-FL Personalized Asynchronous Federated Learning Mohammad Taha Toghania Soomin Leeb C esar A. Uribea aDepartment of Electrical and Computer Engineering Rice University Houston TX USA

2025-05-02 0 0 1019.67KB 41 页 10玖币

侵权投诉

PersA-FL: Personalized Asynchronous Federated Learning

Mohammad Taha Toghania, Soomin Leeb, C´esar A. Uribea

aDepartment of Electrical and Computer Engineering, Rice University, Houston, TX, USA

bYahoo! Research, Sunnyvale, California, USA

ARTICLE HISTORY

Compiled October 5, 2023

ABSTRACT

We study the personalized federated learning problem under asynchronous updates.

In this problem, each client seeks to obtain a personalized model that simultaneously

outperforms local and global models. We consider two optimization-based frame-

works for personalization: (i) Model-Agnostic Meta-Learning (MAML) and (ii) Moreau

Envelope (ME). MAML involves learning a joint model adapted for each client through

ﬁne-tuning, whereas ME requires a bi-level optimization problem with implicit gra-

dients to enforce personalization via regularized losses. We focus on improving the

scalability of personalized federated learning by removing the synchronous commu-

nication assumption. Moreover, we extend the studied function class by removing

boundedness assumptions on the gradient norm. Our main technical contribution is

a uniﬁed proof for asynchronous federated learning with bounded staleness that we

apply to MAML and ME personalization frameworks. For the smooth and non-convex

functions class, we show the convergence of our method to a ﬁrst-order stationary

point. We illustrate the performance of our method and its tolerance to staleness

through experiments for classiﬁcation tasks over heterogeneous datasets.

KEYWORDS

Federated Learning; Personalization; Asynchronous Communication;

Heterogeneous Data; Distributed Optimization; Staleness.

1. Introduction

Federated Learning (FL) is designed to facilitate distributed training of machine learn-

ing models across devices by exploiting the data and computation power available to

them [38]. A major beneﬁt of FL is its ability to allow training models on data dis-

tributed across multiple devices without centralization. This is particularly beneﬁcial

in situations with limited sensitive data [32, 33] where clients are reluctant to share

their private data. At the same time, it is known that training over a larger set of data

points improves the quality of the obtained model [68]. In such scenarios, FL enjoys

the power of collaborative learning without relocating the data from its original source

[32]. Nevertheless, FL poses challenges such as data heterogeneity (statistical diversity

among clients) [11, 17, 34, 42], fairness [14, 43], privacy [24, 30, 55, 69], unreliable

communication [60, 64], and staleness [3, 44, 53, 70].

This work was partly done while MTT interning at Yahoo! Research. Part of this material is based upon

work supported by the National Science Foundation under Grants #2211815 and #2213568. Corresponding

Author’s Email: mttoghani@rice.edu

arXiv:2210.01176v2 [cs.LG] 4 Oct 2023

The common underlying assumption that determines the superiority of FL to indi-

vidual local training is that the data points of all clients are coming from the same

distribution, i.e., homogeneous data across clients. Consequently, FL can improve the

quality of empirical loss minimization when data available on each device is limited;

otherwise, each client may obtain a proper model without collaboration or commu-

nication with others. Therefore, FL1results in a common global model with better

generalization across clients [46] compared to individual training. In heterogeneous

data setups where clients hold samples from non-identical data distributions, a com-

mon (global) model may perform poorly on the local data points of each client. For

instance, consider the next word prediction task on a smart keyboard [28], where each

client has a unique writing style or emphasis on the vocabulary domain. In this exam-

ple, the corresponding mobile application is supposed to suggest a set of words that

will likely be selected as the next word in the sentence. This scenario clearly states a

case with a heterogeneous data setup with a limited sample on each client’s device.

Thus, if each client trains a model independently, without collaboration with the other

clients, the model will likely perform poorly on the new data due to sample limitation.

Hence, the question arises about what will occur if the clients hold data samples from

similar (but not identical) distributions.

In FL with heterogeneous data, an ideal scenario is to learn a globally common

model easily adaptable to local data on each client, i.e., model fusion. This approach

is known as Personalized Federated Learning (PFL), which strives to exploit both the

shared and unshared information from the data of all clients. A solution to the model

fusion in PFL is to apply transfer learning [12, 73] (e.g., ﬁne-tuning) on a jointly

trained model under FL. Interestingly, the centralized version of this problem has

been extensively studied in Meta-Learning [66] and Multi-Task Learning [50], where

the goal is to obtain a meta (global) model that with (potentially) minimal adap-

tation performs well on multiple tasks. Particularly, Model-Agnostic Meta-Learning

(MAML) [21, 56] proposes an optimization-based formulation that aims to ﬁnd an initial

meta-model with proper performance after applying one or a few steps of (stochastic)

gradient descent. The key property of MAML is its ability to gauge ﬁne-tuning during

the learning process. Multiple studies have been conducted on the convergence and

generalization of MAML [8, 16, 18, 19, 22, 31] for various problems and setups. Fallah

et al. [17] suggest the MAML formulation as a potential solution for PFL, and propose

Per-FedAvg algorithm for collaborative learning with MAML personalized cost function.

Dinh et al. [13] present pFedMe algorithm for PFL via adopting a diﬀerent formulation

for personalization, namely Moreau Envelopes (ME). The proposed algorithm is a joint

bi-level optimization problem with personalized parameters which are regularized to

be close to the global model. We will elaborate on these two formulations (MAML &ME)

in Section 2. Additionally, several recent works have approached PFL mainly through

optimization-based [5, 10, 20, 23, 26, 27, 29, 45, 46, 64, 72], or structure-based [9, 59, 65]

techniques.

Scalability to large-scale setups with potentially many clients is another major chal-

lenge for FL. The proposed algorithms in this scheme, mostly require synchronous com-

munications between the server and clients [10, 13, 17, 23, 38, 43, 47]. Such constraints

impose considerable delays on the learning progress, since increasing the concurrency

in synchronous updates decreases the training speed and quality. For example, lim-

ited communication bandwidth, computation power, and communication failures incur

large delays in the training process. In cross-device FL, devices are naturally prone to

1We refer to Federated Learning with no personalization as FL.

update and communicate models under less restrictive rules, whereas clients may ap-

ply updates in an asynchronous fashion, i.e., staleness. Hogwild! [52] is one of the

ﬁrst eﬀorts to model asynchrony in distributed setup with delayed updates. Multi-

ple works have studied asynchronous training under diﬀerent setups and assumptions

[1, 4, 6, 15, 44, 49, 53]. Speciﬁcally, some recent seminal works have studied the conver-

gence of asynchronous SGD-based methods, and show their convergence under certain

assumptions on maximum or average delay [2, 37, 48, 61].2In decentralized setups,

Hadjicostis et al. [25] propose a consensus algorithm called running-sum, which is

robust to message losses. Furthermore, Olshevsky et al. [54] present a more general

framework with robustness to asynchrony, message losses, and delays for both consen-

sus and optimization problems [60, 64]. More closely, FL under stale updates has been

thoroughly studied in [3, 40, 51, 57, 65, 70]. Particularly, Tziotis et al. [65] studies

the existence of stragglers in PFL via shared representations, i.e., system and data

heterogeneity in structure-based personalization.

The main contribution of paper [51] is on the server algorithm, where this paper

proposes a more secure and robust algorithm by aggregating a buﬀer of asynchronous

updates within a secure channel prior to sending them to the server. Whereas, our

work focuses on scalability and personalization via asynchronous communication and

learning personalized models.

In this work, we study the PFL problem under asynchronous communications to

improve training concurrency, performance, and eﬃciency. We propose the PersA-FL

algorithm, a novel personalized & asynchronous method that jointly addresses the

heterogeneity and staleness in FL. We develop a technique based on asynchronous

updates to resolve the communication bottleneck imposed by synchronized learning

in PFL, where we improve the training scalability and performance. To the best of our

knowledge, this is the ﬁrst study on the intersection of staleness and personalization

through the lens of optimization-based techniques. We summarize our contributions

as follows:

•Through the integration of two personalization formulations, MAML &ME, we

propose PersA-FL, an algorithm that allows personalized federated learning

under asynchronous communications between the server and clients. Our pro-

posed method consists of two algorithms from the perspectives of the server

and clients. We present the client algorithm under three diﬀerent options for

the local updates, each addressing a separate formulation, (A) FedAsync,(B)

PersA-FL-MAML, and (C) PersA-FL-ME.

•We present a new convergence analysis for Asynchronous Federated Learning

(FedAsync) under smooth non-convex cost functions by removing the bounded-

ness assumption from the gradient norm. Our analysis assumes bounded variance

of stochasticity and heterogeneity, and bounded maximum delay. Hence, we im-

prove the existing theory by extending the result to a broader function class,

i.e., unbounded gradient norm.

•We show the convergence rate of PersA-FL-MAML based on the maximum delay

and personalization budget under the same assumptions as Fallah et al. [17].3

We highlight the impact of batch size in the biased stochastic estimation of

the full gradients for the MAML cost. We present the communication and sample

2Mishchenko et al. [48] studies the convergence of distributed optimization for homogeneous strongly convex

and smooth functions with no assumptions on maximum delay, i.e., unbounded staleness.

3Besides the assumptions for FedAsync, seminal works [17, 22, 56] assume second-order Lipschitzness, bounded

variance, and bounded gradient in the analysis of MAML cost functions.

complexity to ﬁnd an εﬁrst-order stationary point for the proposed algorithm.

•We prove the convergence of PersA-FL-ME with no boundedness assumption on

the gradient norm. We discuss the connection of convergence rate to the gradient

estimation error and level of personalization. Compared to [13], we show an

explicit dependence of convergence rate to the estimation error. Moreover, we

relax the heterogeneity assumption in [13] allowing bounded population diversity

instead of uniformly bounded heterogeneity. We determine the communication

and local inexact solver complexity to ﬁnd an εﬁrst-order stationary point for

this method.

•We present numerical experiments evaluating our proposed algorithm on hetero-

geneous MNIST and CIFAR10 with unbalanced distributions across the clients.

We illustrate the advantages of our method in terms of performance and scala-

bility to varying delays in setups with heterogeneity.

Table 1 illustrates the properties of our proposed method and provides a comparison

between our algorithm and underlying analysis with related seminal works. As shown

in this table, building upon the results in [13, 17], we extend the capability of FL to

staleness. Table 1 also contains the convergence results for our proposed algorithm,

which we will discuss in more details in Section 4.

The main diﬀerence between our method and the works in [51, 70] mainly lies in

the client algorithm, where we consider three options (A,B, and C) for updating the

parameters locally. Option A is similar to the client algorithm in [51, 70], but we

improve the theoretical convergence results by removing the assumption on bounded

gradients for this setup. Option B and Option C, along with the server algorithm are

novel methods for personalized asynchronous federated learning. Nguyen et al. [51]

characterize the server algorithm with a secure and robust update aggregation and

[63] enhances its theoretical properties. Study of secure aggregation on the server side

remains as a future direction for this work.

The remainder of this paper is organized as follows. In Section 2, we introduce the

PFL setup and discuss the asynchronous communication framework between the server

and clients. In Section 3, we describe our algorithm, PersA-FL, for PFL under staleness.

In Section 4, we state the convergence result for our proposed algorithm along with the

underlying assumptions and technical lemmas. We present the numerical experiments

in Section 5. We ﬁnally end by concluding remarks in Section 6.

2. Problem Setup & Background

In this section, we ﬁrst present the formal problem setup for FL [47], as well as the

personalization formulations in MAML [17] and ME [13]. Then, we discuss the underlying

communication setting under asynchronous updates.

2.1. Federated Learning Problem Setup

We consider a set of nclients and one server, where each client i∈[n] holds a private

function fi:Rd→R, and the goal is to collaboratively obtain a model w∈Rdthat

Table 1. A comparison of related federated learning methods with convergence guarantees for smooth non-

convex functions. Parameters τ,α,ν, and brespectively denote the maximum delay, MAML personalization

stepsize, ME inexact gradient estimation error, batch size.

Algorithm & Reference

Personalized

Cost

Asynchronous

Updates

Unbounded

Gradient

Convergence Rate

McMahan et al. [47] ✗ ✗ - No Analysis

FedAvg Yu et al. [71] ✗ ✗ ✗ O1

√T

Wang et al. [67] ✗ ✗ ✓O1

√T

FedAsync

Xie et al. [70] ✗✓✗O1

√T+Oτ2

T

This Work ✗✓ ✓ O1

√T+Oτ2

T

FedBuff

Nguyen et al. [51] ✗✓✗O1

√T+Oτ2

T

Toghani and Uribe [63] ✗✓ ✓ O1

√T+Oτ2

T

Per-FedAvg Fallah et al. [17] ✓✗ ✗ O1

√T+Oα2

b

pFedMe Dinh et al. [13] ✓✗✓O1

√T+Oλ2(1

b+ν2)

(λ−L)2

PersA-FL-MAML This Work ✓✓✗O1

√T+Oτ2

T+Oα2

b

PersA-FL-ME This Work ✓✓ ✓ O1

√T+Oτ2

T+Oλ2

(λ−L)2ν2

minimizes the local cost functions on average, as follows:

min

w∈Rdf(w):=1

i=1

fi(w),

with fi(w):=EΞi∼pi[ℓi(w, Ξi)],

(1)

where ℓi:Rd×Si→Ris a cost function that determines the prediction error of some

model w∈Rdover a single data point ξi∈ Sion client i, where ξiis a realization

of Ξi∼pi, i.e., piis the client i’s data distribution over Si, for i∈[n]. In the above

deﬁnition, fi(·) is the local cost function of client i, and f(·) denotes the global cost

function, i.e., average loss. For instance, in a supervised learning setup with Zi:=

Xi×Yi, we have ℓi(w, ξi) as the prediction cost of some learning model parameterized

by wfor sample ξi= (x, y), where x∈ Xiand y∈ Yi. Let Dibe a data batch with

samples independently drawn from the distribution pi. Then, the unbiased stochastic

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PersA-FL:PersonalizedAsynchronousFederatedLearningMohammadTahaToghania,SoominLeeb,C´esarA.UribeaaDepartmentofElectricalandComputerEngineering,RiceUniversity,Houston,TX,USAbYahoo!Research,Sunnyvale,California,USAARTICLEHISTORYCompiledOctober5,2023ABSTRACTWestudythepersonalizedfederatedlearningproblem...

展开>> 收起<<

PersA-FL Personalized Asynchronous Federated Learning Mohammad Taha Toghania Soomin Leeb C esar A. Uribea aDepartment of Electrical and Computer Engineering Rice University Houston TX USA.pdf

共41页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PersA-FL Personalized Asynchronous Federated Learning Mohammad Taha Toghania Soomin Leeb C esar A. Uribea aDepartment of Electrical and Computer Engineering Rice University Houston TX USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: