Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning 1stGuangyu Sun

2025-04-27 0 0 1.13MB 10 页 10玖币

侵权投诉

Exploring Parameter-Efﬁcient Fine-Tuning to

Enable Foundation Models in Federated Learning

1st Guangyu Sun

Center for Research in Computer Vision

University of Central Florida

Orlando, FL, USA

guangyu.sun@ucf.edu

2nd Umar Khalid

Center for Research in Computer Vision

University of Central Florida

Orlando, FL, USA

umar.khalid@ucf.edu

3rd Matias Mendieta

Center for Research in Computer Vision

University of Central Florida

Orlando, FL, USA

matias.mendieta@ucf.edu

4th Pu Wang

Department of Computer Science

University of North Carolina at Charlotte

Charlotte, NC, USA

pu.wang@uncc.edu

5th Chen Chen

Center for Research in Computer Vision

University of Central Florida

Orlando, FL, USA

chen.chen@crcv.ucf.edu

Abstract—Federated learning (FL) has emerged as a promising

paradigm for enabling the collaborative training of models

without centralized access to the raw data on local devices. In the

typical FL paradigm (e.g., FedAvg), model weights are sent to and

from the server each round to participating clients. Recently, the

use of small pre-trained models has been shown to be effective

in federated learning optimization and improving convergence.

However, recent state-of-the-art pre-trained models are getting

more capable but also have more parameters, known as the

“Foundation Models.” In conventional FL, sharing the enormous

model weights can quickly put a massive communication burden

on the system, especially if more capable models are employed.

Can we ﬁnd a solution to enable those strong and readily available

pre-trained models in FL to achieve excellent performance while

simultaneously reducing the communication burden? To this

end, we investigate the use of parameter-efﬁcient ﬁne-tuning

in federated learning and thus introduce a new framework:

FedPEFT. Speciﬁcally, we systemically evaluate the performance

of FedPEFT across a variety of client stability, data distribution,

and differential privacy settings. By only locally tuning and

globally sharing a small portion of the model weights, signiﬁcant

reductions in the total communication overhead can be achieved

while maintaining competitive or even better performance in a

wide range of federated learning scenarios, providing insight into

a new paradigm for practical and effective federated systems.

Index Terms—federated learning, parameter-efﬁcient ﬁne-

tuning, vision transformers, image classiﬁcation, action recog-

nition

I. INTRODUCTION

Federated learning (FL) [1] has become increasingly preva-

lent in the research community, having the goal of enabling

collaborative training with a network of clients without need-

ing to share any private data. One key challenge for this

training paradigm is overcoming data heterogeneity. The par-

ticipating devices in a federated system are often deployed

across a variety of users and environments, resulting in a

non-IID data distribution. As the level of heterogeneity in-

tensiﬁes, optimization becomes increasingly difﬁcult. Various

techniques have been proposed for alleviating this issue. These

primarily consist of modiﬁcations to the local or global ob-

jectives through proximal terms, regularization, and improved

aggregation operations [2, 3, 4, 5, 6]. More recently, some

works have investigated the role of model initialization in

mitigating such effects [7, 8]. Inspired by the common usage

of pre-trained models for facilitating strong transfer learning

in centralized training, researchers employed widely available

pre-trained weights for initialization in FL and were able

to close much of the gap between federated and centralized

performance.

Still, while pre-trained initializations are effective for alle-

viating heterogeneity effects in FL, another key challenge is

left unaddressed; that is, communication constraints. This is

often the primary bottleneck for real-world federated systems

[9]. In the standard FL framework [10], updates for all

model parameters are sent back and forth between the server

and participating clients each round. This can quickly put a

massive communication burden on the system, especially if

more capable models beyond very small MLPs are used.

When employing strong pre-trained models, the number

of parameters can be large, such as for current state-of-the-

art transformers. For example, ViT-Base (ViT-B) [11] has 84

million parameters, let alone the current signiﬁcant progress

in large foundation models (e.g., GPT-4 [12] has more than

1 trillion parameters). Those large models would simply

exacerbate the communication overhead to insurmountable

levels. As a compromise, most existing FL work focuses on

the performance of smaller Convolutional Neural Networks

(e.g., ResNet [13]) on smaller datasets (e.g., CIFAR-10 [14],

EMINIST [15]). Considering the thriving progress in large

pre-trained Foundation Models [16], an efﬁcient framework

enabling these large pre-trained models will be signiﬁcant for

the FL community.

Based on the previous study on centralized training [17,

18, 19, 20], we note that pre-trained models have strong

representations, and updating all the weights during ﬁne-tuning

arXiv:2210.01708v5 [cs.LG] 24 Dec 2024

(b) FedPEFT

~0.68MB

···

Local Training Aggregation

Download

Upload

Download

Server

Client M Local Training

Client 1

Frozen Weights

Trainable Weights

(a) Conventional Federated Learning

Client 1

~328MB

···

Local Training Aggregation

Download

Client M Local Training

Upload

Download

Server

Fig. 1: Process in a federated learning communication round

with Mparticipating clients. We use ViT-Base as an instance

to analyze the communication costs. (a) Conventional federated

learning framework, where the entire model will be sent during

the communication. (b) FedPEFT, which is our proposed parameter-

efﬁcient framework for federated learning.

is often not necessary. Various parameter-efﬁcient ﬁne-tuning

methods (e.g., ﬁne-tuning only a subset of the parameters or

the bias terms) for centralized training have been proposed in

the literature and show that successful and efﬁcient adaptation

is possible, even under domain shift [18, 17, 19]. We reason

that such insight is applicable to FL, where each client can be

thought of as a shifted domain on which we are ﬁne-tuning. By

leveraging pre-trained weights, it may be possible to simply

update a small portion of the weights for each client. This will

signiﬁcantly reduce the communication burden on the system,

as the updates communicated with the server will consist of

just a fraction of the total model parameters.

Can we reap these potential communication beneﬁts while

still achieving strong performance in FL? Unfortunately, op-

erating conditions in FL are difﬁcult, requiring successful

convergence under varying data heterogeneity levels, random

client availability, and differential privacy procedures. There-

fore, we are unable to properly assess this possibility of

beneﬁt based on existing literature, as diverse parameter-

efﬁcient ﬁne-tuning methods have not been systematically

explored in such situations in FL. To ﬁll this gap, we explore

the viability of a Federated Parameter-Efﬁcient Fine-Tuning

(FedPEFT) framework with a systemic empirical study on a

comprehensive set of FL scenarios including communication

analysis about communication cost for each method to enable

pre-trained models, capability analysis of each method with

unlimited communication budget, and robustness analysis of

each method when additional constraints (i.e., differential

privacy or data scarcity) applied. The framework is illustrated

in Fig. 1. We deploy parameter-efﬁcient ﬁne-tuning methods

to adapt pre-trained models and enable massive reductions in

communication overheads.

The contribution of this paper is summarized as follows:

•We explored several PEFT methods in FL as the FedPEFT

framework to simultaneously addresses data heterogeneity

and communication challenges. FedPEFT allows for the uti-

lization of powerful pre-trained models in federated learning

while keeping communication costs extremely low.

•We present a systematic study of the FedPEFT framework

with various ﬁne-tuning methods under heterogeneous data

distributions, client availability ratios, and increasing de-

grees of domain gap relative to the pre-trained represen-

tations on both image and video domains, showing the

capability of FedPEFT. (Sections IV-B and IV-C)

•To ensure FedPEFT is practical for the complex environ-

ments of FL, we further analyze the robustness of FedPEFT

among low-data regimes and differential privacy operations.

(Sections IV-D)

II. RELATED WORK

Federated Learning. FL is a decentralized training paradigm

composed of two procedures: local training and global aggre-

gation. Therefore, most existing work focuses on either local

training [4, 21, 2] or global aggregation [22, 23] to learn a bet-

ter global model. Another line of work cuts into this problem

by applying different initializations to help both procedures.

[8] shows that initializing the model with pre-trained weights

can make the global aggregation of FedAvg more stable, even

when pre-trained with synthetic data. Furthermore, [7] presents

the effectiveness of pre-training with different local and global

operations. However, these works focus purely on the effect

of initialization in a standard FedAvg framework and do not

consider the communication constraints of the system. Our

work pushes the envelope further by leveraging strong pre-

trained models (even large, capable transformers) in FL while

effectively handling the communication issue via parameter-

efﬁcient ﬁne-tuning.

Communication in Federated Learning. Communication

constraints are a primary bottleneck in federated learning.

To reduce the communication cost, several previous work

leverage model compression techniques [24, 25]. Such works

do not change the training paradigm but rather post-process

the local model to reduce communication costs. For instance,

[24] proposes approaches that parameterize the model with

fewer variables and compress the model in an encoding-

decoding fashion. However, the minimal requirement to main-

tain all the information is still high when facing today’s

large models. Meanwhile, another line of work changes the

training paradigm by learning federated ensembles based on

several pre-trained base models [26]. In this way, only the

mixing weights of the base models will be communicated

in each round. This approach aims to reduce the burden of

downloading and uploading the entire model in each round.

However, the base models are not directly trained, and the ﬁnal

performance is highly related to the base models. Meanwhile,

model ensembles will take more time and space, which is often

limited on the client side. Our framework follows the strategy

of this line of work that does not transmit the entire model,

but we use only one pre-trained model instead of several base

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExploringParameter-EfficientFine-TuningtoEnableFoundationModelsinFederatedLearning1stGuangyuSunCenterforResearchinComputerVisionUniversityofCentralFloridaOrlando,FL,USAguangyu.sun@ucf.edu2ndUmarKhalidCenterforResearchinComputerVisionUniversityofCentralFloridaOrlando,FL,USAumar.khalid@ucf.edu3rdMatia...

展开>> 收起<<

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning 1stGuangyu Sun.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning 1stGuangyu Sun

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: