Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning 1stGuangyu Sun

2025-04-27 0 0 1.13MB 10 页 10玖币
侵权投诉
Exploring Parameter-Efficient Fine-Tuning to
Enable Foundation Models in Federated Learning
1st Guangyu Sun
Center for Research in Computer Vision
University of Central Florida
Orlando, FL, USA
guangyu.sun@ucf.edu
2nd Umar Khalid
Center for Research in Computer Vision
University of Central Florida
Orlando, FL, USA
umar.khalid@ucf.edu
3rd Matias Mendieta
Center for Research in Computer Vision
University of Central Florida
Orlando, FL, USA
matias.mendieta@ucf.edu
4th Pu Wang
Department of Computer Science
University of North Carolina at Charlotte
Charlotte, NC, USA
pu.wang@uncc.edu
5th Chen Chen
Center for Research in Computer Vision
University of Central Florida
Orlando, FL, USA
chen.chen@crcv.ucf.edu
Abstract—Federated learning (FL) has emerged as a promising
paradigm for enabling the collaborative training of models
without centralized access to the raw data on local devices. In the
typical FL paradigm (e.g., FedAvg), model weights are sent to and
from the server each round to participating clients. Recently, the
use of small pre-trained models has been shown to be effective
in federated learning optimization and improving convergence.
However, recent state-of-the-art pre-trained models are getting
more capable but also have more parameters, known as the
“Foundation Models.” In conventional FL, sharing the enormous
model weights can quickly put a massive communication burden
on the system, especially if more capable models are employed.
Can we find a solution to enable those strong and readily available
pre-trained models in FL to achieve excellent performance while
simultaneously reducing the communication burden? To this
end, we investigate the use of parameter-efficient fine-tuning
in federated learning and thus introduce a new framework:
FedPEFT. Specifically, we systemically evaluate the performance
of FedPEFT across a variety of client stability, data distribution,
and differential privacy settings. By only locally tuning and
globally sharing a small portion of the model weights, significant
reductions in the total communication overhead can be achieved
while maintaining competitive or even better performance in a
wide range of federated learning scenarios, providing insight into
a new paradigm for practical and effective federated systems.
Index Terms—federated learning, parameter-efficient fine-
tuning, vision transformers, image classification, action recog-
nition
I. INTRODUCTION
Federated learning (FL) [1] has become increasingly preva-
lent in the research community, having the goal of enabling
collaborative training with a network of clients without need-
ing to share any private data. One key challenge for this
training paradigm is overcoming data heterogeneity. The par-
ticipating devices in a federated system are often deployed
across a variety of users and environments, resulting in a
non-IID data distribution. As the level of heterogeneity in-
tensifies, optimization becomes increasingly difficult. Various
techniques have been proposed for alleviating this issue. These
primarily consist of modifications to the local or global ob-
jectives through proximal terms, regularization, and improved
aggregation operations [2, 3, 4, 5, 6]. More recently, some
works have investigated the role of model initialization in
mitigating such effects [7, 8]. Inspired by the common usage
of pre-trained models for facilitating strong transfer learning
in centralized training, researchers employed widely available
pre-trained weights for initialization in FL and were able
to close much of the gap between federated and centralized
performance.
Still, while pre-trained initializations are effective for alle-
viating heterogeneity effects in FL, another key challenge is
left unaddressed; that is, communication constraints. This is
often the primary bottleneck for real-world federated systems
[9]. In the standard FL framework [10], updates for all
model parameters are sent back and forth between the server
and participating clients each round. This can quickly put a
massive communication burden on the system, especially if
more capable models beyond very small MLPs are used.
When employing strong pre-trained models, the number
of parameters can be large, such as for current state-of-the-
art transformers. For example, ViT-Base (ViT-B) [11] has 84
million parameters, let alone the current significant progress
in large foundation models (e.g., GPT-4 [12] has more than
1 trillion parameters). Those large models would simply
exacerbate the communication overhead to insurmountable
levels. As a compromise, most existing FL work focuses on
the performance of smaller Convolutional Neural Networks
(e.g., ResNet [13]) on smaller datasets (e.g., CIFAR-10 [14],
EMINIST [15]). Considering the thriving progress in large
pre-trained Foundation Models [16], an efficient framework
enabling these large pre-trained models will be significant for
the FL community.
Based on the previous study on centralized training [17,
18, 19, 20], we note that pre-trained models have strong
representations, and updating all the weights during fine-tuning
arXiv:2210.01708v5 [cs.LG] 24 Dec 2024
(b) FedPEFT
~0.68MB
···
Local Training Aggregation
Download
Upload
Upload
Download
Server
Client M Local Training
Client 1
Frozen Weights
Trainable Weights
(a) Conventional Federated Learning
Client 1
~328MB
···
Local Training Aggregation
Download
Client M Local Training
Upload
Upload
Download
Server
Fig. 1: Process in a federated learning communication round
with Mparticipating clients. We use ViT-Base as an instance
to analyze the communication costs. (a) Conventional federated
learning framework, where the entire model will be sent during
the communication. (b) FedPEFT, which is our proposed parameter-
efficient framework for federated learning.
is often not necessary. Various parameter-efficient fine-tuning
methods (e.g., fine-tuning only a subset of the parameters or
the bias terms) for centralized training have been proposed in
the literature and show that successful and efficient adaptation
is possible, even under domain shift [18, 17, 19]. We reason
that such insight is applicable to FL, where each client can be
thought of as a shifted domain on which we are fine-tuning. By
leveraging pre-trained weights, it may be possible to simply
update a small portion of the weights for each client. This will
significantly reduce the communication burden on the system,
as the updates communicated with the server will consist of
just a fraction of the total model parameters.
Can we reap these potential communication benefits while
still achieving strong performance in FL? Unfortunately, op-
erating conditions in FL are difficult, requiring successful
convergence under varying data heterogeneity levels, random
client availability, and differential privacy procedures. There-
fore, we are unable to properly assess this possibility of
benefit based on existing literature, as diverse parameter-
efficient fine-tuning methods have not been systematically
explored in such situations in FL. To fill this gap, we explore
the viability of a Federated Parameter-Efficient Fine-Tuning
(FedPEFT) framework with a systemic empirical study on a
comprehensive set of FL scenarios including communication
analysis about communication cost for each method to enable
pre-trained models, capability analysis of each method with
unlimited communication budget, and robustness analysis of
each method when additional constraints (i.e., differential
privacy or data scarcity) applied. The framework is illustrated
in Fig. 1. We deploy parameter-efficient fine-tuning methods
to adapt pre-trained models and enable massive reductions in
communication overheads.
The contribution of this paper is summarized as follows:
We explored several PEFT methods in FL as the FedPEFT
framework to simultaneously addresses data heterogeneity
and communication challenges. FedPEFT allows for the uti-
lization of powerful pre-trained models in federated learning
while keeping communication costs extremely low.
We present a systematic study of the FedPEFT framework
with various fine-tuning methods under heterogeneous data
distributions, client availability ratios, and increasing de-
grees of domain gap relative to the pre-trained represen-
tations on both image and video domains, showing the
capability of FedPEFT. (Sections IV-B and IV-C)
To ensure FedPEFT is practical for the complex environ-
ments of FL, we further analyze the robustness of FedPEFT
among low-data regimes and differential privacy operations.
(Sections IV-D)
II. RELATED WORK
Federated Learning. FL is a decentralized training paradigm
composed of two procedures: local training and global aggre-
gation. Therefore, most existing work focuses on either local
training [4, 21, 2] or global aggregation [22, 23] to learn a bet-
ter global model. Another line of work cuts into this problem
by applying different initializations to help both procedures.
[8] shows that initializing the model with pre-trained weights
can make the global aggregation of FedAvg more stable, even
when pre-trained with synthetic data. Furthermore, [7] presents
the effectiveness of pre-training with different local and global
operations. However, these works focus purely on the effect
of initialization in a standard FedAvg framework and do not
consider the communication constraints of the system. Our
work pushes the envelope further by leveraging strong pre-
trained models (even large, capable transformers) in FL while
effectively handling the communication issue via parameter-
efficient fine-tuning.
Communication in Federated Learning. Communication
constraints are a primary bottleneck in federated learning.
To reduce the communication cost, several previous work
leverage model compression techniques [24, 25]. Such works
do not change the training paradigm but rather post-process
the local model to reduce communication costs. For instance,
[24] proposes approaches that parameterize the model with
fewer variables and compress the model in an encoding-
decoding fashion. However, the minimal requirement to main-
tain all the information is still high when facing today’s
large models. Meanwhile, another line of work changes the
training paradigm by learning federated ensembles based on
several pre-trained base models [26]. In this way, only the
mixing weights of the base models will be communicated
in each round. This approach aims to reduce the burden of
downloading and uploading the entire model in each round.
However, the base models are not directly trained, and the final
performance is highly related to the base models. Meanwhile,
model ensembles will take more time and space, which is often
limited on the client side. Our framework follows the strategy
of this line of work that does not transmit the entire model,
but we use only one pre-trained model instead of several base
摘要:

ExploringParameter-EfficientFine-TuningtoEnableFoundationModelsinFederatedLearning1stGuangyuSunCenterforResearchinComputerVisionUniversityofCentralFloridaOrlando,FL,USAguangyu.sun@ucf.edu2ndUmarKhalidCenterforResearchinComputerVisionUniversityofCentralFloridaOrlando,FL,USAumar.khalid@ucf.edu3rdMatia...

展开>> 收起<<
Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning 1stGuangyu Sun.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:10 页 大小:1.13MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注