
(b) FedPEFT
~0.68MB
···
Local Training Aggregation
Download
Upload
Upload
Download
Server
Client M Local Training
Client 1
Frozen Weights
Trainable Weights
(a) Conventional Federated Learning
Client 1
~328MB
···
Local Training Aggregation
Download
Client M Local Training
Upload
Upload
Download
Server
Fig. 1: Process in a federated learning communication round
with Mparticipating clients. We use ViT-Base as an instance
to analyze the communication costs. (a) Conventional federated
learning framework, where the entire model will be sent during
the communication. (b) FedPEFT, which is our proposed parameter-
efficient framework for federated learning.
is often not necessary. Various parameter-efficient fine-tuning
methods (e.g., fine-tuning only a subset of the parameters or
the bias terms) for centralized training have been proposed in
the literature and show that successful and efficient adaptation
is possible, even under domain shift [18, 17, 19]. We reason
that such insight is applicable to FL, where each client can be
thought of as a shifted domain on which we are fine-tuning. By
leveraging pre-trained weights, it may be possible to simply
update a small portion of the weights for each client. This will
significantly reduce the communication burden on the system,
as the updates communicated with the server will consist of
just a fraction of the total model parameters.
Can we reap these potential communication benefits while
still achieving strong performance in FL? Unfortunately, op-
erating conditions in FL are difficult, requiring successful
convergence under varying data heterogeneity levels, random
client availability, and differential privacy procedures. There-
fore, we are unable to properly assess this possibility of
benefit based on existing literature, as diverse parameter-
efficient fine-tuning methods have not been systematically
explored in such situations in FL. To fill this gap, we explore
the viability of a Federated Parameter-Efficient Fine-Tuning
(FedPEFT) framework with a systemic empirical study on a
comprehensive set of FL scenarios including communication
analysis about communication cost for each method to enable
pre-trained models, capability analysis of each method with
unlimited communication budget, and robustness analysis of
each method when additional constraints (i.e., differential
privacy or data scarcity) applied. The framework is illustrated
in Fig. 1. We deploy parameter-efficient fine-tuning methods
to adapt pre-trained models and enable massive reductions in
communication overheads.
The contribution of this paper is summarized as follows:
•We explored several PEFT methods in FL as the FedPEFT
framework to simultaneously addresses data heterogeneity
and communication challenges. FedPEFT allows for the uti-
lization of powerful pre-trained models in federated learning
while keeping communication costs extremely low.
•We present a systematic study of the FedPEFT framework
with various fine-tuning methods under heterogeneous data
distributions, client availability ratios, and increasing de-
grees of domain gap relative to the pre-trained represen-
tations on both image and video domains, showing the
capability of FedPEFT. (Sections IV-B and IV-C)
•To ensure FedPEFT is practical for the complex environ-
ments of FL, we further analyze the robustness of FedPEFT
among low-data regimes and differential privacy operations.
(Sections IV-D)
II. RELATED WORK
Federated Learning. FL is a decentralized training paradigm
composed of two procedures: local training and global aggre-
gation. Therefore, most existing work focuses on either local
training [4, 21, 2] or global aggregation [22, 23] to learn a bet-
ter global model. Another line of work cuts into this problem
by applying different initializations to help both procedures.
[8] shows that initializing the model with pre-trained weights
can make the global aggregation of FedAvg more stable, even
when pre-trained with synthetic data. Furthermore, [7] presents
the effectiveness of pre-training with different local and global
operations. However, these works focus purely on the effect
of initialization in a standard FedAvg framework and do not
consider the communication constraints of the system. Our
work pushes the envelope further by leveraging strong pre-
trained models (even large, capable transformers) in FL while
effectively handling the communication issue via parameter-
efficient fine-tuning.
Communication in Federated Learning. Communication
constraints are a primary bottleneck in federated learning.
To reduce the communication cost, several previous work
leverage model compression techniques [24, 25]. Such works
do not change the training paradigm but rather post-process
the local model to reduce communication costs. For instance,
[24] proposes approaches that parameterize the model with
fewer variables and compress the model in an encoding-
decoding fashion. However, the minimal requirement to main-
tain all the information is still high when facing today’s
large models. Meanwhile, another line of work changes the
training paradigm by learning federated ensembles based on
several pre-trained base models [26]. In this way, only the
mixing weights of the base models will be communicated
in each round. This approach aims to reduce the burden of
downloading and uploading the entire model in each round.
However, the base models are not directly trained, and the final
performance is highly related to the base models. Meanwhile,
model ensembles will take more time and space, which is often
limited on the client side. Our framework follows the strategy
of this line of work that does not transmit the entire model,
but we use only one pre-trained model instead of several base