
DyLoRA: Parameter-Efficient Tuning of Pretrained Models using
Dynamic Search-Free Low Rank Adaptation
Mojtaba Valipour1,2Mehdi Rezagholizadeh2Ivan Kobyzev2Ali Ghodsi1
{mojtaba.valipour, ali.ghodsi}@uwaterloo.ca, {mehdi.rezagholizadeh, ivan.kobyzev}@huawei.com
1: University of Waterloo, 2: Huawei Noah’s Ark Lab
Abstract
With the ever-growing size of pretrained mod-
els (PMs), fine-tuning them has become more
expensive and resource-hungry. As a remedy,
low-rank adapters (LoRA) keep the main pre-
trained weights of the model frozen and just
introduce some learnable truncated SVD mod-
ules (so-called LoRA blocks) to the model.
While LoRA blocks are parameter-efficient,
they suffer from two major problems: first,
the size of these blocks is fixed and cannot
be modified after training (for example, if we
need to change the rank of LoRA blocks, then
we need to re-train them from scratch); sec-
ond, optimizing their rank requires an exhaus-
tive search and effort. In this work, we in-
troduce a dynamic low-rank adaptation (Dy-
LoRA) technique to address these two prob-
lems together. Our DyLoRA method trains
LoRA blocks for a range of ranks instead
of a single rank by sorting the representa-
tion learned by the adapter module at different
ranks during training. We evaluate our solu-
tion on different natural language understand-
ing (GLUE benchmark) and language genera-
tion tasks (E2E, DART and WebNLG) using
different pretrained models such as RoBERTa
and GPT with different sizes. Our results show
that we can train dynamic search-free models
with DyLoRA at least 4 to 7 times (depending
to the task) faster than LoRA without signifi-
cantly compromising performance. Moreover,
our models can perform consistently well on a
much larger range of ranks compared to LoRA.
1
1 Introduction
Pre-training/fine-tuning has become a popular
paradigm for solving many tasks in natural lan-
guage processing (NLP) (Devlin et al.,2018;Liu
et al.,2019;Brown et al.,2020) and Computer Vi-
sion (Simonyan and Zisserman,2014;He et al.,
2016;Howard et al.,2019;Bochkovskiy et al.,
1github.com/huawei-noah/KD-NLP/tree/main/DyLoRA
2020;Chen et al.,2020;Dosovitskiy et al.,2020).
pretrained models (PMs) such as pretrained lan-
guage models (PLMs) (Devlin et al.,2018;Brown
et al.,2020), and pretrained visual-language mod-
els (Lu et al.,2019;Li et al.,2019;Su et al.,2019;
Xia et al.,2021) have advanced a lot in recent years.
With the ever-growing size of these pretrained mod-
els, fine-tuning them on downstream tasks becomes
more expensive. Moreover, as the ratio of the num-
ber of parameters of models with respect to the
labeled data increases, the fine-tuning process will
be more prone to overfitting (Karimi Mahabadi
et al.,2021). There are two categories of solutions:
first, model compression (Jafari et al.,2021;Chen
et al.,2021); second, parameter-efficient tuning
(PET) (Houlsby et al.,2019a;Karimi Mahabadi
et al.,2021;Mao et al.,2021).
There are many different model compression
techniques in the literature for Transformer-based
models such as matrix factorization (Noach and
Goldberg,2020;Tahaei et al.,2021), prun-
ing (Wang et al.,2019), quantization (Tao et al.,
2022;Prato et al.,2020), and knowledge distilla-
tion (Hinton et al.,2015;Li et al.,2021;Jafari et al.,
2021;Passban et al.,2021;Rashid et al.,2021).
There are also different types of PET techniques
in the literature such as low-rank adapters (Wang
et al.,2020;Karimi Mahabadi et al.,2021;Houlsby
et al.,2019b;Hu et al.,2021b), and prompt-based
techniques (Lester et al.,2021).
Although model compression solutions are well-
established in recent years in the literature, apply-
ing them to large language models can be very
costly, because compression techniques usually
need to train (or fine-tune) the original large model.
A case in point is knowledge distillation which re-
lies on fine-tuning a large teacher model or even
pre-training the student model as suggested in (Jiao
et al.,2019). Moreover, using compression tech-
niques usually leads to degrading the model perfor-
mance. PETs can be alternatives to the compres-
arXiv:2210.07558v2 [cs.CL] 19 Apr 2023