SparseAdapter An Easy Approach for Improving the Parameter-Efficiency of Adapters Shwai He1 4Liang Ding1yDaize Dong4Miao Zhang2Dacheng Tao1 3

2025-05-03 0 0 418.87KB 7 页 10玖币
侵权投诉
SparseAdapter: An Easy Approach for Improving the
Parameter-Efficiency of Adapters
Shwai He1, 4Liang Ding1Daize Dong4Miao Zhang2Dacheng Tao1, 3
1JD Explore Academy
2Aalborg University 3The university of Sydney
4University of Electronic Science and Technology of China
shwai.he@gmail.com,dingliang1@jd.com,dzdong2019@gmail.com,
miaoz@cs.aau.dk,dacheng.tao@gmail.com
Abstract
Adapter Tuning, which freezes the pretrained
language models (PLMs) and only fine-tunes
a few extra modules, has become an appeal-
ing efficient alternative to the full model fine-
tuning. Although computationally efficient,
the recent adapters often increase parame-
ters (e.g. bottleneck dimension) for match-
ing the performance of full model fine-tuning,
which we argue goes against their original
intention. In this work, we re-examine the
parameter-efficiency of adapters through the
lens of network pruning (we name such plug-
in concept as SparseAdapter) and find that
SparseAdapter can achieve comparable or bet-
ter performance than standard adapters when
the sparse ratio reaches up to 80%. Based
on our findings, we introduce an easy but ef-
fective setting “Large-Sparse” to improve the
model capacity of adapters under the same
parameter budget. Experiments on five com-
petitive adapters upon three advanced PLMs
show that with proper sparse method (e.g.
SNIP) and ratio (e.g. 40%) SparseAdapter can
consistently outperform their corresponding
counterpart. Encouragingly, with the Large-
Sparse setting, we can obtain further appeal-
ing gains, even outperforming the full fine-
tuning by a large margin. Our code will be re-
leased at: https://github.com/Shwai-He/
SparseAdapter.
1 Introduction
The “pretrain-finetune” paradigm has become the
de facto standard for the community of natural
language processing (NLP) (Devlin et al.,2019;
Liu et al.,2019). Given a pretrained language
model (PLM), the conventional fine-tuning man-
ner is tuning the entire parameters, i.e., full fine-
tuning, for each downstream task (Devlin et al.,
2019). Considering the ever-increasing size of
Work was done when Shwai was interning at JD Explore
Academy.
Corresponding author
Fine-tuning
LS
S
LS
A
LS
-
fPfeif
S
S
r
Figure 1: Performance of different parameter-efficient
tuning methods on tasks from GLUE benchmark with
RoBERTa-base encoder. We report the performance of
Houlsby Adapters, Pfeiffer Adapters, LoRA as well as
that used in our plug-in method SparseAdapter, where
we denoted the normal sparse (in Table 1and 4) as “S-”
and Large-Sparse (in Table 3) as “LS-” in prefix.
PLMs (Brown et al.,2020), full fine-tuning has
become prohibitively expensive, limiting the appli-
cability of PLMs to a broader range of tasks. Hence,
various parameter-efficient fine-tuning approaches
are explored (Houlsby et al.,2019;Hu et al.,2021;
Zhong et al.,2022), among which Adapter Tuning,
that only tunes the extra light-weighted modules
and keeps the original PLM frozen, has attached
great attention.
Despite the progress, existing adapters match the
performance of full fine-tuning by increasing the
bottleneck dimension (Houlsby et al.,2019;Wang
et al.,2022). This increases the overall parame-
ters and FLOPs, violating the original intention
of adapters. In this work, we turn to investigate
the parameter-efficiency property (the nature of
adapters) to answer the following questions:
1
Whether the current adapters can be further effi-
cient?
2
How can we increase the representation
capacity of adapters within the original parameter
budget?
To this end, we examine the parameter-efficiency
arXiv:2210.04284v5 [cs.CL] 10 Nov 2022
of adapters through the lens of network prun-
ing (Mozer and Smolensky,1989;Janowsky,1989),
which reduces the model size of neural networks by
pruning redundant parameters and training the rest
ones, therefore, improving the network efficiency.
We call such pruned adapters
SparseAdapter
.
Specifically, we systematically investigate five rep-
resentative pruning methods in §2.2 to check at
what sparse ratio can the adapters maintain the
effectiveness. Note that to maintain the efficient
nature of adapters, we prune all adapters at ini-
tialization such that there are no extra computa-
tional costs. We find that
1
SparseAdapter
can achieve comparable (or even better) perfor-
mance than standard adapters when the sparse ratio
reaches up to 80%. Such encouraging performance
could hold even using the random pruning method
(See Figure 2) on GLUE benchmark (Wang et al.,
2018). Based on these insights, we introduce a
frustratingly easy setting, namely Large-Sparse,
for SparseAdapter. We find that
2
Scaling up
the bottleneck dimension of SparseAdapter with a
correspondingly larger sparse ratio (to ensure the
same parameter budget, for example, 2
×
dimen-
sion scaling with 50% sparse ratio) can effectively
yield significant improvement by augmenting the
model capacity.
We validate the concept of our proposed
SparseAdapter upon five advanced adapters, i.e.,
Houlsby (Houlsby et al.,2019), Pfeiffer (Pfeif-
fer et al.,2020b), LoRA (Hu et al.,2021), MAM
Adapter (He et al.,2022) and AdapterFusion (Pfeif-
fer et al.,2021), spanning both natural language
understanding (GLUE and SQuAD) and generation
(XSum) benchmarks. We show that with proper
sparsity, e.g. 40%, SparseAdapter could consis-
tently outperform their correspondingly counter-
part baselines. And with our Large-Sparse setting,
SparseAdapter could even beat the full fine-tuning
method significantly, e.g. 79.6 vs. 79.0 in Fig-
ure 1.
2 Methodology
Motivation.
Adapters are bottleneck modules
plugged in PLMs, with bottleneck dimension
r
and
model dimension
d
. In standard Adapter Tuning,
only adapter layers are trainable while the param-
eters of original parameters are frozen, where the
number of trainable parameters determines the ca-
pacity of adapters. The common recipe to augment
the capacity is to increase the bottleneck dimension,
which requires more computation cost, violating
the original intention of adapters.
To check whether augmenting adapters by in-
creasing the parameters is an optimal choice, we
decide to revisit the nature of adapters, i.e., parame-
ter efficiency, by pruning the redundant parameters.
As shown in Figure 2, randomly pruned adapters
can achieve comparable or even better performance
than standard adapters, which indicates the exis-
tence of redundant parameters. The comparable
performance could even be held under 80% spar-
sity. Such preliminary study urges us to investigate
the research questions
1
and
2
. We decide to
approach them by systematically investigating the
effects of different pruning methods.
Figure 2: The comparison between randomly pruned
adapters and standard adapters on datasets from GLUE.
Figure 3: Schematic comparison of (a) standard adapter
and (b) our proposed SparseAdapter.
Fine-tune
Adapter Fine-tuned
Adapter
(a) Standard Adapter Tuning.
Prune
Initialization
Fine-tune
Adapter SparseAdapter Fine-tuned
SparseAdapter
(b) SparseAdapter Tuning.
2.1 Pruning Adapters at Initialization
As is shown in Figure 3, we intend to prune
out redundant parameters and then fine-tune the
SparseAdapter, instead of directly tuning all pa-
rameters (standard Adapter Tuning). By pruning
adapters at initialization, we can abandon the re-
dundant parameters at the early stage and avoid the
time-consuming iterative pruning process (Fran-
kle and Carbin,2018). Specifically, considering
an adapter with weights
wl
inserted in the layer
摘要:

SparseAdapter:AnEasyApproachforImprovingtheParameter-EfciencyofAdaptersShwaiHe1,4LiangDing1yDaizeDong4MiaoZhang2DachengTao1,31JDExploreAcademy2AalborgUniversity3TheuniversityofSydney4UniversityofElectronicScienceandTechnologyofChinashwai.he@gmail.com,dingliang1@jd.com,dzdong2019@gmail.com,miaoz@cs...

展开>> 收起<<
SparseAdapter An Easy Approach for Improving the Parameter-Efficiency of Adapters Shwai He1 4Liang Ding1yDaize Dong4Miao Zhang2Dacheng Tao1 3.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:418.87KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注