SparseAdapter An Easy Approach for Improving the Parameter-Efﬁciency of Adapters Shwai He1 4Liang Ding1yDaize Dong4Miao Zhang2Dacheng Tao1 3

2025-05-03 0 0 418.87KB 7 页 10玖币

侵权投诉

SparseAdapter: An Easy Approach for Improving the

Parameter-Efﬁciency of Adapters

Shwai He1, 4∗Liang Ding1†Daize Dong4Miao Zhang2Dacheng Tao1, 3

1JD Explore Academy

2Aalborg University 3The university of Sydney

4University of Electronic Science and Technology of China

shwai.he@gmail.com,dingliang1@jd.com,dzdong2019@gmail.com,

miaoz@cs.aau.dk,dacheng.tao@gmail.com

Abstract

Adapter Tuning, which freezes the pretrained

language models (PLMs) and only ﬁne-tunes

a few extra modules, has become an appeal-

ing efﬁcient alternative to the full model ﬁne-

tuning. Although computationally efﬁcient,

the recent adapters often increase parame-

ters (e.g. bottleneck dimension) for match-

ing the performance of full model ﬁne-tuning,

which we argue goes against their original

intention. In this work, we re-examine the

parameter-efﬁciency of adapters through the

lens of network pruning (we name such plug-

in concept as SparseAdapter) and ﬁnd that

SparseAdapter can achieve comparable or bet-

ter performance than standard adapters when

the sparse ratio reaches up to 80%. Based

on our ﬁndings, we introduce an easy but ef-

fective setting “Large-Sparse” to improve the

model capacity of adapters under the same

parameter budget. Experiments on ﬁve com-

petitive adapters upon three advanced PLMs

show that with proper sparse method (e.g.

SNIP) and ratio (e.g. 40%) SparseAdapter can

consistently outperform their corresponding

counterpart. Encouragingly, with the Large-

Sparse setting, we can obtain further appeal-

ing gains, even outperforming the full ﬁne-

tuning by a large margin. Our code will be re-

leased at: https://github.com/Shwai-He/

SparseAdapter.

1 Introduction

The “pretrain-ﬁnetune” paradigm has become the

de facto standard for the community of natural

language processing (NLP) (Devlin et al.,2019;

Liu et al.,2019). Given a pretrained language

model (PLM), the conventional ﬁne-tuning man-

ner is tuning the entire parameters, i.e., full ﬁne-

tuning, for each downstream task (Devlin et al.,

2019). Considering the ever-increasing size of

∗

Work was done when Shwai was interning at JD Explore

Academy.

†Corresponding author

Fine-tuning

fPfeif

Figure 1: Performance of different parameter-efﬁcient

tuning methods on tasks from GLUE benchmark with

RoBERTa-base encoder. We report the performance of

Houlsby Adapters, Pfeiffer Adapters, LoRA as well as

that used in our plug-in method SparseAdapter, where

we denoted the normal sparse (in Table 1and 4) as “S-”

and Large-Sparse (in Table 3) as “LS-” in preﬁx.

PLMs (Brown et al.,2020), full ﬁne-tuning has

become prohibitively expensive, limiting the appli-

cability of PLMs to a broader range of tasks. Hence,

various parameter-efﬁcient ﬁne-tuning approaches

are explored (Houlsby et al.,2019;Hu et al.,2021;

Zhong et al.,2022), among which Adapter Tuning,

that only tunes the extra light-weighted modules

and keeps the original PLM frozen, has attached

great attention.

Despite the progress, existing adapters match the

performance of full ﬁne-tuning by increasing the

bottleneck dimension (Houlsby et al.,2019;Wang

et al.,2022). This increases the overall parame-

ters and FLOPs, violating the original intention

of adapters. In this work, we turn to investigate

the parameter-efﬁciency property (the nature of

adapters) to answer the following questions:

Whether the current adapters can be further efﬁ-

cient?

How can we increase the representation

capacity of adapters within the original parameter

budget?

To this end, we examine the parameter-efﬁciency

arXiv:2210.04284v5 [cs.CL] 10 Nov 2022

of adapters through the lens of network prun-

ing (Mozer and Smolensky,1989;Janowsky,1989),

which reduces the model size of neural networks by

pruning redundant parameters and training the rest

ones, therefore, improving the network efﬁciency.

We call such pruned adapters

SparseAdapter

Speciﬁcally, we systematically investigate ﬁve rep-

resentative pruning methods in §2.2 to check at

what sparse ratio can the adapters maintain the

effectiveness. Note that to maintain the efﬁcient

nature of adapters, we prune all adapters at ini-

tialization such that there are no extra computa-

tional costs. We ﬁnd that

SparseAdapter

can achieve comparable (or even better) perfor-

mance than standard adapters when the sparse ratio

reaches up to 80%. Such encouraging performance

could hold even using the random pruning method

(See Figure 2) on GLUE benchmark (Wang et al.,

2018). Based on these insights, we introduce a

frustratingly easy setting, namely Large-Sparse,

for SparseAdapter. We ﬁnd that

Scaling up

the bottleneck dimension of SparseAdapter with a

correspondingly larger sparse ratio (to ensure the

same parameter budget, for example, 2

dimen-

sion scaling with 50% sparse ratio) can effectively

yield signiﬁcant improvement by augmenting the

model capacity.

We validate the concept of our proposed

SparseAdapter upon ﬁve advanced adapters, i.e.,

Houlsby (Houlsby et al.,2019), Pfeiffer (Pfeif-

fer et al.,2020b), LoRA (Hu et al.,2021), MAM

Adapter (He et al.,2022) and AdapterFusion (Pfeif-

fer et al.,2021), spanning both natural language

understanding (GLUE and SQuAD) and generation

(XSum) benchmarks. We show that with proper

sparsity, e.g. 40%, SparseAdapter could consis-

tently outperform their correspondingly counter-

part baselines. And with our Large-Sparse setting,

SparseAdapter could even beat the full ﬁne-tuning

method signiﬁcantly, e.g. 79.6 vs. 79.0 in Fig-

ure 1.

2 Methodology

Motivation.

Adapters are bottleneck modules

plugged in PLMs, with bottleneck dimension

and

model dimension

. In standard Adapter Tuning,

only adapter layers are trainable while the param-

eters of original parameters are frozen, where the

number of trainable parameters determines the ca-

pacity of adapters. The common recipe to augment

the capacity is to increase the bottleneck dimension,

which requires more computation cost, violating

the original intention of adapters.

To check whether augmenting adapters by in-

creasing the parameters is an optimal choice, we

decide to revisit the nature of adapters, i.e., parame-

ter efﬁciency, by pruning the redundant parameters.

As shown in Figure 2, randomly pruned adapters

can achieve comparable or even better performance

than standard adapters, which indicates the exis-

tence of redundant parameters. The comparable

performance could even be held under 80% spar-

sity. Such preliminary study urges us to investigate

the research questions

and

. We decide to

approach them by systematically investigating the

effects of different pruning methods.

Figure 2: The comparison between randomly pruned

adapters and standard adapters on datasets from GLUE.

Figure 3: Schematic comparison of (a) standard adapter

and (b) our proposed SparseAdapter.

Fine-tune

Adapter Fine-tuned

Adapter

(a) Standard Adapter Tuning.

Prune

Initialization

Fine-tune

Adapter SparseAdapter Fine-tuned

SparseAdapter

(b) SparseAdapter Tuning.

2.1 Pruning Adapters at Initialization

As is shown in Figure 3, we intend to prune

out redundant parameters and then ﬁne-tune the

SparseAdapter, instead of directly tuning all pa-

rameters (standard Adapter Tuning). By pruning

adapters at initialization, we can abandon the re-

dundant parameters at the early stage and avoid the

time-consuming iterative pruning process (Fran-

kle and Carbin,2018). Speciﬁcally, considering

an adapter with weights

inserted in the layer

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SparseAdapter:AnEasyApproachforImprovingtheParameter-EfciencyofAdaptersShwaiHe1,4LiangDing1yDaizeDong4MiaoZhang2DachengTao1,31JDExploreAcademy2AalborgUniversity3TheuniversityofSydney4UniversityofElectronicScienceandTechnologyofChinashwai.he@gmail.com,dingliang1@jd.com,dzdong2019@gmail.com,miaoz@cs...

展开>> 收起<<

SparseAdapter An Easy Approach for Improving the Parameter-Efﬁciency of Adapters Shwai He1 4Liang Ding1yDaize Dong4Miao Zhang2Dacheng Tao1 3.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SparseAdapter An Easy Approach for Improving the Parameter-Efﬁciency of Adapters Shwai He1 4Liang Ding1yDaize Dong4Miao Zhang2Dacheng Tao1 3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: