Efﬁciently Tuned Parameters are Task Embeddings Wangchunshu Zhou1 Canwen Xu2 Julian McAuley2 1ETH Zurich2University of California San Diego

2025-05-03 0 0 284.11KB 8 页 10玖币

侵权投诉

Efﬁciently Tuned Parameters are Task Embeddings

Wangchunshu Zhou1∗

, Canwen Xu2∗, Julian McAuley2

1ETH Zurich 2University of California, San Diego

1wangchunshu.zhou@inf.ethz.ch, 2{cxu,jmcauley}@ucsd.edu

Abstract

Intermediate-task transfer can beneﬁt a wide

range of NLP tasks with properly selected

source datasets. However, it is computation-

ally infeasible to experiment with all interme-

diate transfer combinations, making choosing

a useful source task a challenging problem. In

this paper, we anticipate that task-speciﬁc pa-

rameters updated in parameter-efﬁcient tuning

methods are likely to encode task-speciﬁc in-

formation. Therefore, such parameters can be

predictive for inter-task transferability. Thus,

we propose to exploit these efﬁciently tuned

parameters as off-the-shelf task embeddings

for the efﬁcient selection of source datasets

for intermediate-task transfer. We experiment

with 11 text classiﬁcation tasks and 11 ques-

tion answering tasks. Experimental results

show that our approach can consistently out-

perform existing inter-task transferability pre-

diction methods while being conceptually sim-

ple and computationally efﬁcient. Our anal-

ysis also reveals that the ability of efﬁciently

tuned parameters on transferability prediction

is disentangled with their in-task performance.

This allows us to use parameters from early

checkpoints as task embeddings to further im-

prove efﬁciency.1

1 Introduction

The pretraining then ﬁne-tuning paradigm (Peters

et al.,2018;Devlin et al.,2019;Radford et al.,

2018,2019;Brown et al.,2020;Lewis et al.,2020;

Raffel et al.,2019) has substantially improved the

state-of-the-art on a wide range of natural language

processing (NLP) tasks. In this paradigm, we ﬁrst

pretrain a large language model on large-scale cor-

pora in a general domain, and then ﬁne-tune the

pretrained model to be a task-speciﬁc model on

the target dataset. In addition to directly trans-

ferring from a general pretrained language model,

∗Equal contribution.

Code available at

https://github.com/JetRunner/

TuPaTE.

Layer n

Layer 1

Layer 2

…

Extract

Average

Train

Set

Task Dataset

Parameter-Eﬃcient

Tuning

Task

Embedding

Figure 1: The workﬂow of using efﬁciently tuned pa-

rameters as task embeddings. The yellow boxes repre-

sent tunable parameters in Transformer layers.

prior work (Phang et al.,2018) also shows that

intermediate-task transfer, i.e., ﬁne-tuning on in-

termediate source tasks before the target task, can

further improve target task performance. However,

the success of intermediate-task transfer heavily

relies on the selection of a proper source dataset

while an inappropriate source dataset often leads

to performance degradation compared to plain ﬁne-

tuning. Therefore, some recent works (Vu et al.,

2020;Poth et al.,2021) investigate methods to ef-

ﬁciently predict inter-task transferability without

actually trying out all intermediate-task combina-

tions.

The current state of the art (Vu et al.,2020)

on predicting inter-task transferability is built on

Task2Vec (Achille et al.,2019), which considers

the Fisher information matrix of a model ﬁne-

tuned on a task as the “task embedding”, and pre-

dicts inter-task transferability by computing the

cosine similarity between the task embedding of

the source and target tasks. Despite empirically per-

forming well, this approach requires ﬁne-tuning the

full model and (inefﬁciently) computing the Fisher

matrix of the model. Moreover, the resulting task

embeddings generally have a high dimensionality

similar to the size of the underlying model. There-

fore, intermediate task selection, which requires

storing task embeddings for each source/target task,

can be space-consuming, especially when experi-

arXiv:2210.11705v1 [cs.CL] 21 Oct 2022

menting with large language models.

In this work, we opt for parameter-efﬁcient tun-

ing approaches (Houlsby et al.,2019;Li and Liang,

2021;Guo et al.,2021;Hu et al.,2022;Zaken

et al.,2022) for the efﬁcient and accurate pre-

diction of inter-task transferability. Our key in-

sight is that task-speciﬁc parameters updated in

parameter-efﬁcient tuning methods are likely to en-

code high density task-speciﬁc information since

they are used as a query for retrieving task-related

knowledge in a frozen pretrained language model.

Therefore, we propose to directly use task-speciﬁc

parameters learned via parameter-efﬁcient tuning

on source/target datasets as task embeddings, as

shown in Figure 1. Compared to task embed-

dings obtained by calculating the Fisher matrix

of the ﬁne-tuned model (Achille et al.,2019;Vu

et al.,2020), efﬁciently tuned parameters are of

much lower dimensionality and do not suffer from

noise from uninformative weights in the model

parameters, thus leading to more accurate trans-

ferability prediction. Also, our method only re-

quires parameter-efﬁcient tuning on the tasks and

stores task-speciﬁc parameters, making both com-

puting and storing task embeddings more efﬁcient.

Moreover, with the development of open-source

parameter-efﬁcient tuning platforms like Adapter-

Hub (Pfeiffer et al.,2020), we can easily access

off-the-shelf parameters of the source and target

datasets downloaded from the model zoo and then

compute the similarity between the downloaded

parameters.

We empirically verify the effectiveness of our

approach by experimenting with 11 text classiﬁ-

cation tasks and 11 question answering tasks, fol-

lowing Vu et al. (2020). Our results show that our

approach consistently outperforms existing inter-

task transferability prediction methods while being

simpler and more efﬁcient. In addition, we ﬁnd that

the ability of efﬁciently tuned parameters on trans-

ferability prediction is not strongly correlated with

their in-task performance. Therefore, task-speciﬁc

parameters tuned with a relatively small number

of steps are already highly predictive for inter-task

transferability, allowing us to further improve the

efﬁciency of intermediate task selection.

2 Related Work

Prior work (Phang et al.,2018) shows that posi-

tive transfer can be elicited by training a model

on intermediate source tasks before ﬁne-tuning on

the target task. However, the choice of an appro-

priate source task is crucial for effective transfer.

Phang et al. (2018) show that the size of the source

dataset is an good prior for source task selection.

Pruksachatkun et al. (2020) propose to use task re-

quiring complex reasoning and inference as source

tasks. Besides these heuristics, a number of work

also focuses on systematic prediction of interme-

diate task transferability. Vu et al. (2020) propose

to used TASK2VEC to construct task embeddings

based on the input text or Fisher information ma-

trix of a ﬁne-tuned model. Poth et al. (2021) fur-

ther extend similar ideas for adapter-based trans-

fer learning. More recently, Vu et al. (2021) ex-

plore prompt-based transfer and propose to use

prompt similarity as a predictor for prompt trans-

ferability to select proper soft prompts for initial-

ization. This can be viewed as a special case of

our proposed method where the parameter-efﬁcient

tuning method is restricted to vanilla prompt tun-

ing (Lester et al.,2021) and the transfer method

is restricted to prompt transfer instead of general

intermediate-task transfer.

3 Methodology

3.1 Parameter-Efﬁcient Tuning

Parameter-efﬁcient tuning only updates a small

portion of parameters in a large pretrained model.

In this paper, we experiment with three types of

parameter-efﬁcient tuning: Prompt Tuning (Liu

et al.,2021), Bias Tuning (Zaken et al.,2022), and

Low-Rank Tuning (Hu et al.,2022).

Prompt Tuning

We experiment with P-Tuning

v2 (Liu et al.,2021). Speciﬁcally, P-Tuning v2

implements a prompt tuning method by introduc-

ing additional attention preﬁx matrices

Kt=

{k1. . . kn}

and

Vt={v1. . . vn}

for each Trans-

former layer, where

is a hyperparameter control-

ling the added preﬁx length;

k∗

and

v∗

are vectors

with dimension

;

is the hidden size of the

Transformer model.

For each Transformer layer, the added vectors

are concatenated with the original key and value

matrices to be

K0=Kt⊕K

and

V0=Vt⊕V

where

and

are the original key and value in

each layer’s attention block. Then, the new scaled

dot-product attention is calculated by replacing the

original

and

with the new

and

, respec-

tively.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientlyTunedParametersareTaskEmbeddingsWangchunshuZhou1,CanwenXu2,JulianMcAuley21ETHZurich2UniversityofCalifornia,SanDiego1wangchunshu.zhou@inf.ethz.ch,2{cxu,jmcauley}@ucsd.eduAbstractIntermediate-tasktransfercanbenetawiderangeofNLPtaskswithproperlyselectedsourcedatasets.However,itiscomputatio...

展开>> 收起<<

Efﬁciently Tuned Parameters are Task Embeddings Wangchunshu Zhou1 Canwen Xu2 Julian McAuley2 1ETH Zurich2University of California San Diego.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁciently Tuned Parameters are Task Embeddings Wangchunshu Zhou1 Canwen Xu2 Julian McAuley2 1ETH Zurich2University of California San Diego

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: