The InEffectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning Sovesh Mohapatra1y Somesh Mohapatra2y

2025-05-06 0 0 535.17KB 5 页 10玖币

侵权投诉

The (In)Effectiveness of Intermediate Task Training

For Domain Adaptation and Cross-Lingual Transfer Learning

Sovesh Mohapatra1,†, Somesh Mohapatra 2,†

1University of Massachusetts Amherst

2Massachusetts Institute of Technology

soveshmohapa@umass.edu, someshm@mit.edu

†Both authors contributed equally to the work.

Abstract

Transfer learning from large language models (LLMs) has

emerged as a powerful technique to enable knowledge-based

ﬁne-tuning for a number of tasks, adaptation of models for

different domains and even languages. However, it remains

an open question, if and when transfer learning will work, i.e.

leading to positive or negative transfer. In this paper, we ana-

lyze the knowledge transfer across three natural language pro-

cessing (NLP) tasks - text classiﬁcation, sentimental analysis,

and sentence similarity, using three LLMs - BERT, RoBERTa,

and XLNet - and analyzing their performance, by ﬁne-tuning

on target datasets for domain and cross-lingual adaptation

tasks, with and without an intermediate task training on a

larger dataset. Our experiments showed that ﬁne-tuning with-

out an intermediate task training can lead to a better perfor-

mance for most tasks, while more generalized tasks might ne-

cessitate a preceding intermediate task training step. We hope

that this work will act as a guide on transfer learning to NLP

practitioners.

Introduction

Knowledge-based transfer learning leverages zero or few-

shot learning from a pre-trained model to predict for a range

of similar tasks (You et al. 2020; Raffel et al. 2020; Houlsby

et al. 2019). The ability to use a pre-trained model, as-is or

with very limited training, has proposed a very lucrative op-

portunity, as compared to training from scratch for every sin-

gle task (Pan 2020; Day and Khoshgoftaar 2017) The appli-

cations of transfer learning have ranged from NLP to image,

and even video tasks (Kim et al. 2020; Salza et al. 2022;

Bengio 2012).

In recent works, people have applied transfer learning to

a range of NLP tasks, observing mixed results, both posi-

tive and negative transfer (Zhang et al. 2022). Pruksachatkun

et al. (2020) showed how transfer learning with intermedi-

ate task training could affect a number of target and probing

English-language NLP tasks. In most cases, positive transfer

from LLMs, such as BERT, has been noted for similar lan-

guage NLP tasks, like hate speech classiﬁcation (Mozafari,

Farahbakhsh, and Crespi 2019), propaganda detection (Vlad

et al. 2019), and biomedical NLP tasks (Peng, Yan, and Lu

2019). Negative transfer has been shown in attempts to trans-

fer an English Part-of-Speech (POS) tagger to a Hindi cor-

pus (Dell’Orletta 2009; Rayson et al. 2007), and other NLP

tasks (Wang et al. 2019).

Transfer learning for domain adaptation has been widely

studied and applied across language and medical ﬁelds (Xu,

He, and Shu 2020; Ghafoorian et al. 2017; Kouw and Loog

2018). Savini and Caragea (2022) showed how intermediate

task training on sarcasm helped in transfer learning, sim-

ilar to Felbo et al. (2017) and Baroiu and Traus

,an-Matu

(2022). However, in another domain adaptation task, Meftah

et al. (2021) showed that knowledge transfer between related

seemingly similar domains like news and tweets resulted in

negative transfer, probing the results using both quantitative

and qualitative methods.

Cross-lingual tasks are another area where transfer learn-

ing strategies have shown a lot of potential (Ahmad et al.

2020; Luo et al. 2021). Chen et al. (2018) have shown that

when language-invariant and language-speciﬁc features are

coupled at the instance level.

In this work, we analyze the effect of intermediate task

training on a larger dataset for three different NLP tasks -

text classiﬁcation, sentiment analysis, and sentence similar-

ity - and evaluate three language models - BERT, RoBERTa,

and XLNet. For each NLP task, we have one domain adap-

tation and another cross-lingual task. In total, we have eigh-

teen experiments on a range of NLP tasks.

Methodology

Here, we present an overview of our methodology, including

information on transfer learning for intermediate task train-

ing, and domain adaptation, and cross-lingual ﬁne-tuning

and evaluation for the NLP tasks, and the datasets.

In each of the following tasks, both intermediate task

training and ﬁne-tuning were performed by training over

70% of the dataset, and evaluated on the remaining 30%.

For the intermediate task training, each pre-trained LLM

was trained for 100 epochs using the large dataset. For ﬁne-

tuning after and without intermediate task training, transfer

learning to the target dataset was performed by training for

10 epochs. In both cases of transfer learning, all the model

weights were updated, or none of the layers were frozen.

Model instances for LLMs, BERT, RoBERTa, and XL-

Net, were obtained from the respective GitHub repositories

arXiv:2210.01091v2 [cs.CL] 4 Nov 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

The(In)EffectivenessofIntermediateTaskTrainingForDomainAdaptationandCross-LingualTransferLearningSoveshMohapatra1,y,SomeshMohapatra2,y1UniversityofMassachusettsAmherst2MassachusettsInstituteofTechnologysoveshmohapa@umass.edu,someshm@mit.eduyBothauthorscontributedequallytothework.AbstractTransferlear...

展开>> 收起<<

The InEffectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning Sovesh Mohapatra1y Somesh Mohapatra2y.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The InEffectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning Sovesh Mohapatra1y Somesh Mohapatra2y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: