The InEffectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning Sovesh Mohapatra1y Somesh Mohapatra2y

2025-05-06 0 0 535.17KB 5 页 10玖币
侵权投诉
The (In)Effectiveness of Intermediate Task Training
For Domain Adaptation and Cross-Lingual Transfer Learning
Sovesh Mohapatra1,, Somesh Mohapatra 2,
1University of Massachusetts Amherst
2Massachusetts Institute of Technology
soveshmohapa@umass.edu, someshm@mit.edu
Both authors contributed equally to the work.
Abstract
Transfer learning from large language models (LLMs) has
emerged as a powerful technique to enable knowledge-based
fine-tuning for a number of tasks, adaptation of models for
different domains and even languages. However, it remains
an open question, if and when transfer learning will work, i.e.
leading to positive or negative transfer. In this paper, we ana-
lyze the knowledge transfer across three natural language pro-
cessing (NLP) tasks - text classification, sentimental analysis,
and sentence similarity, using three LLMs - BERT, RoBERTa,
and XLNet - and analyzing their performance, by fine-tuning
on target datasets for domain and cross-lingual adaptation
tasks, with and without an intermediate task training on a
larger dataset. Our experiments showed that fine-tuning with-
out an intermediate task training can lead to a better perfor-
mance for most tasks, while more generalized tasks might ne-
cessitate a preceding intermediate task training step. We hope
that this work will act as a guide on transfer learning to NLP
practitioners.
Introduction
Knowledge-based transfer learning leverages zero or few-
shot learning from a pre-trained model to predict for a range
of similar tasks (You et al. 2020; Raffel et al. 2020; Houlsby
et al. 2019). The ability to use a pre-trained model, as-is or
with very limited training, has proposed a very lucrative op-
portunity, as compared to training from scratch for every sin-
gle task (Pan 2020; Day and Khoshgoftaar 2017) The appli-
cations of transfer learning have ranged from NLP to image,
and even video tasks (Kim et al. 2020; Salza et al. 2022;
Bengio 2012).
In recent works, people have applied transfer learning to
a range of NLP tasks, observing mixed results, both posi-
tive and negative transfer (Zhang et al. 2022). Pruksachatkun
et al. (2020) showed how transfer learning with intermedi-
ate task training could affect a number of target and probing
English-language NLP tasks. In most cases, positive transfer
from LLMs, such as BERT, has been noted for similar lan-
guage NLP tasks, like hate speech classification (Mozafari,
Farahbakhsh, and Crespi 2019), propaganda detection (Vlad
et al. 2019), and biomedical NLP tasks (Peng, Yan, and Lu
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
2019). Negative transfer has been shown in attempts to trans-
fer an English Part-of-Speech (POS) tagger to a Hindi cor-
pus (Dell’Orletta 2009; Rayson et al. 2007), and other NLP
tasks (Wang et al. 2019).
Transfer learning for domain adaptation has been widely
studied and applied across language and medical fields (Xu,
He, and Shu 2020; Ghafoorian et al. 2017; Kouw and Loog
2018). Savini and Caragea (2022) showed how intermediate
task training on sarcasm helped in transfer learning, sim-
ilar to Felbo et al. (2017) and Baroiu and Traus
,an-Matu
(2022). However, in another domain adaptation task, Meftah
et al. (2021) showed that knowledge transfer between related
seemingly similar domains like news and tweets resulted in
negative transfer, probing the results using both quantitative
and qualitative methods.
Cross-lingual tasks are another area where transfer learn-
ing strategies have shown a lot of potential (Ahmad et al.
2020; Luo et al. 2021). Chen et al. (2018) have shown that
when language-invariant and language-specific features are
coupled at the instance level.
In this work, we analyze the effect of intermediate task
training on a larger dataset for three different NLP tasks -
text classification, sentiment analysis, and sentence similar-
ity - and evaluate three language models - BERT, RoBERTa,
and XLNet. For each NLP task, we have one domain adap-
tation and another cross-lingual task. In total, we have eigh-
teen experiments on a range of NLP tasks.
Methodology
Here, we present an overview of our methodology, including
information on transfer learning for intermediate task train-
ing, and domain adaptation, and cross-lingual fine-tuning
and evaluation for the NLP tasks, and the datasets.
In each of the following tasks, both intermediate task
training and fine-tuning were performed by training over
70% of the dataset, and evaluated on the remaining 30%.
For the intermediate task training, each pre-trained LLM
was trained for 100 epochs using the large dataset. For fine-
tuning after and without intermediate task training, transfer
learning to the target dataset was performed by training for
10 epochs. In both cases of transfer learning, all the model
weights were updated, or none of the layers were frozen.
Model instances for LLMs, BERT, RoBERTa, and XL-
Net, were obtained from the respective GitHub repositories
arXiv:2210.01091v2 [cs.CL] 4 Nov 2022
摘要:

The(In)EffectivenessofIntermediateTaskTrainingForDomainAdaptationandCross-LingualTransferLearningSoveshMohapatra1,y,SomeshMohapatra2,y1UniversityofMassachusettsAmherst2MassachusettsInstituteofTechnologysoveshmohapa@umass.edu,someshm@mit.eduyBothauthorscontributedequallytothework.AbstractTransferlear...

展开>> 收起<<
The InEffectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning Sovesh Mohapatra1y Somesh Mohapatra2y.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:535.17KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注