
The (In)Effectiveness of Intermediate Task Training
For Domain Adaptation and Cross-Lingual Transfer Learning
Sovesh Mohapatra1,†, Somesh Mohapatra 2,†
1University of Massachusetts Amherst
2Massachusetts Institute of Technology
soveshmohapa@umass.edu, someshm@mit.edu
†Both authors contributed equally to the work.
Abstract
Transfer learning from large language models (LLMs) has
emerged as a powerful technique to enable knowledge-based
fine-tuning for a number of tasks, adaptation of models for
different domains and even languages. However, it remains
an open question, if and when transfer learning will work, i.e.
leading to positive or negative transfer. In this paper, we ana-
lyze the knowledge transfer across three natural language pro-
cessing (NLP) tasks - text classification, sentimental analysis,
and sentence similarity, using three LLMs - BERT, RoBERTa,
and XLNet - and analyzing their performance, by fine-tuning
on target datasets for domain and cross-lingual adaptation
tasks, with and without an intermediate task training on a
larger dataset. Our experiments showed that fine-tuning with-
out an intermediate task training can lead to a better perfor-
mance for most tasks, while more generalized tasks might ne-
cessitate a preceding intermediate task training step. We hope
that this work will act as a guide on transfer learning to NLP
practitioners.
Introduction
Knowledge-based transfer learning leverages zero or few-
shot learning from a pre-trained model to predict for a range
of similar tasks (You et al. 2020; Raffel et al. 2020; Houlsby
et al. 2019). The ability to use a pre-trained model, as-is or
with very limited training, has proposed a very lucrative op-
portunity, as compared to training from scratch for every sin-
gle task (Pan 2020; Day and Khoshgoftaar 2017) The appli-
cations of transfer learning have ranged from NLP to image,
and even video tasks (Kim et al. 2020; Salza et al. 2022;
Bengio 2012).
In recent works, people have applied transfer learning to
a range of NLP tasks, observing mixed results, both posi-
tive and negative transfer (Zhang et al. 2022). Pruksachatkun
et al. (2020) showed how transfer learning with intermedi-
ate task training could affect a number of target and probing
English-language NLP tasks. In most cases, positive transfer
from LLMs, such as BERT, has been noted for similar lan-
guage NLP tasks, like hate speech classification (Mozafari,
Farahbakhsh, and Crespi 2019), propaganda detection (Vlad
et al. 2019), and biomedical NLP tasks (Peng, Yan, and Lu
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
2019). Negative transfer has been shown in attempts to trans-
fer an English Part-of-Speech (POS) tagger to a Hindi cor-
pus (Dell’Orletta 2009; Rayson et al. 2007), and other NLP
tasks (Wang et al. 2019).
Transfer learning for domain adaptation has been widely
studied and applied across language and medical fields (Xu,
He, and Shu 2020; Ghafoorian et al. 2017; Kouw and Loog
2018). Savini and Caragea (2022) showed how intermediate
task training on sarcasm helped in transfer learning, sim-
ilar to Felbo et al. (2017) and Baroiu and Traus
,an-Matu
(2022). However, in another domain adaptation task, Meftah
et al. (2021) showed that knowledge transfer between related
seemingly similar domains like news and tweets resulted in
negative transfer, probing the results using both quantitative
and qualitative methods.
Cross-lingual tasks are another area where transfer learn-
ing strategies have shown a lot of potential (Ahmad et al.
2020; Luo et al. 2021). Chen et al. (2018) have shown that
when language-invariant and language-specific features are
coupled at the instance level.
In this work, we analyze the effect of intermediate task
training on a larger dataset for three different NLP tasks -
text classification, sentiment analysis, and sentence similar-
ity - and evaluate three language models - BERT, RoBERTa,
and XLNet. For each NLP task, we have one domain adap-
tation and another cross-lingual task. In total, we have eigh-
teen experiments on a range of NLP tasks.
Methodology
Here, we present an overview of our methodology, including
information on transfer learning for intermediate task train-
ing, and domain adaptation, and cross-lingual fine-tuning
and evaluation for the NLP tasks, and the datasets.
In each of the following tasks, both intermediate task
training and fine-tuning were performed by training over
70% of the dataset, and evaluated on the remaining 30%.
For the intermediate task training, each pre-trained LLM
was trained for 100 epochs using the large dataset. For fine-
tuning after and without intermediate task training, transfer
learning to the target dataset was performed by training for
10 epochs. In both cases of transfer learning, all the model
weights were updated, or none of the layers were frozen.
Model instances for LLMs, BERT, RoBERTa, and XL-
Net, were obtained from the respective GitHub repositories
arXiv:2210.01091v2 [cs.CL] 4 Nov 2022