
Data-based cross-lingual transfer
methods
aim to automatically generate labelled data for a tar-
get language. Previous works on data-based trans-
fer have proposed translation and annotation pro-
jection as an effective technique for zero-resource
cross-lingual sequence labelling (Jain et al.,2019;
Fei et al.,2020). In this setting, as illustrated in Fig-
ure 1, the idea is to translate gold-labelled text into
the target language and then, using automatic word
alignments, project the labels from the source into
the target language. The result is an automatically
generated dataset in the target language that can be
used for training a sequence labelling model.
The emergence of multilingual language models
(Devlin et al.,2019;Conneau et al.,2020) allows
for model-based cross-lingual transfer. As Figure
1illustrates, using labelled data in one source lan-
guage (usually English), it is possible to fine-tune
a pre-trained multilingual model that is directly
used to make predictions in any of the languages
included in the model. This is also known as zero-
shot cross-lingual sequence labelling.
In this work we present an in-depth study of
both approaches using the latest advancements in
machine translation, word aligners and multilin-
gual language models. We focus on two sequence
labelling tasks, namely, Named Entity Recogni-
tion (NER) and Opinion Target Extraction (OTE).
In order to do so, we present a data-based cross-
lingual transfer approach consisting of translating
gold labeled data between English and 7 other lan-
guages using state-of-the-art machine translation
systems. Sequence labelling annotations are then
automatically projected for every language pair.
Additionally, we also produced manual alignments
for those 4 languages for which we had expert an-
notators. After translation and projection, for the
data-transfer approach we fine-tune multilingual
language models using the automatically generated
datasets. We then compare the performance ob-
tained for each of the target languages against the
performance of the zero-shot cross-lingual method,
consisting of fine-tuning the multilingual language
models in the English gold data and generating the
predictions in the required target languages.
The main contributions of our work are the
following: First, we empirically establish the re-
quired conditions for each of these two approaches,
data-transfer and zero-shot model-based, to out-
perform the other. In this sense, our experiments
show that, contrary to what previous research sug-
gested (Fei et al.,2020;Li et al.,2021), the zero-
shot model-based approach obtains the best results
when high-capacity multilingual models including
the target language and domain are available. Sec-
ond, when the performance of the multilingual lan-
guage model is not optimal for the specific target
language or domain (for example when working
on a text genre and domain for which available
language models have not been trained), or when
the required hardware to work with high-capacity
language models is not easily accessible, then data-
transfer based on translate and project constitutes
a competitive option. Third, we observe that ma-
chine translation data often generates training and
test data which is, due to important differences in
language use, markedly different to the signal re-
ceived when using gold standard data in the target
language. These discrepancies seem to explain the
larger error rate of the translate and project method
with respect to the zero-shot technique. Finally,
we create manually projected datasets for four lan-
guages and automatically projected datasets for
seven languages. We use them to train and evaluate
cross-lingual sequence labelling models. Addition-
ally, they are also used to extrinsically evaluate
machine translation and word alignment systems.
These new datasets, together with the code to gen-
erate them are publicly available to facilitate the
reproducibility of results and its use in future re-
search.1
2 Related work
2.1 Data-based cross-lingual transfer
Data-based cross-lingual transfer methods aim to
automatically generate labelled data for a target
language. Some of these methods exploit parallel
data. Ehrmann et al. (2011) automatically annotate
the English version of a multi-parallel corpus and
projects the annotations into all the other languages
using statistical alignments of phrases. Wang and
Manning (2014) project model expectations rather
than labels, which facilities transfer of model un-
certainty across languages. Ni et al. (2017) use
a heuristic scheme that effectively selects good-
quality projection-labeled data from noisy data.
They also project word embeddings from a tar-
get language into a source language, so that the
1https://github.com/ikergarcia1996/
Easy-Label-Projection
https://github.com/ikergarcia1996/
Easy-Translate