
training instances in the target languages.
Our approach is based on multiparallel corpora,
meaning that the translation of each sentence is
available in more than two languages. We ex-
ploit the Parallel Bible Corpus (PBC) of Mayer
and Cysouw (2014),
1
a multiparallel corpus cover-
ing more than 1000 languages, many of which are
extremely low-resource, by which we mean that
only a tiny amount of unlabeled data is available or
that no language technologies exist for them at all
(Joshi et al.,2020).
We evaluate our method on a diverse set of low-
resource languages from multiple language fami-
lies, including four languages not covered by pre-
trained language models (PLMs). We train POS
tagging models for these languages and evaluate
them against references from the Universal Depen-
dencies corpus (Zeman et al.,2019). We compare
the results of our method against multiple state-
of-the-art (SOTA) cross-lingual unsupervised and
semisupervised POS taggers employing different
approaches like annotation projection and zero-shot
transfer. Our experiments highlight the benefits of
our new transfer and self-learning methods; cru-
cially, they show that reasonably accurate POS tag-
gers can be bootstrapped without any annotated
data for a diverse set of low-resource languages,
establishing a new SOTA for high-resource-to-low-
resource cross-lingual POS transfer. We also assess
the quality of the projected annotations with respect
to “silver” references and perform an ablation study.
To summarize, our contributions are:2
•
We formalize annotation projection as graph-
based label propagation and introduce two
new POS annotation projection models,
GLP-B (GLP-Base) and GLP-SL (GLP-
SelfLearning).
•
We evaluate GLP-B and GLP-SL on 17 low-
resource languages, including 4 languages not
covered by large PLMs.
•
By comparing our method with various su-
pervised, semisupervised, and PLM-based ap-
proaches for POS tagging of low-resource lan-
guages, we establish a new SOTA for unsuper-
vised POS tagging.
1
We do not use PBC-specific features. Thus, our work is
in principle applicable to any multiparallel corpus.
2
Our code, data, and trained models are available at
https:
//github.com/ayyoobimani/GLP-POS
2 Related work
POS tagging
Part of Speech tagging aims to as-
sign each word the proper syntactic tag in con-
text (Manning and Schütze,1999). For high-
resource languages, for which large labeled training
sets are available, high-accuracy POS tagging is
achieved through supervised learning (Kondratyuk
and Straka,2019;Tsai et al.,2019).
Zero-shot transfer
In low-resource settings, one
approach is to use cross-lingual transfer thanks to
pretrained multilingual representations, thereby en-
abling zero-shot POS tagging. Kondratyuk and
Straka (2019) analyze the few-shot and zero-shot
performance of mBERT (Devlin et al.,2019) fine-
tuning on POS tagging. We include this approach
in our set of baselines below. Ebrahimi and
Kann (2021) and Wang et al. (2022) analyze zero-
shot POS tagging performance of XLM-RoBERTa
(Conneau et al.,2020) and propose complementary
methods such as continued pretraining, vocabulary
expansion and adapter modules for better perfor-
mance. We show that combining GLP with Wang
et al. (2022)’s embeddings further improves our
base performance.
Annotation projection
Annotation projection is
another approach to annotating low-resource lan-
guages. Yarowsky and Ngai (2001) first proposed
projecting annotation labels across languages, ex-
ploiting parallel corpora and word alignment. To
reduce systematic transfer errors, Fossum and Ab-
ney (2005) extended this by projecting from mul-
tiple source languages. Agi´
c et al. (2015a) and
Agi´
c et al. (2016) exploit multilingual transfer se-
tups to bootstrap POS taggers for low-resource lan-
guages starting from a parallel corpus and taggers
and parsers for high-resource languages. Other
works project labels by leveraging token and type-
level constraints (Täckström et al.,2013;Buys and
Botha,2016a;Eskander et al.,2020). The latter
study notably proposes an unsupervised method for
selecting training instances via cross-lingual pro-
jection and trains POS taggers exploiting contex-
tualized word embeddings, affix embeddings and
hierarchical Brown clusters (Brown et al.,1992).
This approach is also used as a baseline below.
Semi-supervised approaches have been proposed
to mitigate the noise of projecting between lan-
guages. This can be achieved with auxiliary lex-
ical resources (Täckström et al.,2013;Ganchev
and Das,2013;Wisniewski et al.,2014;Li et al.,