Robust Unsupervised Cross-Lingual Word Embedding
using Domain Flow Interpolation
Liping Tang1Zhen Li2Zhiquan Luo2Helen Meng1,3
1Centre for Perceptual and Interactive Intelligence
2The Chinese University of Hong Kong, Shenzhen
3The Chinese University of Hong Kong
lptang@cpii.hk, {zhenli, zqluo}@cuhk.edu.cn, hmmeng@cuhk.edu.hk
Abstract
This paper investigates an unsupervised ap-
proach towards deriving a universal, cross-
lingual word embedding space, where words
with similar semantics from different lan-
guages are close to one another. Previous ad-
versarial approaches have shown promising re-
sults in inducing cross-lingual word embed-
ding without parallel data. However, the train-
ing stage shows instability for distant language
pairs. Instead of mapping the source language
space directly to the target language space, we
propose to make use of a sequence of interme-
diate spaces for smooth bridging. Each inter-
mediate space may be conceived as a pseudo-
language space and is introduced via simple
linear interpolation. This approach is mod-
eled after domain flow in computer vision, but
with a modified objective function. Experi-
ments on intrinsic Bilingual Dictionary Induc-
tion tasks show that the proposed approach
can improve the robustness of adversarial mod-
els with comparable and even better preci-
sion. Further experiments on the downstream
task of Cross-Lingual Natural Language Infer-
ence show that the proposed model achieves
significant performance improvement for dis-
tant language pairs in downstream tasks com-
pared to state-of-the-art adversarial and non-
adversarial models.
1 Introduction
Learning cross-lingual word embedding (CLWE)
is a fundamental step towards deriving a univer-
sal embedding space in which words with simi-
lar semantics from different languages are close
to one another. CLWE has also shown effective-
ness in knowledge transfer between languages for
many natural language processing tasks, including
Named Entity Recognition (Guo et al.,2015), Ma-
chine Translation (Gu et al.,2018), and Information
Retrieval (Vulic and Moens,2015).
Inspired by Mikolov et al. (2013), recent CLWE
models have been dominated by mapping-based
methods (Ruder et al.,2019;Glavas et al.,2019;
Vulic et al.,2019). They map monolingual word
embeddings into a shared space via linear map-
pings, assuming that different word embedding
spaces are nearly isomorphic. By leveraging a
seed dictionary of 5000 word pairs, Mikolov et al.
(2013) induces CLWEs by solving a least-squares
problem. Subsequent works (Xing et al.,2015;
Artetxe et al.,2016;Smith et al.,2017;Joulin et al.,
2018) propose to improve the model by normaliz-
ing the embedding vectors, imposing an orthogonal-
ity constraint on the linear mapping, and modifying
the objective function. Following work has shown
that reliable projections can be learned from weak
supervision by utilizing shared numerals(Artetxe
et al.,2017), cognates (Smith et al.,2017), or iden-
tical strings (Søgaard et al.,2018).
Moreover, several fully unsupervised approaches
have been recently proposed to induce CLWEs by
adversarial training (Zhang et al.,2017a;Zhang
et al.,2017b;Lample et al.,2018). State-of-the-art
unsupervised adversarial approaches (Lample et al.,
2018) have achieved very promising results and
even outperform supervised approaches in some
cases. However, the main drawback of adversarial
approaches lies in their instability on distant lan-
guage pairs (Søgaard et al.,2018), inspiring the
proposition of non-adversarial approaches (Hoshen
and Wolf,2018;Artetxe et al.,2018b). In partic-
ular, Artetxe et al. (2018b) (VecMap) have shown
strong robustness on several language pairs. How-
ever, it still fails on 87 out of 210 distant language
pairs (Vulic et al.,2019).
Subsequently, Li et al. (2020) proposed Itera-
tive Dimension Reduction to improve the robust-
ness of VecMap. On the other hand, Mohiuddin
and Joty (2019) revisited adversarial models and
add two regularization terms that yield improved
results. However, the problem of instability still
remains. For instance, our experiments show that
the improved version(Mohiuddin and Joty,2019)
arXiv:2210.03319v1 [cs.CL] 7 Oct 2022