
IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces
Kelly Marchisio, Neha Verma, Kevin Duh, and Philipp Koehn
Johns Hopkins University
{kmarc, nverma7}@jhu.edu, kevinduh@cs.jhu.edu, phi@jhu.edu
Abstract
The ability to extract high-quality translation
dictionaries from monolingual word embed-
ding spaces depends critically on the geomet-
ric similarity of the spaces—their degree of
“isomorphism.” We address the root-cause
of faulty cross-lingual mapping: that word
embedding training resulted in the underly-
ing spaces being non-isomorphic. We incor-
porate global measures of isomorphism di-
rectly into the Skip-gram loss function, suc-
cessfully increasing the relative isomorphism
of trained word embedding spaces and improv-
ing their ability to be mapped to a shared cross-
lingual space. The result is improved bilin-
gual lexicon induction in general data con-
ditions, under domain mismatch, and with
training algorithm dissimilarities. We re-
lease IsoVec at
https://github.com/
kellymarchisio/isovec.
1 Introduction
The task of extracting a translation dictionary from
word embedding spaces, called “bilingual lexicon
induction” (BLI), is a common task in the natural
language processing literature. Bilingual dictionar-
ies are useful in their own right as linguistic re-
sources, and automatically generated dictionaries
may be particularly helpful for low-resource lan-
guages for which human-curated dictionaries are
unavailable. BLI is also used as an extrinsic eval-
uation task to assess the quality of cross-lingual
spaces. If a high-quality translation dictionary can
be automatically extracted from a shared embed-
ding space, intuition says that the space is high-
quality and useful for downstream tasks.
“Mapping-based” methods are one way to cre-
ate cross-lingual embedding spaces. Separately-
trained monolingual embeddings are mapped to a
shared space by applying a linear transformation to
one or both spaces, after which a bilingual lexicon
can be extracted via nearest-neighbor search (e.g.,
Mikolov et al.,2013b;Lample et al.,2018;Artetxe
et al.,2018b;Joulin et al.,2018;Patra et al.,2019).
Mapping methods are effective for closely-
related languages with embedding spaces trained
on high-quality, domain-matched data even with-
out supervision, but critically rely on the “approxi-
mate isomorphism assumption”—that monolingual
embedding spaces are geometrically similar.
1
Prob-
lematically, researchers have observed that the iso-
morphism assumption weakens substantially as lan-
guages and domains become dissimilar, leading
to failure precisely where unsupervised methods
might be helpful (e.g. Søgaard et al.,2018;Ormaz-
abal et al.,2019;Glavaš et al.,2019;Vuli´
c et al.,
2019;Patra et al.,2019;Marchisio et al.,2020).
Existing work attributes non-isomorphism to lin-
guistic, algorithmic, data size, or domain differ-
ences in training data for source and target lan-
guages. From Søgaard et al. (2018), “the perfor-
mance of unsupervised BDI [BLI] depends heavily
on... language pair, the comparability of the mono-
lingual corpora, and the parameters of the word
embedding algorithms.” Several authors found that
unsupervised machine translation methods suffer
under similar data shifts (Marchisio et al.,2020;
Kim et al.,2020;Marie and Fujita,2020).
While such factors do result in low isomorphism
of spaces trained with traditional methods, we
needn’t resign ourselves to the mercy of the ge-
ometry a training methodology naturally produces.
While multiple works post-process embeddings or
map non-linearly, we control similarity explicitly
during embedding training by incorporating five
global metrics of isomorphism into the Skip-gram
loss function. Our three supervised and two unsu-
pervised losses gain some control of the relative iso-
morphism of word embedding spaces, compensat-
1
In formal mathematicals, “isomorphic” requires two ob-
jects to have an invertible correspondence between them. Re-
searchers in NLP loosen the definition to “geometrically sim-
ilar”, and consider degrees of similarity. We might say that
space X is more isomorphic to space Y than is space Z.
arXiv:2210.05098v3 [cs.CL] 4 Jul 2023