
Checks and Strategies for Enabling Code-Switched Machine Translation
Thamme Gowda and Mozhdeh Gheini and Jonathan May
Information Sciences Institute and Computer Science Department
University of Southern California
{tg,gheini,jonmay}@isi.edu
Abstract
Code-switching is a common phenomenon
among multilingual speakers, where alterna-
tion between two or more languages occurs
within the context of a single conversation.
While multilingual humans can seamlessly
switch back and forth between languages, mul-
tilingual neural machine translation (NMT)
models are not robust to such sudden changes
in input. This work explores multilingual
NMT models’ ability to handle code-switched
text. First, we propose checks to measure
switching capability. Second, we investigate
simple and effective data augmentation meth-
ods that can enhance an NMT model’s ability
to support code-switching. Finally, by using
a glass-box analysis of attention modules, we
demonstrate the effectiveness of these methods
in improving robustness.
1 Introduction
Neural machine translation (NMT) (Sutskever
et al.,2014;Bahdanau et al.,2015;Vaswani et al.,
2017) has made significant progress, from support-
ing only a pair of languages per model to simultane-
ously supporting hundreds of languages (Johnson
et al.,2017;Zhang et al.,2020;Tiedemann,2020;
Gowda et al.,2021b). Multilingual NMT models
have been deployed in production systems and are
actively used to translate across languages in day-
to-day settings (Wu et al.,2016;Caswell,2020;
Mohan and Skotdal,2021). A great many metrics
for evaluation of machine translation have been
proposed (Doddington,2002;Banerjee and Lavie,
2005;Snover et al.,2006;Popovi´
c,2015;Gowda
et al.,2021a); simply citing a more comprehensive
list would exceed space limitations, however, ex-
cept context-aware MT, nearly all approaches con-
sider translation in the context of a single sentence.
Even approaches that generalize to support trans-
lation of multiple languages (Zhang et al.,2020;
Tiedemann,2020;Gowda et al.,2021b) continue to
use the single-sentence, single-language paradigm.
In reality, however, multilingual environments of-
ten involve language alternation or code-switching
(CS), where seamless alternation between two or
more languages occurs (Myers-Scotton and Ury,
1977).
CS can be broadly classified into two types
(Myers-Scotton,1989): (i) intra-sentential CS,
where switching occurs within sentence or clause
boundary, and (ii) inter-sentential CS, where
switching occurs at sentence or clause boundaries.
An example for each type is given in Table 1. CS
has been studied extensively in linguistics commu-
nities (Nilep,2006); however, the efforts in the MT
community are scant (Gupta et al.,2021).
Intra Ce
moment when you start
penser en deux langues
at the
same temps.
(The moment when you start to think in two
languages at the same time.)
Inter Comme on fait son lit
, you must lie
on it.
(As you make your bed, you must lie on it.)
Table 1: Intra- and inter- sentential code-switching ex-
amples between French and English.
In this work, we show that, as commonly built,
multilingual NMT models are not robust to multi-
sentence translation, especially when CS is in-
volved. The contributions of this work are out-
lined as follows: Firstly, a few simple but effective
checks for improving the test coverage in multi-
lingual NMT evaluation are described (Section 2).
Secondly, we explore training data augmentation
techniques such as concatenation and noise addi-
tion in the context of multilingual NMT (Section 3).
Third, using a many-to-one multilingual translation
task setup (Section 4), we investigate the relation-
ship between training data augmentation methods
and their impact on multilingual test cases. Fourth,
arXiv:2210.05096v1 [cs.CL] 11 Oct 2022