
Multilingual BERT has an accent:
Evaluating English influences on fluency in multilingual models
Isabel Papadimitriou* and Kezia Lopez* and Dan Jurafsky
Computer Science Department
Stanford University
{isabelvp,keziakl,jurafsky}@stanford.edu
Abstract
While multilingual language models can im-
prove NLP performance on low-resource lan-
guages by leveraging higher-resource lan-
guages, they also reduce average performance
on all languages (the ‘curse of multilingual-
ity’). Here we show another problem with
multilingual models: grammatical structures
in higher-resource languages bleed into lower-
resource languages, a phenomenon we call
grammatical structure bias. We show this bias
via a novel method for comparing the fluency
of multilingual models to the fluency of mono-
lingual Spanish and Greek models: testing
their preference for two carefully-chosen vari-
able grammatical structures (optional pronoun-
drop in Spanish and optional Subject-Verb or-
dering in Greek). We find that multilingual
BERT is biased toward the English-like setting
(explicit pronouns and Subject-Verb-Object or-
dering) as compared to our monolingual con-
trol language model. With our case studies,
we hope to bring to light the fine-grained ways
in which multilingual models can be biased,
and encourage more linguistically-aware flu-
ency evaluation.
1 Introduction
Multilingual language models share a single set of
parameters between many languages, opening new
pathways for multilingual and low-resource NLP.
However, not all training languages have an equal
amount, or a comparable quality of training data
in these models. In this paper, we investigate if
the hegemonic status of English influences other
languages in multilingual language models. We
propose a novel method for evaluation, whereby
we ask if model predictions for lower-resource lan-
guages exhibit structural features of English. This
is similar to asking if the model has learned some
languages with an “English accent”, or an English
grammatical structure bias.
We demonstrate this bias effect in Spanish and
Greek, comparing the monolingual models BETO
Monolingual
model
Control ratio
Multilingual
model Test ratio
Compare:
Is multilingual model
more English-biased?
English-like corpus:
Spanish with
pronoun
Non-English-like
corpus:
Spanish with
Prodrop
Figure 1: Our method for evaluating English structural
bias in multilingual models. We compare monolingual
and multilingual model predictions on two sets of natu-
ral sentences in the target language: one which is struc-
turally parallel to English, and one which is not.
(Cañete et al.,2020) and GreekBERT (Koutsikakis
et al.,2020) to multilingual BERT (mBERT),
where English is the most frequent language in
the training data. We show that mBERT prefers
English-like sentence structure in Spanish and
Greek compared to the monolingual models. Our
case studies focus on Spanish pronoun drop (pro-
drop) and Greek subject-verb order, two structural
grammatical features. We show that multilingual
BERT is structurally biased towards explicit pro-
nouns rather than pro-drop in Spanish, and subject-
before-verb order in Greek: the structural forms
parallel to English.
Though the effect we showcase here is likely not
captured by the downstream classification tasks of-
ten used to evaluate multilingual models (Hu et al.,
2020), it demonstrates the type of fluency that can
be lost with multilingual training — something that
current evaluation methods miss. In fact, though
we choose two clear-cut syntactic features to in-
vestigate, there are many less-measurable features
that make language production fluent: subtleties in
lexical choice, grammatical choice, and discourse
expression, among many others. With this paper,
beyond showing a trend for two specific grammati-
cal features, we wish to highlight fluency discrepan-
cies in multilingual models, and also call for more
evaluations focused on fluency.
arXiv:2210.05619v2 [cs.CL] 13 Apr 2023