pronouns. Using this example, we discuss how
the underlying data influences the production and
operationalization of socio-political constructs,
e.g., gender, in ML systems.
We hope that our work inspires more NLP re-
searchers and practitioners to think about steps in
ML as acts of historical fiction, leading to more
plurality and thus, fairer and more inclusive NLP.
2 Background
Data-making has been conceptualized as fiction
(e.g. Gitelman,2013) and ML researchers have also
begun to conceptualize ML, and data, as subjective
(Talat et al.,2021) and value-laden (Birhane et al.,
2022). Here, we lay the foundation for considering
ML through the lens of historical fiction.
2.1 Historical Fiction
In his foundational text, “The Archaeology of
Knowledge”,Foucault (2013) argues that history as
a field has been pre-occupied with the construction
of linear timelines rather than constructing narra-
tives, in efforts to describe the past. Describing this
distinction, White (2005) notes that “historical dis-
course wages everything on the true, while fictional
discourse is interested in the real.” That is, through
engaging with fiction, we are afforded knowledge
and understanding of the realities of life in the pe-
riod that is under investigation. Moreover, through
purposefully engaging with historical fiction, his-
tories that have otherwise been marginalized can
be surfaced (White,2005). Imagining histories
in opposition to hegemony can provide space for
viewing our contemporary conditions through the
lens of values in our past that have been neglected.
The resulting timelines are what Azoulay (2019)
terms potential histories.
2.2 Machine Learning and NLP
ML has been critiqued for its discriminatory and
hegemonic outcomes from multiple fields (Ben-
jamin,2019;Blodgett et al.,2021;Bolukbasi et al.,
2016), which has lead to a number of methods
that address the issue of discrimination by propos-
ing to “debias” ML models (e.g. De-Arteaga et al.,
2019;Dixon et al.,2018;Lauscher et al.,2020).
Early efforts have however been complicated by
notions of ‘bias’ being under-specified (for further
detail see Blodgett et al.,2020). Zhao et al. (2018)
perform data augmentation, with a goal of a less
gender-biased co-reference solution system. Mov-
ing a step further, Qian et al. (2022) collect data
perturbed along demographic lines by humans, and
train an automated perturber, and a language model
trained on the perturbed data. Although such ar-
tifacts can be used towards efforts to debias, the
artifacts can also be used to situate models within
desired contexts. Other works provide critiques
from theoretical perspectives. For instance, Talat
et al. (2021) critique the disembodied view that the
ML practice and practitioners take, arguing that
“social bias is inherent” to data making and model-
ing practices. Rogers (2021) argues that through
carefully curating data along desired values, ML
can constitute a progressive practice. Finally, So-
laiman and Dennison (2021) propose fine-tuning
language models on curated data, which seeks to
shift language models away from producing toxic,
i.e. abusive content. Such work stands in contrast
to a large body of literature, which uncritically col-
lects and uses data, with the result of producing ML
models that recreate discriminatory contemporaries
(e.g. Green and Viljoen,2020;Gitelman,2013).
Viewing ML through the lens of historical fic-
tion, we argue that ML engages in creating fictions,
without awareness. For instance, in the creation
of data (Gitelman,2013) and in the amplification
of dominant discourses (Zhao et al.,2017). The
predominant function of these fictions has been to
imagine a single past that reflect hegemonic trends
in our contemporary. Here, we provide a case-
study that illustrates the possibility of imagining
pasts that reflect our current conditions, through
constructing a fiction (i.e., a data set and a model
which we train on this data) that is oppositional
to hegemony. Through deploying these fictions
of the past (i.e., data sets and corresponding ML
models) in productive settings, we are, as a society,
able to shape futures that are more aligned with our
fictions of relalities that were formerly oppressed.
3 Experiments: Neopronoun-Fiction
We describe a showcase which demonstrates the
idea behind historical fiction in NLP: we study the
case of the neopronoun “xe”. Neopronouns are
not yet established pronouns (McGaughey,2020).
They are an important example of language change
and are mostly used by individuals belonging to
already marginalized groups, e.g., non-binary indi-
viduals (e.g., see the overview by Lauscher et al.,
2022). NLP has long been ignoring neopronouns,
leading to exclusion of these individuals in lan-