Back to the Future On Potential Histories in NLP Zeerak Talat Digital Democracies Institute

2025-04-27 0 0 529.47KB 7 页 10玖币
侵权投诉
Back to the Future: On Potential Histories in NLP
Zeerak Talat
Digital Democracies Institute
Simon Fraser University
Burnaby, Canada
zeerak_talat@sfu.ca
Anne Lauscher
Data Science Group
University of Hamburg
Hamburg, Germany
anne.lauscher@uni-hamburg.de
Abstract
Machine learning and NLP require the con-
struction of datasets to train and fine-tune mod-
els. In this context, previous work has demon-
strated the sensitivity of these data sets. For
instance, potential societal biases in this data
are likely to be encoded and to be amplified
in the models we deploy. In this work, we
draw from developments in the field of his-
tory and take a novel perspective on these prob-
lems: considering datasets and models through
the lens of historical fiction surfaces their po-
litical nature, and affords re-configuring how
we view the past, such that marginalized dis-
courses are surfaced. Building on such in-
sights, we argue that contemporary methods
for machine learning are prejudiced towards
dominant and hegemonic histories. Employ-
ing the example of neopronouns, we show that
by surfacing marginalized histories within con-
temporary conditions, we can create models
that better represent the lived realities of tra-
ditionally marginalized and excluded commu-
nities.
1 Introduction
The state-of-the art in NLP requires, among other
steps, selecting,sampling, and annotating data
sets which we can then use to train large machine
learning (ML) models (e.g., Devlin et al.,2019;Liu
et al.,2019). Previous work has shown that this is a
sensitive process: for instance, potential societal bi-
ases present in the data are prone to be encoded and
even amplified in our models and might jeopardize
fairness (e.g., Blodgett et al.,2020). Researchers
have thus argued that ML for NLP should be
handled with care, and have proposed measures
designed to counter potential ethical issues, e.g.,
via augmenting datasets (Zhao et al.,2018). In
this work, we argue that
all these steps along the
ML pipeline are in fact acts of historical fiction
.
Historical fiction is a field of study in which history
is constructed as a plurality rather than a singular
entity or timeline (White,2005). What the field of
historical fiction affords is drawing out marginal-
ized and minoritized histories that have otherwise
been forgotten or suppressed (White,2005). In
contrast, traditional history creates histories from
linear timelines and emphasizes the dominant
norms (Foucault,2013). Here we argue: if the
act of creating a history then, is creating a fiction
through which we can understand the past, then
the creation of datasets for ML and training ML
models similarly engage in acts of historical fiction.
However, rather than highlighting marginalized nar-
ratives or histories, mainstream ML draws out the
majoritarian histories. This occurs at the expense of
marginalized narratives, giving rise to the marginal-
ization that ML performs. In this way, current ML
is a conservative practice, which polices and limits
the expression of marginalized discourses, and
thereby the existence of marginalized people.
In this paper, we acknowledge the potential
of historical fiction for fairer NLP. Strongly
believing that our community can profit from this
novel perspective, we a) introduce its theoretical
background; b) review different possibilities of
how NLP is currently performing acts of historical
fiction; and c) demonstrate through case study how
to construct histories for ML that are progressive
by explicitly including the lived realities of groups
that have otherwise been marginalized. We show
that such constructions strongly impact the ways
in which models come to embody information
(Talat et al.,2021). Here, we resort to the case of
neo-pronouns (novel and yet established pronouns),
to showcase how a simple heuristic fiction process
impacts how models embody these. Concretely, we
replace gendered pronouns with a gender-neutral
neo-pronoun and adopt existing model special-
ization methods (e.g., Lauscher et al.,2021), for
injecting a potential history. Training a model on
our fiction data shifts a marginalized pronoun from
the edges of the vector space towards majoritarian
arXiv:2210.06245v1 [cs.CL] 12 Oct 2022
pronouns. Using this example, we discuss how
the underlying data influences the production and
operationalization of socio-political constructs,
e.g., gender, in ML systems.
We hope that our work inspires more NLP re-
searchers and practitioners to think about steps in
ML as acts of historical fiction, leading to more
plurality and thus, fairer and more inclusive NLP.
2 Background
Data-making has been conceptualized as fiction
(e.g. Gitelman,2013) and ML researchers have also
begun to conceptualize ML, and data, as subjective
(Talat et al.,2021) and value-laden (Birhane et al.,
2022). Here, we lay the foundation for considering
ML through the lens of historical fiction.
2.1 Historical Fiction
In his foundational text, “The Archaeology of
Knowledge”,Foucault (2013) argues that history as
a field has been pre-occupied with the construction
of linear timelines rather than constructing narra-
tives, in efforts to describe the past. Describing this
distinction, White (2005) notes that “historical dis-
course wages everything on the true, while fictional
discourse is interested in the real. That is, through
engaging with fiction, we are afforded knowledge
and understanding of the realities of life in the pe-
riod that is under investigation. Moreover, through
purposefully engaging with historical fiction, his-
tories that have otherwise been marginalized can
be surfaced (White,2005). Imagining histories
in opposition to hegemony can provide space for
viewing our contemporary conditions through the
lens of values in our past that have been neglected.
The resulting timelines are what Azoulay (2019)
terms potential histories.
2.2 Machine Learning and NLP
ML has been critiqued for its discriminatory and
hegemonic outcomes from multiple fields (Ben-
jamin,2019;Blodgett et al.,2021;Bolukbasi et al.,
2016), which has lead to a number of methods
that address the issue of discrimination by propos-
ing to “debias” ML models (e.g. De-Arteaga et al.,
2019;Dixon et al.,2018;Lauscher et al.,2020).
Early efforts have however been complicated by
notions of ‘bias’ being under-specified (for further
detail see Blodgett et al.,2020). Zhao et al. (2018)
perform data augmentation, with a goal of a less
gender-biased co-reference solution system. Mov-
ing a step further, Qian et al. (2022) collect data
perturbed along demographic lines by humans, and
train an automated perturber, and a language model
trained on the perturbed data. Although such ar-
tifacts can be used towards efforts to debias, the
artifacts can also be used to situate models within
desired contexts. Other works provide critiques
from theoretical perspectives. For instance, Talat
et al. (2021) critique the disembodied view that the
ML practice and practitioners take, arguing that
“social bias is inherent” to data making and model-
ing practices. Rogers (2021) argues that through
carefully curating data along desired values, ML
can constitute a progressive practice. Finally, So-
laiman and Dennison (2021) propose fine-tuning
language models on curated data, which seeks to
shift language models away from producing toxic,
i.e. abusive content. Such work stands in contrast
to a large body of literature, which uncritically col-
lects and uses data, with the result of producing ML
models that recreate discriminatory contemporaries
(e.g. Green and Viljoen,2020;Gitelman,2013).
Viewing ML through the lens of historical fic-
tion, we argue that ML engages in creating fictions,
without awareness. For instance, in the creation
of data (Gitelman,2013) and in the amplification
of dominant discourses (Zhao et al.,2017). The
predominant function of these fictions has been to
imagine a single past that reflect hegemonic trends
in our contemporary. Here, we provide a case-
study that illustrates the possibility of imagining
pasts that reflect our current conditions, through
constructing a fiction (i.e., a data set and a model
which we train on this data) that is oppositional
to hegemony. Through deploying these fictions
of the past (i.e., data sets and corresponding ML
models) in productive settings, we are, as a society,
able to shape futures that are more aligned with our
fictions of relalities that were formerly oppressed.
3 Experiments: Neopronoun-Fiction
We describe a showcase which demonstrates the
idea behind historical fiction in NLP: we study the
case of the neopronoun “xe”. Neopronouns are
not yet established pronouns (McGaughey,2020).
They are an important example of language change
and are mostly used by individuals belonging to
already marginalized groups, e.g., non-binary indi-
viduals (e.g., see the overview by Lauscher et al.,
2022). NLP has long been ignoring neopronouns,
leading to exclusion of these individuals in lan-
摘要:

BacktotheFuture:OnPotentialHistoriesinNLPZeerakTalatDigitalDemocraciesInstituteSimonFraserUniversityBurnaby,Canadazeerak_talat@sfu.caAnneLauscherDataScienceGroupUniversityofHamburgHamburg,Germanyanne.lauscher@uni-hamburg.deAbstractMachinelearningandNLPrequirethecon-structionofdatasetstotrainandne-t...

展开>> 收起<<
Back to the Future On Potential Histories in NLP Zeerak Talat Digital Democracies Institute.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:529.47KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注