Back to the Future On Potential Histories in NLP Zeerak Talat Digital Democracies Institute

2025-04-27 0 0 529.47KB 7 页 10玖币

侵权投诉

Back to the Future: On Potential Histories in NLP

Zeerak Talat

Digital Democracies Institute

Simon Fraser University

Burnaby, Canada

zeerak_talat@sfu.ca

Anne Lauscher

Data Science Group

University of Hamburg

Hamburg, Germany

anne.lauscher@uni-hamburg.de

Abstract

Machine learning and NLP require the con-

struction of datasets to train and ﬁne-tune mod-

els. In this context, previous work has demon-

strated the sensitivity of these data sets. For

instance, potential societal biases in this data

are likely to be encoded and to be ampliﬁed

in the models we deploy. In this work, we

draw from developments in the ﬁeld of his-

tory and take a novel perspective on these prob-

lems: considering datasets and models through

the lens of historical ﬁction surfaces their po-

litical nature, and affords re-conﬁguring how

we view the past, such that marginalized dis-

courses are surfaced. Building on such in-

sights, we argue that contemporary methods

for machine learning are prejudiced towards

dominant and hegemonic histories. Employ-

ing the example of neopronouns, we show that

by surfacing marginalized histories within con-

temporary conditions, we can create models

that better represent the lived realities of tra-

ditionally marginalized and excluded commu-

nities.

1 Introduction

The state-of-the art in NLP requires, among other

steps, selecting,sampling, and annotating data

sets which we can then use to train large machine

learning (ML) models (e.g., Devlin et al.,2019;Liu

et al.,2019). Previous work has shown that this is a

sensitive process: for instance, potential societal bi-

ases present in the data are prone to be encoded and

even ampliﬁed in our models and might jeopardize

fairness (e.g., Blodgett et al.,2020). Researchers

have thus argued that ML for NLP should be

handled with care, and have proposed measures

designed to counter potential ethical issues, e.g.,

via augmenting datasets (Zhao et al.,2018). In

this work, we argue that

all these steps along the

ML pipeline are in fact acts of historical ﬁction

Historical ﬁction is a ﬁeld of study in which history

is constructed as a plurality rather than a singular

entity or timeline (White,2005). What the ﬁeld of

historical ﬁction affords is drawing out marginal-

ized and minoritized histories that have otherwise

been forgotten or suppressed (White,2005). In

contrast, traditional history creates histories from

linear timelines and emphasizes the dominant

norms (Foucault,2013). Here we argue: if the

act of creating a history then, is creating a ﬁction

through which we can understand the past, then

the creation of datasets for ML and training ML

models similarly engage in acts of historical ﬁction.

However, rather than highlighting marginalized nar-

ratives or histories, mainstream ML draws out the

majoritarian histories. This occurs at the expense of

marginalized narratives, giving rise to the marginal-

ization that ML performs. In this way, current ML

is a conservative practice, which polices and limits

the expression of marginalized discourses, and

thereby the existence of marginalized people.

In this paper, we acknowledge the potential

of historical ﬁction for fairer NLP. Strongly

believing that our community can proﬁt from this

novel perspective, we a) introduce its theoretical

background; b) review different possibilities of

how NLP is currently performing acts of historical

ﬁction; and c) demonstrate through case study how

to construct histories for ML that are progressive

by explicitly including the lived realities of groups

that have otherwise been marginalized. We show

that such constructions strongly impact the ways

in which models come to embody information

(Talat et al.,2021). Here, we resort to the case of

neo-pronouns (novel and yet established pronouns),

to showcase how a simple heuristic ﬁction process

impacts how models embody these. Concretely, we

replace gendered pronouns with a gender-neutral

neo-pronoun and adopt existing model special-

ization methods (e.g., Lauscher et al.,2021), for

injecting a potential history. Training a model on

our ﬁction data shifts a marginalized pronoun from

the edges of the vector space towards majoritarian

arXiv:2210.06245v1 [cs.CL] 12 Oct 2022

pronouns. Using this example, we discuss how

the underlying data inﬂuences the production and

operationalization of socio-political constructs,

e.g., gender, in ML systems.

We hope that our work inspires more NLP re-

searchers and practitioners to think about steps in

ML as acts of historical ﬁction, leading to more

plurality and thus, fairer and more inclusive NLP.

2 Background

Data-making has been conceptualized as ﬁction

(e.g. Gitelman,2013) and ML researchers have also

begun to conceptualize ML, and data, as subjective

(Talat et al.,2021) and value-laden (Birhane et al.,

2022). Here, we lay the foundation for considering

ML through the lens of historical ﬁction.

2.1 Historical Fiction

In his foundational text, “The Archaeology of

Knowledge”,Foucault (2013) argues that history as

a ﬁeld has been pre-occupied with the construction

of linear timelines rather than constructing narra-

tives, in efforts to describe the past. Describing this

distinction, White (2005) notes that “historical dis-

course wages everything on the true, while ﬁctional

discourse is interested in the real.” That is, through

engaging with ﬁction, we are afforded knowledge

and understanding of the realities of life in the pe-

riod that is under investigation. Moreover, through

purposefully engaging with historical ﬁction, his-

tories that have otherwise been marginalized can

be surfaced (White,2005). Imagining histories

in opposition to hegemony can provide space for

viewing our contemporary conditions through the

lens of values in our past that have been neglected.

The resulting timelines are what Azoulay (2019)

terms potential histories.

2.2 Machine Learning and NLP

ML has been critiqued for its discriminatory and

hegemonic outcomes from multiple ﬁelds (Ben-

jamin,2019;Blodgett et al.,2021;Bolukbasi et al.,

2016), which has lead to a number of methods

that address the issue of discrimination by propos-

ing to “debias” ML models (e.g. De-Arteaga et al.,

2019;Dixon et al.,2018;Lauscher et al.,2020).

Early efforts have however been complicated by

notions of ‘bias’ being under-speciﬁed (for further

detail see Blodgett et al.,2020). Zhao et al. (2018)

perform data augmentation, with a goal of a less

gender-biased co-reference solution system. Mov-

ing a step further, Qian et al. (2022) collect data

perturbed along demographic lines by humans, and

train an automated perturber, and a language model

trained on the perturbed data. Although such ar-

tifacts can be used towards efforts to debias, the

artifacts can also be used to situate models within

desired contexts. Other works provide critiques

from theoretical perspectives. For instance, Talat

et al. (2021) critique the disembodied view that the

ML practice and practitioners take, arguing that

“social bias is inherent” to data making and model-

ing practices. Rogers (2021) argues that through

carefully curating data along desired values, ML

can constitute a progressive practice. Finally, So-

laiman and Dennison (2021) propose ﬁne-tuning

language models on curated data, which seeks to

shift language models away from producing toxic,

i.e. abusive content. Such work stands in contrast

to a large body of literature, which uncritically col-

lects and uses data, with the result of producing ML

models that recreate discriminatory contemporaries

(e.g. Green and Viljoen,2020;Gitelman,2013).

Viewing ML through the lens of historical ﬁc-

tion, we argue that ML engages in creating ﬁctions,

without awareness. For instance, in the creation

of data (Gitelman,2013) and in the ampliﬁcation

of dominant discourses (Zhao et al.,2017). The

predominant function of these ﬁctions has been to

imagine a single past that reﬂect hegemonic trends

in our contemporary. Here, we provide a case-

study that illustrates the possibility of imagining

pasts that reﬂect our current conditions, through

constructing a ﬁction (i.e., a data set and a model

which we train on this data) that is oppositional

to hegemony. Through deploying these ﬁctions

of the past (i.e., data sets and corresponding ML

models) in productive settings, we are, as a society,

able to shape futures that are more aligned with our

ﬁctions of relalities that were formerly oppressed.

3 Experiments: Neopronoun-Fiction

We describe a showcase which demonstrates the

idea behind historical ﬁction in NLP: we study the

case of the neopronoun “xe”. Neopronouns are

not yet established pronouns (McGaughey,2020).

They are an important example of language change

and are mostly used by individuals belonging to

already marginalized groups, e.g., non-binary indi-

viduals (e.g., see the overview by Lauscher et al.,

2022). NLP has long been ignoring neopronouns,

leading to exclusion of these individuals in lan-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BacktotheFuture:OnPotentialHistoriesinNLPZeerakTalatDigitalDemocraciesInstituteSimonFraserUniversityBurnaby,Canadazeerak_talat@sfu.caAnneLauscherDataScienceGroupUniversityofHamburgHamburg,Germanyanne.lauscher@uni-hamburg.deAbstractMachinelearningandNLPrequirethecon-structionofdatasetstotrainandne-t...

展开>> 收起<<

Back to the Future On Potential Histories in NLP Zeerak Talat Digital Democracies Institute.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Back to the Future On Potential Histories in NLP Zeerak Talat Digital Democracies Institute

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: