The Shared Task on Gender Rewriting Bashar Alhafni1Nizar Habash1Houda Bouamor2Ossama Obeid1 Sultan Alrowili 3Daliyah Alzeer 4Khawlah M. Alshanqiti 5Ahmed ElBakry 6

2025-05-06 0 0 379.83KB 10 页 10玖币

侵权投诉

The Shared Task on Gender Rewriting

Bashar Alhafni,1Nizar Habash,1Houda Bouamor,2Ossama Obeid,1

Sultan Alrowili,3Daliyah Alzeer,4Khawlah M. Alshanqiti,5Ahmed ElBakry,6

Muhammad ElNokrashy,6Mohamed Gabr,6Abderrahmane Issam,7

Abdelrahim Qaddoumi,8K. Vijay-Shanker,3Mahmoud Zyate9∗

1New York University Abu Dhabi, 2Carnegie Mellon University in Qatar,

3University of Delaware, 4Taif University, 5Umm Alqura University,

6Microsoft ATL Cairo, 7Archipel Cognitive, 8New York University, 9Leyton

alhafni@nyu.edu

Abstract

In this paper, we present the results and ﬁnd-

ings of the Shared Task on Gender Rewriting,

which was organized as part of the Seventh

Arabic Natural Language Processing Work-

shop. The task of gender rewriting refers

to generating alternatives of a given sentence

to match different target user gender contexts

(e.g., female speaker with a male listener,

a male speaker with a male listener, etc.).

This requires changing the grammatical gen-

der (masculine or feminine) of certain words

referring to the users. In this task, we focus

on Arabic, a gender-marking morphologically

rich language. A total of ﬁve teams from four

countries participated in the shared task.

1 Introduction

The problem of gender bias in Natural Language

Processing (NLP) systems has been receiving a lot

of attention across a variety of tasks such as ma-

chine translation, co-reference resolution, and dia-

logue systems. Research has shown that NLP sys-

tems do not only have the ability to embed societal

biases, but they also amplify and propagate them

in ways that create representational harms and de-

grade users’ experiences (Sun et al.,2019;Blodgett

et al.,2020). The main cause of this problem is usu-

ally attributed to inherently biased data that is used

to build these systems and which mirrors the in-

equalities of the world we live in. Therefore, many

approaches were proposed to mitigate this problem

by either using counterfactual data augmentation

techniques (Lu et al.,2018;Hall Maudslay et al.,

2019;Zmigrod et al.,2019) or by debiasing pre-

trained representation that is trained on biased data

(Bolukbasi et al.,2016;Zhao et al.,2018;Manzini

et al.,2019;Zhao et al.,2020). However, even

the most balanced of models can still exhibit and

∗

The ﬁrst four authors are the shared task organizers,

listed in order of contribution. The remaining authors are the

shared task participants in alphabetical order.

amplify bias if they are designed to produce a sin-

gle text output without taking their users’ gender

preferences into consideration (Habash et al.,2019;

Alhafni et al.,2020,2022b). Therefore, to provide

the correct user-aware output, NLP systems should

be designed to produce outputs that are as gender

speciﬁc as the users preferences they have access

to. Recently, Alhafni et al. (2022b) introduced the

task of gender rewriting, which refers to generating

alternatives of a given sentence to match different

target user gender contexts. To encourage more

researchers to work on this problem, we organized

the Shared Task on Gender Rewriting. We focus on

Modern Standard Arabic (MSA), a gender-marking

morphologically rich language, in contexts involv-

ing two users.1

This shared task was organized as part of the Sev-

enth Arabic Natural Language Processing Work-

shop (WANLP), collocated with EMNLP 2022.

This is the ﬁrst shared task at WANLP in seven

years to target a language generation problem in

Arabic. A total of ﬁve teams from four countries

participated in the shared task. One team con-

tributed to a system description paper which is in-

cluded in the WANLP proceedings and cited in this

paper. We provide a description of all submitted

systems and the approaches they use. All of the

datasets created for this shared task will be made

publicly available to support further research on

gender rewriting.

This paper is organized as follows. We ﬁrst pro-

vide a description of the shared task (§2). We then

describe the data used in the shared task, including

a newly created set which we used for evaluation in

§3. Next, we provide a description of all submitted

systems in §4and discuss the results in §5. Finally,

we discuss the lessons we learned from running

this shared task and provide recommendations to

the (Arabic) NLP community in §6.

1http://gender-rewriting-shared-task.

camel-lab.com/

arXiv:2210.12410v1 [cs.CL] 22 Oct 2022

Input Sentence Target Speaker Target Listener Output Sentence

تا    ة

(Really glad to know you ladies)

Masculine Masculine ةد    

(Really glad to know you gentlemen)

Feminine Masculine ةد    ة

(Really glad to know you gentlemen)

Masculine Feminine تا    

(Really glad to know you ladies)

Feminine Feminine تا    ة

(Really glad to know you ladies)

Table 1: Example of the gender rewriting task. The input sentence has four rewritten alternatives that match the

different target user gender contexts. First person gendered words are in purple and second person gendered words

are in red.

2 Task Description

The task of gender rewriting was introduced by

Alhafni et al. (2022b) and it refers to generating

alternatives of a given Arabic sentence to match

different target user gender contexts. We focus

on contexts involving two users (I and/or You) –

ﬁrst and second grammatical persons with indepen-

dent grammatical gender preferences. This requires

changing the grammatical gender (masculine or

feminine) of certain words referring to the users

(speaker/ﬁrst person and listener/second person)

in the input sentence. Therefore, given an Arabic

sentence as an input, the goal is to generate four

different gender rewritten alternatives to match the

different target user gender contexts (i.e., female

speaker with a male listener, a male speaker with

a male listener, a male speaker with a female lis-

tener, and a female speaker with a female listener).

Table 1shows an example of the gender rewriting

problem where the input sentence is rewritten to its

four gender alternatives that match the four target

user gender contexts.

Notation

We use the notation that is deﬁned by

Alhafni et al. (2022b). Namely, we use four ele-

mentary symbols to facilitate the discussion of this

task: 1M, 1F, 2M and 2F. The digit part of the sym-

bol refers to the grammatical person (1

or 2

)

and the letter part refers to the grammatical gender

(Masculine or Feminine). Additionally, we use B

to refer to invariant/ambiguous gender.

2.1 Shared Task Restrictions

We provided the participants with a set of restric-

tions for building their systems to ensure a common

experimental setup and fair comparison. Partici-

pants were asked not to use any external manually

labeled datasets. However, the use of publicly avail-

able unlabeled data was allowed. Participants were

also not allowed to use the publicly available de-

velopment and test sets of the shared task corpus

for training their systems. Moreover, we provided

the participants with a new blind test set that was

manually annotated for this shared task. The partic-

ipants were provided with the input sentences and

they did not have access to the gold references. We

discuss the properties and statistics of this new test

set in more detail in §3.2.

2.2 Evaluation Metrics

We follow Alhafni et al. (2022b) by treating the gen-

der rewriting problem as a user-aware grammatical

error correction task and use the MaxMatch (M

)

scorer (Dahlmeier and Ng,2012) as our evaluation

metric. The M

scorer computes the Precision (P),

Recall (R), and F

0.5

by maximally matching phrase-

level edits made by a system to gold-standard edits.

The gold edits are computed by the M

scorer based

on provided gold references. We also report BLEU

(Papineni et al.,2002) scores which are obtained

using SacreBLEU (Post,2018). We report the gen-

der rewriting results in a normalized space for Alif,

Ya, and Ta-Marbuta (Habash,2010).

3 Shared Task Data

In this section, we describe the data we use in the

shared task.

3.1 The Arabic Parallel Gender Corpus

We use the publicly available Arabic Parallel Gen-

der Corpus (APGC) – a parallel corpus of Ara-

bic sentences with gender annotations and gender

rewritten alternatives of sentences selected from

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TheSharedTaskonGenderRewritingBasharAlhafni,1NizarHabash,1HoudaBouamor,2OssamaObeid,1SultanAlrowili,3DaliyahAlzeer,4KhawlahM.Alshanqiti,5AhmedElBakry,6MuhammadElNokrashy,6MohamedGabr,6AbderrahmaneIssam,7AbdelrahimQaddoumi,8K.Vijay-Shanker,3MahmoudZyate91NewYorkUniversityAbuDhabi,2CarnegieMellonUniv...

展开>> 收起<<

The Shared Task on Gender Rewriting Bashar Alhafni1Nizar Habash1Houda Bouamor2Ossama Obeid1 Sultan Alrowili 3Daliyah Alzeer 4Khawlah M. Alshanqiti 5Ahmed ElBakry 6.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The Shared Task on Gender Rewriting Bashar Alhafni1Nizar Habash1Houda Bouamor2Ossama Obeid1 Sultan Alrowili 3Daliyah Alzeer 4Khawlah M. Alshanqiti 5Ahmed ElBakry 6

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: