The Shared Task on Gender Rewriting Bashar Alhafni1Nizar Habash1Houda Bouamor2Ossama Obeid1 Sultan Alrowili 3Daliyah Alzeer 4Khawlah M. Alshanqiti 5Ahmed ElBakry 6

2025-05-06 0 0 379.83KB 10 页 10玖币
侵权投诉
The Shared Task on Gender Rewriting
Bashar Alhafni,1Nizar Habash,1Houda Bouamor,2Ossama Obeid,1
Sultan Alrowili,3Daliyah Alzeer,4Khawlah M. Alshanqiti,5Ahmed ElBakry,6
Muhammad ElNokrashy,6Mohamed Gabr,6Abderrahmane Issam,7
Abdelrahim Qaddoumi,8K. Vijay-Shanker,3Mahmoud Zyate9
1New York University Abu Dhabi, 2Carnegie Mellon University in Qatar,
3University of Delaware, 4Taif University, 5Umm Alqura University,
6Microsoft ATL Cairo, 7Archipel Cognitive, 8New York University, 9Leyton
alhafni@nyu.edu
Abstract
In this paper, we present the results and find-
ings of the Shared Task on Gender Rewriting,
which was organized as part of the Seventh
Arabic Natural Language Processing Work-
shop. The task of gender rewriting refers
to generating alternatives of a given sentence
to match different target user gender contexts
(e.g., female speaker with a male listener,
a male speaker with a male listener, etc.).
This requires changing the grammatical gen-
der (masculine or feminine) of certain words
referring to the users. In this task, we focus
on Arabic, a gender-marking morphologically
rich language. A total of five teams from four
countries participated in the shared task.
1 Introduction
The problem of gender bias in Natural Language
Processing (NLP) systems has been receiving a lot
of attention across a variety of tasks such as ma-
chine translation, co-reference resolution, and dia-
logue systems. Research has shown that NLP sys-
tems do not only have the ability to embed societal
biases, but they also amplify and propagate them
in ways that create representational harms and de-
grade users’ experiences (Sun et al.,2019;Blodgett
et al.,2020). The main cause of this problem is usu-
ally attributed to inherently biased data that is used
to build these systems and which mirrors the in-
equalities of the world we live in. Therefore, many
approaches were proposed to mitigate this problem
by either using counterfactual data augmentation
techniques (Lu et al.,2018;Hall Maudslay et al.,
2019;Zmigrod et al.,2019) or by debiasing pre-
trained representation that is trained on biased data
(Bolukbasi et al.,2016;Zhao et al.,2018;Manzini
et al.,2019;Zhao et al.,2020). However, even
the most balanced of models can still exhibit and
The first four authors are the shared task organizers,
listed in order of contribution. The remaining authors are the
shared task participants in alphabetical order.
amplify bias if they are designed to produce a sin-
gle text output without taking their users’ gender
preferences into consideration (Habash et al.,2019;
Alhafni et al.,2020,2022b). Therefore, to provide
the correct user-aware output, NLP systems should
be designed to produce outputs that are as gender
specific as the users preferences they have access
to. Recently, Alhafni et al. (2022b) introduced the
task of gender rewriting, which refers to generating
alternatives of a given sentence to match different
target user gender contexts. To encourage more
researchers to work on this problem, we organized
the Shared Task on Gender Rewriting. We focus on
Modern Standard Arabic (MSA), a gender-marking
morphologically rich language, in contexts involv-
ing two users.1
This shared task was organized as part of the Sev-
enth Arabic Natural Language Processing Work-
shop (WANLP), collocated with EMNLP 2022.
This is the first shared task at WANLP in seven
years to target a language generation problem in
Arabic. A total of five teams from four countries
participated in the shared task. One team con-
tributed to a system description paper which is in-
cluded in the WANLP proceedings and cited in this
paper. We provide a description of all submitted
systems and the approaches they use. All of the
datasets created for this shared task will be made
publicly available to support further research on
gender rewriting.
This paper is organized as follows. We first pro-
vide a description of the shared task (§2). We then
describe the data used in the shared task, including
a newly created set which we used for evaluation in
§3. Next, we provide a description of all submitted
systems in §4and discuss the results in §5. Finally,
we discuss the lessons we learned from running
this shared task and provide recommendations to
the (Arabic) NLP community in §6.
1http://gender-rewriting-shared-task.
camel-lab.com/
arXiv:2210.12410v1 [cs.CL] 22 Oct 2022
Input Sentence Target Speaker Target Listener Output Sentence
تا    ة
(Really glad to know you ladies)
Masculine Masculine ةد    
(Really glad to know you gentlemen)
Feminine Masculine ةد    ة
(Really glad to know you gentlemen)
Masculine Feminine تا    
(Really glad to know you ladies)
Feminine Feminine تا    ة
(Really glad to know you ladies)
Table 1: Example of the gender rewriting task. The input sentence has four rewritten alternatives that match the
different target user gender contexts. First person gendered words are in purple and second person gendered words
are in red.
2 Task Description
The task of gender rewriting was introduced by
Alhafni et al. (2022b) and it refers to generating
alternatives of a given Arabic sentence to match
different target user gender contexts. We focus
on contexts involving two users (I and/or You) –
first and second grammatical persons with indepen-
dent grammatical gender preferences. This requires
changing the grammatical gender (masculine or
feminine) of certain words referring to the users
(speaker/first person and listener/second person)
in the input sentence. Therefore, given an Arabic
sentence as an input, the goal is to generate four
different gender rewritten alternatives to match the
different target user gender contexts (i.e., female
speaker with a male listener, a male speaker with
a male listener, a male speaker with a female lis-
tener, and a female speaker with a female listener).
Table 1shows an example of the gender rewriting
problem where the input sentence is rewritten to its
four gender alternatives that match the four target
user gender contexts.
Notation
We use the notation that is defined by
Alhafni et al. (2022b). Namely, we use four ele-
mentary symbols to facilitate the discussion of this
task: 1M, 1F, 2M and 2F. The digit part of the sym-
bol refers to the grammatical person (1
st
or 2
nd
)
and the letter part refers to the grammatical gender
(Masculine or Feminine). Additionally, we use B
to refer to invariant/ambiguous gender.
2.1 Shared Task Restrictions
We provided the participants with a set of restric-
tions for building their systems to ensure a common
experimental setup and fair comparison. Partici-
pants were asked not to use any external manually
labeled datasets. However, the use of publicly avail-
able unlabeled data was allowed. Participants were
also not allowed to use the publicly available de-
velopment and test sets of the shared task corpus
for training their systems. Moreover, we provided
the participants with a new blind test set that was
manually annotated for this shared task. The partic-
ipants were provided with the input sentences and
they did not have access to the gold references. We
discuss the properties and statistics of this new test
set in more detail in §3.2.
2.2 Evaluation Metrics
We follow Alhafni et al. (2022b) by treating the gen-
der rewriting problem as a user-aware grammatical
error correction task and use the MaxMatch (M
2
)
scorer (Dahlmeier and Ng,2012) as our evaluation
metric. The M
2
scorer computes the Precision (P),
Recall (R), and F
0.5
by maximally matching phrase-
level edits made by a system to gold-standard edits.
The gold edits are computed by the M
2
scorer based
on provided gold references. We also report BLEU
(Papineni et al.,2002) scores which are obtained
using SacreBLEU (Post,2018). We report the gen-
der rewriting results in a normalized space for Alif,
Ya, and Ta-Marbuta (Habash,2010).
3 Shared Task Data
In this section, we describe the data we use in the
shared task.
3.1 The Arabic Parallel Gender Corpus
We use the publicly available Arabic Parallel Gen-
der Corpus (APGC) – a parallel corpus of Ara-
bic sentences with gender annotations and gender
rewritten alternatives of sentences selected from
摘要:

TheSharedTaskonGenderRewritingBasharAlhafni,1NizarHabash,1HoudaBouamor,2OssamaObeid,1SultanAlrowili,3DaliyahAlzeer,4KhawlahM.Alshanqiti,5AhmedElBakry,6MuhammadElNokrashy,6MohamedGabr,6AbderrahmaneIssam,7AbdelrahimQaddoumi,8K.Vijay-Shanker,3MahmoudZyate91NewYorkUniversityAbuDhabi,2CarnegieMellonUniv...

展开>> 收起<<
The Shared Task on Gender Rewriting Bashar Alhafni1Nizar Habash1Houda Bouamor2Ossama Obeid1 Sultan Alrowili 3Daliyah Alzeer 4Khawlah M. Alshanqiti 5Ahmed ElBakry 6.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:379.83KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注