
The Shared Task on Gender Rewriting
Bashar Alhafni,1Nizar Habash,1Houda Bouamor,2Ossama Obeid,1
Sultan Alrowili,3Daliyah Alzeer,4Khawlah M. Alshanqiti,5Ahmed ElBakry,6
Muhammad ElNokrashy,6Mohamed Gabr,6Abderrahmane Issam,7
Abdelrahim Qaddoumi,8K. Vijay-Shanker,3Mahmoud Zyate9∗
1New York University Abu Dhabi, 2Carnegie Mellon University in Qatar,
3University of Delaware, 4Taif University, 5Umm Alqura University,
6Microsoft ATL Cairo, 7Archipel Cognitive, 8New York University, 9Leyton
alhafni@nyu.edu
Abstract
In this paper, we present the results and find-
ings of the Shared Task on Gender Rewriting,
which was organized as part of the Seventh
Arabic Natural Language Processing Work-
shop. The task of gender rewriting refers
to generating alternatives of a given sentence
to match different target user gender contexts
(e.g., female speaker with a male listener,
a male speaker with a male listener, etc.).
This requires changing the grammatical gen-
der (masculine or feminine) of certain words
referring to the users. In this task, we focus
on Arabic, a gender-marking morphologically
rich language. A total of five teams from four
countries participated in the shared task.
1 Introduction
The problem of gender bias in Natural Language
Processing (NLP) systems has been receiving a lot
of attention across a variety of tasks such as ma-
chine translation, co-reference resolution, and dia-
logue systems. Research has shown that NLP sys-
tems do not only have the ability to embed societal
biases, but they also amplify and propagate them
in ways that create representational harms and de-
grade users’ experiences (Sun et al.,2019;Blodgett
et al.,2020). The main cause of this problem is usu-
ally attributed to inherently biased data that is used
to build these systems and which mirrors the in-
equalities of the world we live in. Therefore, many
approaches were proposed to mitigate this problem
by either using counterfactual data augmentation
techniques (Lu et al.,2018;Hall Maudslay et al.,
2019;Zmigrod et al.,2019) or by debiasing pre-
trained representation that is trained on biased data
(Bolukbasi et al.,2016;Zhao et al.,2018;Manzini
et al.,2019;Zhao et al.,2020). However, even
the most balanced of models can still exhibit and
∗
The first four authors are the shared task organizers,
listed in order of contribution. The remaining authors are the
shared task participants in alphabetical order.
amplify bias if they are designed to produce a sin-
gle text output without taking their users’ gender
preferences into consideration (Habash et al.,2019;
Alhafni et al.,2020,2022b). Therefore, to provide
the correct user-aware output, NLP systems should
be designed to produce outputs that are as gender
specific as the users preferences they have access
to. Recently, Alhafni et al. (2022b) introduced the
task of gender rewriting, which refers to generating
alternatives of a given sentence to match different
target user gender contexts. To encourage more
researchers to work on this problem, we organized
the Shared Task on Gender Rewriting. We focus on
Modern Standard Arabic (MSA), a gender-marking
morphologically rich language, in contexts involv-
ing two users.1
This shared task was organized as part of the Sev-
enth Arabic Natural Language Processing Work-
shop (WANLP), collocated with EMNLP 2022.
This is the first shared task at WANLP in seven
years to target a language generation problem in
Arabic. A total of five teams from four countries
participated in the shared task. One team con-
tributed to a system description paper which is in-
cluded in the WANLP proceedings and cited in this
paper. We provide a description of all submitted
systems and the approaches they use. All of the
datasets created for this shared task will be made
publicly available to support further research on
gender rewriting.
This paper is organized as follows. We first pro-
vide a description of the shared task (§2). We then
describe the data used in the shared task, including
a newly created set which we used for evaluation in
§3. Next, we provide a description of all submitted
systems in §4and discuss the results in §5. Finally,
we discuss the lessons we learned from running
this shared task and provide recommendations to
the (Arabic) NLP community in §6.
1http://gender-rewriting-shared-task.
camel-lab.com/
arXiv:2210.12410v1 [cs.CL] 22 Oct 2022