to pull representations of the same class close to
each other while pushing representations of differ-
ent classes further apart (Figure 1). In this work,
we investigate five different supervised contrastive
learning methods in the context of Arabic social
meaning. To the best of our knowledge, this is the
first work that provides a comprehensive study of
supervised contrastive learning on a wide range
of Arabic social meanings. We show that perfor-
mance of CL methods can be task-dependent. We
attempt to explain this performance from the per-
spective of task specificity (i.e., how fine-grained
the labels of a given task are). We also show that
contrastive learning methods generally perform bet-
ter than vanilla finetuning based on cross entropy
(CE). Through an extensive experimental study, we
also demonstrate that CL methods outperform CE
finetuning under resource-limited constraints. Our
work allows us to demonstrate the promise of CL
methods in general, and in low-resource settings in
particular.
To summarize, we offer the following contribu-
tions:
1.
We study a comprehensive set of supervised
CL methods for a wide range of Arabic social
meaning tasks, including abusive language
and hate speech detection, emotion and sen-
timent analysis, and identification of demo-
graphic attributes (e.g. age, gender).
2.
We show that CL-based methods outperform
generic CE-based vanilla finetuning for most
of the tasks. To the best of our knowledge,
this is the first work that provides an exten-
sive study of supervised CL on Arabic social
meaning.
3.
We empirically find that improvements CL
methods result in are task-specific and attempt
to understand this finding in the context of the
different tasks we consider with regard to their
label granularity.
4.
We demonstrate that CL methods can achieve
better performance under limited data con-
straints, emphasizing and quantifying how
well these can work for low-resource settings.
2 Related Works
2.1 Arabic Social Meaning
We use the term social meaning (SM) to refer to
meaning arising in real-world communication in
social media (Thomas,2014;Zhang et al.,2022b).
SM covers tasks such as sentiment analysis (Abdul-
Mageed et al.,2012;Abu Farha et al.,2021;Saleh
et al.,2022;Alali et al.,2022), emotion recogni-
tion (Alhuzali et al.,2018;Mubarak et al.,2022c;
Abu Shaqra et al.,2022;Mansy et al.,2022), age
and gender identification (Abdul-Mageed et al.,
2020c;Abbes et al.,2020;Mubarak et al.,2022b;
Mansour Khoudja et al.,2022), hate-speech and
offensive language detection (Elmadany et al.,
2020a;Mubarak et al.,2020,2022a;Husain and
Uzuner,2022), and sarcasm detection (Farha and
Magdy,2020;Wafa’Q et al.,2022;Abdullah et al.,
2022).
Most of the recent studies are transformers-
based. They directly finetune pre-trained mod-
els such as mBERT (Devlin et al.,2018),
MARBERT (Abdul-Mageed et al.,2021a), and
AraT5 (Nagoudi et al.,2022) on SM datasets
like (Abdul-Mageed et al.,2020c;Alshehri et al.,
2020;Abuzayed and Al-Khalifa,2021;Nessir et al.,
2022), using data augmentation (Elmadany et al.,
2020b), ensampling (Mansy et al.,2022;Alzu’bi
et al.,2022), and multi-tasks (Abdul-Mageed et al.,
2020b;Shapiro et al.,2022;AlKhamissi and Diab,
2022). However, to the best of our knowledge,
there is no published research studying CL on Ara-
bic language understanding in general nor social
meaning processing in paticular.
2.2 Contrastive Learning
CL aims to learn effective embedding by pulling se-
mantically close neighbors together while pushing
apart non-neighbors (Hadsell et al. 2006). CL em-
ploys a CL-based similarity objective to learn the
embedding representation in the hyperspace (Chen
et al.,2017;Henderson et al.,2017). In com-
puter vision, Chen et al. (2020a) propose a frame-
work for contrastive learning of visual representa-
tions without specialized architectures or a memory
bank. Khosla et al. (2020) shows that supervised
contrastive loss can outperform CL loss on Ima-
geNet (Russakovsky et al.,2015). In NLP, simi-
lar methods have been explored in the context of
sentence representation learning (Karpukhin et al.,
2020;Gillick et al.,2019;Logeswaran and Lee,
2018;Zhang et al.,2022a). Among the most no-
table works is Gao et al. (2021) who propose un-
supervised CL framework, SimCSE, that predicts
input sentence itself by augmenting it with dropout