A Benchmark Study of Contrastive Learning for Arabic Social Meaning Md Tawkat Islam KhondakeryEl Moatez Billah NagoudiyAbdelRahim Elmadanyy Muhammad Abdul-MageedyLaks V .S. Lakshmanan

2025-04-30 0 0 762.8KB 13 页 10玖币
侵权投诉
A Benchmark Study of Contrastive Learning for Arabic Social Meaning
Md Tawkat Islam KhondakerEl Moatez Billah NagoudiAbdelRahim Elmadany
Muhammad Abdul-MageedLaks V.S. Lakshmanan
Deep Learning & Natural Language Processing Group
The University of British Columbia
{tawkat@cs.,laks@cs.,muhammad.mageed@}ubc.ca
Abstract
Contrastive learning (CL) brought significant
progress to various NLP tasks. Despite this
progress, CL has not been applied to Ara-
bic NLP to date. Nor is it clear how much
benefits it could bring to particular classes
of tasks such as those involved in Arabic so-
cial meaning (e.g., sentiment analysis, dialect
identification, hate speech detection). In this
work, we present a comprehensive benchmark
study of state-of-the-art supervised CL meth-
ods on a wide array of Arabic social meaning
tasks. Through extensive empirical analyses,
we show that CL methods outperform vanilla
finetuning on most tasks we consider. We also
show that CL can be data efficient and quan-
tify this efficiency. Overall, our work allows
us to demonstrate the promise of CL methods,
including in low-resource settings.
1 Introduction
Proliferation of social media resulted in unprece-
dented online user engagement. People around
the world share their emotions, fears, hopes, opin-
ions, etc. online on a daily basis (Farzindar and
Inkpen 2015;Zhang and Abdul-Mageed 2022) on
platforms such as Facebook and Twitter. Hence,
these platforms offer excellent resources for social
meaning tasks such as emotion recognition (Abdul-
Mageed and Ungar 2017;Mohammad et al. 2018),
irony detection (Van Hee et al. 2018), sarcasm de-
tection (Bamman and Smith 2015), hate speech
identification (Waseem and Hovy 2016), stance
identification (Mohammad et al. 2016), among oth-
ers. While the majority of previous social meaning
studies were carried out on English, a fast-growing
number of investigations focus on other languages.
In this paper, we focus on Arabic.
Several works have been conducted on differ-
ent Arabic social meaning tasks. Some of these
focus on Modern Standard Arabic (MSA) (Abdul-
Mageed et al. 2011,2012), while others take Ara-
bic dialects as their target (ElSahar and El-Beltagy
Figure 1: Visual illustration of how supervised con-
trastive learning works. Representations from the same
class are pulled close to each other while representa-
tions from the different classes are pushed further apart.
2015;Al Sallab et al. 2015). While many works
have focused on sentiment analysis, e.g., (Abdul-
Mageed et al.,2012;Nabil et al.,2015;ElSahar and
El-Beltagy,2015;Al Sallab et al.,2015;Al-Moslmi
et al.,2018;Al-Smadi et al.,2019;Al-Ayyoub
et al.,2019;Farha and Magdy,2019) and dialect
identification (Elfardy and Diab,2013;Zaidan
and Callison-Burch,2011,2014;Cotterell and
Callison-Burch,2014;Zhang and Abdul-Mageed,
2019;Bouamor et al.,2018;Abdul-Mageed et al.,
2020b,a,2021b), others focused on detection of
user demographics such as age and gender (Za-
ghouani and Charfi 2018;Rangel et al. 2019), irony
detection (Karoui et al. 2017;Ghanem et al. 2019),
and emotion analysis (Abdul-Mageed et al. 2016;
Alhuzali et al. 2018). Our interest in the current
work is improving Arabic social meaning through
representation learning.
In spite of recent progress in representation learn-
ing, most work in Arabic social meaning mostly
focuses on finetuning language models such as
AraT5 (Nagoudi et al.,2022), CamelBERT (Inoue
et al.,2021), MARBERT (Abdul-Mageed et al.,
2021a), QARIB (Abdelali et al.,2021), among oth-
ers. In particular, Arabic social media processing
has to date ignored the emerging sub-area of con-
trastive learning (CL) (Hadsell et al. 2006). Given
a labeled dataset, CL (Khosla et al.,2020) attempts
arXiv:2210.12314v1 [cs.CL] 22 Oct 2022
to pull representations of the same class close to
each other while pushing representations of differ-
ent classes further apart (Figure 1). In this work,
we investigate five different supervised contrastive
learning methods in the context of Arabic social
meaning. To the best of our knowledge, this is the
first work that provides a comprehensive study of
supervised contrastive learning on a wide range
of Arabic social meanings. We show that perfor-
mance of CL methods can be task-dependent. We
attempt to explain this performance from the per-
spective of task specificity (i.e., how fine-grained
the labels of a given task are). We also show that
contrastive learning methods generally perform bet-
ter than vanilla finetuning based on cross entropy
(CE). Through an extensive experimental study, we
also demonstrate that CL methods outperform CE
finetuning under resource-limited constraints. Our
work allows us to demonstrate the promise of CL
methods in general, and in low-resource settings in
particular.
To summarize, we offer the following contribu-
tions:
1.
We study a comprehensive set of supervised
CL methods for a wide range of Arabic social
meaning tasks, including abusive language
and hate speech detection, emotion and sen-
timent analysis, and identification of demo-
graphic attributes (e.g. age, gender).
2.
We show that CL-based methods outperform
generic CE-based vanilla finetuning for most
of the tasks. To the best of our knowledge,
this is the first work that provides an exten-
sive study of supervised CL on Arabic social
meaning.
3.
We empirically find that improvements CL
methods result in are task-specific and attempt
to understand this finding in the context of the
different tasks we consider with regard to their
label granularity.
4.
We demonstrate that CL methods can achieve
better performance under limited data con-
straints, emphasizing and quantifying how
well these can work for low-resource settings.
2 Related Works
2.1 Arabic Social Meaning
We use the term social meaning (SM) to refer to
meaning arising in real-world communication in
social media (Thomas,2014;Zhang et al.,2022b).
SM covers tasks such as sentiment analysis (Abdul-
Mageed et al.,2012;Abu Farha et al.,2021;Saleh
et al.,2022;Alali et al.,2022), emotion recogni-
tion (Alhuzali et al.,2018;Mubarak et al.,2022c;
Abu Shaqra et al.,2022;Mansy et al.,2022), age
and gender identification (Abdul-Mageed et al.,
2020c;Abbes et al.,2020;Mubarak et al.,2022b;
Mansour Khoudja et al.,2022), hate-speech and
offensive language detection (Elmadany et al.,
2020a;Mubarak et al.,2020,2022a;Husain and
Uzuner,2022), and sarcasm detection (Farha and
Magdy,2020;Wafa’Q et al.,2022;Abdullah et al.,
2022).
Most of the recent studies are transformers-
based. They directly finetune pre-trained mod-
els such as mBERT (Devlin et al.,2018),
MARBERT (Abdul-Mageed et al.,2021a), and
AraT5 (Nagoudi et al.,2022) on SM datasets
like (Abdul-Mageed et al.,2020c;Alshehri et al.,
2020;Abuzayed and Al-Khalifa,2021;Nessir et al.,
2022), using data augmentation (Elmadany et al.,
2020b), ensampling (Mansy et al.,2022;Alzu’bi
et al.,2022), and multi-tasks (Abdul-Mageed et al.,
2020b;Shapiro et al.,2022;AlKhamissi and Diab,
2022). However, to the best of our knowledge,
there is no published research studying CL on Ara-
bic language understanding in general nor social
meaning processing in paticular.
2.2 Contrastive Learning
CL aims to learn effective embedding by pulling se-
mantically close neighbors together while pushing
apart non-neighbors (Hadsell et al. 2006). CL em-
ploys a CL-based similarity objective to learn the
embedding representation in the hyperspace (Chen
et al.,2017;Henderson et al.,2017). In com-
puter vision, Chen et al. (2020a) propose a frame-
work for contrastive learning of visual representa-
tions without specialized architectures or a memory
bank. Khosla et al. (2020) shows that supervised
contrastive loss can outperform CL loss on Ima-
geNet (Russakovsky et al.,2015). In NLP, simi-
lar methods have been explored in the context of
sentence representation learning (Karpukhin et al.,
2020;Gillick et al.,2019;Logeswaran and Lee,
2018;Zhang et al.,2022a). Among the most no-
table works is Gao et al. (2021) who propose un-
supervised CL framework, SimCSE, that predicts
input sentence itself by augmenting it with dropout
as noise.
Recent works have been studying CL exten-
sively for improving both semantic text similar-
ity (STS) and text classification tasks (Meng et al.
2021;Qu et al. 2020;Qiu et al. 2021;Janson et al.
2021). Fang et al. (2020) propose back-translation
as a source of positive pair for NLU tasks. Klein
and Nabi (2022) argue that feature decorrelation
between high and low dropout projected representa-
tions improves STS tasks. Zhou et al. (2022) design
an instance weighting method to penalize false neg-
atives and generate noise-based negatives to guar-
antee the uniformity of the representation space. Su
et al. (2022) propose a token-aware CL method by
contrasting the token from the same sequence to
improve the uniformity in the embedding space.
We now formally introduce these CL methods and
how we employ them in our work.
3 Methods
Given a set of training examples
{xi, yi}i=1,...,N
and an encoder based on a pre-trained language
model (PLM), foutputs contextualized token rep-
resentation of xi,
H={h[CLS], h1, h2, ..., h[SEP]}(1)
Where
H
is the hidden representation of the final
layer of the encoder.
The standard practice of finetuning PLMs passes
the pooled representation
h[CLS]
of
[CLS]
to a
softmax classifier to obtain the probability distribu-
tion for the set of classes C(Figure 2a).
p(yc|h[CLS]) = softmax (Wh[CLS]); cC
(2)
Where
W∈ RdC×dh
are trainable parameters
and
dh
is hidden dimension. The model is trained
with the objective of minimizing cross-entropy
(CE) loss,
LCE =1
N
N
X
i=1
C
X
c=1
yi,c log(p(yi,c|hi[CLS]))1
(3)
3.1 Supervised Contrastive Loss (SCL)
The objective of supervised contrastive loss
(Khosla et al. 2020) is to pull the representations
1hi[CLS]
and
hi
are used interchangeably in the rest of the
paper.
of the same class close to each other while pushing
the representations of different classes further apart.
Following Gao et al. (2021), we adopt dropout-
based data augmentation where for each represen-
tation
hi
, we produce an equivalent dropout-based
representation
hj
and consider
hj
as having the
same label as
hi
(Figure 2b). The model attempts
to minimize NTXent loss (Chen et al.,2020a). The
purpose of NTXent loss is to take each in-batch
representation as an anchor and minimize the dis-
tance between the anchor(
hi
) and the representa-
tions from the same class (
Pi
) while maximizing
the distance between the anchor and the represen-
tation from different classes,
LNT X =
2N
X
i=1
1
PiX
jPi
log esim(hi,hj)
P2N
k=1 1i6=kesim(hi,hk)
(4)
Where
τ
is used to regulate the temperature. The
final loss for SCL is
LSCL = (1 λ)LCE +λLN T X
3.2 Contrastive Adversarial Training (CAT)
Instead of dropout-based augmentation, Pan et al.
(2022) propose to generate adversarial examples
applying fast gradient sign method (FGSM) (Good-
fellow et al.,2015). Formally, FGSM attempts to
maximize
LCE
by adding a small perturbation
r
bounded by ,
maxLCE =arg max
rL(f(xi+r, yi)
s.t. ||r|| < ,  > 0(5)
Goodfellow et al. (2015) approximate the perturba-
tion
r
with a linear approximation around
xi
and
an L2 norm constraint. However, Pan et al. (2022)
propose to approximate
r
around the word embed-
ding matrix
V∈ RdV×dh
(Figure 2c), where
dV
is the vocabulary size. Hence, the adversarial per-
turbation is computed as,
r=VL(f(xi, yi)
||∇VL(f(xi, yi)||2
(6)
After receiving
xi
, the perturbed encoder
fV+r
out-
puts
[CLS]
representation
hj
, which is treated as
the positive pair of
hi
. Both
hi
and
hj
are passed
through a non-linear projection layer and the result-
ing representations are used to train the model with
InfoNCE loss (Oord et al.,2018).
zi=W2ReLU (W1hi)(7)
摘要:

ABenchmarkStudyofContrastiveLearningforArabicSocialMeaningMdTawkatIslamKhondakeryElMoatezBillahNagoudiyAbdelRahimElmadanyyMuhammadAbdul-MageedyLaksV.S.LakshmananyDeepLearning&NaturalLanguageProcessingGroupTheUniversityofBritishColumbia{tawkat@cs.,laks@cs.,muhammad.mageed@}ubc.caAbstractContrastivele...

展开>> 收起<<
A Benchmark Study of Contrastive Learning for Arabic Social Meaning Md Tawkat Islam KhondakeryEl Moatez Billah NagoudiyAbdelRahim Elmadanyy Muhammad Abdul-MageedyLaks V .S. Lakshmanan.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:762.8KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注