A Benchmark Study of Contrastive Learning for Arabic Social Meaning Md Tawkat Islam KhondakeryEl Moatez Billah NagoudiyAbdelRahim Elmadanyy Muhammad Abdul-MageedyLaks V .S. Lakshmanan

2025-04-30 0 0 762.8KB 13 页 10玖币

侵权投诉

A Benchmark Study of Contrastive Learning for Arabic Social Meaning

Md Tawkat Islam Khondaker†El Moatez Billah Nagoudi†AbdelRahim Elmadany†

Muhammad Abdul-Mageed†Laks V.S. Lakshmanan

†Deep Learning & Natural Language Processing Group

The University of British Columbia

{tawkat@cs.,laks@cs.,muhammad.mageed@}ubc.ca

Abstract

Contrastive learning (CL) brought signiﬁcant

progress to various NLP tasks. Despite this

progress, CL has not been applied to Ara-

bic NLP to date. Nor is it clear how much

beneﬁts it could bring to particular classes

of tasks such as those involved in Arabic so-

cial meaning (e.g., sentiment analysis, dialect

identiﬁcation, hate speech detection). In this

work, we present a comprehensive benchmark

study of state-of-the-art supervised CL meth-

ods on a wide array of Arabic social meaning

tasks. Through extensive empirical analyses,

we show that CL methods outperform vanilla

ﬁnetuning on most tasks we consider. We also

show that CL can be data efﬁcient and quan-

tify this efﬁciency. Overall, our work allows

us to demonstrate the promise of CL methods,

including in low-resource settings.

1 Introduction

Proliferation of social media resulted in unprece-

dented online user engagement. People around

the world share their emotions, fears, hopes, opin-

ions, etc. online on a daily basis (Farzindar and

Inkpen 2015;Zhang and Abdul-Mageed 2022) on

platforms such as Facebook and Twitter. Hence,

these platforms offer excellent resources for social

meaning tasks such as emotion recognition (Abdul-

Mageed and Ungar 2017;Mohammad et al. 2018),

irony detection (Van Hee et al. 2018), sarcasm de-

tection (Bamman and Smith 2015), hate speech

identiﬁcation (Waseem and Hovy 2016), stance

identiﬁcation (Mohammad et al. 2016), among oth-

ers. While the majority of previous social meaning

studies were carried out on English, a fast-growing

number of investigations focus on other languages.

In this paper, we focus on Arabic.

Several works have been conducted on differ-

ent Arabic social meaning tasks. Some of these

focus on Modern Standard Arabic (MSA) (Abdul-

Mageed et al. 2011,2012), while others take Ara-

bic dialects as their target (ElSahar and El-Beltagy

Figure 1: Visual illustration of how supervised con-

trastive learning works. Representations from the same

class are pulled close to each other while representa-

tions from the different classes are pushed further apart.

2015;Al Sallab et al. 2015). While many works

have focused on sentiment analysis, e.g., (Abdul-

Mageed et al.,2012;Nabil et al.,2015;ElSahar and

El-Beltagy,2015;Al Sallab et al.,2015;Al-Moslmi

et al.,2018;Al-Smadi et al.,2019;Al-Ayyoub

et al.,2019;Farha and Magdy,2019) and dialect

identiﬁcation (Elfardy and Diab,2013;Zaidan

and Callison-Burch,2011,2014;Cotterell and

Callison-Burch,2014;Zhang and Abdul-Mageed,

2019;Bouamor et al.,2018;Abdul-Mageed et al.,

2020b,a,2021b), others focused on detection of

user demographics such as age and gender (Za-

ghouani and Charﬁ 2018;Rangel et al. 2019), irony

detection (Karoui et al. 2017;Ghanem et al. 2019),

and emotion analysis (Abdul-Mageed et al. 2016;

Alhuzali et al. 2018). Our interest in the current

work is improving Arabic social meaning through

representation learning.

In spite of recent progress in representation learn-

ing, most work in Arabic social meaning mostly

focuses on ﬁnetuning language models such as

AraT5 (Nagoudi et al.,2022), CamelBERT (Inoue

et al.,2021), MARBERT (Abdul-Mageed et al.,

2021a), QARIB (Abdelali et al.,2021), among oth-

ers. In particular, Arabic social media processing

has to date ignored the emerging sub-area of con-

trastive learning (CL) (Hadsell et al. 2006). Given

a labeled dataset, CL (Khosla et al.,2020) attempts

arXiv:2210.12314v1 [cs.CL] 22 Oct 2022

to pull representations of the same class close to

each other while pushing representations of differ-

ent classes further apart (Figure 1). In this work,

we investigate ﬁve different supervised contrastive

learning methods in the context of Arabic social

meaning. To the best of our knowledge, this is the

ﬁrst work that provides a comprehensive study of

supervised contrastive learning on a wide range

of Arabic social meanings. We show that perfor-

mance of CL methods can be task-dependent. We

attempt to explain this performance from the per-

spective of task speciﬁcity (i.e., how ﬁne-grained

the labels of a given task are). We also show that

contrastive learning methods generally perform bet-

ter than vanilla ﬁnetuning based on cross entropy

(CE). Through an extensive experimental study, we

also demonstrate that CL methods outperform CE

ﬁnetuning under resource-limited constraints. Our

work allows us to demonstrate the promise of CL

methods in general, and in low-resource settings in

particular.

To summarize, we offer the following contribu-

tions:

We study a comprehensive set of supervised

CL methods for a wide range of Arabic social

meaning tasks, including abusive language

and hate speech detection, emotion and sen-

timent analysis, and identiﬁcation of demo-

graphic attributes (e.g. age, gender).

We show that CL-based methods outperform

generic CE-based vanilla ﬁnetuning for most

of the tasks. To the best of our knowledge,

this is the ﬁrst work that provides an exten-

sive study of supervised CL on Arabic social

meaning.

We empirically ﬁnd that improvements CL

methods result in are task-speciﬁc and attempt

to understand this ﬁnding in the context of the

different tasks we consider with regard to their

label granularity.

We demonstrate that CL methods can achieve

better performance under limited data con-

straints, emphasizing and quantifying how

well these can work for low-resource settings.

2 Related Works

2.1 Arabic Social Meaning

We use the term social meaning (SM) to refer to

meaning arising in real-world communication in

social media (Thomas,2014;Zhang et al.,2022b).

SM covers tasks such as sentiment analysis (Abdul-

Mageed et al.,2012;Abu Farha et al.,2021;Saleh

et al.,2022;Alali et al.,2022), emotion recogni-

tion (Alhuzali et al.,2018;Mubarak et al.,2022c;

Abu Shaqra et al.,2022;Mansy et al.,2022), age

and gender identiﬁcation (Abdul-Mageed et al.,

2020c;Abbes et al.,2020;Mubarak et al.,2022b;

Mansour Khoudja et al.,2022), hate-speech and

offensive language detection (Elmadany et al.,

2020a;Mubarak et al.,2020,2022a;Husain and

Uzuner,2022), and sarcasm detection (Farha and

Magdy,2020;Wafa’Q et al.,2022;Abdullah et al.,

2022).

Most of the recent studies are transformers-

based. They directly ﬁnetune pre-trained mod-

els such as mBERT (Devlin et al.,2018),

MARBERT (Abdul-Mageed et al.,2021a), and

AraT5 (Nagoudi et al.,2022) on SM datasets

like (Abdul-Mageed et al.,2020c;Alshehri et al.,

2020;Abuzayed and Al-Khalifa,2021;Nessir et al.,

2022), using data augmentation (Elmadany et al.,

2020b), ensampling (Mansy et al.,2022;Alzu’bi

et al.,2022), and multi-tasks (Abdul-Mageed et al.,

2020b;Shapiro et al.,2022;AlKhamissi and Diab,

2022). However, to the best of our knowledge,

there is no published research studying CL on Ara-

bic language understanding in general nor social

meaning processing in paticular.

2.2 Contrastive Learning

CL aims to learn effective embedding by pulling se-

mantically close neighbors together while pushing

apart non-neighbors (Hadsell et al. 2006). CL em-

ploys a CL-based similarity objective to learn the

embedding representation in the hyperspace (Chen

et al.,2017;Henderson et al.,2017). In com-

puter vision, Chen et al. (2020a) propose a frame-

work for contrastive learning of visual representa-

tions without specialized architectures or a memory

bank. Khosla et al. (2020) shows that supervised

contrastive loss can outperform CL loss on Ima-

geNet (Russakovsky et al.,2015). In NLP, simi-

lar methods have been explored in the context of

sentence representation learning (Karpukhin et al.,

2020;Gillick et al.,2019;Logeswaran and Lee,

2018;Zhang et al.,2022a). Among the most no-

table works is Gao et al. (2021) who propose un-

supervised CL framework, SimCSE, that predicts

input sentence itself by augmenting it with dropout

as noise.

Recent works have been studying CL exten-

sively for improving both semantic text similar-

ity (STS) and text classiﬁcation tasks (Meng et al.

2021;Qu et al. 2020;Qiu et al. 2021;Janson et al.

2021). Fang et al. (2020) propose back-translation

as a source of positive pair for NLU tasks. Klein

and Nabi (2022) argue that feature decorrelation

between high and low dropout projected representa-

tions improves STS tasks. Zhou et al. (2022) design

an instance weighting method to penalize false neg-

atives and generate noise-based negatives to guar-

antee the uniformity of the representation space. Su

et al. (2022) propose a token-aware CL method by

contrasting the token from the same sequence to

improve the uniformity in the embedding space.

We now formally introduce these CL methods and

how we employ them in our work.

3 Methods

Given a set of training examples

{xi, yi}i=1,...,N

and an encoder based on a pre-trained language

model (PLM), foutputs contextualized token rep-

resentation of xi,

H={h[CLS], h1, h2, ..., h[SEP]}(1)

Where

is the hidden representation of the ﬁnal

layer of the encoder.

The standard practice of ﬁnetuning PLMs passes

the pooled representation

h[CLS]

[CLS]

to a

softmax classiﬁer to obtain the probability distribu-

tion for the set of classes C(Figure 2a).

p(yc|h[CLS]) = softmax (Wh[CLS]); c∈C

(2)

Where

W∈ RdC×dh

are trainable parameters

and

is hidden dimension. The model is trained

with the objective of minimizing cross-entropy

(CE) loss,

LCE =−1

i=1

c=1

yi,c log(p(yi,c|hi[CLS]))1

(3)

3.1 Supervised Contrastive Loss (SCL)

The objective of supervised contrastive loss

(Khosla et al. 2020) is to pull the representations

1hi[CLS]

and

are used interchangeably in the rest of the

paper.

of the same class close to each other while pushing

the representations of different classes further apart.

Following Gao et al. (2021), we adopt dropout-

based data augmentation where for each represen-

tation

, we produce an equivalent dropout-based

representation

and consider

as having the

same label as

(Figure 2b). The model attempts

to minimize NTXent loss (Chen et al.,2020a). The

purpose of NTXent loss is to take each in-batch

representation as an anchor and minimize the dis-

tance between the anchor(

) and the representa-

tions from the same class (

) while maximizing

the distance between the anchor and the represen-

tation from different classes,

LNT X =

i=1

−1

PiX

j∈Pi

log esim(hi,hj)/τ

P2N

k=1 1i6=kesim(hi,hk)/τ

(4)

Where

is used to regulate the temperature. The

ﬁnal loss for SCL is

LSCL = (1 −λ)LCE +λLN T X

3.2 Contrastive Adversarial Training (CAT)

Instead of dropout-based augmentation, Pan et al.

(2022) propose to generate adversarial examples

applying fast gradient sign method (FGSM) (Good-

fellow et al.,2015). Formally, FGSM attempts to

maximize

LCE

by adding a small perturbation

bounded by ,

maxLCE =arg max

rL(f(xi+r, yi)

s.t. ||r|| < ,  > 0(5)

Goodfellow et al. (2015) approximate the perturba-

tion

with a linear approximation around

and

an L2 norm constraint. However, Pan et al. (2022)

propose to approximate

around the word embed-

ding matrix

V∈ RdV×dh

(Figure 2c), where

is the vocabulary size. Hence, the adversarial per-

turbation is computed as,

r=−∇VL(f(xi, yi)

||∇VL(f(xi, yi)||2

(6)

After receiving

, the perturbed encoder

fV+r

out-

puts

[CLS]

representation

, which is treated as

the positive pair of

. Both

and

are passed

through a non-linear projection layer and the result-

ing representations are used to train the model with

InfoNCE loss (Oord et al.,2018).

zi=W2ReLU (W1hi)(7)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ABenchmarkStudyofContrastiveLearningforArabicSocialMeaningMdTawkatIslamKhondakeryElMoatezBillahNagoudiyAbdelRahimElmadanyyMuhammadAbdul-MageedyLaksV.S.LakshmananyDeepLearning&NaturalLanguageProcessingGroupTheUniversityofBritishColumbia{tawkat@cs.,laks@cs.,muhammad.mageed@}ubc.caAbstractContrastivele...

展开>> 收起<<

A Benchmark Study of Contrastive Learning for Arabic Social Meaning Md Tawkat Islam KhondakeryEl Moatez Billah NagoudiyAbdelRahim Elmadanyy Muhammad Abdul-MageedyLaks V .S. Lakshmanan.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Benchmark Study of Contrastive Learning for Arabic Social Meaning Md Tawkat Islam KhondakeryEl Moatez Billah NagoudiyAbdelRahim Elmadanyy Muhammad Abdul-MageedyLaks V .S. Lakshmanan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: