Multilingual Auxiliary Tasks Training Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models Syrielle MontariolArij RiabiDjamé Seddah

2025-05-02 0 0 283.18KB 17 页 10玖币

侵权投诉

Multilingual Auxiliary Tasks Training: Bridging the Gap between

Languages for Zero-Shot Transfer of Hate Speech Detection Models

Syrielle Montariol∗Arij Riabi∗Djamé Seddah

INRIA Paris, France

firstname.lastname@inria.fr

Abstract

Zero-shot cross-lingual transfer learning has

been shown to be highly challenging for tasks

involving a lot of linguistic speciﬁcities or

when a cultural gap is present between lan-

guages, such as in hate speech detection. In

this paper, we highlight this limitation for

hate speech detection in several domains and

languages using strict experimental settings.

Then, we propose to train on multilingual aux-

iliary tasks – sentiment analysis, named en-

tity recognition, and tasks relying on syntac-

tic information – to improve zero-shot trans-

fer of hate speech detection models across lan-

guages. We show how hate speech detection

models beneﬁt from a cross-lingual knowledge

proxy brought by auxiliary tasks ﬁne-tuning

and highlight these tasks’ positive impact on

bridging the hate speech linguistic and cultural

gap between languages.

1 Introduction

Given the impact social media hate speech can

have on our society as a whole – leading to many

small-scale Overton window effects – the NLP

community has devoted considerable efforts to

automatic hate speech detection using machine

learning-based approaches, and proposed differ-

ent benchmarks and datasets to evaluate their tech-

niques (Dinakar et al.,2011;Sood et al.,2012;

Waseem and Hovy,2016;Davidson et al.,2017;

Fortuna and Nunes,2018;Kennedy et al.,2020).

However, these systems are designed to be ef-

ﬁcient at a given point in time for a speciﬁc type

of online content they were trained on. As hate

speech varies signiﬁcantly diachronically (Florio

et al.,2020) and synchronically (Yin and Zubiaga,

2021), hate speech detection models need to be

constantly adapted to new contexts. For example,

as noted by Markov et al. (2021), the occurrence

of new hate speech domains and their associated

*These authors contributed equally.

lexicons and expressions can be triggered by real-

world events, from local scope incidents to world-

wide crisis.

New annotated datasets are needed to

optimally capture all these domain-speciﬁc, target-

speciﬁc hate speech types. The possibility of creat-

ing and constantly updating exhaustively annotated

datasets, adapted to every possible language and

domain, is chimerical. Thus, the task of hate speech

detection is often faced with low-resource issues.

In this low-resource scenario for a given target

language and domain, if annotated data is available

in another language, the main option for most NLP

tasks is to perform zero-shot transfer using a mul-

tilingual language model (Conneau et al.,2020).

However, in our case, hate speech perception is

highly variable across languages and cultures; for

example, some slur expressions can be considered

not offensive in one language, denoting an infor-

mal register nonetheless, but will be considered

offensive, if not hateful, in another (Nozza,2021).

Despite the cross-lingual transfer paradigm being

extensively used in hate speech detection to cope

with the data scarcity issue (Basile and Rubagotti,

2018;van der Goot et al.,2018;Pamungkas and

Patti,2019;Ranasinghe and Zampieri,2020) or

even the use of models trained on a translation of

the initial training data (Rosa et al.,2021) , this

strong hate speech cultural and linguistic variation

can lower the transferability of hate speech detec-

tion models across languages in a zero-shot setting.

To overcome this limitation, in the absence of

training data or efﬁcient translation models for a

target language, the cultural and linguistic infor-

mation speciﬁc to this language needs to be found

elsewhere. In this paper, we propose to capture

this information by ﬁne-tuning the language model

on resource-rich tasks in both the transfer’s source

and target language. Indeed, even though hate-

annotated datasets are not available in both lan-

e.g. Hate speech towards Chinese communities spiked in

2020 with the emergence of the COVID-19 Pandemic.

arXiv:2210.13029v2 [cs.CL] 25 Oct 2022

guages, it is likely that similarly annotated data

in the source and target language exist for other

tasks. A language model jointly ﬁne-tuned for this

other task in the two languages can learn some pat-

terns and knowledge, bridging the gap between the

languages, and helping the hate speech detection

model to be transferred between them.

In summary, our work focuses on zero-shots

cross-language multitask architectures where an-

notated hate speech data is available only for one

source language, but some annotated data for other

tasks can be accessed in both the source and target

languages. Using a multitask architecture (van der

Goot et al.,2021b) on top of a multilingual model,

we investigate the impact of auxiliary tasks oper-

ating at different sentence linguistics levels (POS

Tagging, Named Entity Recognition (NER), Depen-

dency Parsing and Sentiment analysis) on the trans-

fer effectiveness. Using Nozza (2021)’s original

set of languages and datasets (hate speech against

women and immigrants, from Twitter datasets in

English, Italian and Spanish), our main contribu-

tions are as follows.

•

Building strictly comparable corpora across

languages,

leading to a thorough evaluation

framework, we highlight cases where zero-

shot cross-lingual transfer of hate speech de-

tection models fails and diagnose the effect of

the choice of the multilingual language model.

•

We identify auxiliary tasks with a positive

impact on cross-lingual transfer when trained

jointly with hate speech detection: sentiment

analysis and NER. The impact of syntactic

tasks is more mitigated.

•

Using the HateCheck test suite (Röttger et al.,

2021,2022), we identify which hate speech

classes of functionalities suffer the most from

cross-lingual transfer, highlighting the impact

of slurs; and which ones beneﬁt from joint

training with multilingual auxiliary tasks.

2 Related Work

Intermediate task training.

In order to improve

the efﬁciency of a pre-trained language model for a

given task, this model can undergo preliminary

ﬁne-tuning on an intermediate task before ﬁne-

tuning again on the downstream task. This idea

Our comparable datasets are available at

https://gi

thub.com/ArijRB/Multilingual-Auxiliary-T

asks-Training-Bridging-the-Gap-between-L

anguages-for-Zero-Shot-Transfer-of-/.

was formalized as Supplementary Training on In-

termediate Labeled-data Tasks (STILT) by Phang

et al. (2018), who perform sequential task-to-task

pre-training. More recently, Pruksachatkun et al.

(2020) perform a survey of intermediate and target

task pairs to analyze the usefulness of this inter-

mediary ﬁne-tuning, but only in a monolingual set-

ting. Phang et al. (2020) turn towards cross-lingual

STILT. They ﬁne-tune a language model on nine

intermediate language-understanding tasks in En-

glish and apply it to a set of non-English target

tasks. They show that machine-translating interme-

diate task data for training or using a multilingual

language model does not improve the transfer com-

pared to English training data. However, to the best

of our knowledge, using intermediate task training

data on both the source and the target language for

transfer has not been tested in the literature.

Auxiliary tasks for hate speech detection.

Auxiliary task training for hate speech detection

has been done almost exclusively with the senti-

ment analysis task (Bauwelinck, Nina and Lefever,

Els,2019;Aroyehun and Gelbukh,2021), and only

in monolingual scenarios. But additional informa-

tion is sometimes added to the hate speech clas-

siﬁer differently. Gambino and Pirrone (2020),

among the best systems on the HaSpeeDe task of

EVALITA 2020, use POS-tagged text as input of

the classiﬁcation systems, which is highly bene-

ﬁcial for Spanish and a bit less for German and

English. Furthermore, the effect of syntactic infor-

mation is also investigated by Narang and Brew

(2020), using classiﬁers based on the syntactic

structure of the text for abusive language detection.

Markov et al. (2021) evaluate the impact of manu-

ally extracted POS, stylometric and emotion-based

features on hate speech detection, showing that

the latter two are robust features for hate speech

detection across languages.

Zero-shot cross-lingual transfer for hate speech

detection

Due to the lack of annotated data on

many languages and domains for hate speech detec-

tion, zero-shot cross-lingual transfer has been tack-

led a lot in the literature. Among the most recent

work, Pelicon et al. (2021) investigates the impact

of a preliminary training of a classiﬁcation model

on hate speech data languages different from the

target language; they show that language models

pre-trained on a small number of languages beneﬁt

more of this intermediate training, and often out-

performs massively multilingual language models.

To perform cross-lingual experiment, Glavaš et al.

(2020) create a dataset with aligned examples in

six different languages, avoiding the issue of hate

speech variation across languages that we tackle in

this paper. On their aligned test set, they show the

positive impact of intermediate masked language

model ﬁne-tuning on abusive corpora in the target

language. Using aligned corpora allows the authors

to focus on the effect of the intermediate ﬁnetuning

without the noise of inter-language variability. On

the contrary, in our case, we investigate the issue

of limited transferability of hate speech detection

models across languages. Nozza (2021), on which

this paper builds upon, demonstrates the limitation

of cross-lingual transfer for domain-speciﬁc hate

speech – in particular, hate speech towards women

– and explains it by showing examples of cultural

variation between languages. Some notable hate

speech vocabulary in one language may be used as

an intensiﬁer in another language.

Stappen et al.

(2020) perform zero- and few-shots cross-lingual

transfer on some of the datasets we use in this paper,

with an attention-based classiﬁcation model; but

contrarily to us, they do not distinguish between

the hate speech targets.

3 The Bottleneck of Zero-shot

Cross-lingual Transfer

3.1 Hate speech corpora

We use the same hate speech datasets as Nozza

(2021), who relied on them to point out the lim-

itations of zero-shot cross-lingual transfer. The

corpora are in three languages: English (en), Span-

ish (es) and Italian (it); and two domains: hate

speech towards immigrants and hate speech to-

wards women. The corpora come from various

shared tasks; For English and Spanish, we use the

dataset from a shared task on hate speech against

immigrants and women on Twitter (HatEval). For

the Italian corpora, we use the automatic misog-

yny identiﬁcation challenge (AMI) (Fersini et al.,

2018) for the women domain and the hate speech

detection shared task on Facebook and Twitter

(HaSpeeDe) (Bosco et al.,2018) for the immigrants

domain. Links to the resources are listed in Table

6in Appendix A.

Nozza (2021) gives the example of the Spanish word

puta often used as an intensiﬁer without any misogynistic

connotation, while it translates to a slang version of “prostitute”

in English.

The hate speech detection task is a binary classi-

ﬁcation task where each dataset is annotated with

two labels: hateful and non hateful. We train bi-

nary classiﬁcation models on the train sets in each

language and predict on the test set of each lan-

guage, investigating two settings: 1) monolingual,

i.e, training and testing on the same language and

domain for hate speech; 2) zero-shot, cross-lingual,

i.e. training on one and testing on another. We

evaluate the models using macro-F1 as metric.

3.2 Original baseline results

The original results reported by Nozza (2021) can

be found in the ﬁrst rows of Table 1. In the table,

we highlight in brown zero-shot cross-lingual cases

where the macro-F1 score drops by more than 25%

compared to the monolingual setting: these are

cases for which we consider that the cross-lingual

transfer failed. We observe the phenomenon that

raised the issue of zero-shot cross-lingual transfer:

in the women domain, the models trained on Span-

ish and Italian in a zero-shot setting have much

lower scores compared to the monolingual results;

4 out of the 6 cross-lingual cells are highlighted in

brown. One possible cause, as explained by Nozza

(2021), is the presence of language-speciﬁc offen-

sive interjections that lead the model to wrongly

classify text as hateful towards women.

On a side note, models trained and tested on the

English corpus on the immigrants domain have par-

ticularly low scores (macro-F1 of 36.8 in the mono-

lingual setting). This phenomenon was also ob-

served by Nozza (2021) and Stappen et al. (2020),

and is explained by the authors by the presence

of speciﬁc words and hashtags that were used for

scraping the tweets and that lead the model to over-

ﬁt, linked with a large discrepancy between the

train and test set.

3.3 Experimental settings

Building comparable corpora.

We started this

work to investigate the failure of cross-lingual hate

speech datasets for the women domain highlighted

by Nozza (2021). However, these experiments

were not realized in comparable settings; the cor-

pora do not have the same size in the different

languages and domains. Our goal is to conﬁrm

these results under a strictly comparable setting,

and a multi-seed robust experimental framework.

Therefore, we build comparable corpora in each

language and domain to ensure the comparability

of the transfer settings. We reduce all datasets to

Model

Src

lang

immigrants women

en es it en es it

m-BERT

Nozza (2021)

en 36.8 63.3 59.0 55.9 54.6 44.9

es 59.6 63.0 68.3 55.8 83.9 33.7

it 63.5 66.6 77.7 54.5 46.3 80.8

Comparable corpus size and new random split

m-BERT

en 72.5 48.5 63.8 75.2 41.7 43.4

es 59.4 80.9 58.5 54.5 76.9 40.5

it 62.8 54.8 76.3 46.3 53.6 88.3

XLM-R

en 75.3 51.9 70.1 76.6 51.6 49.9

es 62.0 83.4 65.4 63.4 77.8 46.9

it 69.2 51.3 78.6 60.3 57.3 89.0

XLM-T

en 76.8 48.5 73.5 78.6 61.5 60.6

es 65.9 84.2 60.7 72.5 80.3 51.9

it 71.5 56.8 78.4 63.4 58.2 90.3

Table 1: Monolingual and cross-lingual hate speech de-

tection macro-F1 scores on all corpora. All results ex-

cept for the one from Nozza (2021) are macro-F1 (%)

averaged over 5 runs. All use 20 epochs. Numbers in

brown highlight cases when the loss in performance in

the zero-shot cross-lingual case compared to the mono-

lingual case is higher than 25%.

a total size of 2 591 tweets, the size of the small-

est one, sampling from each original split sepa-

rately; each train set has 1 618 tweets, each devel-

opment set 173, and each test set 800. We use

the Kolmogorov–Smirnov test to compare the sen-

tence length distribution (number of tokens) and

the percentage of hate speech between the sampled

and the original datasets, to make sure they stay

comparable. The sampling is done randomly until

the similarity conditions with the original dataset

are met. The original size for each dataset as well

as the sampling size for building the comparable

datasets and the percentage of hateful examples can

be found in Table 7and Table 8in Appendix A.

On top of this, before the sub-sampling of the

corpora, we merge the development, test and train

dataset for each language and domain before per-

forming a new random split. This allows us to

overcome the train-test discrepancy observed in the

English-immigrants dataset we mentioned above.

Pre-processing.

We process the datasets by re-

placing all mentions and URLs with speciﬁc to-

kens, and segmenting the hashtags into words.

Given the compositional nature of hashtags (a set

of concatenated words), hashtag segmentation is

frequently done as a pre-processing step in the lit-

erature when handling tweets (e.g. (Röttger et al.,

4Using the Python package wordsegment.

2021)); it can improve tasks such as tweet cluster-

ing (Gromann and Declerck,2017).

Models training.

For all our experiments, we

use the MACHAMP v0.2 framework

(van der Goot

et al.,2021b), a multi-task toolkit based on Al-

lenNLP (Gardner et al.,2018). We keep most of the

default hyperparameters of MACHAMP for all ex-

periments, which the authors optimized on a wide

variety of tasks. We ﬁne-tune a multilingual lan-

guage model on the hate speech detection task for

each of the six training corpora described in the pre-

vious section. We keep the best out of 20 epochs

for each run according to the macro-F1 score on

the development set.

Note that the new comparable test sets sampled

from the original corpora are relatively small (800

observations). To increase the robustness of the

results, we use ﬁve different seeds when ﬁne-tuning

a language model on the hate speech detection task

and report the average macro-F1 over the ﬁve runs.

Language Models.

We use two general-domain

large-scale multilingual language models: m-

BERT (Devlin et al.,2019) following Nozza (2021)

and XLM-R (Conneau et al.,2020). The former

is the multilingual version of BERT, trained on

Wikipedia content in 104 languages, with 100M

parameters. The latter has the same architecture

as RoBERTa (Liu et al.,2019) with 550M parame-

ters and is trained on the publicly available 2.5 TB

CommonCrawl Corpus, covering 100 languages.

Then, we experiment with XLM-T (Barbieri

et al.,2021), an off-the-shelf XLM-R model ﬁne-

tuned on 200 million tweets (1 724 million tokens)

scraped between 05/2018 and 03/2020, in more

than 30 languages, including our three target lan-

guages.

3.4 Setting a new baseline

We compare the scores for m-BERT from Nozza

(2021) to the scores obtained using our compara-

ble corpora, reported in Table 1. First, our experi-

ment with m-BERT on comparable corpora allows

us to highlight additional cases where zero-shot

cross-lingual transfer “fails” (macro-F1 dropping

by more than 25% compared to monolingual score)

in the immigrants domain, that were not visible

in the previous study due to variations in training

corpus size. On top of this, with the new splits,

5https://github.com/machamp-nlp/macha

mp, under the MIT license.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MultilingualAuxiliaryTasksTraining:BridgingtheGapbetweenLanguagesforZero-ShotTransferofHateSpeechDetectionModelsSyrielleMontariolArijRiabiDjaméSeddahINRIAParis,Francefirstname.lastname@inria.frAbstractZero-shotcross-lingualtransferlearninghasbeenshowntobehighlychallengingfortasksinvolvingalotoflin...

展开>> 收起<<

Multilingual Auxiliary Tasks Training Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models Syrielle MontariolArij RiabiDjamé Seddah.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multilingual Auxiliary Tasks Training Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models Syrielle MontariolArij RiabiDjamé Seddah

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: