Data-Efﬁcient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages Paul Röttger1Debora Nozza2Federico Bianchi3 and Dirk Hovy2

2025-05-06 0 0 1.68MB 18 页 10玖币

侵权投诉

Data-Efﬁcient Strategies for Expanding Hate Speech Detection into

Under-Resourced Languages

Paul Röttger1,Debora Nozza2,Federico Bianchi3, and Dirk Hovy2

1University of Oxford

2Bocconi University

3Stanford University

Abstract

Hate speech is a global phenomenon, but

most hate speech datasets so far focus on

English-language content. This hinders the

development of more effective hate speech

detection models in hundreds of languages

spoken by billions across the world. More

data is needed, but annotating hateful content

is expensive, time-consuming and potentially

harmful to annotators. To mitigate these is-

sues, we explore data-efﬁcient strategies for

expanding hate speech detection into under-

resourced languages. In a series of exper-

iments with mono- and multilingual models

across ﬁve non-English languages, we ﬁnd

that 1) a small amount of target-language ﬁne-

tuning data is needed to achieve strong per-

formance, 2) the beneﬁts of using more such

data decrease exponentially, and 3) initial ﬁne-

tuning on readily-available English data can

partially substitute target-language data and

improve model generalisability. Based on

these ﬁndings, we formulate actionable recom-

mendations for hate speech detection in low-

resource language settings.

Content warning: This article contains illus-

trative examples of hateful language.

1 Introduction

Hate speech is a global phenomenon, but most hate

speech datasets so far focus on English-language

content (Vidgen and Derczynski,2020;Poletto

et al.,2021). This hinders the development of effec-

tive models for detecting hate speech in other lan-

guages. As a consequence, billions of non-English

speakers across the world are less protected against

online hate, and even giant social media platforms

have clear language gaps in their content modera-

tion systems (Simonite,2021;Marinescu,2021).

Zero-shot cross-lingual transfer, where large

multilingual language models are ﬁne-tuned on

one source language and then applied to another

target language, may appear like a potential so-

lution to the issue of language-speciﬁc resource

Figure 1: Overview of our experimental setup. We

use ISO 639-1 codes to denote the different languages.

MHC is Multilingual HateCheck (Röttger et al.,2022).

scarcity. However, while this approach performs

well on some tasks (Conneau et al.,2020;Barbieri

et al.,2022), it fails on many others (Lauscher et al.,

2020;Hu et al.,2020). For hate speech detection

in particular, cross-lingual performance in zero-

shot settings is lacking (Stappen et al.,2020;Leite

et al.,2020). For example, zero-shot cross-lingual

transfer cannot account for language-speciﬁc taboo

expressions that play a key role in classiﬁcation

(Nozza,2021). Conversely, hate speech detection

models trained or ﬁne-tuned directly on the target

language, i.e. in few- and many-shot settings, are

consistently found to perform best (Aluru et al.,

2020;Pelicon et al.,2021).

So,

how do we build hate speech detection

models for hundreds more languages?

We need

at least some labelled data in the target language to

make models effective, but data annotation is difﬁ-

cult, time-consuming and expensive. It requires re-

sources that are often very limited for non-English

languages. Annotating hateful content in particular

also risks exposing annotators to harm in the pro-

cess (Vidgen et al.,2019;Derczynski et al.,2022).

arXiv:2210.11359v1 [cs.CL] 20 Oct 2022

In this article, we explore strategies for hate

speech detection in under-resourced languages that

make efﬁcient use of labelled data, to build effective

models while also minimising annotation cost and

risk of harm to annotators. For this purpose, we

conduct a series of experiments using mono- and

multilingual models ﬁne-tuned on differently-sized

random samples of labelled hate speech data in En-

glish as well as Arabic, Spanish, Hindi, Portuguese

and Italian. Our key ﬁndings are:

A small amount of labelled target-language

data is needed to achieve strong performance

on held-out test sets.

The beneﬁts of using more such data decrease

exponentially.

Initial ﬁne-tuning on readily-available English

data can partially substitute target-language

data and improve model generalisability.

Based on these ﬁndings, we formulate and dis-

cuss

ﬁve recommendations

for expanding hate

speech detection into under-resourced languages:

1. Collect and label target-language data.

2. Start by labelling a small set, then iterate.

Use diverse data collection methods to in-

crease marginal beneﬁts of annotation.

Use multilingual models for unlocking

readily-available data in high-resource lan-

guages for initial ﬁne-tuning.

Evaluate out-of-domain performance to reveal

potential weaknesses in generalisability.

With these recommendations, we hope to facili-

tate the development of new hate speech detection

models for yet-unserved languages.1

Deﬁnition of Hate Speech

Deﬁnitions of hate

speech vary across cultural and legal settings. Fol-

lowing Röttger et al. (2021), we deﬁne hate speech

as abuse that is targeted at a protected group or

at its members for being a part of that group. Pro-

tected groups are based on characteristics such as

gender identity, race or religion, which broadly

reﬂects Western legal consensus, particularly the

US 1964 Civil Rights Act, the UK’s 2010 Equal-

ity Act and the EU’s Charter of Fundamental

Rights. Based on these deﬁnitions, we approach

hate speech detection as the binary classiﬁcation of

content as either hateful or non-hateful.

We make all data and code to reproduce our experiments

available on GitHub.

2 Experiments

All our experiments follow the setup described in

Figure 1. We start by loading a pre-trained mono-

or multilingual transformer model for sequence

classiﬁcation. For multilingual models, there is

an optional ﬁrst phase of ﬁne-tuning on English

data. This is to simulate using readily-available

data from a high-resource language that is not the

target language. For all models, there is then a sec-

ond phase of ﬁne-tuning on differently-sized ran-

dom samples of data in the target language. This

is to simulate using scarce data from an under-

resourced language. Finally, all models are evalu-

ated on the held-out test set corresponding to the

target-language dataset they were ﬁne-tuned on as

well as the target-language test suite from Multilin-

gual HateCheck (Röttger et al.,2022).

2.1 Data

For all our experiments, we use hate speech

datasets from hatespeechdata.com, which was ﬁrst

introduced by Vidgen and Derczynski (2020) and

is now the largest public repository of datasets an-

notated for hate, abuse and offensive language. At

the time of our review in May 2022, the site listed

53 English datasets as well as 57 datasets in 24

other languages. From these datasets, we select

those for our experiments that a) contain explicit

labels for hate, and b) use a deﬁnition of hate for

data annotation that aligns with our own (§1).

2.1.1 Fine-Tuning 1: English

For the optional ﬁrst phase of ﬁne-tuning in

English, we use one of three English datasets.

DYN21_ENby Vidgen et al. (2021) contains

41,255 entries, of which 53.9% are labelled as hate-

ful. The entries were hand-crafted by annotators

to be challenging to hate speech detection mod-

els, using the Dynabench platform (Kiela et al.,

2021). FOU18_ENby Founta et al. (2018) con-

tains 99,996 tweets, of which 4.97% are labelled as

hateful. KEN20_ENby Kennedy et al. (2020) con-

tains 39,565 comments from Youtube, Twitter and

Reddit, of which 29.31% are labelled as hateful.

From each of these three English datasets, we

sample 20,000 entries for the optional ﬁrst phase

of ﬁne-tuning, plus another 500 entries for develop-

ment and 2,000 for testing. To align the proportion

of hate across English datasets, we use random

sampling for DYN21_EN, while for KEN20_EN

and FOU18_ENwe retain all hateful entries and

then sample from non-hateful entries, so that the

proportion of hate in the two datasets increases to

50.0% and 22.0%, respectively.

2.1.2 Fine-Tuning 2: Target Language

For ﬁne-tuning in the target language, we use one

of ﬁve datasets in ﬁve different target languages.

BAS19_EScompiled by Basile et al. (2019) for

SemEval 2019 contains 4,950 Spanish tweets, of

which 41.5% are labelled as hateful. FOR19_PT

by Fortuna et al. (2019) contains 5,670 Portuguese

tweets, of which 31.5% are labelled as hateful.

HAS21_HIcompiled by Modha et al. (2021) for

HASOC 2021 contains 4,594 Hindi tweets, of

which 12.3% are labelled as hateful. OUS19_AR

by Ousidhoum et al. (2019) contains 3,353 Ara-

bic tweets, of which 22.5% are labelled as hateful.

SAN20_ITcompiled by Sanguinetti et al. (2020)

for EvalIta 2020 contains 8,100 Italian tweets, of

which 41.8% are labelled as hateful.

From each of these ﬁve target-language datasets,

we randomly sample differently-sized subsets for

target-language ﬁne-tuning. Like in English, we

set aside 500 entries for development and 2,000

for testing.

From the remaining data, we sample

subsets in 12 different sizes – 10, 20, 30, 40, 50,

100, 200, 300, 400, 500, 1,000 and 2,000 entries –

so that we are able to evaluate the effects of using

more or less labelled data within and across differ-

ent orders of magnitude.

Zhao et al. (2021) show

that there can be large sampling effects when ﬁne-

tuning on small amounts of data. To mitigate this

issue, we use 10 different random seeds for each

sample size, so that in total we have 120 different

samples in each language, and 600 samples across

the ﬁve non-English languages.

2.2 Models

Multilingual Models

We ﬁne-tune and evaluate

XLM-T (Barbieri et al.,2022), an XLM-R model

(Conneau et al.,2020) pre-trained on an additional

198 million Twitter posts in over 30 languages.

XLM-R is a widely-used architecture for multilin-

gual language modelling, which has been shown to

achieve near state-of-the-art performance on multi-

lingual hate speech detection (Banerjee et al.,2021;

Modha et al.,2021). We chose XLM-T because

it strongly outperformed XLM-R across our target

language test sets in initial experiments.

Due to limited dataset size, we only set aside 300 dev and

1,000 test entries for OUS19_AR(n=3,353).

3There is at least one hateful entry in every sample.

Monolingual Models

For each of the ﬁve target

languages, we also ﬁne-tune and evaluate a mono-

lingual transformer model from HuggingFace. For

Spanish, we use RoBERTuito (Pérez et al.,2021).

For Portuguese, we use BERTimbau (Souza et al.,

2020). For Hindi, we use Hindi BERT. For Arabic,

we use AraBERT v2 (Antoun et al.,2020). For Ital-

ian, we use UmBERTo. Details on model training

can be found in Appendix A.

Model Notation

We denote all models by an ad-

ditive code. The ﬁrst part is either M for a monolin-

gual model or X for XLM-T. For XLM-T, the sec-

ond part of the code is DEN, FENor KEN, for mod-

els ﬁne-tuned on 20,000 entries from DYN21_EN,

FOU18_ENor KEN20_EN. For all models, the

ﬁnal part of the code is ES, PT, HI, ARor IT, cor-

responding to the target language that the model

was ﬁnetuned on. For example, M+ITdenotes the

monolingual Italian model, UmBERTo, ﬁne-tuned

on SAN20_IT, and X+KEN+ARdenotes an XLM-

T model ﬁne-tuned ﬁrst on 20,000 English entries

from KEN20_ENand then on OUS19_AR.

2.3 Evaluation Setup

Held-Out Test Sets + MHC

We test all mod-

els on the held-out test sets corresponding to

their target-language ﬁne-tuning data, to evaluate

their in-domain performance (§2.4). For exam-

ple, we test X+KEN+IT, which was ﬁne-tuned on

SAN20_ITdata, on the SAN20_ITtest set. Addi-

tionally, we test all models on the matching target-

language test suite from Multilingual HateCheck

(MHC). MHC is a collection of around 3,000 test

cases for different kinds of hate as well as chal-

lenging non-hate in each of ten different languages

(Röttger et al.,2022). We use MHC to evaluate

out-of-domain generalisability (§2.5).

Evaluation Metrics

We use macro F1 to evalu-

ate model performance because most of our test

sets as well as MHC are imbalanced. To give con-

text for interpreting performance, we show baseline

model results in all ﬁgures: macro F1 for always

predicting the hateful class ("always hate"), for

never predicting the hateful class ("never hate")

and for predicting both classes with equal proba-

bility ("50/50"). We also show bootstrapped 95%

conﬁdence intervals around the average macro F1,

which is calculated across the 10 random seeds for

each sample size. These conﬁdence intervals are

expected to be wider for models ﬁne-tuned on less

data because of larger sampling effects.

Figure 2: Macro F1 on target-language held-out test sets across models ﬁne-tuned on up to N=2,000 target-

language entries. Model notation as described in §2.2. Conﬁdence intervals and model baselines as described

in §2.4. We provide larger versions of all ﬁve graphs in Appendix B.

BAS19_ESFOR19_PTHAS21_HIOUS19_ARSAN20_IT

N 20 200 2,000 20 200 2,000 20 200 2,000 20 200 2,000 20 200 2,000

M0.50 0.72 0.84 0.46 0.67 0.73 0.47 0.55 0.60 0.48 0.66 0.72 0.40 0.73 0.79

X0.48 0.70 0.81 0.42 0.67 0.73 0.47 0.47 0.56 0.43 0.69 0.70 0.40 0.70 0.78

X+DEN0.66 0.75 0.82 0.63 0.70 0.72 0.52 0.56 0.60 0.52 0.66 0.70 0.63 0.73 0.77

X+FEN0.59 0.68 0.80 0.66 0.71 0.73 0.55 0.56 0.59 0.61 0.68 0.70 0.66 0.73 0.76

X+KEN0.61 0.70 0.79 0.65 0.69 0.71 0.52 0.56 0.60 0.60 0.67 0.69 0.64 0.71 0.76

Table 1: Macro F1 on respective held-out test sets for models ﬁne-tuned on N target-language entries, averaged

across 10 random seeds for each N. Best performance for a given N in bold. Results across all N in Appendix B.

2.4 Testing on Held-Out Test Sets

When evaluating our mono- and multilingual mod-

els on their corresponding target-language test sets,

we ﬁnd a set of consistent patterns in model per-

formance. We visualise overall performance in

Figure 2and highlight key data points in Table 1.

First, there is an enormous beneﬁt from even

very small amounts of target-language ﬁne-tuning

data N. Model performance increases sharply

across models and held-out test sets up to

around N=200. For example, the performance of

X+DEN+ESincreases from around 0.63 at N=10

to 0.70 at N=50, to 0.75 at N=200.

On the other hand, larger amounts of target-

language ﬁne-tuning data correspond to much less

of an improvement in model performance. Across

models and held-out test sets, there is a steep de-

crease in the marginal beneﬁts of increasing N.

M+IT, for example, improves by 0.33 macro F1

from N=20 to N=200, and by just 0.06 from N=200

to N=2,000. X+PTimproves by 0.25 macro F1

from N=20 to N=200, and by just 0.06 from N=200

to N=2,000. We analyse these decreasing marginal

beneﬁts using linear regression in §2.6.

Further, there is a clear beneﬁt to a ﬁrst phase of

ﬁne-tuning on English data, when there is limited

target-language data. Absolute performance dif-

fers across test sets, but multilingual models with

initial ﬁne-tuning on English data (i.e. X+DEN,

X+FENand X+KEN) perform substantially better

than those without (i.e. M and X), up to around

N=200. At N=20, there is up to 0.26 macro F1

difference between the former and the latter. Con-

versely, models without initial ﬁne-tuning on En-

glish data need substantially more target-language

data to achieve the same performance. For exam-

ple, X+DEN+PTat N=100 performs as well as

M+PTat N=300 on FOR19_PT.

Relatedly, there are clear differences in model

performance based on which English dataset was

used in the ﬁrst ﬁne-tuning phase, when there is

limited target-language data. Among the three En-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Data-EfcientStrategiesforExpandingHateSpeechDetectionintoUnder-ResourcedLanguagesPaulRöttger1,DeboraNozza2,FedericoBianchi3,andDirkHovy21UniversityofOxford2BocconiUniversity3StanfordUniversityAbstractHatespeechisaglobalphenomenon,butmosthatespeechdatasetssofarfocusonEnglish-languagecontent.Thishind...

展开>> 收起<<

Data-Efﬁcient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages Paul Röttger1Debora Nozza2Federico Bianchi3 and Dirk Hovy2.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Data-Efﬁcient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages Paul Röttger1Debora Nozza2Federico Bianchi3 and Dirk Hovy2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: