Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages Paul Röttger1Debora Nozza2Federico Bianchi3 and Dirk Hovy2

2025-05-06 0 0 1.68MB 18 页 10玖币
侵权投诉
Data-Efficient Strategies for Expanding Hate Speech Detection into
Under-Resourced Languages
Paul Röttger1,Debora Nozza2,Federico Bianchi3, and Dirk Hovy2
1University of Oxford
2Bocconi University
3Stanford University
Abstract
Hate speech is a global phenomenon, but
most hate speech datasets so far focus on
English-language content. This hinders the
development of more effective hate speech
detection models in hundreds of languages
spoken by billions across the world. More
data is needed, but annotating hateful content
is expensive, time-consuming and potentially
harmful to annotators. To mitigate these is-
sues, we explore data-efficient strategies for
expanding hate speech detection into under-
resourced languages. In a series of exper-
iments with mono- and multilingual models
across five non-English languages, we find
that 1) a small amount of target-language fine-
tuning data is needed to achieve strong per-
formance, 2) the benefits of using more such
data decrease exponentially, and 3) initial fine-
tuning on readily-available English data can
partially substitute target-language data and
improve model generalisability. Based on
these findings, we formulate actionable recom-
mendations for hate speech detection in low-
resource language settings.
Content warning: This article contains illus-
trative examples of hateful language.
1 Introduction
Hate speech is a global phenomenon, but most hate
speech datasets so far focus on English-language
content (Vidgen and Derczynski,2020;Poletto
et al.,2021). This hinders the development of effec-
tive models for detecting hate speech in other lan-
guages. As a consequence, billions of non-English
speakers across the world are less protected against
online hate, and even giant social media platforms
have clear language gaps in their content modera-
tion systems (Simonite,2021;Marinescu,2021).
Zero-shot cross-lingual transfer, where large
multilingual language models are fine-tuned on
one source language and then applied to another
target language, may appear like a potential so-
lution to the issue of language-specific resource
Figure 1: Overview of our experimental setup. We
use ISO 639-1 codes to denote the different languages.
MHC is Multilingual HateCheck (Röttger et al.,2022).
scarcity. However, while this approach performs
well on some tasks (Conneau et al.,2020;Barbieri
et al.,2022), it fails on many others (Lauscher et al.,
2020;Hu et al.,2020). For hate speech detection
in particular, cross-lingual performance in zero-
shot settings is lacking (Stappen et al.,2020;Leite
et al.,2020). For example, zero-shot cross-lingual
transfer cannot account for language-specific taboo
expressions that play a key role in classification
(Nozza,2021). Conversely, hate speech detection
models trained or fine-tuned directly on the target
language, i.e. in few- and many-shot settings, are
consistently found to perform best (Aluru et al.,
2020;Pelicon et al.,2021).
So,
how do we build hate speech detection
models for hundreds more languages?
We need
at least some labelled data in the target language to
make models effective, but data annotation is diffi-
cult, time-consuming and expensive. It requires re-
sources that are often very limited for non-English
languages. Annotating hateful content in particular
also risks exposing annotators to harm in the pro-
cess (Vidgen et al.,2019;Derczynski et al.,2022).
arXiv:2210.11359v1 [cs.CL] 20 Oct 2022
In this article, we explore strategies for hate
speech detection in under-resourced languages that
make efficient use of labelled data, to build effective
models while also minimising annotation cost and
risk of harm to annotators. For this purpose, we
conduct a series of experiments using mono- and
multilingual models fine-tuned on differently-sized
random samples of labelled hate speech data in En-
glish as well as Arabic, Spanish, Hindi, Portuguese
and Italian. Our key findings are:
1.
A small amount of labelled target-language
data is needed to achieve strong performance
on held-out test sets.
2.
The benefits of using more such data decrease
exponentially.
3.
Initial fine-tuning on readily-available English
data can partially substitute target-language
data and improve model generalisability.
Based on these findings, we formulate and dis-
cuss
five recommendations
for expanding hate
speech detection into under-resourced languages:
1. Collect and label target-language data.
2. Start by labelling a small set, then iterate.
3.
Use diverse data collection methods to in-
crease marginal benefits of annotation.
4.
Use multilingual models for unlocking
readily-available data in high-resource lan-
guages for initial fine-tuning.
5.
Evaluate out-of-domain performance to reveal
potential weaknesses in generalisability.
With these recommendations, we hope to facili-
tate the development of new hate speech detection
models for yet-unserved languages.1
Definition of Hate Speech
Definitions of hate
speech vary across cultural and legal settings. Fol-
lowing Röttger et al. (2021), we define hate speech
as abuse that is targeted at a protected group or
at its members for being a part of that group. Pro-
tected groups are based on characteristics such as
gender identity, race or religion, which broadly
reflects Western legal consensus, particularly the
US 1964 Civil Rights Act, the UK’s 2010 Equal-
ity Act and the EU’s Charter of Fundamental
Rights. Based on these definitions, we approach
hate speech detection as the binary classification of
content as either hateful or non-hateful.
1
We make all data and code to reproduce our experiments
available on GitHub.
2 Experiments
All our experiments follow the setup described in
Figure 1. We start by loading a pre-trained mono-
or multilingual transformer model for sequence
classification. For multilingual models, there is
an optional first phase of fine-tuning on English
data. This is to simulate using readily-available
data from a high-resource language that is not the
target language. For all models, there is then a sec-
ond phase of fine-tuning on differently-sized ran-
dom samples of data in the target language. This
is to simulate using scarce data from an under-
resourced language. Finally, all models are evalu-
ated on the held-out test set corresponding to the
target-language dataset they were fine-tuned on as
well as the target-language test suite from Multilin-
gual HateCheck (Röttger et al.,2022).
2.1 Data
For all our experiments, we use hate speech
datasets from hatespeechdata.com, which was first
introduced by Vidgen and Derczynski (2020) and
is now the largest public repository of datasets an-
notated for hate, abuse and offensive language. At
the time of our review in May 2022, the site listed
53 English datasets as well as 57 datasets in 24
other languages. From these datasets, we select
those for our experiments that a) contain explicit
labels for hate, and b) use a definition of hate for
data annotation that aligns with our own (§1).
2.1.1 Fine-Tuning 1: English
For the optional first phase of fine-tuning in
English, we use one of three English datasets.
DYN21_ENby Vidgen et al. (2021) contains
41,255 entries, of which 53.9% are labelled as hate-
ful. The entries were hand-crafted by annotators
to be challenging to hate speech detection mod-
els, using the Dynabench platform (Kiela et al.,
2021). FOU18_ENby Founta et al. (2018) con-
tains 99,996 tweets, of which 4.97% are labelled as
hateful. KEN20_ENby Kennedy et al. (2020) con-
tains 39,565 comments from Youtube, Twitter and
Reddit, of which 29.31% are labelled as hateful.
From each of these three English datasets, we
sample 20,000 entries for the optional first phase
of fine-tuning, plus another 500 entries for develop-
ment and 2,000 for testing. To align the proportion
of hate across English datasets, we use random
sampling for DYN21_EN, while for KEN20_EN
and FOU18_ENwe retain all hateful entries and
then sample from non-hateful entries, so that the
proportion of hate in the two datasets increases to
50.0% and 22.0%, respectively.
2.1.2 Fine-Tuning 2: Target Language
For fine-tuning in the target language, we use one
of five datasets in five different target languages.
BAS19_EScompiled by Basile et al. (2019) for
SemEval 2019 contains 4,950 Spanish tweets, of
which 41.5% are labelled as hateful. FOR19_PT
by Fortuna et al. (2019) contains 5,670 Portuguese
tweets, of which 31.5% are labelled as hateful.
HAS21_HIcompiled by Modha et al. (2021) for
HASOC 2021 contains 4,594 Hindi tweets, of
which 12.3% are labelled as hateful. OUS19_AR
by Ousidhoum et al. (2019) contains 3,353 Ara-
bic tweets, of which 22.5% are labelled as hateful.
SAN20_ITcompiled by Sanguinetti et al. (2020)
for EvalIta 2020 contains 8,100 Italian tweets, of
which 41.8% are labelled as hateful.
From each of these five target-language datasets,
we randomly sample differently-sized subsets for
target-language fine-tuning. Like in English, we
set aside 500 entries for development and 2,000
for testing.
2
From the remaining data, we sample
subsets in 12 different sizes – 10, 20, 30, 40, 50,
100, 200, 300, 400, 500, 1,000 and 2,000 entries –
so that we are able to evaluate the effects of using
more or less labelled data within and across differ-
ent orders of magnitude.
3
Zhao et al. (2021) show
that there can be large sampling effects when fine-
tuning on small amounts of data. To mitigate this
issue, we use 10 different random seeds for each
sample size, so that in total we have 120 different
samples in each language, and 600 samples across
the five non-English languages.
2.2 Models
Multilingual Models
We fine-tune and evaluate
XLM-T (Barbieri et al.,2022), an XLM-R model
(Conneau et al.,2020) pre-trained on an additional
198 million Twitter posts in over 30 languages.
XLM-R is a widely-used architecture for multilin-
gual language modelling, which has been shown to
achieve near state-of-the-art performance on multi-
lingual hate speech detection (Banerjee et al.,2021;
Modha et al.,2021). We chose XLM-T because
it strongly outperformed XLM-R across our target
language test sets in initial experiments.
2
Due to limited dataset size, we only set aside 300 dev and
1,000 test entries for OUS19_AR(n=3,353).
3There is at least one hateful entry in every sample.
Monolingual Models
For each of the five target
languages, we also fine-tune and evaluate a mono-
lingual transformer model from HuggingFace. For
Spanish, we use RoBERTuito (Pérez et al.,2021).
For Portuguese, we use BERTimbau (Souza et al.,
2020). For Hindi, we use Hindi BERT. For Arabic,
we use AraBERT v2 (Antoun et al.,2020). For Ital-
ian, we use UmBERTo. Details on model training
can be found in Appendix A.
Model Notation
We denote all models by an ad-
ditive code. The first part is either M for a monolin-
gual model or X for XLM-T. For XLM-T, the sec-
ond part of the code is DEN, FENor KEN, for mod-
els fine-tuned on 20,000 entries from DYN21_EN,
FOU18_ENor KEN20_EN. For all models, the
final part of the code is ES, PT, HI, ARor IT, cor-
responding to the target language that the model
was finetuned on. For example, M+ITdenotes the
monolingual Italian model, UmBERTo, fine-tuned
on SAN20_IT, and X+KEN+ARdenotes an XLM-
T model fine-tuned first on 20,000 English entries
from KEN20_ENand then on OUS19_AR.
2.3 Evaluation Setup
Held-Out Test Sets + MHC
We test all mod-
els on the held-out test sets corresponding to
their target-language fine-tuning data, to evaluate
their in-domain performance (§2.4). For exam-
ple, we test X+KEN+IT, which was fine-tuned on
SAN20_ITdata, on the SAN20_ITtest set. Addi-
tionally, we test all models on the matching target-
language test suite from Multilingual HateCheck
(MHC). MHC is a collection of around 3,000 test
cases for different kinds of hate as well as chal-
lenging non-hate in each of ten different languages
(Röttger et al.,2022). We use MHC to evaluate
out-of-domain generalisability (§2.5).
Evaluation Metrics
We use macro F1 to evalu-
ate model performance because most of our test
sets as well as MHC are imbalanced. To give con-
text for interpreting performance, we show baseline
model results in all figures: macro F1 for always
predicting the hateful class ("always hate"), for
never predicting the hateful class ("never hate")
and for predicting both classes with equal proba-
bility ("50/50"). We also show bootstrapped 95%
confidence intervals around the average macro F1,
which is calculated across the 10 random seeds for
each sample size. These confidence intervals are
expected to be wider for models fine-tuned on less
data because of larger sampling effects.
Figure 2: Macro F1 on target-language held-out test sets across models fine-tuned on up to N=2,000 target-
language entries. Model notation as described in §2.2. Confidence intervals and model baselines as described
in §2.4. We provide larger versions of all five graphs in Appendix B.
BAS19_ESFOR19_PTHAS21_HIOUS19_ARSAN20_IT
N 20 200 2,000 20 200 2,000 20 200 2,000 20 200 2,000 20 200 2,000
M0.50 0.72 0.84 0.46 0.67 0.73 0.47 0.55 0.60 0.48 0.66 0.72 0.40 0.73 0.79
X0.48 0.70 0.81 0.42 0.67 0.73 0.47 0.47 0.56 0.43 0.69 0.70 0.40 0.70 0.78
X+DEN0.66 0.75 0.82 0.63 0.70 0.72 0.52 0.56 0.60 0.52 0.66 0.70 0.63 0.73 0.77
X+FEN0.59 0.68 0.80 0.66 0.71 0.73 0.55 0.56 0.59 0.61 0.68 0.70 0.66 0.73 0.76
X+KEN0.61 0.70 0.79 0.65 0.69 0.71 0.52 0.56 0.60 0.60 0.67 0.69 0.64 0.71 0.76
Table 1: Macro F1 on respective held-out test sets for models fine-tuned on N target-language entries, averaged
across 10 random seeds for each N. Best performance for a given N in bold. Results across all N in Appendix B.
2.4 Testing on Held-Out Test Sets
When evaluating our mono- and multilingual mod-
els on their corresponding target-language test sets,
we find a set of consistent patterns in model per-
formance. We visualise overall performance in
Figure 2and highlight key data points in Table 1.
First, there is an enormous benefit from even
very small amounts of target-language fine-tuning
data N. Model performance increases sharply
across models and held-out test sets up to
around N=200. For example, the performance of
X+DEN+ESincreases from around 0.63 at N=10
to 0.70 at N=50, to 0.75 at N=200.
On the other hand, larger amounts of target-
language fine-tuning data correspond to much less
of an improvement in model performance. Across
models and held-out test sets, there is a steep de-
crease in the marginal benefits of increasing N.
M+IT, for example, improves by 0.33 macro F1
from N=20 to N=200, and by just 0.06 from N=200
to N=2,000. X+PTimproves by 0.25 macro F1
from N=20 to N=200, and by just 0.06 from N=200
to N=2,000. We analyse these decreasing marginal
benefits using linear regression in §2.6.
Further, there is a clear benefit to a first phase of
fine-tuning on English data, when there is limited
target-language data. Absolute performance dif-
fers across test sets, but multilingual models with
initial fine-tuning on English data (i.e. X+DEN,
X+FENand X+KEN) perform substantially better
than those without (i.e. M and X), up to around
N=200. At N=20, there is up to 0.26 macro F1
difference between the former and the latter. Con-
versely, models without initial fine-tuning on En-
glish data need substantially more target-language
data to achieve the same performance. For exam-
ple, X+DEN+PTat N=100 performs as well as
M+PTat N=300 on FOR19_PT.
Relatedly, there are clear differences in model
performance based on which English dataset was
used in the first fine-tuning phase, when there is
limited target-language data. Among the three En-
摘要:

Data-EfcientStrategiesforExpandingHateSpeechDetectionintoUnder-ResourcedLanguagesPaulRöttger1,DeboraNozza2,FedericoBianchi3,andDirkHovy21UniversityofOxford2BocconiUniversity3StanfordUniversityAbstractHatespeechisaglobalphenomenon,butmosthatespeechdatasetssofarfocusonEnglish-languagecontent.Thishind...

展开>> 收起<<
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages Paul Röttger1Debora Nozza2Federico Bianchi3 and Dirk Hovy2.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.68MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注