Digital Object Identiﬁer Assessing the impact of contextual information in hate speech detection

2025-04-27 0 0 839.35KB 18 页 10玖币

侵权投诉

Digital Object Identiﬁer

Assessing the impact of contextual

information in hate speech detection

JUAN MANUEL PÉREZ1, FRANCO LUQUE2, DEMIAN ZAYAT7, MARTÍN KONDRATZKY10,

AGUSTÍN MORO6, 8, PABLO SANTIAGO SERRATI6, 9, JOAQUÍN ZAJAC6, 11, PAULA

MIGUEL6, 9, NATALIA DEBANDI12, AGUSTÍN GRAVANO4, 5, 6, and VIVIANA COTIK1, 3

1Instituto de Ciencias de la Computación, CONICET, UBA (e-mail: {jmperez, vcotik} at dc.uba.ar)

2Facultad de Astronomía, Matemática y Física, Universidad Nacional de Córdoba (email: francolq at famaf.unc.edu.ar)

3Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires

4Laboratorio de Inteligencia Artiﬁcial, Universidad Torcuato Di Tella (email: agravano at utdt.edu)

5Escuela de Negocios, Universidad Torcuato Di Tella

6Consejo Nacional de Investigaciones Cientíﬁcas y Técnicas (CONICET)

7Facultad de Derecho, Universidad de Buenos Aires (email: dzayat at derecho.uba.ar)

8Universidad Nacional del Centro de la Provincia de Buenos Aires (email: agustin.moro at azul.der.unicen.edu.ar)

9Instituto de Investigaciones Gino Germani, Facultad de Ciencias Sociales, Universidad de Buenos Aires

10Facultad de Filosofía y Letras, Universidad de Buenos Aires

11Escuela Interdisciplinaria de Altos Estudios Sociales, Universidad de San Martín

12Universidad Nacional de Río Negro (email: nataliadebandi at gmail.com)

Corresponding author: Juan Manuel Pérez (e-mail: jmperez at dc.uba.ar).

ABSTRACT In recent years, hate speech has gained relevance in social networks and other digital media

due to its intensity and its association with violent acts against members of protected groups. Facing

huge amounts of user-generated contents, a great effort has been made to develop automatic tools to aid

the analysis and moderation of this kind of speech, at least in its most threatening forms. One of the

limitations for current approaches on automatic hate speech detection is the lack of context. The focus

on isolated messages, without considering any type of conversational context or even the topic being

discussed, severely restricts the available information in orther to determine whether a post in a social

network should be tagged as hateful or not. In this work, we assess the impact of adding contextual

information to the hate speech detection task. In particular, we study a Twitter subdomain consisting

of replies to posts by digital newspapers and media outlets, which provides a natural environment for

contextualized hate speech detection. We built an original corpus in "Rioplatense" Spanish dialect focused

on hate speech associated with the COVID-19 pandemic. A sample of this corpus was manually annotated

using carefully designed guidelines. Our classiﬁcation experiments using state-of-the-art transformer-based

machine learning techniques show evidence that adding contextual information improves the performance

of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro

F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of the use of contextual

information in hate speech detection. Our code, models, and corpus has been made available for further

research.

INDEX TERMS NLP, Text Classiﬁcation, Hate Speech detection with contextual information, Spanish

annotated corpus, COVID-19 Hate Speech

I. INTRODUCTION

Hate speech can be described as speech containing deni-

gration and violence towards an individual or a group of

individuals, based on certain characteristics protected by

international treaties, such as gender, race, language, and

others [1]. In recent years, this type of discourse has taken

on great relevance due to its intensity and its prevalence

on social media. The exposition to this phenomenon has

been related to stress and depression in the victims [2], and

also to the setting of a hostile and dehumanizing ground for

immigrants, sexual and religious minorities, as well as other

vulnerable groups [3]. Adding to the psychological effects,

one of the most worrying aspects of hate speech on social

media is its relationship with violent acts against members

of these groups, such as the “Unite the Right” attacks at

Charlottesville [4], the Pittsburgh synagogue shooting [5],

VOLUME 4, 2016 1

arXiv:2210.00465v3 [cs.CL] 11 Mar 2023

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

and the Rohingya genocide at Myanmar [6, 7], among others.

As a result, states and supranational organizations such as the

European Union have enacted legislation that urges social

media companies to moderate and eliminate discriminatory

content, with a particular focus on that which encourages

physical violence [8].

The last two years have seen a dramatic increase in the

prevalence of hate speech amid the COVID-19 pandemic,

featuring targets such as Chinese, Asian and Jews, among

other nationalities and minorities, blaming them for the

spread of the virus or the increase in inequalities [9]. The

dissemination of fake news related to conspiracy theories and

other types of disinformation [10, 11] has been linked to an

increase in violence against members of these groups [9].

Great effort has been made in recent years in the research

and development of automatic tools to aid the analysis and

moderation of hate speech, at least in its most threatening

forms [12, 13, 14, 15]. From a Natural Language Processing

(NLP) perspective, hate speech detection can be thought of

as a text classiﬁcation task: given a document generated by

a user (i.e., a post in a social network), predict whether or

not it contains hateful content [14]. Additionally, it may be

of interest to predict other features, such as whether the text

contains a call to take some possibly violent action, whether

it is directed against an individual or a group, or which

characteristics are attacked [16], for example.

One of the limitations of the current approaches to auto-

matic hate speech detection is the lack of context. Most stud-

ies and resources work with data without any kind of context

- i.e., isolated user messages with no information about the

conversational thread or even the topic being discussed- [17].

This limits the available information to discern if a comment

is hateful or not, given that an expression can be injurious in

certain contexts, but not in others.

Another limitation is that most resources for hate speech

detection are built in English, restricting the research and ap-

plicability to other languages [14, 15]. While there are some

datasets in Spanish [16, 18, 19], to the best of our knowledge,

none is related to the COVID-19 pandemic, which shows

distinctive features and targets in comparison to other hate

speech events. Besides, none of the existing datasets comes

from the Rioplatense dialectal variety of Spanish, which has

its own particularities and might express hate speech in a

distinct way.

In the present work, we address the issues described

above regarding hate speech detection: 1) we consider finer-

grained distinctions that go beyond a binary detection of

hateful vs. non-hateful speech, such as the identiﬁcation of

attacked characteristics and the detection of calls to action; 2)

we study the impact of adding contextual information to the

classiﬁcation problems, and 3) we approach the problem in

Spanish, a language with relatively few resources available

for this task. We are especially interested in the second issue,

regarding the usefulness of contextual information; this is the

main research question of this work.

For these purposes, we built a dataset based on user

responses to posts from media outlets on Twitter. This sub-

domain of social networks (i.e., responses to news posts) is

particularly interesting because it provides a natural context

for the discussion (the news post under debate) while also

replicating the interactions of a news forum. We collected a

Spanish dataset of news related to the COVID-19 pandemic

and had it annotated by native speakers. Classiﬁcation ex-

periments using state-of-the-art techniques based on BETO

[20], a Spanish version of BERT (Bidirectional Encoder

Representations from Transformers) [21], show evidence that

adding context improves detection both in a binary setting

(predicting the presence or absence of hate speech) and in

a ﬁne-grained setting (predicting the attacked characteristics

and whether there is a call to action). These results highlight

the importance of contextual information for hate speech

detection. Figure 1 provides a graphical, high-level overview

of the work discussed in this paper.

Our contributions are the following:

1) We describe the collection, curation and annotation

process of a novel corpus for hate speech detection

based on user responses to news posts from media

outlets on Twitter. This dataset is in the Rioplatense

dialectal variety of Spanish and focuses on hate speech

associated with the COVID-19 pandemic.

2) Through a series of classiﬁcation experiments using

state-of-the-art techniques, we show evidence that in-

cluding contextual information improves the perfor-

mance of hate speech detection, both in binary and

ﬁne-grained settings.

3) We make our code, models and the annotated corpus

available for further research.1

The rest of the paper is organized as follows: Section II

reviews previous work for automatic hate speech detection.

Section III states the deﬁnition of hate speech used in this

work, along with the targeted groups and the characteristics

of interest. Section IV describes the process performed to

collect and annotate our corpus, which is later used in Sec-

tion V to conduct our classiﬁcation experiments. Section VII

discusses the results and Section VIII draws conclusions and

outlines possible future work.

II. PREVIOUS WORK

Hate speech has attracted a lot of attention in recent years,

with literature from the legal and social domains studying its

deﬁnition and classiﬁcation [22], the elements that enable its

identiﬁcation, and its relationship to freedom of expression

and human rights [1, 23]. The automatic detection of this

phenomenon is usually approached as a classiﬁcation task,

and is related to a family of other tasks such as cyberbullying,

offensive language, abusive language, toxic language, and

others. Waseem et al. [24] propose a typology of these related

tasks by asking whether the offensive content is directed to a

1Our code and corpus will be publicly available once the paper is pub-

lished. If needed before, please write to the corresponding author.

2VOLUME 4, 2016

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

࿕#

🚫☪

��

😠💣󰎩

🚫󰎩

🎶🎶🎶

Data collection

from outlets and

responses

Sampling Annotation Contextualized

corpus

Classification

experiments

Contextualized classifier

Non-contextualized classifier

Context: China bans dog

consumption

Text: gotta drop ‘em a

bomb

Text: gotta drop ‘em a

bomb

Figure 1: Work overview. The process starts with the collection of data from Twitter, according to a sampling procedure destined

to achieve a balanced proportion of attacked characteristics. The dataset is later annotated by native speakers following carefully

designed annotation guidelines. The annotated corpus is used to train and evaluate models for hate speech detection, both as a

binary and a multi-label classiﬁcation task. Our experiments reveal that contextualized models outperform non-contextualized

ones.

speciﬁc entity or group, and whether the content is explicit or

implicit.

There is a plethora of resources for the automatic detection

of hate speech. Interested readers can refer to Poletto et al.

[17] for an extensive review of datasets for this task. In

particular, Spanish corpora are scarce, despite its being one

of the most used languages in social media, and the second

language in the number of native speakers worldwide [25].

To the best of our knowledge, all available datasets for

this language have been published in the context of shared

tasks. Fersini et al. [19] presented a ∼4k Twitter dataset for

the Automatic Misogyny Identiﬁcation (AMI) shared task

(IberEval 20182). The MEX-A3T task (IberEval 2018 and

IberLEF 20193) included a dataset of ∼11k Mexican Spanish

tweets annotated for aggressiveness [26, 27]. Basile et al. [16]

published a ∼6.6k tweets dataset annotated for misogyny

and xenophobia, in the context of the HatEval challenge

(SemEval 2019 4).

Due to the COVID-19 pandemic, a spike in the incidence

of hate speech has been documented in social networks [28].

Some works have addressed its distinctive features, studying

hateful dynamics in social networks [29] and also generat-

ing speciﬁc resources for the analysis and identiﬁcation of

this kind of toxic behavior [30]. AnonymousAuthors [31]

describe a work-in-progress on this research of hate speech

in Spanish tweets related to newspaper articles about the

COVID-19 pandemic.

Regarding techniques for our speciﬁc task, classic machine

learning techniques such as handcrafted features and bags of

words over linear classiﬁers have been applied [12, 32, 33].

Lately, however, deep learning techniques such as recurrent

2IberEval 2018: https://sites.google.com/view/ibereval-2018?pli=1

3IberLEF 2019: https://sites.google.com/view/iberlef-2019/

4SemEval 2019: https://alt.qcri.org/semeval2019/

neural networks or —more recently— pre-trained language

models have become state-of-the-art [34, 35, 36, 37, 38, 39].

In spite of the great results achieved by these methods,

Arango et al. [40] calls some of them into question, sug-

gesting that they may be due to possible cases of overﬁtting.

Plaza-del Arco et al. [41] analyze the currently available

Spanish pre-trained models for hate speech detection tasks.

Since the appearance of GPT (Generative Pre-trained

Transformer) [42] and BERT [21], pre-trained language mod-

els based on transformers [43] have become state-of-the-art

for most NLP tasks. These techniques use a transfer-learning

approach, by ﬁrst pre-training a large language model (thus

their name) on a big corpus, and then ﬁne-tuning it for a

speciﬁc task (e.g. sentiment analysis, question answering, or

hate speech detection) [42, 44]. This approach has replaced

previous deep learning architectures for most NLP tasks,

which used to be based on recurrent neural networks and

word embeddings [45, 46].

Pre-trained models have been built for different languages,

and also for different domains (such as the biomedical [47]

and the legal domains [48]) and text sources (such as Twitter

[49] and other social networks). In particular, Spanish pre-

trained models include BETO [20], BERTin [50], RoBERTA-

es [51] and RoBERTuito [52]. Nozza et al. [53] review BERT-

based language models for different tasks and languages.5

Few prior studies incorporate some kind of context to the

user comments for hate speech or toxicity detection. Gao and

Huang [54] analyze the impact of adding context to the task

of hate speech detection for a dataset of comments from the

Fox News site. As mentioned by Pavlopoulos et al. [55], this

study has room for improvement: the dataset is rather small,

with around 1.6k comments extracted from only 10 news

5Note that the names BETO, BERTin, RoBERTA and RoBERTuito are

not acronyms, but alterations of the original name BERT.

VOLUME 4, 2016 3

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

articles; its annotation process was mainly performed by just

one person; and some of its methodologies are subject to dis-

cussion, such as including the name of the user as a predictive

feature. Mubarak et al. [56] built a dataset of comments taken

from the Al Jazeera website,6and annotated them together

with the title of the article, but without including the entire

thread of replies.

Pavlopoulos et al. [55] analyze the impact of adding

context to the toxicity detection task. They ﬁnd that, while

humans seem to leverage conversational context to detect

toxicity, the trained classiﬁcation models were not able to

improve their performance signiﬁcantly by adding context.

Following up, Xenos et al. [57] label each message with its

“context sensitiveness”, measured as the difference between

two groups of annotators: those who have seen the context,

and those who have not. With this, they observe that classi-

ﬁers improve their performance on comments which are more

sensitive to context.

Further, Sheth et al. [58] explore some opportunities for

incorporating richer information sources into the toxicity

detection task, such as the interaction history between users,

some kind of social context, and other external knowledge

bases. Wiegand et al. [59] pose some questions and chal-

lenges regarding the detection of implicit toxicity — that is,

some subtle forms of abusive language not expressed as slurs.

Summing up, BERT-based models are state-of-the-art for

this type of classiﬁcation tasks; there have been various at-

tempts to include context in distinct ways and with disparate

success; there have been relatively few studies on Spanish

data; and hate speech detection has typically been addressed

as a binary task, making no distinction among the attacked

characteristics or calls-to-action. In the present work, we

assess the usefulness of adding context, we work with BERT-

based models, on Spanish data, and address both binary and

ﬁne-grained classiﬁcation tasks.

III. DEFINITION OF HATE SPEECH

We say that there is hate speech in a comment if it contains

statements of an intense and irrational nature of disapproval

and hatred against an individual or a group of people because

of its identiﬁcation with a group protected by domestic

or international laws [1]. Protected treats or characteristics

include color, race, national or social origin, gender identity,

language, and sexual orientation, among others.

Hate speech can manifest itself explicitly as direct in-

sults, slurs, celebrations of crimes, incitements to take ac-

tion against an individual or group, or even more veiled

expressions such as ironic content. Following this deﬁnition,

we consider that an insult or aggression is not enough to

constitute hate speech; it is necessary to make an explicit or

implicit appeal to at least one protected characteristic.

For international law, hate speech has an extra element

that differentiates it from offensive behavior: the promotion

of violent actions against its targets. However, the NLP

6https://www.aljazeera.com/

Short name Hate speech against ...

WOMEN women

LGBTI gay, lesbian, bisexual, transgender, intersexual peo-

ple

RACISM people based on their race, skin color, language, or

national identity

CLASS people based on their socioeconomic status

POLITICS people based on their political afﬁliation or ideology

APPEARANCE fat people, old people, or other aspect-based features

CRIMINAL criminals and persons in conﬂict with law

DISABLED people with disability or mental health affections

Table 1: Protected characteristics considered in this work.

Short names are used throughout the paper to refer to these

broad groups.

community does not usually require this “call to action” when

identifying hate speech. In the present work, we will adopt

this latter view, and we will explicitly state when we also

refer to calls to action.

Several characteristics are taken into account in this work.

In addition to misogyny and racism (the most common treats

considered in previous works), we also consider: homopho-

bia and transphobia; social class hatred (sometimes known

as aporophobia); hatred due to physical appearance (e.g.,

overweight); hatred towards people with disabilities; political

hate speech; and hate speech against criminals, prisoners,

offenders and other people in conﬂict with the law. For this

selection, we take into account the deﬁnition of discrimina-

tion from international human rights treaties, which refers

to discrimination motivated by race, color, sex, language,

religion, political, or other opinions, national or social origin,

property, birth or other status [60]. These eight characteristics

are listed in Table 1 along with reference names that will be

used throughout the paper.

IV. CORPUS

This section describes the collection, curation, and annota-

tion process of the corpus. Our aim was to construct a dataset

of user messages commenting on speciﬁc news articles, in

a similar fashion to the reader forums present in many news

outlet websites. Figure 2 offers a schematic illustration of our

dataset, with a tweet from a news outlet about China banning

the breeding of dogs for human consumption, its respective

news article, and replies from users to the original tweet.

A. DATA COLLECTION

Our data collection process was targeted at the ofﬁcial Twit-

ter accounts of a selected set of Argentinian news outlets:

La Nación (@lanacion), Clarín (@clarincom), Infobae (@in-

fobae), Perﬁl (@perﬁlcom), and Crónica (@cronica). These

are the main National newspapers in the country, and attract

a vast volume of interaction on Twitter.

We considered a ﬁxed time period of one year, starting

in March 2020. We collected the replies to each post of the

mentioned accounts using the Spritzer Twitter API, listening

to any tweet mentioning one of their usernames.

For the purpose of this work, we were only interested in the

4VOLUME 4, 2016

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DigitalObjectIdentierAssessingtheimpactofcontextualinformationinhatespeechdetectionJUANMANUELPÉREZ1,FRANCOLUQUE2,DEMIANZAYAT7,MARTÍNKONDRATZKY10,AGUSTÍNMORO6,8,PABLOSANTIAGOSERRATI6,9,JOAQUÍNZAJAC6,11,PAULAMIGUEL6,9,NATALIADEBANDI12,AGUSTÍNGRAVANO4,5,6,andVIVIANACOTIK1,31InstitutodeCienciasdelaComp...

展开>> 收起<<

Digital Object Identiﬁer Assessing the impact of contextual information in hate speech detection.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Digital Object Identiﬁer Assessing the impact of contextual information in hate speech detection

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: