Digital Object Identifier Assessing the impact of contextual information in hate speech detection

2025-04-27 0 0 839.35KB 18 页 10玖币
侵权投诉
Digital Object Identifier
Assessing the impact of contextual
information in hate speech detection
JUAN MANUEL PÉREZ1, FRANCO LUQUE2, DEMIAN ZAYAT7, MARTÍN KONDRATZKY10,
AGUSTÍN MORO6, 8, PABLO SANTIAGO SERRATI6, 9, JOAQUÍN ZAJAC6, 11, PAULA
MIGUEL6, 9, NATALIA DEBANDI12, AGUSTÍN GRAVANO4, 5, 6, and VIVIANA COTIK1, 3
1Instituto de Ciencias de la Computación, CONICET, UBA (e-mail: {jmperez, vcotik} at dc.uba.ar)
2Facultad de Astronomía, Matemática y Física, Universidad Nacional de Córdoba (email: francolq at famaf.unc.edu.ar)
3Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires
4Laboratorio de Inteligencia Artificial, Universidad Torcuato Di Tella (email: agravano at utdt.edu)
5Escuela de Negocios, Universidad Torcuato Di Tella
6Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
7Facultad de Derecho, Universidad de Buenos Aires (email: dzayat at derecho.uba.ar)
8Universidad Nacional del Centro de la Provincia de Buenos Aires (email: agustin.moro at azul.der.unicen.edu.ar)
9Instituto de Investigaciones Gino Germani, Facultad de Ciencias Sociales, Universidad de Buenos Aires
10Facultad de Filosofía y Letras, Universidad de Buenos Aires
11Escuela Interdisciplinaria de Altos Estudios Sociales, Universidad de San Martín
12Universidad Nacional de Río Negro (email: nataliadebandi at gmail.com)
Corresponding author: Juan Manuel Pérez (e-mail: jmperez at dc.uba.ar).
ABSTRACT In recent years, hate speech has gained relevance in social networks and other digital media
due to its intensity and its association with violent acts against members of protected groups. Facing
huge amounts of user-generated contents, a great effort has been made to develop automatic tools to aid
the analysis and moderation of this kind of speech, at least in its most threatening forms. One of the
limitations for current approaches on automatic hate speech detection is the lack of context. The focus
on isolated messages, without considering any type of conversational context or even the topic being
discussed, severely restricts the available information in orther to determine whether a post in a social
network should be tagged as hateful or not. In this work, we assess the impact of adding contextual
information to the hate speech detection task. In particular, we study a Twitter subdomain consisting
of replies to posts by digital newspapers and media outlets, which provides a natural environment for
contextualized hate speech detection. We built an original corpus in "Rioplatense" Spanish dialect focused
on hate speech associated with the COVID-19 pandemic. A sample of this corpus was manually annotated
using carefully designed guidelines. Our classification experiments using state-of-the-art transformer-based
machine learning techniques show evidence that adding contextual information improves the performance
of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro
F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of the use of contextual
information in hate speech detection. Our code, models, and corpus has been made available for further
research.
INDEX TERMS NLP, Text Classification, Hate Speech detection with contextual information, Spanish
annotated corpus, COVID-19 Hate Speech
I. INTRODUCTION
Hate speech can be described as speech containing deni-
gration and violence towards an individual or a group of
individuals, based on certain characteristics protected by
international treaties, such as gender, race, language, and
others [1]. In recent years, this type of discourse has taken
on great relevance due to its intensity and its prevalence
on social media. The exposition to this phenomenon has
been related to stress and depression in the victims [2], and
also to the setting of a hostile and dehumanizing ground for
immigrants, sexual and religious minorities, as well as other
vulnerable groups [3]. Adding to the psychological effects,
one of the most worrying aspects of hate speech on social
media is its relationship with violent acts against members
of these groups, such as the “Unite the Right” attacks at
Charlottesville [4], the Pittsburgh synagogue shooting [5],
VOLUME 4, 2016 1
arXiv:2210.00465v3 [cs.CL] 11 Mar 2023
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
and the Rohingya genocide at Myanmar [6, 7], among others.
As a result, states and supranational organizations such as the
European Union have enacted legislation that urges social
media companies to moderate and eliminate discriminatory
content, with a particular focus on that which encourages
physical violence [8].
The last two years have seen a dramatic increase in the
prevalence of hate speech amid the COVID-19 pandemic,
featuring targets such as Chinese, Asian and Jews, among
other nationalities and minorities, blaming them for the
spread of the virus or the increase in inequalities [9]. The
dissemination of fake news related to conspiracy theories and
other types of disinformation [10, 11] has been linked to an
increase in violence against members of these groups [9].
Great effort has been made in recent years in the research
and development of automatic tools to aid the analysis and
moderation of hate speech, at least in its most threatening
forms [12, 13, 14, 15]. From a Natural Language Processing
(NLP) perspective, hate speech detection can be thought of
as a text classification task: given a document generated by
a user (i.e., a post in a social network), predict whether or
not it contains hateful content [14]. Additionally, it may be
of interest to predict other features, such as whether the text
contains a call to take some possibly violent action, whether
it is directed against an individual or a group, or which
characteristics are attacked [16], for example.
One of the limitations of the current approaches to auto-
matic hate speech detection is the lack of context. Most stud-
ies and resources work with data without any kind of context
- i.e., isolated user messages with no information about the
conversational thread or even the topic being discussed- [17].
This limits the available information to discern if a comment
is hateful or not, given that an expression can be injurious in
certain contexts, but not in others.
Another limitation is that most resources for hate speech
detection are built in English, restricting the research and ap-
plicability to other languages [14, 15]. While there are some
datasets in Spanish [16, 18, 19], to the best of our knowledge,
none is related to the COVID-19 pandemic, which shows
distinctive features and targets in comparison to other hate
speech events. Besides, none of the existing datasets comes
from the Rioplatense dialectal variety of Spanish, which has
its own particularities and might express hate speech in a
distinct way.
In the present work, we address the issues described
above regarding hate speech detection: 1) we consider finer-
grained distinctions that go beyond a binary detection of
hateful vs. non-hateful speech, such as the identification of
attacked characteristics and the detection of calls to action; 2)
we study the impact of adding contextual information to the
classification problems, and 3) we approach the problem in
Spanish, a language with relatively few resources available
for this task. We are especially interested in the second issue,
regarding the usefulness of contextual information; this is the
main research question of this work.
For these purposes, we built a dataset based on user
responses to posts from media outlets on Twitter. This sub-
domain of social networks (i.e., responses to news posts) is
particularly interesting because it provides a natural context
for the discussion (the news post under debate) while also
replicating the interactions of a news forum. We collected a
Spanish dataset of news related to the COVID-19 pandemic
and had it annotated by native speakers. Classification ex-
periments using state-of-the-art techniques based on BETO
[20], a Spanish version of BERT (Bidirectional Encoder
Representations from Transformers) [21], show evidence that
adding context improves detection both in a binary setting
(predicting the presence or absence of hate speech) and in
a fine-grained setting (predicting the attacked characteristics
and whether there is a call to action). These results highlight
the importance of contextual information for hate speech
detection. Figure 1 provides a graphical, high-level overview
of the work discussed in this paper.
Our contributions are the following:
1) We describe the collection, curation and annotation
process of a novel corpus for hate speech detection
based on user responses to news posts from media
outlets on Twitter. This dataset is in the Rioplatense
dialectal variety of Spanish and focuses on hate speech
associated with the COVID-19 pandemic.
2) Through a series of classification experiments using
state-of-the-art techniques, we show evidence that in-
cluding contextual information improves the perfor-
mance of hate speech detection, both in binary and
fine-grained settings.
3) We make our code, models and the annotated corpus
available for further research.1
The rest of the paper is organized as follows: Section II
reviews previous work for automatic hate speech detection.
Section III states the definition of hate speech used in this
work, along with the targeted groups and the characteristics
of interest. Section IV describes the process performed to
collect and annotate our corpus, which is later used in Sec-
tion V to conduct our classification experiments. Section VII
discusses the results and Section VIII draws conclusions and
outlines possible future work.
II. PREVIOUS WORK
Hate speech has attracted a lot of attention in recent years,
with literature from the legal and social domains studying its
definition and classification [22], the elements that enable its
identification, and its relationship to freedom of expression
and human rights [1, 23]. The automatic detection of this
phenomenon is usually approached as a classification task,
and is related to a family of other tasks such as cyberbullying,
offensive language, abusive language, toxic language, and
others. Waseem et al. [24] propose a typology of these related
tasks by asking whether the offensive content is directed to a
1Our code and corpus will be publicly available once the paper is pub-
lished. If needed before, please write to the corresponding author.
2VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
#
🚫
��
��
😠💣󰎩
🚫󰎩
🎶🎶🎶
Data collection
from outlets and
responses
Sampling Annotation Contextualized
corpus
Classification
experiments
Contextualized classifier
Non-contextualized classifier
Context: China bans dog
consumption
Text: gotta drop ‘em a
bomb
Text: gotta drop ‘em a
bomb
Figure 1: Work overview. The process starts with the collection of data from Twitter, according to a sampling procedure destined
to achieve a balanced proportion of attacked characteristics. The dataset is later annotated by native speakers following carefully
designed annotation guidelines. The annotated corpus is used to train and evaluate models for hate speech detection, both as a
binary and a multi-label classification task. Our experiments reveal that contextualized models outperform non-contextualized
ones.
specific entity or group, and whether the content is explicit or
implicit.
There is a plethora of resources for the automatic detection
of hate speech. Interested readers can refer to Poletto et al.
[17] for an extensive review of datasets for this task. In
particular, Spanish corpora are scarce, despite its being one
of the most used languages in social media, and the second
language in the number of native speakers worldwide [25].
To the best of our knowledge, all available datasets for
this language have been published in the context of shared
tasks. Fersini et al. [19] presented a 4k Twitter dataset for
the Automatic Misogyny Identification (AMI) shared task
(IberEval 20182). The MEX-A3T task (IberEval 2018 and
IberLEF 20193) included a dataset of 11k Mexican Spanish
tweets annotated for aggressiveness [26, 27]. Basile et al. [16]
published a 6.6k tweets dataset annotated for misogyny
and xenophobia, in the context of the HatEval challenge
(SemEval 2019 4).
Due to the COVID-19 pandemic, a spike in the incidence
of hate speech has been documented in social networks [28].
Some works have addressed its distinctive features, studying
hateful dynamics in social networks [29] and also generat-
ing specific resources for the analysis and identification of
this kind of toxic behavior [30]. AnonymousAuthors [31]
describe a work-in-progress on this research of hate speech
in Spanish tweets related to newspaper articles about the
COVID-19 pandemic.
Regarding techniques for our specific task, classic machine
learning techniques such as handcrafted features and bags of
words over linear classifiers have been applied [12, 32, 33].
Lately, however, deep learning techniques such as recurrent
2IberEval 2018: https://sites.google.com/view/ibereval-2018?pli=1
3IberLEF 2019: https://sites.google.com/view/iberlef-2019/
4SemEval 2019: https://alt.qcri.org/semeval2019/
neural networks or —more recently— pre-trained language
models have become state-of-the-art [34, 35, 36, 37, 38, 39].
In spite of the great results achieved by these methods,
Arango et al. [40] calls some of them into question, sug-
gesting that they may be due to possible cases of overfitting.
Plaza-del Arco et al. [41] analyze the currently available
Spanish pre-trained models for hate speech detection tasks.
Since the appearance of GPT (Generative Pre-trained
Transformer) [42] and BERT [21], pre-trained language mod-
els based on transformers [43] have become state-of-the-art
for most NLP tasks. These techniques use a transfer-learning
approach, by first pre-training a large language model (thus
their name) on a big corpus, and then fine-tuning it for a
specific task (e.g. sentiment analysis, question answering, or
hate speech detection) [42, 44]. This approach has replaced
previous deep learning architectures for most NLP tasks,
which used to be based on recurrent neural networks and
word embeddings [45, 46].
Pre-trained models have been built for different languages,
and also for different domains (such as the biomedical [47]
and the legal domains [48]) and text sources (such as Twitter
[49] and other social networks). In particular, Spanish pre-
trained models include BETO [20], BERTin [50], RoBERTA-
es [51] and RoBERTuito [52]. Nozza et al. [53] review BERT-
based language models for different tasks and languages.5
Few prior studies incorporate some kind of context to the
user comments for hate speech or toxicity detection. Gao and
Huang [54] analyze the impact of adding context to the task
of hate speech detection for a dataset of comments from the
Fox News site. As mentioned by Pavlopoulos et al. [55], this
study has room for improvement: the dataset is rather small,
with around 1.6k comments extracted from only 10 news
5Note that the names BETO, BERTin, RoBERTA and RoBERTuito are
not acronyms, but alterations of the original name BERT.
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
articles; its annotation process was mainly performed by just
one person; and some of its methodologies are subject to dis-
cussion, such as including the name of the user as a predictive
feature. Mubarak et al. [56] built a dataset of comments taken
from the Al Jazeera website,6and annotated them together
with the title of the article, but without including the entire
thread of replies.
Pavlopoulos et al. [55] analyze the impact of adding
context to the toxicity detection task. They find that, while
humans seem to leverage conversational context to detect
toxicity, the trained classification models were not able to
improve their performance significantly by adding context.
Following up, Xenos et al. [57] label each message with its
“context sensitiveness”, measured as the difference between
two groups of annotators: those who have seen the context,
and those who have not. With this, they observe that classi-
fiers improve their performance on comments which are more
sensitive to context.
Further, Sheth et al. [58] explore some opportunities for
incorporating richer information sources into the toxicity
detection task, such as the interaction history between users,
some kind of social context, and other external knowledge
bases. Wiegand et al. [59] pose some questions and chal-
lenges regarding the detection of implicit toxicity — that is,
some subtle forms of abusive language not expressed as slurs.
Summing up, BERT-based models are state-of-the-art for
this type of classification tasks; there have been various at-
tempts to include context in distinct ways and with disparate
success; there have been relatively few studies on Spanish
data; and hate speech detection has typically been addressed
as a binary task, making no distinction among the attacked
characteristics or calls-to-action. In the present work, we
assess the usefulness of adding context, we work with BERT-
based models, on Spanish data, and address both binary and
fine-grained classification tasks.
III. DEFINITION OF HATE SPEECH
We say that there is hate speech in a comment if it contains
statements of an intense and irrational nature of disapproval
and hatred against an individual or a group of people because
of its identification with a group protected by domestic
or international laws [1]. Protected treats or characteristics
include color, race, national or social origin, gender identity,
language, and sexual orientation, among others.
Hate speech can manifest itself explicitly as direct in-
sults, slurs, celebrations of crimes, incitements to take ac-
tion against an individual or group, or even more veiled
expressions such as ironic content. Following this definition,
we consider that an insult or aggression is not enough to
constitute hate speech; it is necessary to make an explicit or
implicit appeal to at least one protected characteristic.
For international law, hate speech has an extra element
that differentiates it from offensive behavior: the promotion
of violent actions against its targets. However, the NLP
6https://www.aljazeera.com/
Short name Hate speech against ...
WOMEN women
LGBTI gay, lesbian, bisexual, transgender, intersexual peo-
ple
RACISM people based on their race, skin color, language, or
national identity
CLASS people based on their socioeconomic status
POLITICS people based on their political affiliation or ideology
APPEARANCE fat people, old people, or other aspect-based features
CRIMINAL criminals and persons in conflict with law
DISABLED people with disability or mental health affections
Table 1: Protected characteristics considered in this work.
Short names are used throughout the paper to refer to these
broad groups.
community does not usually require this “call to action” when
identifying hate speech. In the present work, we will adopt
this latter view, and we will explicitly state when we also
refer to calls to action.
Several characteristics are taken into account in this work.
In addition to misogyny and racism (the most common treats
considered in previous works), we also consider: homopho-
bia and transphobia; social class hatred (sometimes known
as aporophobia); hatred due to physical appearance (e.g.,
overweight); hatred towards people with disabilities; political
hate speech; and hate speech against criminals, prisoners,
offenders and other people in conflict with the law. For this
selection, we take into account the definition of discrimina-
tion from international human rights treaties, which refers
to discrimination motivated by race, color, sex, language,
religion, political, or other opinions, national or social origin,
property, birth or other status [60]. These eight characteristics
are listed in Table 1 along with reference names that will be
used throughout the paper.
IV. CORPUS
This section describes the collection, curation, and annota-
tion process of the corpus. Our aim was to construct a dataset
of user messages commenting on specific news articles, in
a similar fashion to the reader forums present in many news
outlet websites. Figure 2 offers a schematic illustration of our
dataset, with a tweet from a news outlet about China banning
the breeding of dogs for human consumption, its respective
news article, and replies from users to the original tweet.
A. DATA COLLECTION
Our data collection process was targeted at the official Twit-
ter accounts of a selected set of Argentinian news outlets:
La Nación (@lanacion), Clarín (@clarincom), Infobae (@in-
fobae), Perfil (@perfilcom), and Crónica (@cronica). These
are the main National newspapers in the country, and attract
a vast volume of interaction on Twitter.
We considered a fixed time period of one year, starting
in March 2020. We collected the replies to each post of the
mentioned accounts using the Spritzer Twitter API, listening
to any tweet mentioning one of their usernames.
For the purpose of this work, we were only interested in the
4VOLUME 4, 2016
摘要:

DigitalObjectIdentierAssessingtheimpactofcontextualinformationinhatespeechdetectionJUANMANUELPÉREZ1,FRANCOLUQUE2,DEMIANZAYAT7,MARTÍNKONDRATZKY10,AGUSTÍNMORO6,8,PABLOSANTIAGOSERRATI6,9,JOAQUÍNZAJAC6,11,PAULAMIGUEL6,9,NATALIADEBANDI12,AGUSTÍNGRAVANO4,5,6,andVIVIANACOTIK1,31InstitutodeCienciasdelaComp...

展开>> 收起<<
Digital Object Identifier Assessing the impact of contextual information in hate speech detection.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:839.35KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注