How Hate Speech Varies by Target Identity A Computational Analysis Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1 1School of Computer Science2Department of English

2025-05-06 0 0 278.01KB 13 页 10玖币
侵权投诉
How Hate Speech Varies by Target Identity: A Computational Analysis
Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1
1School of Computer Science 2Department of English
Carnegie Mellon University
Pittsburgh, Pennsylvania, USA
{mamille3,huixiann,dwb2,carley}@andrew.cmu.edu
Abstract
This paper investigates how hate speech varies
in systematic ways according to the identities
it targets. Across multiple hate speech datasets
annotated for targeted identities, we find that
classifiers trained on hate speech targeting spe-
cific identity groups struggle to generalize to
other targeted identities. This provides em-
pirical evidence for differences in hate speech
by target identity; we then investigate which
patterns structure this variation. We find that
the targeted demographic category (e.g. gen-
der/sexuality or race/ethnicity) appears to have
a greater effect on the language of hate speech
than does the relative social power of the tar-
geted identity group. We also find that words
associated with hate speech targeting specific
identities often relate to stereotypes, histo-
ries of oppression, current social movements,
and other social contexts specific to identities.
These experiments suggest the importance of
considering targeted identity, as well as the so-
cial contexts associated with these identities,
in automated hate speech classification.
Warning: This paper contains offensive and
hateful terms and concepts. We have chosen
to reproduce these terms for clarity in aiding
efforts against hate speech.
1 Introduction
Researchers working in natural language process-
ing (NLP) often treat hate speech as a binary, uni-
fied, concept that can be detected from language
alone. However, as a linguistic concept that relies
heavily on social context, hate speech contains a
variety of related phenomena (Brown,2017). Hate
speech is characterized by variation in linguistic
features (e.g. implicit vs. explicit), context (e.g.
platforms, prior conversations), and communities
(social histories and hierarchies). This paper fo-
cuses on a crucial aspect of this variation: how hate
speech varies by the identity groups it targets.
To study this variation, we analyze hate speech
datasets that include annotations for which identity
group is targeted. Drawing from multiple of these
datasets, we sample new corpora that target the
same identity group. These identity groups vary
according to several dimensions, including relevant
demographic category (e.g. gender, religion) and
relative social power (e.g. socially marginalized or
dominant). We empirically test which dimensions
most clearly separate different forms of hate speech
by evaluating how well classifiers trained on one
set of identities generalize to hate speech directed
at different sets of identities.
We find that hate speech varies most prominently
by the targeted demographic category and less so
by the social power of the targeted identity group.
Theorists working in philosophy and sociolinguis-
tics have drawn attention to how hate speech di-
rected at marginalized groups differs from hate
directed toward socially dominant groups (Butler,
1997;Lakoff,2000). However, we do not find that
hate speech toward dominant groups is sufficiently
different to consistently increase classification per-
formance when removed from existing datasets.
Analyzing the most representative terms in hate
speech directed toward different identities, we
find that many words reflect identity-specific con-
text such as histories of oppression or stereo-
types. These results have implications for NLP
researchers building generalizable hate speech clas-
sifiers, as well as for a more general understanding
of variation in hate speech.
Contributions
1.
An empirical analysis of variation in hate
speech by target identity. Specifically, how
well classifiers trained on hate speech directed
toward specific identities generalize to hate
speech directed at other identities.
2.
An analysis of which dimensions of social
difference (demographic category, power)
among targeted identities reflect the most vari-
ation in hate speech.
arXiv:2210.10839v2 [cs.CL] 7 Dec 2022
3.
A qualitative analysis of the hate speech terms
most strongly associated with specific target
identities.
2 Hate Speech
Hate speech is an example of a “thick concept” with
a set of related, but difficult to define meanings and
understandings (Pohjonen and Udupa,2017). Le-
gal theorist Alexander Brown (2017) argues for a
set of attributes that make an expression more or
less likely to be considered hate speech, similar to
Wittgenstein’s “family resemblances” concept. Key
attributes include an incitement of emotion and vi-
olence, and a direction of that incitement toward a
targeted identity group (Sanguinetti et al.,2018;Po-
letto et al.,2021). Though others have studied the
linguistic properties of this incitement (Marsters,
2019;Wiegand et al.,2021), we focus on how vari-
ation in the identity group targeted by hate speech
affects the linguistic characteristics of hate speech.
2.1 Variation by identity
Identities are central to hate speech. Classifiers
often learn to associate the presence of identity
terms, especially derogatory ones, with hate speech
and abusive language (Dixon et al.,2017;Uyheng
and Carley,2021). Computational studies of the
targets of online hate speech have included mea-
surement studies of its prevalence toward different
targets. Silva et al. (2016) and Mondal et al. (2017)
searched for templates such as “I hate ___” to mea-
sure hate toward different identity groups. We ana-
lyze datasets manually annotated with the targets
of hate speech. This captures a broader range of
hate speech, including indirect hate speech and
stereotypes. ElSherief et al. (2018a,b) investigated
differences between hate toward groups versus in-
dividual targets. In contrast, we compare differ-
ences among identity targets. Rieger et al. (2021)
measured multiple types of variation, including
by identity target, in hate speech from fringe plat-
forms such as 4chan and 8chan. We test if such
differences affect the generalization of hate speech
classifiers.
Many identities are involved in the production
and recognition of hate speech, including the iden-
tities of those who produce hate speech and those
who annotate hate speech datasets. The post his-
tory and inferred gender of social media users
have been found to be useful in predicting hate
speech (Waseem and Hovy,2016;Unsvåg and
Gambäck,2018;Qian et al.,2018). Waseem (2016)
find differences in hate speech annotations between
crowdworkers and experts, while Sap et al. (2022)
find differences by the political ideology of annota-
tors. We focus on identities presented in the hate
speech itself.
2.2 Generalizability
In this paper, we evaluate the ability of hate speech
classifiers to generalize across targeted identities.
Gröndahl et al. (2018) find that hate speech models
generally perform poorly on data that differs from
their training data; we look at how shifts in the
distribution of identity targets affects generaliza-
tion. Swamy et al. (2019) look at generalizability
across subtasks of abusive language detection and
find that a larger proportion of hateful instances
aids generalization. Pamungkas et al. (2020) and
Fortuna et al. (2020) find that hate speech models
using variants of BERT (Devlin et al.,2019) gen-
eralize better than other models. We thus use a
variant of BERT in our generalization experiments.
See Yin and Zubiaga (2021) for a more thorough
survey on generalizability in hate speech detection.
3 Data
From surveys of hate speech datasets (Vidgen and
Derczynski,2020;Poletto et al.,2021) and the Hate
Speech Dataset Catalogue
1
, we selected datasets
with annotations for targeted identities. We only
selected datasets that do not restrict target identities
in order to minimize differences in other properties
(e.g, domain, year) when comparing across targeted
identities. This excludes hate speech datasets and
shared tasks that focus on particular targeted iden-
tity groups, such as women or immigrants (Kwok
and Wang,2013;Basile et al.,2019).
We also did not consider hate speech datasets
that label targeted demographic category, such as
race or gender (Waseem,2016), but do not specify
the identity group targeted. Demographic category
is just one of the dimensions of similarities and
differences among identity groups that we wish
to compare for their affect on hate speech. We
included datasets from all domains, except those
with synthetic data.
Since we only found one non-English dataset
that contained unrestricted annotations for targeted
identities (Ousidhoum et al.,2019), we focus on
hate speech in English in this work.
1https://hatespeechdata.com/
For generalization analyses, we sampled corpora
specific to identity groups across datasets large
enough to contain a minimum number of instances
of hate speech against enough groups (described
in Section 4.1). These are the first 4 datasets noted
in Table 1. All datasets are used in the analysis of
removing dominant groups (Section 6.2).
Datasets are resampled to a 30/70 ratio of hate to
non-hate to eliminate a source of variance among
hate speech datasets known to affect generaliza-
tion (Swamy et al.,2019). Non-hate instances
are upsampled or downsampled to meet this ra-
tio, which was chosen as typical of hate speech
datasets (Vidgen and Derczynski,2020). If they
do not already contain a binary hate speech label,
dataset labels are binarized as described in Ap-
pendix A.
3.1 Target identity label normalization
Annotations for targeted identities vary consider-
ably across datasets. Some of these differences
are variations in naming conventions for identity
groups with significant similarity (‘Caucasian’ and
‘white people’, for example). Other identities are
subsets of broader identities, such as ‘trans men’ as
a specific group within ‘LGBTQ+ people’.
To construct identity-based corpora across
datasets, we normalized and grouped identities an-
notated in each dataset. One of the authors, who has
taken graduate-level courses on language and iden-
tity, manually normalized the most common iden-
tity labels in each dataset and assigned these nor-
malized identity labels into broader identity groups
(such as ‘LGBTQ+ people’). Intersectional iden-
tities, such as ‘Chinese women’, were assigned to
multiple groups (in this case ‘Asian people’ and
‘women’). Hate speech was often directed at con-
flated, problematic groupings such as ‘Muslims and
Arabs’. Though we do not condone these group-
ings, we use them as the most accurate descriptors
of identities targeted.
4 Cross-Identity Generalization
We examine variation among hate speech target-
ing different identities in a bottom-up, empirical
fashion. In order to do this, we construct corpora
of hate speech directed at the most commonly an-
notated target identities, grouped and normalized
as described in Section 3.1. We then trained hate
speech classifiers on each target identity corpus and
evaluated on corpora targeting other identities.
Along with practical implications for hate speech
classification generalization, this analysis suggests
which similarities and differences among identities
are most relevant for differentiating hate speech.
4.1 Data sampling
In order to have enough data targeting many iden-
tities and to generalize beyond the particularities
of specific datasets, we assembled identity-specific
corpora from multiple source datasets. To mitigate
dataset-specific effects, we uniformly sampled hate
speech instances directed toward target identities
from the first 4 datasets listed in Table 1. We se-
lect these datasets since they contain enough data to
train classifiers targeting a sufficient variety of iden-
tities. The corpus for each target identity contains
an equal amount of hate speech drawn from each of
these datasets, though the total number of instances
may differ among corpora. Negative instances were
also uniformly sampled across datasets, and were
restricted to those which had no target identity an-
notation or an annotation that matched the target
identity of the hate speech.
We selected target identities that contained a
minimum of 900 instances labeled as hate across
these four datasets after grouping and normaliza-
tion. We selected this threshold as a balance be-
tween including a sufficient number of identities
and having enough examples of hate speech toward
each identity to train classifiers. In order to in-
clude a variety of identities in the analysis while
maintaining uniform samples for each dataset, we
upsample identity-specific hate speech from indi-
vidual datasets up to 2 times if needed. Corpora
are split into a 60/40 train/test split. Selected target
identities and the size of each corpus can be found
in Table 2. These identity-specific corpora, which
are samples of existing publicly available datasets,
are available at https://osf.io/53tfs/.
4.2 Cross-identity hate speech classification
Due to the high performance of BERT-based mod-
els on hate speech classification (Mozafari et al.,
2019;Samghabadi et al.,2020), we trained and
evaluated a DistilBERT model (Sanh et al.,2019),
which has been shown to perform very similarly to
BERT on hate speech detection with fewer param-
eters (Vidgen et al.,2021). Models were trained
with early stopping after no improvement for 5
epochs on a development set of 10% of the training
set. An Adam optimizer was used with an initial
learning rate of
106
. Input data was lowercased
摘要:

HowHateSpeechVariesbyTargetIdentity:AComputationalAnalysisMichaelMillerYoder1LynnetteHuiXianNg1DavidWestBrown2KathleenM.Carley11SchoolofComputerScience2DepartmentofEnglishCarnegieMellonUniversityPittsburgh,Pennsylvania,USA{mamille3,huixiann,dwb2,carley}@andrew.cmu.eduAbstractThispaperinvestigateshow...

展开>> 收起<<
How Hate Speech Varies by Target Identity A Computational Analysis Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1 1School of Computer Science2Department of English.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:278.01KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注