How Hate Speech Varies by Target Identity A Computational Analysis Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1 1School of Computer Science2Department of English

2025-05-06 0 0 278.01KB 13 页 10玖币

侵权投诉

How Hate Speech Varies by Target Identity: A Computational Analysis

Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1

1School of Computer Science 2Department of English

Carnegie Mellon University

Pittsburgh, Pennsylvania, USA

{mamille3,huixiann,dwb2,carley}@andrew.cmu.edu

Abstract

This paper investigates how hate speech varies

in systematic ways according to the identities

it targets. Across multiple hate speech datasets

annotated for targeted identities, we ﬁnd that

classiﬁers trained on hate speech targeting spe-

ciﬁc identity groups struggle to generalize to

other targeted identities. This provides em-

pirical evidence for differences in hate speech

by target identity; we then investigate which

patterns structure this variation. We ﬁnd that

the targeted demographic category (e.g. gen-

der/sexuality or race/ethnicity) appears to have

a greater effect on the language of hate speech

than does the relative social power of the tar-

geted identity group. We also ﬁnd that words

associated with hate speech targeting speciﬁc

identities often relate to stereotypes, histo-

ries of oppression, current social movements,

and other social contexts speciﬁc to identities.

These experiments suggest the importance of

considering targeted identity, as well as the so-

cial contexts associated with these identities,

in automated hate speech classiﬁcation.

Warning: This paper contains offensive and

hateful terms and concepts. We have chosen

to reproduce these terms for clarity in aiding

efforts against hate speech.

1 Introduction

Researchers working in natural language process-

ing (NLP) often treat hate speech as a binary, uni-

ﬁed, concept that can be detected from language

alone. However, as a linguistic concept that relies

heavily on social context, hate speech contains a

variety of related phenomena (Brown,2017). Hate

speech is characterized by variation in linguistic

features (e.g. implicit vs. explicit), context (e.g.

platforms, prior conversations), and communities

(social histories and hierarchies). This paper fo-

cuses on a crucial aspect of this variation: how hate

speech varies by the identity groups it targets.

To study this variation, we analyze hate speech

datasets that include annotations for which identity

group is targeted. Drawing from multiple of these

datasets, we sample new corpora that target the

same identity group. These identity groups vary

according to several dimensions, including relevant

demographic category (e.g. gender, religion) and

relative social power (e.g. socially marginalized or

dominant). We empirically test which dimensions

most clearly separate different forms of hate speech

by evaluating how well classiﬁers trained on one

set of identities generalize to hate speech directed

at different sets of identities.

We ﬁnd that hate speech varies most prominently

by the targeted demographic category and less so

by the social power of the targeted identity group.

Theorists working in philosophy and sociolinguis-

tics have drawn attention to how hate speech di-

rected at marginalized groups differs from hate

directed toward socially dominant groups (Butler,

1997;Lakoff,2000). However, we do not ﬁnd that

hate speech toward dominant groups is sufﬁciently

different to consistently increase classiﬁcation per-

formance when removed from existing datasets.

Analyzing the most representative terms in hate

speech directed toward different identities, we

ﬁnd that many words reﬂect identity-speciﬁc con-

text such as histories of oppression or stereo-

types. These results have implications for NLP

researchers building generalizable hate speech clas-

siﬁers, as well as for a more general understanding

of variation in hate speech.

Contributions

An empirical analysis of variation in hate

speech by target identity. Speciﬁcally, how

well classiﬁers trained on hate speech directed

toward speciﬁc identities generalize to hate

speech directed at other identities.

An analysis of which dimensions of social

difference (demographic category, power)

among targeted identities reﬂect the most vari-

ation in hate speech.

arXiv:2210.10839v2 [cs.CL] 7 Dec 2022

A qualitative analysis of the hate speech terms

most strongly associated with speciﬁc target

identities.

2 Hate Speech

Hate speech is an example of a “thick concept” with

a set of related, but difﬁcult to deﬁne meanings and

understandings (Pohjonen and Udupa,2017). Le-

gal theorist Alexander Brown (2017) argues for a

set of attributes that make an expression more or

less likely to be considered hate speech, similar to

Wittgenstein’s “family resemblances” concept. Key

attributes include an incitement of emotion and vi-

olence, and a direction of that incitement toward a

targeted identity group (Sanguinetti et al.,2018;Po-

letto et al.,2021). Though others have studied the

linguistic properties of this incitement (Marsters,

2019;Wiegand et al.,2021), we focus on how vari-

ation in the identity group targeted by hate speech

affects the linguistic characteristics of hate speech.

2.1 Variation by identity

Identities are central to hate speech. Classiﬁers

often learn to associate the presence of identity

terms, especially derogatory ones, with hate speech

and abusive language (Dixon et al.,2017;Uyheng

and Carley,2021). Computational studies of the

targets of online hate speech have included mea-

surement studies of its prevalence toward different

targets. Silva et al. (2016) and Mondal et al. (2017)

searched for templates such as “I hate ___” to mea-

sure hate toward different identity groups. We ana-

lyze datasets manually annotated with the targets

of hate speech. This captures a broader range of

hate speech, including indirect hate speech and

stereotypes. ElSherief et al. (2018a,b) investigated

differences between hate toward groups versus in-

dividual targets. In contrast, we compare differ-

ences among identity targets. Rieger et al. (2021)

measured multiple types of variation, including

by identity target, in hate speech from fringe plat-

forms such as 4chan and 8chan. We test if such

differences affect the generalization of hate speech

classiﬁers.

Many identities are involved in the production

and recognition of hate speech, including the iden-

tities of those who produce hate speech and those

who annotate hate speech datasets. The post his-

tory and inferred gender of social media users

have been found to be useful in predicting hate

speech (Waseem and Hovy,2016;Unsvåg and

Gambäck,2018;Qian et al.,2018). Waseem (2016)

ﬁnd differences in hate speech annotations between

crowdworkers and experts, while Sap et al. (2022)

ﬁnd differences by the political ideology of annota-

tors. We focus on identities presented in the hate

speech itself.

2.2 Generalizability

In this paper, we evaluate the ability of hate speech

classiﬁers to generalize across targeted identities.

Gröndahl et al. (2018) ﬁnd that hate speech models

generally perform poorly on data that differs from

their training data; we look at how shifts in the

distribution of identity targets affects generaliza-

tion. Swamy et al. (2019) look at generalizability

across subtasks of abusive language detection and

ﬁnd that a larger proportion of hateful instances

aids generalization. Pamungkas et al. (2020) and

Fortuna et al. (2020) ﬁnd that hate speech models

using variants of BERT (Devlin et al.,2019) gen-

eralize better than other models. We thus use a

variant of BERT in our generalization experiments.

See Yin and Zubiaga (2021) for a more thorough

survey on generalizability in hate speech detection.

3 Data

From surveys of hate speech datasets (Vidgen and

Derczynski,2020;Poletto et al.,2021) and the Hate

Speech Dataset Catalogue

, we selected datasets

with annotations for targeted identities. We only

selected datasets that do not restrict target identities

in order to minimize differences in other properties

(e.g, domain, year) when comparing across targeted

identities. This excludes hate speech datasets and

shared tasks that focus on particular targeted iden-

tity groups, such as women or immigrants (Kwok

and Wang,2013;Basile et al.,2019).

We also did not consider hate speech datasets

that label targeted demographic category, such as

race or gender (Waseem,2016), but do not specify

the identity group targeted. Demographic category

is just one of the dimensions of similarities and

differences among identity groups that we wish

to compare for their affect on hate speech. We

included datasets from all domains, except those

with synthetic data.

Since we only found one non-English dataset

that contained unrestricted annotations for targeted

identities (Ousidhoum et al.,2019), we focus on

hate speech in English in this work.

1https://hatespeechdata.com/

For generalization analyses, we sampled corpora

speciﬁc to identity groups across datasets large

enough to contain a minimum number of instances

of hate speech against enough groups (described

in Section 4.1). These are the ﬁrst 4 datasets noted

in Table 1. All datasets are used in the analysis of

removing dominant groups (Section 6.2).

Datasets are resampled to a 30/70 ratio of hate to

non-hate to eliminate a source of variance among

hate speech datasets known to affect generaliza-

tion (Swamy et al.,2019). Non-hate instances

are upsampled or downsampled to meet this ra-

tio, which was chosen as typical of hate speech

datasets (Vidgen and Derczynski,2020). If they

do not already contain a binary hate speech label,

dataset labels are binarized as described in Ap-

pendix A.

3.1 Target identity label normalization

Annotations for targeted identities vary consider-

ably across datasets. Some of these differences

are variations in naming conventions for identity

groups with signiﬁcant similarity (‘Caucasian’ and

‘white people’, for example). Other identities are

subsets of broader identities, such as ‘trans men’ as

a speciﬁc group within ‘LGBTQ+ people’.

To construct identity-based corpora across

datasets, we normalized and grouped identities an-

notated in each dataset. One of the authors, who has

taken graduate-level courses on language and iden-

tity, manually normalized the most common iden-

tity labels in each dataset and assigned these nor-

malized identity labels into broader identity groups

(such as ‘LGBTQ+ people’). Intersectional iden-

tities, such as ‘Chinese women’, were assigned to

multiple groups (in this case ‘Asian people’ and

‘women’). Hate speech was often directed at con-

ﬂated, problematic groupings such as ‘Muslims and

Arabs’. Though we do not condone these group-

ings, we use them as the most accurate descriptors

of identities targeted.

4 Cross-Identity Generalization

We examine variation among hate speech target-

ing different identities in a bottom-up, empirical

fashion. In order to do this, we construct corpora

of hate speech directed at the most commonly an-

notated target identities, grouped and normalized

as described in Section 3.1. We then trained hate

speech classiﬁers on each target identity corpus and

evaluated on corpora targeting other identities.

Along with practical implications for hate speech

classiﬁcation generalization, this analysis suggests

which similarities and differences among identities

are most relevant for differentiating hate speech.

4.1 Data sampling

In order to have enough data targeting many iden-

tities and to generalize beyond the particularities

of speciﬁc datasets, we assembled identity-speciﬁc

corpora from multiple source datasets. To mitigate

dataset-speciﬁc effects, we uniformly sampled hate

speech instances directed toward target identities

from the ﬁrst 4 datasets listed in Table 1. We se-

lect these datasets since they contain enough data to

train classiﬁers targeting a sufﬁcient variety of iden-

tities. The corpus for each target identity contains

an equal amount of hate speech drawn from each of

these datasets, though the total number of instances

may differ among corpora. Negative instances were

also uniformly sampled across datasets, and were

restricted to those which had no target identity an-

notation or an annotation that matched the target

identity of the hate speech.

We selected target identities that contained a

minimum of 900 instances labeled as hate across

these four datasets after grouping and normaliza-

tion. We selected this threshold as a balance be-

tween including a sufﬁcient number of identities

and having enough examples of hate speech toward

each identity to train classiﬁers. In order to in-

clude a variety of identities in the analysis while

maintaining uniform samples for each dataset, we

upsample identity-speciﬁc hate speech from indi-

vidual datasets up to 2 times if needed. Corpora

are split into a 60/40 train/test split. Selected target

identities and the size of each corpus can be found

in Table 2. These identity-speciﬁc corpora, which

are samples of existing publicly available datasets,

are available at https://osf.io/53tfs/.

4.2 Cross-identity hate speech classiﬁcation

Due to the high performance of BERT-based mod-

els on hate speech classiﬁcation (Mozafari et al.,

2019;Samghabadi et al.,2020), we trained and

evaluated a DistilBERT model (Sanh et al.,2019),

which has been shown to perform very similarly to

BERT on hate speech detection with fewer param-

eters (Vidgen et al.,2021). Models were trained

with early stopping after no improvement for 5

epochs on a development set of 10% of the training

set. An Adam optimizer was used with an initial

learning rate of

10−6

. Input data was lowercased

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HowHateSpeechVariesbyTargetIdentity:AComputationalAnalysisMichaelMillerYoder1LynnetteHuiXianNg1DavidWestBrown2KathleenM.Carley11SchoolofComputerScience2DepartmentofEnglishCarnegieMellonUniversityPittsburgh,Pennsylvania,USA{mamille3,huixiann,dwb2,carley}@andrew.cmu.eduAbstractThispaperinvestigateshow...

展开>> 收起<<

How Hate Speech Varies by Target Identity A Computational Analysis Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1 1School of Computer Science2Department of English.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

How Hate Speech Varies by Target Identity A Computational Analysis Michael Miller Yoder1Lynnette Hui Xian Ng1David West Brown2Kathleen M. Carley1 1School of Computer Science2Department of English

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: