Spread Love Not Hate Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale1 Aditya Kane1 Shantanu Patankar1 Tanmay Chavan1 Raviraj Joshi2

2025-05-03 0 0 261.47KB 10 页 10玖币

侵权投诉

Spread Love Not Hate: Undermining the Importance

of Hateful Pre-training for Hate Speech Detection

Omkar Gokhale1∗, Aditya Kane1∗

, Shantanu Patankar1∗

, Tanmay Chavan1∗

, Raviraj Joshi2

Pune Institute of Computer Technology, L3Cube1

Indian Institute of Technology Madras, L3Cube2

{omkargokhale2001,adityakane1,shantanupatankar2001}@gmail.com,

{chavantanmay1402,ravirajoshi}@gmail.com

Abstract

Pre-training large neural language models, such as BERT, has led to impressive

gains on many natural language processing (NLP) tasks. Although this method

has proven to be effective for many domains, it might not always provide desirable

beneﬁts. In this paper, we study the effects of hateful pre-training on low-resource

hate speech classiﬁcation tasks. While previous studies on the English language

have emphasized its importance, we aim to augment their observations with some

non-obvious insights. We evaluate different variations of tweet-based BERT models

pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This

evaluation is carried out for the Indian languages Hindi and Marathi. This paper is

empirical evidence that hateful pre-training is not the best pre-training option for

hate speech detection. We show that pre-training on non-hateful text from the target

domain provides similar or better results. Further, we introduce HindTweetBERT

and MahaTweetBERT, the ﬁrst publicly available BERT models pre-trained on

Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art

performance on hate speech classiﬁcation tasks. We also release hateful BERT for

the two languages and a gold hate speech evaluation benchmark HateEval-Hi and

HateEval-Mr consisting of manually labeled 2000 tweets each. The models and

data are available at https://github.com/l3cube-pune/MarathiNLP .

1 Introduction

Detecting hate speech in social media is a crucial task (Schmidt and Wiegand, 2017; Velankar et al.,

2022). The effect of hateful social media content on the mental health of society is still under

research, but it is undeniably negative (Kelly et al., 2018; De Choudhury and De, 2014). Twitter is

a powerful social media platform and is quite popular in India for the past few years. It is used by

many politicians, activists, journalists, and businessmen as an ofﬁcial medium of communication with

society. Though Article 19 in the Indian Constitution guarantees freedom of speech and expression,

identifying and curbing hateful tweets is essential to maintain harmony.

Hindi and Marathi are Indo-Aryan languages predominantly spoken in India. Both languages are

derived from Sanskrit and have 40+ dialects. Marathi is spoken by 83 million people. It is the

third-largest spoken language in India and the tenth in the world. Hindi, being spoken by 528 million

people, is the largest spoken language in India and the third largest in the world.

Identiﬁcation of hate in NLP has followed a common trend in NLP, the manual feature-based

classiﬁers were followed by CNNs and LSTMs, which were then superseded by the modern pre-

trained transformers (Mullah and Zainon, 2021; Badjatiya et al., 2017; Velankar et al., 2023). The

∗ﬁrst author, equal contribution

I Can’t Believe It’s Not Better Workshop at NeurIPS 2022.

arXiv:2210.04267v3 [cs.CL] 11 Dec 2022

transformer-based masked language models (MLM) pre-trained on a variety of text data are suitable

for general-purpose use cases.

Creating a domain-speciﬁc bias in a pre-training corpus has previously shown state-of-the-art results

(Gururangan et al., 2020). Thus, in this paper, we try to ﬁnd the impact of hateful pre-training

on hate speech classiﬁcation. The previous work has shown the positive impact of using Hateful

BERT for downstream hate speech identiﬁcation tasks (Caselli et al., 2020; Sarkar et al., 2021).

However, it remains to be veriﬁed that the improvements were indeed due to the hateful nature of the

pre-training corpus or are simply a side effect of adaptation to target domain text. The past work in

the high-resource language is thus incomplete and does not provide sufﬁcient evidence to analyze the

impact of hateful pre-training. To complete the analysis, we pre-train our models using both hateful

and non-hateful data from the target domain. Moreover, there is no previous work related to hateful

pre-training in low-resource Indic languages. Our work also tries to ﬁll this gap for low-resource

languages.

While evaluating the impact of pre-training, we build some useful resources for Hindi and Marathi.

We introduce two new models MahaTweetBERT

and HindTweetBERT

pre-trained on 40 million

Marathi and Hindi tweets respectively. We use these models along with MuRIL Khanuja et al. (2021),

the state-of-the-art Indic multilingual BERT to generate baseline results. To extract the most hateful

and least hateful tweets from these 40 million corpora, we classify the tweets using previous state-of-

the-art models and choose tweets with the highest conﬁdence (most hateful) and lowest conﬁdence

(least hateful). We show that the selected data is indeed hateful by randomly choosing 2000 samples

each for both languages and labeling them manually. To actually see if hateful pre-training has

an impact, we compare the performances of models pre-trained on the most hateful, least hateful,

and random corpora against our baseline on downstream hate speech identiﬁcation tasks. We show

that hateful pre-training is helpful when considered in isolation, however non-hateful or random

pre-training is equivalently good. The improvement in performance with hateful pre-training could be

a side effect of target domain adaptation and is not dependent upon the hatefulness of the pre-training

corpus. The hateful models are termed as MahaTweetBERT-Hateful

and HindTweetBERT-Hateful

The 40M tweets corpus is termed as L3Cube-MahaTweetCorpus and HindTweetCorpus for Marathi

and Hindi respectively. The datasets and models released as a part of this will be documented on

github6as well.

The main contributions of this work are as follows.

•

We show that hateful BERT is not always desirable for hate speech detection tasks, and the

BERT model pre-trained on non-hateful in-domain data yields similar or better performance.

•

We release pre-trained Twitter BERT models MahaTweetBERT and HindTweetBERT for

Marathi and Hindi. We also release MahaTweetBERT-Hateful and HindTweetBERT-Hateful,

the hateful version of the corresponding models. These models are ﬁne-tuned versions

of current state-of-the-art MahaBERT and HindBERT models on corresponding language

tweets data (40 M sentences).

•

We release gold standard benchmark hate speech detection datasets HateEval-Mr and

HateEval-Hi with 2000 manually labeled tweets.

2 Related Work

Pre-trained models have obtained remarkable results in many areas of NLP. Although these pre-

trained models are well suited for generalized tasks, they have some limitations in domain-speciﬁc

tasks. To combat this, numerous domain-speciﬁc models have been developed based on the BERT

architecture (Devlin et al., 2018). Domain-speciﬁc NLP models are pre-trained on in-domain data

that is unique to a speciﬁc category of text. For example, BioBERT (Lee et al., 2020) is a model

that is trained on large-scale biomedical corpora. It outperforms the previous state-of-the-art models

2MahaTweetBERT

3HindTweetBERT

4MahaTweetBERT-Hateful

5HindTweetBERT-Hateful

6MarathiNLP

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SpreadLoveNotHate:UnderminingtheImportanceofHatefulPre-trainingforHateSpeechDetectionOmkarGokhale1,AdityaKane1,ShantanuPatankar1,TanmayChavan1,RavirajJoshi2PuneInstituteofComputerTechnology,L3Cube1IndianInstituteofTechnologyMadras,L3Cube2{omkargokhale2001,adityakane1,shantanupatankar2001}@gmail....

展开>> 收起<<

Spread Love Not Hate Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale1 Aditya Kane1 Shantanu Patankar1 Tanmay Chavan1 Raviraj Joshi2.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Spread Love Not Hate Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale1 Aditya Kane1 Shantanu Patankar1 Tanmay Chavan1 Raviraj Joshi2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: