Spread Love Not Hate Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale1 Aditya Kane1 Shantanu Patankar1 Tanmay Chavan1 Raviraj Joshi2

2025-05-03 0 0 261.47KB 10 页 10玖币
侵权投诉
Spread Love Not Hate: Undermining the Importance
of Hateful Pre-training for Hate Speech Detection
Omkar Gokhale1, Aditya Kane1
, Shantanu Patankar1
, Tanmay Chavan1
, Raviraj Joshi2
Pune Institute of Computer Technology, L3Cube1
Indian Institute of Technology Madras, L3Cube2
{omkargokhale2001,adityakane1,shantanupatankar2001}@gmail.com,
{chavantanmay1402,ravirajoshi}@gmail.com
Abstract
Pre-training large neural language models, such as BERT, has led to impressive
gains on many natural language processing (NLP) tasks. Although this method
has proven to be effective for many domains, it might not always provide desirable
benefits. In this paper, we study the effects of hateful pre-training on low-resource
hate speech classification tasks. While previous studies on the English language
have emphasized its importance, we aim to augment their observations with some
non-obvious insights. We evaluate different variations of tweet-based BERT models
pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This
evaluation is carried out for the Indian languages Hindi and Marathi. This paper is
empirical evidence that hateful pre-training is not the best pre-training option for
hate speech detection. We show that pre-training on non-hateful text from the target
domain provides similar or better results. Further, we introduce HindTweetBERT
and MahaTweetBERT, the first publicly available BERT models pre-trained on
Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art
performance on hate speech classification tasks. We also release hateful BERT for
the two languages and a gold hate speech evaluation benchmark HateEval-Hi and
HateEval-Mr consisting of manually labeled 2000 tweets each. The models and
data are available at https://github.com/l3cube-pune/MarathiNLP .
1 Introduction
Detecting hate speech in social media is a crucial task (Schmidt and Wiegand, 2017; Velankar et al.,
2022). The effect of hateful social media content on the mental health of society is still under
research, but it is undeniably negative (Kelly et al., 2018; De Choudhury and De, 2014). Twitter is
a powerful social media platform and is quite popular in India for the past few years. It is used by
many politicians, activists, journalists, and businessmen as an official medium of communication with
society. Though Article 19 in the Indian Constitution guarantees freedom of speech and expression,
identifying and curbing hateful tweets is essential to maintain harmony.
Hindi and Marathi are Indo-Aryan languages predominantly spoken in India. Both languages are
derived from Sanskrit and have 40+ dialects. Marathi is spoken by 83 million people. It is the
third-largest spoken language in India and the tenth in the world. Hindi, being spoken by 528 million
people, is the largest spoken language in India and the third largest in the world.
Identification of hate in NLP has followed a common trend in NLP, the manual feature-based
classifiers were followed by CNNs and LSTMs, which were then superseded by the modern pre-
trained transformers (Mullah and Zainon, 2021; Badjatiya et al., 2017; Velankar et al., 2023). The
first author, equal contribution
I Can’t Believe It’s Not Better Workshop at NeurIPS 2022.
arXiv:2210.04267v3 [cs.CL] 11 Dec 2022
transformer-based masked language models (MLM) pre-trained on a variety of text data are suitable
for general-purpose use cases.
Creating a domain-specific bias in a pre-training corpus has previously shown state-of-the-art results
(Gururangan et al., 2020). Thus, in this paper, we try to find the impact of hateful pre-training
on hate speech classification. The previous work has shown the positive impact of using Hateful
BERT for downstream hate speech identification tasks (Caselli et al., 2020; Sarkar et al., 2021).
However, it remains to be verified that the improvements were indeed due to the hateful nature of the
pre-training corpus or are simply a side effect of adaptation to target domain text. The past work in
the high-resource language is thus incomplete and does not provide sufficient evidence to analyze the
impact of hateful pre-training. To complete the analysis, we pre-train our models using both hateful
and non-hateful data from the target domain. Moreover, there is no previous work related to hateful
pre-training in low-resource Indic languages. Our work also tries to fill this gap for low-resource
languages.
While evaluating the impact of pre-training, we build some useful resources for Hindi and Marathi.
We introduce two new models MahaTweetBERT
2
and HindTweetBERT
3
pre-trained on 40 million
Marathi and Hindi tweets respectively. We use these models along with MuRIL Khanuja et al. (2021),
the state-of-the-art Indic multilingual BERT to generate baseline results. To extract the most hateful
and least hateful tweets from these 40 million corpora, we classify the tweets using previous state-of-
the-art models and choose tweets with the highest confidence (most hateful) and lowest confidence
(least hateful). We show that the selected data is indeed hateful by randomly choosing 2000 samples
each for both languages and labeling them manually. To actually see if hateful pre-training has
an impact, we compare the performances of models pre-trained on the most hateful, least hateful,
and random corpora against our baseline on downstream hate speech identification tasks. We show
that hateful pre-training is helpful when considered in isolation, however non-hateful or random
pre-training is equivalently good. The improvement in performance with hateful pre-training could be
a side effect of target domain adaptation and is not dependent upon the hatefulness of the pre-training
corpus. The hateful models are termed as MahaTweetBERT-Hateful
4
and HindTweetBERT-Hateful
5
.
The 40M tweets corpus is termed as L3Cube-MahaTweetCorpus and HindTweetCorpus for Marathi
and Hindi respectively. The datasets and models released as a part of this will be documented on
github6as well.
The main contributions of this work are as follows.
We show that hateful BERT is not always desirable for hate speech detection tasks, and the
BERT model pre-trained on non-hateful in-domain data yields similar or better performance.
We release pre-trained Twitter BERT models MahaTweetBERT and HindTweetBERT for
Marathi and Hindi. We also release MahaTweetBERT-Hateful and HindTweetBERT-Hateful,
the hateful version of the corresponding models. These models are fine-tuned versions
of current state-of-the-art MahaBERT and HindBERT models on corresponding language
tweets data (40 M sentences).
We release gold standard benchmark hate speech detection datasets HateEval-Mr and
HateEval-Hi with 2000 manually labeled tweets.
2 Related Work
Pre-trained models have obtained remarkable results in many areas of NLP. Although these pre-
trained models are well suited for generalized tasks, they have some limitations in domain-specific
tasks. To combat this, numerous domain-specific models have been developed based on the BERT
architecture (Devlin et al., 2018). Domain-specific NLP models are pre-trained on in-domain data
that is unique to a specific category of text. For example, BioBERT (Lee et al., 2020) is a model
that is trained on large-scale biomedical corpora. It outperforms the previous state-of-the-art models
2MahaTweetBERT
3HindTweetBERT
4MahaTweetBERT-Hateful
5HindTweetBERT-Hateful
6MarathiNLP
2
摘要:

SpreadLoveNotHate:UnderminingtheImportanceofHatefulPre-trainingforHateSpeechDetectionOmkarGokhale1,AdityaKane1,ShantanuPatankar1,TanmayChavan1,RavirajJoshi2PuneInstituteofComputerTechnology,L3Cube1IndianInstituteofTechnologyMadras,L3Cube2{omkargokhale2001,adityakane1,shantanupatankar2001}@gmail....

展开>> 收起<<
Spread Love Not Hate Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale1 Aditya Kane1 Shantanu Patankar1 Tanmay Chavan1 Raviraj Joshi2.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:261.47KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注