Hate Speech and Offensive Language Detection in Bengali Mithun Das Somnath Banerjee Punyajoy Saha Animesh Mukherjee Indian Institute of Technology Kharagpur West Bengal India

2025-05-06 0 0 451.81KB 11 页 10玖币
侵权投诉
Hate Speech and Offensive Language Detection in Bengali
Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee
Indian Institute of Technology Kharagpur, West Bengal, India
mithundas@iitkgp.ac.in, som.iitkgpcse@kgpian.iitkgp.ac.in,
punyajoys@iitkgp.ac.in, animeshm@cse.iitkgp.ac.in
Abstract
Social media often serves as a breeding ground
for various hateful and offensive content. Iden-
tifying such content on social media is crucial
due to its impact on the race, gender, or re-
ligion in an unprejudiced society. However,
while there is extensive research in hate speech
detection in English, there is a gap in hateful
content detection in low-resource languages
like Bengali. Besides, a current trend on so-
cial media is the use of Romanized Bengali
for regular interactions. To overcome the ex-
isting research’s limitations, in this study, we
develop an annotated dataset of 10K Bengali
posts consisting of 5K actual and 5K Roman-
ized Bengali tweets. We implement several
baseline models for the classification of such
hateful posts. We further explore the inter-
lingual transfer mechanism to boost classifica-
tion performance. Finally, we perform an in-
depth error analysis by looking into the mis-
classified posts by the models. While training
actual and Romanized datasets separately, we
observe that XLM-Roberta performs the best.
Further, we witness that on joint training and
few-shot training, MuRIL outperforms other
models by interpreting the semantic expres-
sions better. We make our code and dataset
public for others1.
1 Introduction
Social media websites like Twitter and Facebook
have brought billions of people together and given
them the opportunity to share their thoughts and
opinions rapidly. On the one hand, it has facilitated
communication and the growth of social networks;
on the other, it has been exploited to propagate mis-
information, violence, and hate speech (Mathew
et al.,2019;Das et al.,2020) against users based
on their gender, race, religion, or other character-
istics. If such content is left unaddressed, it may
1https://github.com/hate-alert/
Bengali_Hate
result in widespread conflict and violence, raising
concerns about the safety of human rights, the rule
of law, and freedom of speech, all of which are cru-
cial for the growth of an unprejudiced democratic
society (Rizwan et al.,2020). Organizations such
as Facebook have been blamed for being a forum
for instigating anti-Muslim violence in Sri Lanka
that resulted in the deaths of three individuals
2
, and
a UN report accused them of disseminating hate
speech in a way that contributed significantly to the
plausible genocide of the Rohingya population in
Myanmar3.
In order to reduce the dissemination of such
harmful content, these platforms have developed
certain guidelines
4
that the users of these platforms
ought to comply with. If these rules aren’t followed,
the post can get deleted, or the user’s account might
get suspended. Even to diminish the harmful con-
tent from their forum, these platforms engage mod-
erators (Newton,2019) to manually review the
posts and preserve the platform as wholesome and
people-friendly. However, this moderation strat-
egy is confined by the moderators’ speed, jargon,
capability to understand the development of slang,
and familiarity with multilingual content. More-
over, due to the sheer magnitude of data streaming,
it is also an ambitious endeavor to examine each
post manually and filter out such harmful content.
Hence, an automated technique for detecting hate
speech and offensive language is extremely neces-
sary and inevitable.
It has already been witnessed that Facebook vig-
orously eliminated a considerable amount of mali-
cious content from its platforms even before users
reported it (Robertson,2020). However, the hin-
drance is that these platforms can detect harmful
2https://tinyurl.com/sriLankaRiots
3https://www.reuters.com/investigates/
special-report/myanmar-facebook-hate
4https://help.twitter.
com/en/rules-and-policies/
hateful-conduct-policy
arXiv:2210.03479v1 [cs.CL] 7 Oct 2022
content in certain popular languages such as En-
glish, Spanish, etc. (Perrigo,2019) So far, several
investigations have been conducted to identify hate
speech automatically, focusing mainly on the En-
glish language; therefore, an effort is required to
determine and diminish such hateful content in low-
resource languages.
With more than 210 million speakers, Bengali
is the seventh most widely spoken language
5
, with
around 100 million Bengali speakers in Bangladesh
and 85 million in India. Apart from Bangladesh and
India, Bengali is spoken in many countries, includ-
ing the United Kingdom, the United States, and the
Middle East
6
. Also, a current trend on social media
platforms is that apart from actual Bengali, people
tend to write Bengali using Latin scripts(English
characters) and often use English phrases in the
same conversation. This unique and informal com-
munication dialect is called code-mixed Bengali
or Roman Bengali. Code-mixing makes it easier
for speakers to communicate with one another by
providing a more comprehensive range of idioms
and phrases. However, as emphasized by Chittaran-
jan et al. (Chittaranjan et al.,2014), this has made
the task of creating NLP tools more challenging.
Along with these challenges, the challenges spe-
cific to identifying hate speech in Roman Bengali
contain the following: Absence of a hate speech
dataset,Lack of benchmark models. Thus, there is
a need to develop open efficient datasets and mod-
els to detect hate speech in Bengali. Although few
studies have been conducted in developing Ben-
gali hate speech datasets, most of these have been
crawled with comments from Facebook pages, and
all of them are in actual Bengali. Hence, there is a
need for developing more benchmarking datasets
considering other popular platforms. To address
these limitations, in this study, we make the follow-
ing contributions.
-
First, we create a gold-standard dataset of 10K
tweets among which 5K tweets are actual Ben-
gali and 5K tweets are Roman Bengali.
-
Second, we implement several baseline mod-
els to identify such hateful and offensive con-
tent automatically for both actual & Roman
Bengali tweets.
-
Third, we explore several interlingual transfer
mechanisms to boost the classification perfor-
5https://www.berlitz.com/en-uy/blog/
most-spoken-languages-world
6https://www.britannica.com/topic/
Bengali-language
mance.
-
Finally, we perform in-depth error analysis
by looking into a sample of posts where the
models mis-classify some of the test instances.
2 Related Work
Over the past few years, research around automated
hate speech detection has been evolved tremen-
dously. The earlier effort in developing resources
for the hate speech detection was mainly focused
around English language (Waseem and Hovy,2016;
Davidson et al.,2017;Founta et al.,2018). Re-
cently, in an effort to create multilingual hate
speech datasets, several shared task competitions
have been organized (HASOC (Mandl et al.,2019),
OffensEval (Zampieri et al.,2019)„ TRAC (Kumar
et al.,2020), etc.), and multiple datasets such as
Hindi (Modha et al.,2021), Danish (Sigurbergs-
son and Derczynski,2020), Greek (Pitenis et al.,
2020), Turkish (Çöltekin,2020), Mexican Span-
ish (Aragón et al.,2019), etc. have been made
public. There is also some work to detect hate
speech in actual Bengali. Ismam et al. (Ishmam
and Sharmin,2019) collected and annotated 5K
comments from Facebook into six classes-inciteful,
hate speech,religious hatred,communal attack,
religious comments, and political comments. How-
ever,the dataset is not publicly available. Karim
et al. (Karim et al.,2021) provided a dataset of
8K hateful posts collected from multiple sources
such as Facebook, news articles, blogs, etc. One
of the problems with this dataset is that all com-
ments are part of any hate class(personal,geopolit-
ical,religious, and political), so we cannot build
hate speech detection models using this dataset to
screen out hate speech. Romim et al. (2021) cu-
rated a dataset of 30K comments, making it one of
the most extensive datasets for hateful statements.
The author achieved 87.5% accuracy on their test
dataset using the SVM model. However, these
datasets do not consider Roman Bengali posts, a
prevalent communication method on social media
nowadays.
With regards to the detection systems, earlier
methods examined simple linguistic features such
as character and word n-grams, POS tags, tf-idf
with a traditional classifier such as LR, SVM, De-
cision Tree, etc (Davidson et al.,2017). With the
development of larger datasets, researchers have
shifted to data-hungry complex models such as
deep learning (Pitsilis et al.,2018;Zhang et al.,
Actual Roman Total
Hateful 825 510 1,335
Offensive 1,341 2,063 3,404
Normal 2,905 2,534 5,439
Total 5,071 5,107 10,178
Table 1: Dataset Statistics of both Actual and Roman
tweets.
2018) and graph embedding techniques to enrich
the classifier performance.
Recently, transformer-based (Vaswani et al.,
2017) language models such as BERT, XLM-
RoBERTa (Devlin et al.,2019) are becoming quite
popular in several downstream tasks. It has al-
ready been observed that these transformer-based
models outperform several earlier deep learning
models (Mathew et al.,2021). Having observed
these transformer-based models’ superior perfor-
mance, we focus on building these models for our
classification task.
Further, researchers have begun to explore few
shot classifications. One of the most popular tech-
niques for few-shot classification is transfer learn-
ing - where a model (pre-trained in a similar do-
main) is further fine-tuned on a few labeled samples
in the target domain (Alyafeai et al.,2020). Keep-
ing these experiments in mind, we also examine
the ability of transfer learning capabilities between
actual and Roman Bengali data.
3 Dataset Creation
In this section, we provide the data collection pro-
cedure, annotation strategies we have followed and
the statistics of the collected dataset.
3.1 Dataset collection and sampling
In this paper, we collect our dataset from
Twit-
ter
. Despite Hatebase.org maintaining the most
extensive collection of multilingual hateful words,
it still lacks such lexicon base for Bengali
7
. To
sample Bengali (actual and romanized) tweets for
annotation, we create a lexicon of 74 abusive
terms8). These lexicons consist of derogatory key-
words/slurs targeting individuals or different pro-
tected communities. We also include words based
on the name of the targeted communities. The
choice to add names of targeted communities is
made in order to extract random hateful/offensive
7https://hatebase.org/
8https://tinyurl.com/bengaliHate
tweets that do not contain any abusive words. Us-
ing Twitter API, we searched for tweets containing
phrases from the lexicons, which resulted in a sam-
ple of 500K tweets for actual Bengali and 150K
tweets for Roman Bengali. To evade problems re-
lated to user distribution bias, as highlighted by
Arango et al. (Arango et al.,2019), we limit a max-
imum of 75 tweets per user. We also do not use
more than 500 tweets per month to avoid event-
specific tweets in our dataset.
3.2 Annotation procedure
We employed four undergraduate students for our
annotation task. All undergraduate students are
Computer Science majors and native Bengali speak-
ers. They have been recruited voluntarily through
departmental emails and compensated via an Ama-
zon gift card. Two Ph.D. students led the anno-
tation process as expert annotators. Both expert
annotators had previous experience working with
malicious content on social media. Each tweet in
our dataset contains two kinds of annotations: first
whether the text is hate speech, offensive speech, or
normal; second, the target communities in the text.
This additional annotation of the target community
can help us measure bias in the model. Table 3lists
the target groups we have considered.
Annotation guidelines:
The annotation scheme
stated below constitute the main guidelines for the
annotators, while a codebook ensured common un-
derstanding of the label descriptions. We construct
our codebook (which consists the annotation guide-
lines for identifying hateful and offensive tweets
based on the definitions summarized as follows.
-Hate speech:
Hate speech is a language used
to express hatred toward a targeted individual
or group or is intended to be derogatory, hu-
miliating, or insulting to the group members
based on attributes such as race, religion, eth-
nic origin, sexual orientation, disability, caste,
geographic location or gender.
-Offensive:
Offensive speech uses profanity,
strongly impolite, rude, or vulgar language
expressed with fighting or hurtful words to
insult a targeted individual or group.
-Normal:
This contains tweets that do not fall
into the above categories.
3.3 Dataset creation steps
As a first step for creating the dataset, we required
a pilot gold-label dataset to instruct the annota-
tors. Initially, the expert annotators annotated 100
摘要:

HateSpeechandOffensiveLanguageDetectioninBengaliMithunDas,SomnathBanerjee,PunyajoySaha,AnimeshMukherjeeIndianInstituteofTechnologyKharagpur,WestBengal,Indiamithundas@iitkgp.ac.in,som.iitkgpcse@kgpian.iitkgp.ac.in,punyajoys@iitkgp.ac.in,animeshm@cse.iitkgp.ac.inAbstractSocialmediaoftenservesasabreedi...

展开>> 收起<<
Hate Speech and Offensive Language Detection in Bengali Mithun Das Somnath Banerjee Punyajoy Saha Animesh Mukherjee Indian Institute of Technology Kharagpur West Bengal India.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:451.81KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注