Transformer -based Text Classification on Unified Bangla Multi -class Emotion Corpus Md Sakib Ullah Sourav1 Huidong Wang1

2025-04-26 0 0 700.81KB 20 页 10玖币
侵权投诉
Transformer-based Text Classification on Unified
Bangla Multi-class Emotion Corpus
Md Sakib Ullah Sourav1, Huidong Wang1,
Mohammad Sultan Mahmud2*, Hua Zheng2
1School of Management Science and Engineering, Shandong University of
Finance and Economics, Jinan, China.
2*College of Computer Science and Software Engineering, Shenzhen
University,
Shenzhen, 518060, China.
*Corresponding author(s). E-mail(s): sultan@szu.edu.cn;
Contributing authors: sakibsourav@outlook.com;
huidong.wang@ia.ac.cn;
zhenghua2017@email.szu.edu.cn;
Abstract
Due to its importance in studying people’s thoughts on various Web 2.0 services,
emotion classification is a critical undertaking. Most existing research is focused on the
English lan- guage, with little work on low-resource languages. Though sentiment
analysis, particularly emotion classification in English, has received increasing attention in
recent years, little study has been done in the context of Bangla, one of the world’s most
widely spoken languages. In this research, we propose a complete set of approaches for
identifying and extracting emo- tions from Bangla texts. We provide a Bangla emotion
classifier for six classes, i.e., anger, disgust, fear, joy, sadness, and surprise, from Bangla
words using transformer-based models, which exhibit phenomenal results in recent days,
especially for high-resource languages. The Unified Bangla Multi-class Emotion Corpus
(UBMEC) is used to assess the performance of our models. UBMEC is created by
combining two previously released manually labelled datasets of Bangla comments on six
emotion classes with fresh manually labelled Bangla comments created by us. The corpus
dataset and code we used in this work are publicly available.
Keywords: Bangla corpus, Bangla emotion analysis, Text classification, Multi-class
emotion classification, Natural language processing
1
1 Introduction
While enough research has been done to identify emotions from visual and auditory data,
emo- tion recognition from textual data is still a new and active study topic [4]. WeChat,
Twitter, YouTube, Instagram, and Facebook, as well as other Web 2.0 platforms or social
networks (SNs), have recently emerged as the most important platforms for social
communication [32], education [23], information exchange [31], and other purposes [2, 9,
10] among a variety of people. Users of SN connect, share their thoughts, feelings, and ideas,
and participate in dis- cussion groups. Text conversation, or more specifically, emotion
classification (EC), is essential to comprehending people’s activities since the internet’s
invisible nature has made it possible for a single user to engage in violent SN speech data
[19].
EC is a subset of sentiment analysis (SA). Text-based SA is usually classified into two
types: opinion-based and emotion-based. Text polarity, which divides text or sentences into
positive, negative, or neutral feelings, is used to classify opinions [5]. EC is a technique for
extracting fine-grained emotions from speech, voice, picture, or text data [8]. Understanding
the emotion or sentiment behind a particular activity or trend in online content is of significant
value to busi- nesses, consumers, corporate leaders, governments, and other interested
parties [20] because an increasing number of people on virtual platforms are producing
online material at a rapid rate. In many human-computer interaction (HCI) systems where
text is the major form of com- munication, text classification is also crucial. The significant
rise in SNs has caused EC to divert its attention to social media data analysis. Nowadays, it is
common practice to employ com- putational linguistics, machine learning (ML), and deep
learning (DL) to assess the emotions or experiences indicated in user-written comments [27].
One of the most difficult issues in natural language processing (NLP), a branch of artificial
intelligence (AI) that requires a comprehension of natural language for many HCI applications
[24], is classifying emotions in text.
About 228 million people speak Bangla as their mother tongue, and another 37 million
do so as a second language, making it the fifth most widely used native language in the
world 1.
1 https://www.ethnologue.com/language/ben/
2
Bangla data storage has increased dramatically online recently due to the rise of Web 2.0
apps and related services, similar to other major languages. Unstructured textual formats such
as re- views, opinions, suggestions, ratings, comments, and feedback are just a few examples
of how this data is typically presented. Due to the supervised nature of classification
approaches, more labelled data must be used for ML and DL model training to yield useful
results. High-quality Bangla-labeled data is nevertheless scarce in many sectors. Contrary to
English and other West- ern dialects, which are acknowledged as being rich dialects in terms of
linguistics and technology, analyzing these massive volumes of data using NLP to identify
underlying sentiments or emo- tions is a challenging research topic for resource-constrained
languages like Bangla [12, 16]. The highly inflected elements of the Indo-Aryan language,
such as its 36 different noun forms, 24 different pronoun forms, and more than 160 varied
verb forms, make the EC operation in Bangla exceptionally difficult.
BNEmo [28] and BEmoC [15] are two Bangla emotion corpora that include 6327 and
7000 tagged reviews, respectively, and are divided into six groups, i.e., anger, disgust, fear,
joy, sor- row, and surprise. However, the development of an automated emotion classifier for
Bangla literature requires a comparatively larger amount of emotion corpus. As a
consequence, the transformer model [3] is the basis of our proposed model’s inspiration as
we strive to con- duct the EC of Bangla phrases in this study. The aforementioned paper
presented a thorough analysis of transformer-based models for emotion detection in texts
using pre-trained Bidi- rectional Encoder Representations from Transformers (BERT) word
embeddings. Recently, a range of downstream NLP applications have shown the efficacy of
transformer-based deep neu- ral network-based architectural models and modifications,
especially for resource-rich languages (e.g., English). Because Bangla EC has been the subject
of some prior studies, we want to assess it in the most efficient and trustworthy manner
possible.
In order to respond to the following two questions, this research study’s goal is: (1) Is
it possible to determine the emotion expressed by a social network user in Bangla using a
transformer-based DL model? (2) Does the DL approach with BERT word embeddings
work
3
better than the ML-based approaches to EC for the Bangla language? The use of a pre-
trained word embedding model for EC of Bangla texts is examined to solve the first study
question. Unlike ML-based techniques, Multilingual BERT is a DL model based on pre-
trained word em- bedding that is used to examine semantic links between words. The DL
models were compared to Bangla EC’s ML-based approaches to answer the second question.
The main contributions of our research are as follows:
A new, larger, unified 6-class (anger, disgust, fear, joy, sadness, and surprise) EC dataset
named Unified Bangla Multi-class Emotion Corpus (UBMEC) has been constructed for
Bangla based on user reviews. It amalgamates two previously published publicly available
Bangla emotion corpus, BNEmo and BEmoC, as well as additional manually tagged
corpus by us, resulting in an adequately developed Bangla emotion corpus. It is gathered
from various domains such as food, software, entertainment, politics, sports, and others.
A multilingual BERT model (m-BERT) for Bangla EC is being fine-tuned. This model is
based on a BERT base with 12 layers, 768 hidden heads, and 110 M parameters and has been
trained on 104 languages, including Bangla.
A set of baseline results from ML models, i.e., LR, NB, and XGBoost, to create a benchmark
for multi-class EC in Bangla.
The remainder of the paper is laid out as follows: The related work of EC is discussed
in Section 2. The recommended technique is presented in Section 3. Section 4 examines the
experimental findings and assessment criteria. The paper concludes in Section 5.
2 Related Works
2.1
Transformer models in text-based emotion classification (EC)
Researchers put their efforts into building state-of-the-art transformer models that detect emo-
tions from a text in rich resource languages, mostly English. Huang et al. [35] achieved
F1
4
摘要:

Transformer-basedTextClassificationonUnifiedBanglaMulti-classEmotionCorpusMdSakibUllahSourav1,HuidongWang1,MohammadSultanMahmud2*,HuaZheng21SchoolofManagementScienceandEngineering,ShandongUniversityofFinanceandEconomics,Jinan,China.2*CollegeofComputerScienceandSoftwareEngineering,ShenzhenUniversity,...

展开>> 收起<<
Transformer -based Text Classification on Unified Bangla Multi -class Emotion Corpus Md Sakib Ullah Sourav1 Huidong Wang1.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:700.81KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注