Transformer -based Text Classification on Unified Bangla Multi -class Emotion Corpus Md Sakib Ullah Sourav1 Huidong Wang1

2025-04-26 0 0 700.81KB 20 页 10玖币

侵权投诉

Transformer-based Text Classification on Unified

Bangla Multi-class Emotion Corpus

Md Sakib Ullah Sourav1, Huidong Wang1,

Mohammad Sultan Mahmud2*, Hua Zheng2

1School of Management Science and Engineering, Shandong University of

Finance and Economics, Jinan, China.

2*College of Computer Science and Software Engineering, Shenzhen

University,

Shenzhen, 518060, China.

*Corresponding author(s). E-mail(s): sultan@szu.edu.cn;

Contributing authors: sakibsourav@outlook.com;

huidong.wang@ia.ac.cn;

zhenghua2017@email.szu.edu.cn;

Abstract

Due to its importance in studying people’s thoughts on various Web 2.0 services,

emotion classification is a critical undertaking. Most existing research is focused on the

English lan- guage, with little work on low-resource languages. Though sentiment

analysis, particularly emotion classification in English, has received increasing attention in

recent years, little study has been done in the context of Bangla, one of the world’s most

widely spoken languages. In this research, we propose a complete set of approaches for

identifying and extracting emo- tions from Bangla texts. We provide a Bangla emotion

classifier for six classes, i.e., anger, disgust, fear, joy, sadness, and surprise, from Bangla

words using transformer-based models, which exhibit phenomenal results in recent days,

especially for high-resource languages. The Unified Bangla Multi-class Emotion Corpus

(UBMEC) is used to assess the performance of our models. UBMEC is created by

combining two previously released manually labelled datasets of Bangla comments on six

emotion classes with fresh manually labelled Bangla comments created by us. The corpus

dataset and code we used in this work are publicly available.

Keywords: Bangla corpus, Bangla emotion analysis, Text classification, Multi-class

emotion classification, Natural language processing

1 Introduction

While enough research has been done to identify emotions from visual and auditory data,

emo- tion recognition from textual data is still a new and active study topic [4]. WeChat,

Twitter, YouTube, Instagram, and Facebook, as well as other Web 2.0 platforms or social

networks (SNs), have recently emerged as the most important platforms for social

communication [32], education [23], information exchange [31], and other purposes [2, 9,

10] among a variety of people. Users of SN connect, share their thoughts, feelings, and ideas,

and participate in dis- cussion groups. Text conversation, or more specifically, emotion

classification (EC), is essential to comprehending people’s activities since the internet’s

invisible nature has made it possible for a single user to engage in violent SN speech data

[19].

EC is a subset of sentiment analysis (SA). Text-based SA is usually classified into two

types: opinion-based and emotion-based. Text polarity, which divides text or sentences into

positive, negative, or neutral feelings, is used to classify opinions [5]. EC is a technique for

extracting fine-grained emotions from speech, voice, picture, or text data [8]. Understanding

the emotion or sentiment behind a particular activity or trend in online content is of significant

value to busi- nesses, consumers, corporate leaders, governments, and other interested

parties [20] because an increasing number of people on virtual platforms are producing

online material at a rapid rate. In many human-computer interaction (HCI) systems where

text is the major form of com- munication, text classification is also crucial. The significant

rise in SNs has caused EC to divert its attention to social media data analysis. Nowadays, it is

common practice to employ com- putational linguistics, machine learning (ML), and deep

learning (DL) to assess the emotions or experiences indicated in user-written comments [27].

One of the most difficult issues in natural language processing (NLP), a branch of artificial

intelligence (AI) that requires a comprehension of natural language for many HCI applications

[24], is classifying emotions in text.

About 228 million people speak Bangla as their mother tongue, and another 37 million

do so as a second language, making it the fifth most widely used native language in the

world 1.

1 https://www.ethnologue.com/language/ben/

Bangla data storage has increased dramatically online recently due to the rise of Web 2.0

apps and related services, similar to other major languages. Unstructured textual formats such

as re- views, opinions, suggestions, ratings, comments, and feedback are just a few examples

of how this data is typically presented. Due to the supervised nature of classification

approaches, more labelled data must be used for ML and DL model training to yield useful

results. High-quality Bangla-labeled data is nevertheless scarce in many sectors. Contrary to

English and other West- ern dialects, which are acknowledged as being rich dialects in terms of

linguistics and technology, analyzing these massive volumes of data using NLP to identify

underlying sentiments or emo- tions is a challenging research topic for resource-constrained

languages like Bangla [12, 16]. The highly inflected elements of the Indo-Aryan language,

such as its 36 different noun forms, 24 different pronoun forms, and more than 160 varied

verb forms, make the EC operation in Bangla exceptionally difficult.

BNEmo [28] and BEmoC [15] are two Bangla emotion corpora that include 6327 and

7000 tagged reviews, respectively, and are divided into six groups, i.e., anger, disgust, fear,

joy, sor- row, and surprise. However, the development of an automated emotion classifier for

Bangla literature requires a comparatively larger amount of emotion corpus. As a

consequence, the transformer model [3] is the basis of our proposed model’s inspiration as

we strive to con- duct the EC of Bangla phrases in this study. The aforementioned paper

presented a thorough analysis of transformer-based models for emotion detection in texts

using pre-trained Bidi- rectional Encoder Representations from Transformers (BERT) word

embeddings. Recently, a range of downstream NLP applications have shown the efficacy of

transformer-based deep neu- ral network-based architectural models and modifications,

especially for resource-rich languages (e.g., English). Because Bangla EC has been the subject

of some prior studies, we want to assess it in the most efficient and trustworthy manner

possible.

In order to respond to the following two questions, this research study’s goal is: (1) Is

it possible to determine the emotion expressed by a social network user in Bangla using a

transformer-based DL model? (2) Does the DL approach with BERT word embeddings

work

better than the ML-based approaches to EC for the Bangla language? The use of a pre-

trained word embedding model for EC of Bangla texts is examined to solve the first study

question. Unlike ML-based techniques, Multilingual BERT is a DL model based on pre-

trained word em- bedding that is used to examine semantic links between words. The DL

models were compared to Bangla EC’s ML-based approaches to answer the second question.

The main contributions of our research are as follows:

•

A new, larger, unified 6-class (anger, disgust, fear, joy, sadness, and surprise) EC dataset

named Unified Bangla Multi-class Emotion Corpus (UBMEC) has been constructed for

Bangla based on user reviews. It amalgamates two previously published publicly available

Bangla emotion corpus, BNEmo and BEmoC, as well as additional manually tagged

corpus by us, resulting in an adequately developed Bangla emotion corpus. It is gathered

from various domains such as food, software, entertainment, politics, sports, and others.

•

A multilingual BERT model (m-BERT) for Bangla EC is being fine-tuned. This model is

based on a BERT base with 12 layers, 768 hidden heads, and 110 M parameters and has been

trained on 104 languages, including Bangla.

•

A set of baseline results from ML models, i.e., LR, NB, and XGBoost, to create a benchmark

for multi-class EC in Bangla.

The remainder of the paper is laid out as follows: The related work of EC is discussed

in Section 2. The recommended technique is presented in Section 3. Section 4 examines the

experimental findings and assessment criteria. The paper concludes in Section 5.

2 Related Works

2.1

Transformer models in text-based emotion classification (EC)

Researchers put their efforts into building state-of-the-art transformer models that detect emo-

tions from a text in rich resource languages, mostly English. Huang et al. [35] achieved

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Transformer-basedTextClassificationonUnifiedBanglaMulti-classEmotionCorpusMdSakibUllahSourav1,HuidongWang1,MohammadSultanMahmud2*,HuaZheng21SchoolofManagementScienceandEngineering,ShandongUniversityofFinanceandEconomics,Jinan,China.2*CollegeofComputerScienceandSoftwareEngineering,ShenzhenUniversity,...

展开>> 收起<<

Transformer -based Text Classification on Unified Bangla Multi -class Emotion Corpus Md Sakib Ullah Sourav1 Huidong Wang1.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Transformer -based Text Classification on Unified Bangla Multi -class Emotion Corpus Md Sakib Ullah Sourav1 Huidong Wang1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: