POLYHOPE TWO-LEVEL HOPE SPEECH DETECTION FROM TWEETS Fazlourrahman Balouchzahi Grigori Sidorov Alexander Gelbukh

2025-05-02 0 0 1.34MB 20 页 10玖币

侵权投诉

POLYHOPE: TWO-LEVEL HOPE SPEECH DETECTION FROM

TWEETS

Fazlourrahman Balouchzahi, Grigori Sidorov, Alexander Gelbukh

Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City, Mexico

{fbalouchzahi2021, sidorov, gelbukh}@cic.ipn.mx

ABSTRACT

Hope is characterized as openness of spirit toward the future, a desire, expectation, and wish

for something to happen or to be true that remarkably affects human’s state of mind, emotions,

behaviors, and decisions. Hope is usually associated with concepts of desired expectations and

possibility/probability concerning the future. Despite its importance, hope has rarely been studied

as a social media analysis task. This paper presents a hope speech dataset that classiﬁes each tweet

ﬁrst into “Hope" and “Not Hope", then into three ﬁne-grained hope categories: “Generalized Hope",

“Realistic Hope", and “Unrealistic Hope" (along with “Not Hope"). English tweets in the ﬁrst half

of 2022 were collected to build this dataset. Furthermore, we describe our annotation process and

guidelines in detail and discuss the challenges of classifying hope and the limitations of the existing

hope speech detection corpora. In addition, we reported several baselines based on different learning

approaches, such as traditional machine learning, deep learning, and transformers, to benchmark

our dataset. We evaluated our baselines using weighted-averaged and macro-averaged F1-scores.

Observations show that a strict process for annotator selection and detailed annotation guidelines

enhanced the dataset’s quality. This strict annotation process resulted in promising performance for

simple machine learning classiﬁers with only bi-grams; however, binary and multiclass hope speech

detection results reveal that contextual embedding models have higher performance in this dataset.

Keywords

Hope

Wish

Desire

Expectation

Machine Learning

Deep Learning

Transformers

Natural Language

Processing

1 INTRODUCTION

Hope is one of the exceptional human capabilities that enables one to envision future events and their possible expected

outcomes ﬂexibly. Those visions signiﬁcantly affect one’s emotions, behaviors, and state of mind, even though the

desired outcome may have signiﬁcantly less likelihood of happening [

]. Snyder (2000) [

] considers hope a powerful

compensating feature in human psychology to face challenges.

Nowadays, online social media platforms signiﬁcantly affect human life, and people freely pen their thoughts on this

social networks [

]. The signiﬁcant features of social media, such as rapid dissemination, low cost, accessibility, and

anonymity, have increased the popularity of social media platforms [

]. Since social media provides deep insight into

people behavior in social media, they are major sources of scientiﬁc researches on Natural Language Processing (NLP)

problems [5].

Hence, analyzing hope in social media is considered an essential determinant of well-being that can provide potentially

valuable insights into the trajectory of goal-directed behaviors, persistence in the face of misfortunes, and the processes

underlying adjustment to positive and negative life changes.

Over the last few years, several researchers explored psychological traits and other social media analysis tasks such

as: emotion analysis (fear, anger, happiness, depression), hate speech, abusive language identiﬁcation, and misogyny

detection through NLP techniques [

]. However, hope speech on social media has rarely been explored as an NLP

task. To the best of our knowledge, the Hope Speech dataset for Equality, Diversity, and Inclusion (HopeEDI) [

] is a

multilingual hope speech detection corpus in English and code-mixed Dravidian languages and the corpus presented by

arXiv:2210.14136v2 [cs.CL] 3 Nov 2022

PolyHope/ Balouchzahi et al.

Palakodety et al. (2020) [

] in English and Hindi are the only available corpora for Hope Speech detection. Both the

corpora model hope speech detection only as a binary Text Classiﬁcation (TC) task containing two classes, “Hope" and

“Not Hope". However, depending on the characteristics of hope, there may be different types of hope in texts. Hence,

text belonging to the “Hope" class can be further classiﬁed as classes like “rational and irrational or realistic and wishful

hopes" [9, 10, 11].

Given the only two datasets for the hope speech detection task mentioned above, our dataset provides opportunities to

explore further research in this domain.

In a comprehensive and general categorization, hope has been identiﬁed either as “Particularized" or as “Generalized"

hope [

]. These modes of hoping are differentiated based on their objective, cognitive-affective activity,

and behavioral characteristics. Particularized hope is always directed to speciﬁc outcomes and expectations, while

Generalized hope lacks a concrete objective and forms open-ended expectancies towards the future [12].

According to Ezzy (2000) [

]; Smith and Sparkes (2005) [

], Particularized hope is similar to the typical deﬁnition of

hope used in the psychological literature as the expectation and desire for speciﬁc events and outcomes (e.g., I hope

the surgery will be successful). In contrast, Generalized hope is characterized by openness to events and outcomes

(e.g., I hope I will get well). In the ﬁrst example, “surgery" is a speciﬁc event that one may hope to be successful,

while in the second example, it seems different and only hopes for a better future. Although all types of hope represent

future-oriented expectations (whether general or speciﬁc), they differ in how they inﬂuence human behaviors and

decision-making ability [

]. Therefore, distinguishing the different constructs of hopes might be beneﬁcial and lighten

a new path in social media analysis tasks.

While there is no report on different types of Generalized hope in the literature, based on the characteristics of the

desired outcomes, psychologists differentiate Particularized hope into two ﬁne-grained sub-categories, hope as an

expectation and hope as a want or wish [

]. Other nomenclatures for these sub-categories include : realistic and

wishful hopes [

]; realistic and unrealistic hopes [

]; realistic and false hopes [

]; rational and irrational

hopes [9].

One may hope for a high possibility outcome (e.g., we just got engaged and are hoping to get married soon) or for

an outcome with the knowledge that the likelihood of its happening is remote (e.g., winning a lottery) [

]. Eaves et

al. (2016) [

] distinguish Particularized hopes according to “reasonable or probable outcome, in terms of normal or

expected outcomes."

Realistic hope can be characterized as a hope directed toward speciﬁc outcomes, involving a process of mental imaging

along with occurrence probability calculation to prevent the person from losing his grip on reality [

] (e.g., I have

been studying a whole week, and I believe I can pass the test). In contrast, unrealistic hope is based on incomplete

or incorrect information and hopes for something improbable that is not coming through [

] (e.g., I got very low

marks and everyone says that I am already failed, but I am waiting for a miracle to happen). Regarding the associated

behavioral response of Realistic hope, Webb (2007) [

] argues that Realistic hope helps counteract risk aversion.

Therefore, distinguishing between Realistic and Unrealistic hopes is an essential task [10].

In this paper, we primarily discuss challenges, limitations, and methods used for annotation guidelines for the existing

binary hope speech detection datasets, and then we present a two-level annotated hope speech detection dataset in

English tweets according to the deﬁnition of hope given by the psychologists and the learning models for hope speech

detection using the new dataset. Our dataset and baselines will be available on request to the corresponding author.

The main contributions of this paper can be summarized as follows:

• Study of hope speech detection as a two-level Text Classiﬁcation (TC) task,

• Critical review of existing datasets,

• Developing the guidelines for annotating binary and multiclass hope speech detection dataset,

• Building binary and multiclass hope speech detection dataset,

• Modeling hope speech detection as a multiclass classiﬁcation task for the ﬁrst time,

•

Performing a range of experiments on learning approaches as baselines that provide a benchmark for future

research on hope speech detection tasks.

1.1 Task description

Inspired by the earlier literature about hope, in the present study, each tweet is ﬁrst classiﬁed as “Hope" or “Not Hope".

Further, the “Hope" class is, in turn, ﬁne-grained into one of three categories: “Generalized Hope", “Realistic Hope",

and “Unrealistic Hope". This task consists of the following two subtasks to classify hope speech from English tweets:

PolyHope/ Balouchzahi et al.

•Subtask A - Binary Hope Speech Detection:

In this task, each tweet will be identiﬁed as either Hope or Not

Hope,

•Subtask B - Multiclass Hope Speech Detection:

In this task, each tweet will be classiﬁed into ﬁne-grained

hope categories: Generalized Hope, Realistic Hope, and Unrealistic Hope, along with Not Hope tweets.

The rest of the paper is organized as follows: Hope is deﬁned in detail in 2; Existing hope speech detection corpora,

limitations, and techniques used for hope speech detection are discussed in 3. RELATED WORK and steps in

dataset creation are presented in 4. DATASET DEVELOPMENT; 5. BENCHMARKS and 6. RESULTS describe

the baselines and results, respectively, followed by the performance analysis of baselines in 7. ERROR ANALYSIS.

Eventually, 8. The DISCUSSION describes the dataset’s characteristics and limitations, and we conclude the paper in 9.

CONCLUSION AND FUTURE WORKS.

2 Deﬁnitions

Hope was studied in psychology as cognitive-based [

] and emotion-based [

] models. According to Snyder et

al. (1991) [

], hope was described as a cognitive-based model and deﬁned in terms of a goal-setting framework,

where a person is motivated to remain engaged with a future outcome and can anticipate a way to reach that outcome.

Conversely, Averill et al. (2012) [

] described hope as an emotion-associated model that depends on the perceived

likelihood of achieving an outcome.

There are diverse deﬁnitions of hope reported in the literature. Hope is deﬁned as an integral part of being a human [

] and usually a future-oriented thinking [

]. It encourages the person to transform his/her intentions to act and

prevent despair and depression [9]. Verhaeghe et al. (2007) [11] describe hope as a psychological process of adapting

to some unfortunate or unexpected event and situation. Eaves et al. (2016) [

] believe that hope is a dynamic and

multi-faceted mindset that can be considered a biological and supernatural medicine that directly impacts human health.

In the other deﬁnition, Maretha (2021) [

] presents hope as personal feelings co-related with mental activities of desire

and claims that it is an encouragement accompanied by desire, and the tendency that arises is in the form of real and

unreal expectations. Snyder (2002) [

], more generally, describes hope as a desire for something to happen or to be

true, which is commonly associated with promise, potential, support, reassurance, suggestions, or inspiration during

periods of illness, anger, stress, loneliness, and depression.

Hence, we conclude that hope is “a future-oriented expectation, desire or wish towards a general or speciﬁc

event/outcome phenomenon that has a signiﬁcant impact on human behavior, decision, and emotions."

3 RELATED WORK

Hope is a partially subjective term that both psychologists and philosophers are struggling to deﬁne it [

]. Hope

analysis can be located within social media tasks’ growing interest. However, most of the ongoing research on social

media is focused on controlling and eliminating harmful content such as hate speech, abuse and offensive, misogyny

detection, and false information or emotion analysis tasks [

]. Hope speech detection as a Natural Language Processing

(NLP) task was introduced by Chakravarthi et al. (2020) [

] and Palakodety et al. (2020) [

] by proposing two

multilingual corpora that classify each YouTube comment into Hope and Not Hope categories. Details of these corpora

are presented in Table 1. The existing hope speech corpora and their limitations, followed by the techniques for hope

speech detection, are described below:

3.1 Hope speech detection corpora

War-torn regions reveal a lot about the sentiments of people suffering and striving for peace. A comprehensive report [

]

on Kashmir (disputed territory) revealed instances of hope speech in YouTube comments after the Pulwama terror attack

on February 14, 2019. Palakodety et al. (2020) [

] constructed a multilingual dataset of YouTube comments in English

and Hindi written in Roman and Devanagari scripts, respectively. They used a combination of polyglot embeddings

from FastText (100-dimensions), sentiment score, and n-grams (1-3) with Logistic Regression (LR) to achieve the best

averaged-macro F1-score of 78.51 (

2.24%). They modeled hope speech detection as a positive comment mining

task (positive/negative sentiments) which shows a very shallow understanding of hope as a subject. In reality, hope is a

broad phenomenon with various emotions.

Chakravarthi et al. (2020) [

] ignited the other spark of hope speech detection in social media platforms by developing

a HopeEDI corpus from YouTube comments in Dravidian and English languages. Initially, the corpus consisted of

PolyHope/ Balouchzahi et al.

English and code-mixed Tamil-English and Malayalam-English datasets. Later, Chakravarthi et al. (2022) [

] extended

the work for Spanish and code-mixed Kannada-English texts1.

Similar to Palakodety et al. (2020) [

], the authors modeled the hope speech detection task as a binary classiﬁcation

task where each YouTube comment was identiﬁed as Hope or Not Hope. The HopeEDI corpus is a topic-based dataset

consisting of the comments of all YouTube videos shared under speciﬁc domains such as STEM, LGBTQ individuals,

racial minorities, or people with disabilities [

]. The detailed statistics of the HopeEDI corpus are presented in Table 2

Several traditional machine learning classiﬁers were experimented with Term Frequency-Inverse Document Frequency

(TF-IDF) vectors for word uni-grams as baselines, including LR, Support Vector Machine (SVM), K-Nearest Neighbor

(KNN), Decision Tree (DT), and Multinomial Naive Bayes (MNB). The DT classiﬁer with an averaged-macro F1-

score of 0.46 obtained the highest performance for the English dataset. Similarly, the results vary from 0.30 to 0.63

averaged-macro F1-scores for other languages using machine learning classiﬁers.

Some limitations of the existing corpora are listed as follows:

•

Both corpora explore only YouTube comments, whereas Twitter also is a rich source of social media texts

where users can share their feelings and opinions.

•

The low Inter-Annotator Agreement (IAA) of 0.63 for the English dataset in HopeEDI [

] corpus reveals a

lack of conﬁdence in the annotations.

•

Palakodety et al. (2020) [

] very speciﬁcally explores only positivity and supportive comments about conﬂicts

between India and Pakistan that reveals two issues: (i) the corpus is biased towards the conﬂicts between India

and Pakistan about Kashmir and (ii) the concept of hope is not considered

• Both the corpora are constructed for hope speech detection only as a binary classiﬁcation task.

•

The most signiﬁcant criticism on both the corpora is in the annotation guidelines where they consider hope

simply as a positive vibe and support, that partially may support concepts of generalized hope, optimism, and

positivity. However, based on deﬁnitions of hope [

], they are not entirely in line

with the idea of hope as having an expectation and desire of something to happen. Therefore, both corpora are

more likely supportive and positive comment detection corpora.

References Languages Script Size Source Classiﬁcation Avg. Macro F1 (for English)

[8] English

Hindi Roman

Devanagari 2,277

7,716 YouTube Binary 0.78

[7, 25]

English

Spanish

Tamil

Malayalam

Kannada

Roman

Code-mixed

28,424

1,650

17,715

9,918

6,176

YouTube Binary 0.46

Table 1: Available corpora in hope speech detection

Class Language

English Spanish Tamil Malayalam Kannada

Hope 2,234 660 7,084 1,858 1,909

Not Hope 23,347 660 8,870 6,989 3,649

Test set 2,843 330 1,761 1,071 618

Total 28,424 1,650 17,715 9,918 6,176

Table 2: Statistics of HopeEDI corpus

3.2 Techniques for Hope Speech Detection

Chakravarthi et al. (2021, 2022) [

] held two workshops on hope speech detection on HopeEDI corpus [

] and

several participants submitted their methodologies in these workshops. The statistics of corpus for all languages are

given in Table 2. A brief description of the works carried out by several researchers is given below:

1Malayalam, Tamil, and Kannada are Dravidian languages widely used in India

2In their ﬁnal version of the corpus, they did not mention the label distribution on the test set

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

POLYHOPE:TWO-LEVELHOPESPEECHDETECTIONFROMTWEETSFazlourrahmanBalouchzahi,GrigoriSidorov,AlexanderGelbukhInstitutoPolitécnicoNacional(IPN),CentrodeInvestigaciónenComputación(CIC),MexicoCity,Mexico{fbalouchzahi2021,sidorov,gelbukh}@cic.ipn.mxABSTRACTHopeischaracterizedasopennessofspirittowardthefuture,...

展开>> 收起<<

POLYHOPE TWO-LEVEL HOPE SPEECH DETECTION FROM TWEETS Fazlourrahman Balouchzahi Grigori Sidorov Alexander Gelbukh.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

POLYHOPE TWO-LEVEL HOPE SPEECH DETECTION FROM TWEETS Fazlourrahman Balouchzahi Grigori Sidorov Alexander Gelbukh

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: