POLYHOPE TWO-LEVEL HOPE SPEECH DETECTION FROM TWEETS Fazlourrahman Balouchzahi Grigori Sidorov Alexander Gelbukh

2025-05-02 0 0 1.34MB 20 页 10玖币
侵权投诉
POLYHOPE: TWO-LEVEL HOPE SPEECH DETECTION FROM
TWEETS
Fazlourrahman Balouchzahi, Grigori Sidorov, Alexander Gelbukh
Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City, Mexico
{fbalouchzahi2021, sidorov, gelbukh}@cic.ipn.mx
ABSTRACT
Hope is characterized as openness of spirit toward the future, a desire, expectation, and wish
for something to happen or to be true that remarkably affects human’s state of mind, emotions,
behaviors, and decisions. Hope is usually associated with concepts of desired expectations and
possibility/probability concerning the future. Despite its importance, hope has rarely been studied
as a social media analysis task. This paper presents a hope speech dataset that classifies each tweet
first into “Hope" and “Not Hope", then into three fine-grained hope categories: “Generalized Hope",
“Realistic Hope", and “Unrealistic Hope" (along with “Not Hope"). English tweets in the first half
of 2022 were collected to build this dataset. Furthermore, we describe our annotation process and
guidelines in detail and discuss the challenges of classifying hope and the limitations of the existing
hope speech detection corpora. In addition, we reported several baselines based on different learning
approaches, such as traditional machine learning, deep learning, and transformers, to benchmark
our dataset. We evaluated our baselines using weighted-averaged and macro-averaged F1-scores.
Observations show that a strict process for annotator selection and detailed annotation guidelines
enhanced the dataset’s quality. This strict annotation process resulted in promising performance for
simple machine learning classifiers with only bi-grams; however, binary and multiclass hope speech
detection results reveal that contextual embedding models have higher performance in this dataset.
Keywords
Hope
·
Wish
·
Desire
·
Expectation
·
Machine Learning
·
Deep Learning
·
Transformers
·
Natural Language
Processing
1 INTRODUCTION
Hope is one of the exceptional human capabilities that enables one to envision future events and their possible expected
outcomes flexibly. Those visions significantly affect one’s emotions, behaviors, and state of mind, even though the
desired outcome may have significantly less likelihood of happening [
1
]. Snyder (2000) [
2
] considers hope a powerful
compensating feature in human psychology to face challenges.
Nowadays, online social media platforms significantly affect human life, and people freely pen their thoughts on this
social networks [
3
]. The significant features of social media, such as rapid dissemination, low cost, accessibility, and
anonymity, have increased the popularity of social media platforms [
4
]. Since social media provides deep insight into
people behavior in social media, they are major sources of scientific researches on Natural Language Processing (NLP)
problems [5].
Hence, analyzing hope in social media is considered an essential determinant of well-being that can provide potentially
valuable insights into the trajectory of goal-directed behaviors, persistence in the face of misfortunes, and the processes
underlying adjustment to positive and negative life changes.
Over the last few years, several researchers explored psychological traits and other social media analysis tasks such
as: emotion analysis (fear, anger, happiness, depression), hate speech, abusive language identification, and misogyny
detection through NLP techniques [
6
]. However, hope speech on social media has rarely been explored as an NLP
task. To the best of our knowledge, the Hope Speech dataset for Equality, Diversity, and Inclusion (HopeEDI) [
7
] is a
multilingual hope speech detection corpus in English and code-mixed Dravidian languages and the corpus presented by
arXiv:2210.14136v2 [cs.CL] 3 Nov 2022
PolyHope/ Balouchzahi et al.
Palakodety et al. (2020) [
8
] in English and Hindi are the only available corpora for Hope Speech detection. Both the
corpora model hope speech detection only as a binary Text Classification (TC) task containing two classes, “Hope" and
“Not Hope". However, depending on the characteristics of hope, there may be different types of hope in texts. Hence,
text belonging to the “Hope" class can be further classified as classes like “rational and irrational or realistic and wishful
hopes" [9, 10, 11].
Given the only two datasets for the hope speech detection task mentioned above, our dataset provides opportunities to
explore further research in this domain.
In a comprehensive and general categorization, hope has been identified either as “Particularized" or as “Generalized"
hope [
10
,
12
,
13
,
14
,
15
]. These modes of hoping are differentiated based on their objective, cognitive-affective activity,
and behavioral characteristics. Particularized hope is always directed to specific outcomes and expectations, while
Generalized hope lacks a concrete objective and forms open-ended expectancies towards the future [12].
According to Ezzy (2000) [
13
]; Smith and Sparkes (2005) [
15
], Particularized hope is similar to the typical definition of
hope used in the psychological literature as the expectation and desire for specific events and outcomes (e.g., I hope
the surgery will be successful). In contrast, Generalized hope is characterized by openness to events and outcomes
(e.g., I hope I will get well). In the first example, “surgery" is a specific event that one may hope to be successful,
while in the second example, it seems different and only hopes for a better future. Although all types of hope represent
future-oriented expectations (whether general or specific), they differ in how they influence human behaviors and
decision-making ability [
16
]. Therefore, distinguishing the different constructs of hopes might be beneficial and lighten
a new path in social media analysis tasks.
While there is no report on different types of Generalized hope in the literature, based on the characteristics of the
desired outcomes, psychologists differentiate Particularized hope into two fine-grained sub-categories, hope as an
expectation and hope as a want or wish [
10
]. Other nomenclatures for these sub-categories include : realistic and
wishful hopes [
17
]; realistic and unrealistic hopes [
12
,
18
,
19
]; realistic and false hopes [
11
]; rational and irrational
hopes [9].
One may hope for a high possibility outcome (e.g., we just got engaged and are hoping to get married soon) or for
an outcome with the knowledge that the likelihood of its happening is remote (e.g., winning a lottery) [
10
]. Eaves et
al. (2016) [
17
] distinguish Particularized hopes according to “reasonable or probable outcome, in terms of normal or
expected outcomes."
Realistic hope can be characterized as a hope directed toward specific outcomes, involving a process of mental imaging
along with occurrence probability calculation to prevent the person from losing his grip on reality [
12
] (e.g., I have
been studying a whole week, and I believe I can pass the test). In contrast, unrealistic hope is based on incomplete
or incorrect information and hopes for something improbable that is not coming through [
11
] (e.g., I got very low
marks and everyone says that I am already failed, but I am waiting for a miracle to happen). Regarding the associated
behavioral response of Realistic hope, Webb (2007) [
12
] argues that Realistic hope helps counteract risk aversion.
Therefore, distinguishing between Realistic and Unrealistic hopes is an essential task [10].
In this paper, we primarily discuss challenges, limitations, and methods used for annotation guidelines for the existing
binary hope speech detection datasets, and then we present a two-level annotated hope speech detection dataset in
English tweets according to the definition of hope given by the psychologists and the learning models for hope speech
detection using the new dataset. Our dataset and baselines will be available on request to the corresponding author.
The main contributions of this paper can be summarized as follows:
Study of hope speech detection as a two-level Text Classification (TC) task,
Critical review of existing datasets,
Developing the guidelines for annotating binary and multiclass hope speech detection dataset,
Building binary and multiclass hope speech detection dataset,
Modeling hope speech detection as a multiclass classification task for the first time,
Performing a range of experiments on learning approaches as baselines that provide a benchmark for future
research on hope speech detection tasks.
1.1 Task description
Inspired by the earlier literature about hope, in the present study, each tweet is first classified as “Hope" or “Not Hope".
Further, the “Hope" class is, in turn, fine-grained into one of three categories: “Generalized Hope", “Realistic Hope",
and “Unrealistic Hope". This task consists of the following two subtasks to classify hope speech from English tweets:
2
PolyHope/ Balouchzahi et al.
Subtask A - Binary Hope Speech Detection:
In this task, each tweet will be identified as either Hope or Not
Hope,
Subtask B - Multiclass Hope Speech Detection:
In this task, each tweet will be classified into fine-grained
hope categories: Generalized Hope, Realistic Hope, and Unrealistic Hope, along with Not Hope tweets.
The rest of the paper is organized as follows: Hope is defined in detail in 2; Existing hope speech detection corpora,
limitations, and techniques used for hope speech detection are discussed in 3. RELATED WORK and steps in
dataset creation are presented in 4. DATASET DEVELOPMENT; 5. BENCHMARKS and 6. RESULTS describe
the baselines and results, respectively, followed by the performance analysis of baselines in 7. ERROR ANALYSIS.
Eventually, 8. The DISCUSSION describes the dataset’s characteristics and limitations, and we conclude the paper in 9.
CONCLUSION AND FUTURE WORKS.
2 Definitions
Hope was studied in psychology as cognitive-based [
20
] and emotion-based [
21
] models. According to Snyder et
al. (1991) [
20
], hope was described as a cognitive-based model and defined in terms of a goal-setting framework,
where a person is motivated to remain engaged with a future outcome and can anticipate a way to reach that outcome.
Conversely, Averill et al. (2012) [
21
] described hope as an emotion-associated model that depends on the perceived
likelihood of achieving an outcome.
There are diverse definitions of hope reported in the literature. Hope is defined as an integral part of being a human [
12
,
22
] and usually a future-oriented thinking [
14
]. It encourages the person to transform his/her intentions to act and
prevent despair and depression [9]. Verhaeghe et al. (2007) [11] describe hope as a psychological process of adapting
to some unfortunate or unexpected event and situation. Eaves et al. (2016) [
17
] believe that hope is a dynamic and
multi-faceted mindset that can be considered a biological and supernatural medicine that directly impacts human health.
In the other definition, Maretha (2021) [
23
] presents hope as personal feelings co-related with mental activities of desire
and claims that it is an encouragement accompanied by desire, and the tendency that arises is in the form of real and
unreal expectations. Snyder (2002) [
24
], more generally, describes hope as a desire for something to happen or to be
true, which is commonly associated with promise, potential, support, reassurance, suggestions, or inspiration during
periods of illness, anger, stress, loneliness, and depression.
Hence, we conclude that hope is “a future-oriented expectation, desire or wish towards a general or specific
event/outcome phenomenon that has a significant impact on human behavior, decision, and emotions."
3 RELATED WORK
Hope is a partially subjective term that both psychologists and philosophers are struggling to define it [
24
]. Hope
analysis can be located within social media tasks’ growing interest. However, most of the ongoing research on social
media is focused on controlling and eliminating harmful content such as hate speech, abuse and offensive, misogyny
detection, and false information or emotion analysis tasks [
7
]. Hope speech detection as a Natural Language Processing
(NLP) task was introduced by Chakravarthi et al. (2020) [
7
] and Palakodety et al. (2020) [
8
] by proposing two
multilingual corpora that classify each YouTube comment into Hope and Not Hope categories. Details of these corpora
are presented in Table 1. The existing hope speech corpora and their limitations, followed by the techniques for hope
speech detection, are described below:
3.1 Hope speech detection corpora
War-torn regions reveal a lot about the sentiments of people suffering and striving for peace. A comprehensive report [
8
]
on Kashmir (disputed territory) revealed instances of hope speech in YouTube comments after the Pulwama terror attack
on February 14, 2019. Palakodety et al. (2020) [
8
] constructed a multilingual dataset of YouTube comments in English
and Hindi written in Roman and Devanagari scripts, respectively. They used a combination of polyglot embeddings
from FastText (100-dimensions), sentiment score, and n-grams (1-3) with Logistic Regression (LR) to achieve the best
averaged-macro F1-score of 78.51 (
±
2.24%). They modeled hope speech detection as a positive comment mining
task (positive/negative sentiments) which shows a very shallow understanding of hope as a subject. In reality, hope is a
broad phenomenon with various emotions.
Chakravarthi et al. (2020) [
7
] ignited the other spark of hope speech detection in social media platforms by developing
a HopeEDI corpus from YouTube comments in Dravidian and English languages. Initially, the corpus consisted of
3
PolyHope/ Balouchzahi et al.
English and code-mixed Tamil-English and Malayalam-English datasets. Later, Chakravarthi et al. (2022) [
25
] extended
the work for Spanish and code-mixed Kannada-English texts1.
Similar to Palakodety et al. (2020) [
8
], the authors modeled the hope speech detection task as a binary classification
task where each YouTube comment was identified as Hope or Not Hope. The HopeEDI corpus is a topic-based dataset
consisting of the comments of all YouTube videos shared under specific domains such as STEM, LGBTQ individuals,
racial minorities, or people with disabilities [
7
]. The detailed statistics of the HopeEDI corpus are presented in Table 2
2
.
Several traditional machine learning classifiers were experimented with Term Frequency-Inverse Document Frequency
(TF-IDF) vectors for word uni-grams as baselines, including LR, Support Vector Machine (SVM), K-Nearest Neighbor
(KNN), Decision Tree (DT), and Multinomial Naive Bayes (MNB). The DT classifier with an averaged-macro F1-
score of 0.46 obtained the highest performance for the English dataset. Similarly, the results vary from 0.30 to 0.63
averaged-macro F1-scores for other languages using machine learning classifiers.
Some limitations of the existing corpora are listed as follows:
Both corpora explore only YouTube comments, whereas Twitter also is a rich source of social media texts
where users can share their feelings and opinions.
The low Inter-Annotator Agreement (IAA) of 0.63 for the English dataset in HopeEDI [
7
] corpus reveals a
lack of confidence in the annotations.
Palakodety et al. (2020) [
8
] very specifically explores only positivity and supportive comments about conflicts
between India and Pakistan that reveals two issues: (i) the corpus is biased towards the conflicts between India
and Pakistan about Kashmir and (ii) the concept of hope is not considered
Both the corpora are constructed for hope speech detection only as a binary classification task.
The most significant criticism on both the corpora is in the annotation guidelines where they consider hope
simply as a positive vibe and support, that partially may support concepts of generalized hope, optimism, and
positivity. However, based on definitions of hope [
20
,
2
,
24
,
26
,
27
,
10
,
16
,
9
,
8
], they are not entirely in line
with the idea of hope as having an expectation and desire of something to happen. Therefore, both corpora are
more likely supportive and positive comment detection corpora.
References Languages Script Size Source Classification Avg. Macro F1 (for English)
[8] English
Hindi Roman
Devanagari 2,277
7,716 YouTube Binary 0.78
[7, 25]
English
Spanish
Tamil
Malayalam
Kannada
Roman
Code-mixed
28,424
1,650
17,715
9,918
6,176
YouTube Binary 0.46
Table 1: Available corpora in hope speech detection
Class Language
English Spanish Tamil Malayalam Kannada
Hope 2,234 660 7,084 1,858 1,909
Not Hope 23,347 660 8,870 6,989 3,649
Test set 2,843 330 1,761 1,071 618
Total 28,424 1,650 17,715 9,918 6,176
Table 2: Statistics of HopeEDI corpus
3.2 Techniques for Hope Speech Detection
Chakravarthi et al. (2021, 2022) [
28
,
25
] held two workshops on hope speech detection on HopeEDI corpus [
7
] and
several participants submitted their methodologies in these workshops. The statistics of corpus for all languages are
given in Table 2. A brief description of the works carried out by several researchers is given below:
1Malayalam, Tamil, and Kannada are Dravidian languages widely used in India
2In their final version of the corpus, they did not mention the label distribution on the test set
4
摘要:

POLYHOPE:TWO-LEVELHOPESPEECHDETECTIONFROMTWEETSFazlourrahmanBalouchzahi,GrigoriSidorov,AlexanderGelbukhInstitutoPolitécnicoNacional(IPN),CentrodeInvestigaciónenComputación(CIC),MexicoCity,Mexico{fbalouchzahi2021,sidorov,gelbukh}@cic.ipn.mxABSTRACTHopeischaracterizedasopennessofspirittowardthefuture,...

展开>> 收起<<
POLYHOPE TWO-LEVEL HOPE SPEECH DETECTION FROM TWEETS Fazlourrahman Balouchzahi Grigori Sidorov Alexander Gelbukh.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:1.34MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注