Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena1 Muskan Garg2 and Gunjan Ansari3

2025-05-06 0 0 415.39KB 12 页 10玖币

侵权投诉

Explainable Causal Analysis of Mental Health

on Social Media Data

Chandni Saxena1, Muskan Garg2, and Gunjan Ansari3

1The Chinese University of Hong Kong, Hong Kong SAR

csaxena@cse.cuhk.edu.hk

2University of Florida, Gainesville, Florida, USA

muskangarg@ufl.edu

3JSS Academy of Technical Education, Noida, India

gunjanansari@jssaten.ac.in

Abstract. With recent developments in Social Computing,Natural Lan-

guage Processing and Clinical Psychology, the social NLP research com-

munity addresses the challenge of automation in mental illness on social

media. A recent extension to the problem of multi-class classiﬁcation

of mental health issues is to identify the cause behind the user’s inten-

tion. However, multi-class causal categorization for mental health issues

on social media has a major challenge of wrong prediction due to the

overlapping problem of causal explanations. There are two possible mit-

igation techniques to solve this problem: (i) Inconsistency among causal

explanations/ inappropriate human-annotated inferences in the dataset,

(ii) in-depth analysis of arguments and stances in self-reported text using

discourse analysis. In this research work, we hypothesise that if there ex-

ists the inconsistency among F1 scores of diﬀerent classes, there must be

inconsistency among corresponding causal explanations as well. In this

task, we ﬁne tune the classiﬁers and ﬁnd explanations for multi-class

causal categorization of mental illness on social media with LIME and

Integrated Gradient (IG) methods. We test our methods with CAMS

dataset and validate with annotated interpretations. A key contribu-

tion of this research work is to ﬁnd the reason behind inconsistency in

accuracy of multi-class causal categorization. The eﬀectiveness of our

methods is evident with the results obtained having category-wise aver-

age scores of 81.29% and 0.906 using cosine similarity and word mover’s

distance, respectively.

Keywords: causal analysis ·explainability ·mental health ·text cate-

gorization

1 Introduction

People express their thoughts more conveniently on social media than during in-

person (often analytical) sessions with experts. As per the National Institute of

Mental Health report of 2020 1, 52.9 million adults in the USA suﬀer from mental

1https://www.nami.org/mhstats

arXiv:2210.08430v2 [cs.CL] 9 Nov 2022

2 C. Saxena et al.

illness. ”The Health at a Glance Europe 2020” report2noted that the COVID-

19 pandemic and the subsequent economic crisis caused a growing burden on

the mental well-being of the citizens, with evidence of higher rates of stress,

anxiety and depression. Previous studies support social media’s powerful role in

measuring the public’s social well-being [8]. To this end, we obtain Reddit social

media posts demonstrating mental health issues for mental health analysis.

In this research work, we narrow down the problem of mental health anal-

ysis to the identiﬁcation of reasons behind users’ intent in their social media

posts. The sequence to sequence (Seq2Seq) models are applied to solve the prob-

lem of causal categorization over CAMS dataset3. The ground-truth of CAMS

dataset contains two-fold annotations (i) causal category and (ii) interpretations.

The textual segments of interpretation support decision making for identifying

causal categories. However, there exists a major challenge of responsibility and

explainability for multi-class causal analysis while applying ﬁne-tuned Seq2Seq

models. In this context, we ﬁnd explanations for inconsistency among result-

ing accuracy of diﬀerent classes/ categories. Another key contribution is to ﬁnd

distance among inferences and explanations to obtain semantic similarity over

distributional word representation: (i) cosine similarity and (ii) word mover dis-

tance.

Deﬁnition 1: Inferences - The inferences are set of interpreted textual

segments by trained human-annotators which appears as ground-truth informa-

tion in CAMS dataset.

Deﬁnition 2: Explanations - The results obtained as the set of top-

keywords using explainable AI approaches for multi-class causal categorization

of Reddit posts is termed as explanations.

We further discuss a potential instance to deﬁne this problem of explainable

causal analysis in this section. Consider a given sample Awhere a user Upost

A: “Five years now and still no job. I am done with my life.” The user Uis upset

about his ﬁnancial problems/ career due to unemployment. We consider this text

as the user-generated social media data which demonstrates mental health issues.

The intent of a user is ‘to end life’ and a key challenge is to ﬁnd the reason behind

this intent. This cause-and-eﬀect relationship aids the causal categorization. The

category for sample Ais identiﬁed as ‘Jobs and careers’ because the reason is

associated with unemployment. There are ﬁve causal categories in annotated

CAMS dataset, namely, (i) bias or abuse, (ii) jobs and careers, (iii) medication,

(iv) relationships, and (v) alienation.

In this research work, we use the CAMS dataset for explanations on multi-

class causal categorization. We have made three major contributions in this

work. First, we ﬁne-tune deep learning models for multi-class causal categoriza-

tion. Second, we obtain explainable text for causal categorization using Local

2https://health.ec.europa.eu/system/ﬁles/2020-12/2020 healthatglance rep en 0.pdf

3https://github.com/drmuskangarg/CAMS

Explainable Causal Analysis of Mental Health on Social Media Data 3

Interpretable Model-Agnostic Explanations (LIME) and IG. Third, two seman-

tic similarity measures: cosine similarity and word mover distance assist the

validation of resulting explainable snippets with annotated inferences. Our ex-

perimental results explains the inconsistency among accuracy of diﬀerent classes

and validates the consistency of inferences made by model and human annota-

tors, thereby deﬁning the need of discourses and pragmatics for this problem of

causal analysis.

2 Background

Our task is deﬁned as a domain-speciﬁc problem to ﬁnd reasons behind the intent

of a user on social media. After extensive literature surveys, we observe minimal

work on this problem. A domain-speciﬁc dataset is available for public use to

examine the inferences (reasons) and causal categories (multi-class classiﬁcation)

task for mental health data as CAMS dataset [4]. The existing solution of a task

of causal analysis is given as the use of machine learning and neural models for

multi-class categorization of causal categories. The resulting values of f-measure

vary for diﬀerent classes and raise a new research question: To what extent causal

categorization is responsible? We choose to resolve this problem by ﬁnding and

validating the explainable texts.

To ﬁnd the explanations for causal categorization, we explore existing ex-

plainable AI methods for natural language processing [6]. Some well-established

surveys and tutorials categorize explainable approaches into local vs global,post

hoc vs self explaining and model agnostic vs model speciﬁc [3]. We choose to ob-

serve local explanations with given input features for post-hoc interpretability

methods which require less information. To this end, we identify two explain-

ability approaches which are suitable for this study: (i) LIME and (ii) IG.

LIME samples nearby observations and uses model estimates to ﬁt the logis-

tic regression [7]. The parameters of logistic regression represent the importance

measure and larger the parameters, greater eﬀect will have on the output. The

IG is an attempt to assign an attribution value to each input feature which

measures the extent to which an input contributes to the ﬁnal prediction [12].

A recent study is carried out to set a benchmark over three representative NLP

tasks (sentiment analysis, textual similarity and reading comprehension) for in-

terpretability of both neural models and saliency methods [14] thereby empha-

sizing the need of LIME and IG for downstream NLP tasks.

The explainable methods give output in the form of important words/ text

segments which serve as the most important input features. As we have available

human annotated inferences for causal categorization in the form of text, we use

these inferences as ground truth information (text-reference) and resulting expla-

nations (RE) (text-observation). Thus, we use two semantic similarity measures

to evaluate the performance of explainable methods for causal categorization-

Cosine similarity and Word Mover’s distance (WMD). Cosine similarity [9] cal-

culates similarity between two words, sentences, paragraph, piece of text etc and

evolves from the squared Euclidean distance measure which is used to measure

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExplainableCausalAnalysisofMentalHealthonSocialMediaDataChandniSaxena1,MuskanGarg2,andGunjanAnsari31TheChineseUniversityofHongKong,HongKongSARcsaxena@cse.cuhk.edu.hk2UniversityofFlorida,Gainesville,Florida,USAmuskangarg@ufl.edu3JSSAcademyofTechnicalEducation,Noida,Indiagunjanansari@jssaten.ac.inAbst...

展开>> 收起<<

Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena1 Muskan Garg2 and Gunjan Ansari3.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena1 Muskan Garg2 and Gunjan Ansari3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: