Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena1 Muskan Garg2 and Gunjan Ansari3

2025-05-06 0 0 415.39KB 12 页 10玖币
侵权投诉
Explainable Causal Analysis of Mental Health
on Social Media Data
Chandni Saxena1, Muskan Garg2, and Gunjan Ansari3
1The Chinese University of Hong Kong, Hong Kong SAR
csaxena@cse.cuhk.edu.hk
2University of Florida, Gainesville, Florida, USA
muskangarg@ufl.edu
3JSS Academy of Technical Education, Noida, India
gunjanansari@jssaten.ac.in
Abstract. With recent developments in Social Computing,Natural Lan-
guage Processing and Clinical Psychology, the social NLP research com-
munity addresses the challenge of automation in mental illness on social
media. A recent extension to the problem of multi-class classification
of mental health issues is to identify the cause behind the user’s inten-
tion. However, multi-class causal categorization for mental health issues
on social media has a major challenge of wrong prediction due to the
overlapping problem of causal explanations. There are two possible mit-
igation techniques to solve this problem: (i) Inconsistency among causal
explanations/ inappropriate human-annotated inferences in the dataset,
(ii) in-depth analysis of arguments and stances in self-reported text using
discourse analysis. In this research work, we hypothesise that if there ex-
ists the inconsistency among F1 scores of different classes, there must be
inconsistency among corresponding causal explanations as well. In this
task, we fine tune the classifiers and find explanations for multi-class
causal categorization of mental illness on social media with LIME and
Integrated Gradient (IG) methods. We test our methods with CAMS
dataset and validate with annotated interpretations. A key contribu-
tion of this research work is to find the reason behind inconsistency in
accuracy of multi-class causal categorization. The effectiveness of our
methods is evident with the results obtained having category-wise aver-
age scores of 81.29% and 0.906 using cosine similarity and word mover’s
distance, respectively.
Keywords: causal analysis ·explainability ·mental health ·text cate-
gorization
1 Introduction
People express their thoughts more conveniently on social media than during in-
person (often analytical) sessions with experts. As per the National Institute of
Mental Health report of 2020 1, 52.9 million adults in the USA suffer from mental
1https://www.nami.org/mhstats
arXiv:2210.08430v2 [cs.CL] 9 Nov 2022
2 C. Saxena et al.
illness. ”The Health at a Glance Europe 2020” report2noted that the COVID-
19 pandemic and the subsequent economic crisis caused a growing burden on
the mental well-being of the citizens, with evidence of higher rates of stress,
anxiety and depression. Previous studies support social media’s powerful role in
measuring the public’s social well-being [8]. To this end, we obtain Reddit social
media posts demonstrating mental health issues for mental health analysis.
In this research work, we narrow down the problem of mental health anal-
ysis to the identification of reasons behind users’ intent in their social media
posts. The sequence to sequence (Seq2Seq) models are applied to solve the prob-
lem of causal categorization over CAMS dataset3. The ground-truth of CAMS
dataset contains two-fold annotations (i) causal category and (ii) interpretations.
The textual segments of interpretation support decision making for identifying
causal categories. However, there exists a major challenge of responsibility and
explainability for multi-class causal analysis while applying fine-tuned Seq2Seq
models. In this context, we find explanations for inconsistency among result-
ing accuracy of different classes/ categories. Another key contribution is to find
distance among inferences and explanations to obtain semantic similarity over
distributional word representation: (i) cosine similarity and (ii) word mover dis-
tance.
Definition 1: Inferences - The inferences are set of interpreted textual
segments by trained human-annotators which appears as ground-truth informa-
tion in CAMS dataset.
Definition 2: Explanations - The results obtained as the set of top-
keywords using explainable AI approaches for multi-class causal categorization
of Reddit posts is termed as explanations.
We further discuss a potential instance to define this problem of explainable
causal analysis in this section. Consider a given sample Awhere a user Upost
A: “Five years now and still no job. I am done with my life.” The user Uis upset
about his financial problems/ career due to unemployment. We consider this text
as the user-generated social media data which demonstrates mental health issues.
The intent of a user is ‘to end life’ and a key challenge is to find the reason behind
this intent. This cause-and-effect relationship aids the causal categorization. The
category for sample Ais identified as ‘Jobs and careers’ because the reason is
associated with unemployment. There are five causal categories in annotated
CAMS dataset, namely, (i) bias or abuse, (ii) jobs and careers, (iii) medication,
(iv) relationships, and (v) alienation.
In this research work, we use the CAMS dataset for explanations on multi-
class causal categorization. We have made three major contributions in this
work. First, we fine-tune deep learning models for multi-class causal categoriza-
tion. Second, we obtain explainable text for causal categorization using Local
2https://health.ec.europa.eu/system/files/2020-12/2020 healthatglance rep en 0.pdf
3https://github.com/drmuskangarg/CAMS
Explainable Causal Analysis of Mental Health on Social Media Data 3
Interpretable Model-Agnostic Explanations (LIME) and IG. Third, two seman-
tic similarity measures: cosine similarity and word mover distance assist the
validation of resulting explainable snippets with annotated inferences. Our ex-
perimental results explains the inconsistency among accuracy of different classes
and validates the consistency of inferences made by model and human annota-
tors, thereby defining the need of discourses and pragmatics for this problem of
causal analysis.
2 Background
Our task is defined as a domain-specific problem to find reasons behind the intent
of a user on social media. After extensive literature surveys, we observe minimal
work on this problem. A domain-specific dataset is available for public use to
examine the inferences (reasons) and causal categories (multi-class classification)
task for mental health data as CAMS dataset [4]. The existing solution of a task
of causal analysis is given as the use of machine learning and neural models for
multi-class categorization of causal categories. The resulting values of f-measure
vary for different classes and raise a new research question: To what extent causal
categorization is responsible? We choose to resolve this problem by finding and
validating the explainable texts.
To find the explanations for causal categorization, we explore existing ex-
plainable AI methods for natural language processing [6]. Some well-established
surveys and tutorials categorize explainable approaches into local vs global,post
hoc vs self explaining and model agnostic vs model specific [3]. We choose to ob-
serve local explanations with given input features for post-hoc interpretability
methods which require less information. To this end, we identify two explain-
ability approaches which are suitable for this study: (i) LIME and (ii) IG.
LIME samples nearby observations and uses model estimates to fit the logis-
tic regression [7]. The parameters of logistic regression represent the importance
measure and larger the parameters, greater effect will have on the output. The
IG is an attempt to assign an attribution value to each input feature which
measures the extent to which an input contributes to the final prediction [12].
A recent study is carried out to set a benchmark over three representative NLP
tasks (sentiment analysis, textual similarity and reading comprehension) for in-
terpretability of both neural models and saliency methods [14] thereby empha-
sizing the need of LIME and IG for downstream NLP tasks.
The explainable methods give output in the form of important words/ text
segments which serve as the most important input features. As we have available
human annotated inferences for causal categorization in the form of text, we use
these inferences as ground truth information (text-reference) and resulting expla-
nations (RE) (text-observation). Thus, we use two semantic similarity measures
to evaluate the performance of explainable methods for causal categorization-
Cosine similarity and Word Mover’s distance (WMD). Cosine similarity [9] cal-
culates similarity between two words, sentences, paragraph, piece of text etc and
evolves from the squared Euclidean distance measure which is used to measure
摘要:

ExplainableCausalAnalysisofMentalHealthonSocialMediaDataChandniSaxena1,MuskanGarg2,andGunjanAnsari31TheChineseUniversityofHongKong,HongKongSARcsaxena@cse.cuhk.edu.hk2UniversityofFlorida,Gainesville,Florida,USAmuskangarg@ufl.edu3JSSAcademyofTechnicalEducation,Noida,Indiagunjanansari@jssaten.ac.inAbst...

展开>> 收起<<
Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena1 Muskan Garg2 and Gunjan Ansari3.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:415.39KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注