Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data Malikeh Ehghaghi12 Frank Rudzicz12345 Jekaterina Novikova1

2025-04-27 0 0 3.11MB 14 页 10玖币
侵权投诉
Data-driven Approach to Differentiating between Depression and
Dementia from Noisy Speech and Language Data
Malikeh Ehghaghi1,2, Frank Rudzicz1,2,3,4,5, Jekaterina Novikova1
1Winterlight Labs, Toronto, ON
2Department of Computer Science, University of Toronto, ON
3Vector Institute for Artificial Intelligence, Toronto, ON
4Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, ON
5Surgical Safety Technologies Inc., Toronto, ON
{malikeh,jekaterina}@winterlightlabs.com,{frank}@spoclab.com
Abstract
A significant number of studies apply acoustic
and linguistic characteristics of human speech
as prominent markers of dementia and de-
pression. However, studies on discriminat-
ing depression from dementia are rare. Co-
morbid depression is frequent in dementia and
these clinical conditions share many overlap-
ping symptoms, but the ability to distinguish
between depression and dementia is essential
as depression is often curable. In this work, we
investigate the ability of clustering approaches
in distinguishing between depression and de-
mentia from human speech. We introduce a
novel aggregated dataset, which combines nar-
rative speech data from multiple conditions,
i.e., Alzheimer’s disease, mild cognitive im-
pairment, healthy control, and depression. We
compare linear and non-linear clustering ap-
proaches and show that non-linear clustering
techniques distinguish better between distinct
disease clusters. Our interpretability analysis
shows that the main differentiating symptoms
between dementia and depression are acoustic
abnormality, repetitiveness (or circularity) of
speech, word finding difficulty, coherence im-
pairment, and differences in lexical complex-
ity and richness.
1 Introduction
Depressive disorder and dementia are clinical con-
ditions that both impose a substantial cost globally
in terms of mortality and morbidity and have a sig-
nificant negative impact on social and economic
productivity (Jaeschke et al.,2021). Distinguish-
ing between these conditions has proven to be a
challenging task (Murray,2010) as they frequently
co-occur and have many overlapping symptoms
such as apathy (Lee and Lyketsos,2003), changes
in sleep patterns (Thorpe,2009), and concentration
issues (Korczyn and Halperin,2009). However, de-
pression is generally curable by either psychother-
apy or medication, while dementia is a neurode-
generative disease, which is caused by irreversible
deterioration of the nervous system. It is hence cru-
cial to differentiate between these two conditions
(Fraser et al.,2016b).
Previous studies demonstrated that machine
learning methods and speech analysis are useful in
detecting dementia from depression (Fraser et al.,
2016b;Murray,2010). However, the machine
learning methods used in prior studies suffer from
three main limitations:
Firstly, the datasets applied in prior literature
only comprise Alzheimer’s disease (AD), healthy
control (HC), and depression (Depr) samples of
senior participants with similar demographic distri-
butions and recording environments (Fraser et al.,
2016b;Murray,2010). In real world settings, the
datasets are very noisy due to variations in the data
collection procedures. Additionally, dementia is
not necessarily of the AD type in all cases, and
other types of dementia like mild cognitive impair-
ment (MCI) can be included.
Secondly, to the best of our knowledge, previous
studies have only used classification approaches
to detect AD from HC (Pulido et al.,2020;Bal-
agopalan et al.,2021;Balagopalan and Novikova,
2021), Depr from HC (Wu et al.,2022), or AD
from Depr (Fraser et al.,2016b) using speech. This
might not be an ideal simulation of the real world
diagnosis procedure. In clinical diagnosis, the first
step is to detect the symptoms and explore the pat-
tern changes in patient records before diagnosing
the disease (Regier et al.,2013), while in classi-
fication, we first map the samples to the disease
labels and then, apply interpretability methods to
explore the differentiating features between the
classes (Gordon,1999).
Lastly, prior studies demonstrated that acoustic
arXiv:2210.03303v1 [cs.CL] 7 Oct 2022
and linguistic features extracted from spontaneous
speech provide valuable indicators of both mental
disorders such as depression (Low et al.,2020) and
cognitive impairment like AD or MCI (Fraser et al.,
2016a;Boschi et al.,2017). However, they did
not derive a strong conclusion about the main dis-
tinguishing speech-based symptoms in classifying
dementia from depression (Fraser et al.,2016b).
To address the first limitation, we generate a
novel aggregated dataset, which combines several
speech datasets comprising AD, MCI, HC, and
Depr labels with a variety of data collection pro-
cedures. To address the second and third limita-
tions, we introduce a novel approach, which applies
clustering techniques to inspect what data-driven
feature categories (symptoms) are the main differ-
entiators between AD, MCI, Depr, and HC sam-
ples. We then use the distinguishing symptoms as
a feature selection technique to classify AD, MCI,
and Depr. Our key findings indicate that 1) the
non-linear clustering approaches outperform the
linear techniques in terms of separability level of
distinct disease clusters; 2) acoustic abnormalities,
variations in lexical complexity and richness, repet-
itiveness (or circularity) of speech, word finding
difficulty, and coherence impairment are the main
differentiating symptoms to distinguish between
different types of dementia (e.g., AD and MCI),
and Depr; 3) data-driven differentiators are able to
substantially improve performance of classification
across diseases.
2 Related Work
There has been a substantial number of studies on
detecting either dementia (e.g., MCI or AD) or
depression from spontaneous speech. However,
little has been done to distinguish dementia from
depression using discourse patterns.
To discriminate dementia from depression,
Fraser et al. (2016b) applied speech data from the
Pitt corpus in the DementiaBank database (Becker
et al.,1994), elicited from elderly participants
through picture description task, with ‘Cookie
Theft’ (Goodglass et al.,2001) used as a picture.
The samples were labeled as either AD or HC based
on a personal history and a neuropsychological as-
sessment battery (Iverson et al.,2008). A subset
of the samples were labeled as depressed or non-
depressed based on the established threshold on
Hamilton Depression Rating Scale (HAM-D) test
scores (Bagby et al.,2004). To explore the distin-
guishing discourse patterns between AD and Depr,
Murray (2010) collected a speech dataset of elderly
participants (with Depr, AD, or HC labels) who
completed a picture description task, with Norman
Rockwell’s painting ‘The Soldier’ used as a picture.
Samples with Depr were diagnosed based on DSM-
IV criteria (Frances et al.,1995) and samples with
AD met NINCDS-ADRDA criteria (Tierney et al.,
1988) for probable AD. The datasets used in these
studies didn’t include other types of dementia such
as MCI, and all of their samples followed the same
data collection procedure, while we create an ag-
gregated dataset, which consists of AD, MCI, HC,
and Depr samples from different speech datasets
with various data collection procedures.
Murray (2010) examined whether elderly indi-
viduals with depression can be distinguished from
those at early stages of AD through distinct patterns
in narrative speech. Based on their findings, indi-
viduals with AD generated less informative speech
compared to the depressed patients in their pic-
ture descriptions, while there were no significant
differences in the informativeness of the narratives
between HC and Depr samples. Furthermore, quan-
titative and syntactic measures of discourse did not
differ across the three groups. However, Murray
(2010) did not attempt to make predictions using
the data.
Fraser et al. (2016b) investigated if the auto-
mated AD screening tools misclassify cognitively
healthy participants with Depr as AD when using
narrative speech. They also used linguistic and
acoustic features to classify non-depressed AD sub-
jects from those with comorbid depression from
speech elicited through picture description task. In
their study, they compared logistic regression (LR)
with support vector machines (SVM) classifica-
tion models. Their performance in distinguishing
between depressed and non-depressed AD sam-
ples was moderate (accuracy = 0.658) due to a
wide range of overlapping symptoms. In addi-
tion, they only applied classification approaches
and they didn’t derive the most informative fea-
tures discriminating between AD patients with and
without depression. In the present work, we apply
clustering approaches to cluster the diseases based
on the similarities in the discourse patterns, and
apply interpretability techniques to explore the dis-
tinguishing feature categories (symptoms) between
distinct diagnosis labels (i.e., HC, AD, MCI, and
Depr). We use the differentiating symptoms as a
feature selection technique to classify the diseases.
3 Methods
3.1 Dataset
In this paper, we generated an aggregated su-
perset of the datasets listed in Table 1that con-
tains speech recordings of English-speaking par-
ticipants describing pictures. All the audio record-
ings were manually transcribed by trained transcrip-
tionists, using the CHAT protocol and annotations
(MacWhinney,2014).
Dataset AD MCI Depr HC
DementiaBank (Becker et al.,1994) 178 138 0 229
Healthy Aging 0 214 0 211
ADReSS (Luz et al.,2020) 54 0 0 54
DEPAC+ (Tasnim et al.,2022) 0 0 222 532
AD Clinical Trial 1616 0 0 0
Aggregated dataset 1848 352 222 1026
Table 1: Speech datasets used. For each dataset, the
number of samples with each diagnosis label is re-
ported in the following columns.
DementiaBank
(Becker et al.,1994) and
ADReSS
(Luz et al.,2020) are the datasets of
pathological speech elicited from participants
through picture description task, with ‘Cookie
Theft’ (Goodglass et al.,2001) used as a picture.
The recordings are labeled as AD, MCI, and HC.
Healthy Aging
is the dataset of speech elicited
from community volunteers through picture de-
scription task, with ‘Family in the Kitchen’, ‘Man
in the Living Room’, ‘Food Market’, ‘Picnic’,
‘Grandmother’s Birthday’, and ‘Romantic Dinner’
proprietary images. The recordings are labeled as
possible HC and MCI. Soft labels are based on the
established threshold on Montreal Cognitive As-
sessment (Nasreddine et al.,2005) screening tool.
DEPAC+
is the extended version of the
DEPAC
(Tasnim et al.,2022) dataset, with more samples
collected using the same data collection procedure.
This is a dataset of narrative speech elicited from
participants through picture description task, with
‘Family in the Kitchen’ and ‘Man Falling’ images.
The recordings are labeled as HC and Depr. Soft
labels are based on the established threshold on
Patient Health Questionnaire-9 (PHQ-9) (Kroenke
et al.,2001) test scores1.
AD Clinical Trial
is a dataset of speech record-
ings from the baseline and screening visits of a clin-
1
The participants with a PHQ-9 score
9
were labeled as
HC, and the remaining samples with a PHQ-9 score
10
met
criteria for symptoms of depression.
ical trial elicited from participants through picture
description task, with ‘Family in the kitchen’, ‘Man
in the Living Room’, ‘Grandmother’s Birthday’,
‘Romantic Dinner’, and ‘Cookie Theft’ (Goodglass
et al.,2001) images. All the recordings are labeled
as AD according to the the National Institute on Ag-
ing/Alzheimer’s Association citeria (Frisoni et al.,
2011).
All images other than ‘Cookie Theft’ (Goodglass
et al.,2001) were designed to match the ‘Cookie
theft’ picture in style and the amount of information
content units according to picture design principles
described by Patel and Connaghan (2014).
3.2 Feature Extraction
We extracted 220 acoustic features from audio, and
325 linguistic features from the associated tran-
scripts. These features were classified into the fol-
lowing categories (the full list is in Appendix A):
Acoustic:
This category includes spectral and
voicing-related features (e.g., Mel-Frequency Cep-
stral Coefficients (MFCC) (Rudzicz et al.,2012),
Fundamental frequency
(F0)
, or statistical func-
tionals of Zero-Crossing Rate (ZCR) (Kulkarni,
2018)) describing the acoustic properties of the
sound wave.
Syntactic Complexity:
This category com-
prises variables like the frequencies of various pro-
duction rules from the constituency parsing tree
of the transcripts (Chae and Nenkova,2009), or
Lu’s syntactic complexity features (Lu,2010) enu-
merating the rate of usage of different syntactic
structures.
Discourse Mapping:
This category consists of
features such as utterance distances, or speech-
graph features (Mota et al.,2012) like graph density
(Mirheidari et al.,2018) to calculate the repetitive-
ness or circularity of speech.
Lexical Complexity and Richness:
This cate-
gory accounts for the variables like frequency of
words, or measures of vocabulary diversity such
as type-token ratio (Richards,1987) describing the
lexical complexity and vocabulary richness of the
transcripts.
Information Content Units:
This category in-
cludes variables such as the number of objects,
subjects, locations, and actions used to measure
the number of items correctly named in the picture
description task previously found to be associated
with memory impairment (Croisile et al.,1996).
Sentiment:
This category contains features such
摘要:

Data-drivenApproachtoDifferentiatingbetweenDepressionandDementiafromNoisySpeechandLanguageDataMalikehEhghaghi1,2,FrankRudzicz1,2,3,4,5,JekaterinaNovikova11WinterlightLabs,Toronto,ON2DepartmentofComputerScience,UniversityofToronto,ON3VectorInstituteforArticialIntelligence,Toronto,ON4LiKaShingKnowled...

展开>> 收起<<
Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data Malikeh Ehghaghi12 Frank Rudzicz12345 Jekaterina Novikova1.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.11MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注