and linguistic features extracted from spontaneous
speech provide valuable indicators of both mental
disorders such as depression (Low et al.,2020) and
cognitive impairment like AD or MCI (Fraser et al.,
2016a;Boschi et al.,2017). However, they did
not derive a strong conclusion about the main dis-
tinguishing speech-based symptoms in classifying
dementia from depression (Fraser et al.,2016b).
To address the first limitation, we generate a
novel aggregated dataset, which combines several
speech datasets comprising AD, MCI, HC, and
Depr labels with a variety of data collection pro-
cedures. To address the second and third limita-
tions, we introduce a novel approach, which applies
clustering techniques to inspect what data-driven
feature categories (symptoms) are the main differ-
entiators between AD, MCI, Depr, and HC sam-
ples. We then use the distinguishing symptoms as
a feature selection technique to classify AD, MCI,
and Depr. Our key findings indicate that 1) the
non-linear clustering approaches outperform the
linear techniques in terms of separability level of
distinct disease clusters; 2) acoustic abnormalities,
variations in lexical complexity and richness, repet-
itiveness (or circularity) of speech, word finding
difficulty, and coherence impairment are the main
differentiating symptoms to distinguish between
different types of dementia (e.g., AD and MCI),
and Depr; 3) data-driven differentiators are able to
substantially improve performance of classification
across diseases.
2 Related Work
There has been a substantial number of studies on
detecting either dementia (e.g., MCI or AD) or
depression from spontaneous speech. However,
little has been done to distinguish dementia from
depression using discourse patterns.
To discriminate dementia from depression,
Fraser et al. (2016b) applied speech data from the
Pitt corpus in the DementiaBank database (Becker
et al.,1994), elicited from elderly participants
through picture description task, with ‘Cookie
Theft’ (Goodglass et al.,2001) used as a picture.
The samples were labeled as either AD or HC based
on a personal history and a neuropsychological as-
sessment battery (Iverson et al.,2008). A subset
of the samples were labeled as depressed or non-
depressed based on the established threshold on
Hamilton Depression Rating Scale (HAM-D) test
scores (Bagby et al.,2004). To explore the distin-
guishing discourse patterns between AD and Depr,
Murray (2010) collected a speech dataset of elderly
participants (with Depr, AD, or HC labels) who
completed a picture description task, with Norman
Rockwell’s painting ‘The Soldier’ used as a picture.
Samples with Depr were diagnosed based on DSM-
IV criteria (Frances et al.,1995) and samples with
AD met NINCDS-ADRDA criteria (Tierney et al.,
1988) for probable AD. The datasets used in these
studies didn’t include other types of dementia such
as MCI, and all of their samples followed the same
data collection procedure, while we create an ag-
gregated dataset, which consists of AD, MCI, HC,
and Depr samples from different speech datasets
with various data collection procedures.
Murray (2010) examined whether elderly indi-
viduals with depression can be distinguished from
those at early stages of AD through distinct patterns
in narrative speech. Based on their findings, indi-
viduals with AD generated less informative speech
compared to the depressed patients in their pic-
ture descriptions, while there were no significant
differences in the informativeness of the narratives
between HC and Depr samples. Furthermore, quan-
titative and syntactic measures of discourse did not
differ across the three groups. However, Murray
(2010) did not attempt to make predictions using
the data.
Fraser et al. (2016b) investigated if the auto-
mated AD screening tools misclassify cognitively
healthy participants with Depr as AD when using
narrative speech. They also used linguistic and
acoustic features to classify non-depressed AD sub-
jects from those with comorbid depression from
speech elicited through picture description task. In
their study, they compared logistic regression (LR)
with support vector machines (SVM) classifica-
tion models. Their performance in distinguishing
between depressed and non-depressed AD sam-
ples was moderate (accuracy = 0.658) due to a
wide range of overlapping symptoms. In addi-
tion, they only applied classification approaches
and they didn’t derive the most informative fea-
tures discriminating between AD patients with and
without depression. In the present work, we apply
clustering approaches to cluster the diseases based
on the similarities in the discourse patterns, and
apply interpretability techniques to explore the dis-
tinguishing feature categories (symptoms) between
distinct diagnosis labels (i.e., HC, AD, MCI, and
Depr). We use the differentiating symptoms as a