Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data Malikeh Ehghaghi12 Frank Rudzicz12345 Jekaterina Novikova1

2025-04-27 0 0 3.11MB 14 页 10玖币

侵权投诉

Data-driven Approach to Differentiating between Depression and

Dementia from Noisy Speech and Language Data

Malikeh Ehghaghi1,2, Frank Rudzicz1,2,3,4,5, Jekaterina Novikova1

1Winterlight Labs, Toronto, ON

2Department of Computer Science, University of Toronto, ON

3Vector Institute for Artiﬁcial Intelligence, Toronto, ON

4Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, ON

5Surgical Safety Technologies Inc., Toronto, ON

{malikeh,jekaterina}@winterlightlabs.com,{frank}@spoclab.com

Abstract

A signiﬁcant number of studies apply acoustic

and linguistic characteristics of human speech

as prominent markers of dementia and de-

pression. However, studies on discriminat-

ing depression from dementia are rare. Co-

morbid depression is frequent in dementia and

these clinical conditions share many overlap-

ping symptoms, but the ability to distinguish

between depression and dementia is essential

as depression is often curable. In this work, we

investigate the ability of clustering approaches

in distinguishing between depression and de-

mentia from human speech. We introduce a

novel aggregated dataset, which combines nar-

rative speech data from multiple conditions,

i.e., Alzheimer’s disease, mild cognitive im-

pairment, healthy control, and depression. We

compare linear and non-linear clustering ap-

proaches and show that non-linear clustering

techniques distinguish better between distinct

disease clusters. Our interpretability analysis

shows that the main differentiating symptoms

between dementia and depression are acoustic

abnormality, repetitiveness (or circularity) of

speech, word ﬁnding difﬁculty, coherence im-

pairment, and differences in lexical complex-

ity and richness.

1 Introduction

Depressive disorder and dementia are clinical con-

ditions that both impose a substantial cost globally

in terms of mortality and morbidity and have a sig-

niﬁcant negative impact on social and economic

productivity (Jaeschke et al.,2021). Distinguish-

ing between these conditions has proven to be a

challenging task (Murray,2010) as they frequently

co-occur and have many overlapping symptoms

such as apathy (Lee and Lyketsos,2003), changes

in sleep patterns (Thorpe,2009), and concentration

issues (Korczyn and Halperin,2009). However, de-

pression is generally curable by either psychother-

apy or medication, while dementia is a neurode-

generative disease, which is caused by irreversible

deterioration of the nervous system. It is hence cru-

cial to differentiate between these two conditions

(Fraser et al.,2016b).

Previous studies demonstrated that machine

learning methods and speech analysis are useful in

detecting dementia from depression (Fraser et al.,

2016b;Murray,2010). However, the machine

learning methods used in prior studies suffer from

three main limitations:

Firstly, the datasets applied in prior literature

only comprise Alzheimer’s disease (AD), healthy

control (HC), and depression (Depr) samples of

senior participants with similar demographic distri-

butions and recording environments (Fraser et al.,

2016b;Murray,2010). In real world settings, the

datasets are very noisy due to variations in the data

collection procedures. Additionally, dementia is

not necessarily of the AD type in all cases, and

other types of dementia like mild cognitive impair-

ment (MCI) can be included.

Secondly, to the best of our knowledge, previous

studies have only used classiﬁcation approaches

to detect AD from HC (Pulido et al.,2020;Bal-

agopalan et al.,2021;Balagopalan and Novikova,

2021), Depr from HC (Wu et al.,2022), or AD

from Depr (Fraser et al.,2016b) using speech. This

might not be an ideal simulation of the real world

diagnosis procedure. In clinical diagnosis, the ﬁrst

step is to detect the symptoms and explore the pat-

tern changes in patient records before diagnosing

the disease (Regier et al.,2013), while in classi-

ﬁcation, we ﬁrst map the samples to the disease

labels and then, apply interpretability methods to

explore the differentiating features between the

classes (Gordon,1999).

Lastly, prior studies demonstrated that acoustic

arXiv:2210.03303v1 [cs.CL] 7 Oct 2022

and linguistic features extracted from spontaneous

speech provide valuable indicators of both mental

disorders such as depression (Low et al.,2020) and

cognitive impairment like AD or MCI (Fraser et al.,

2016a;Boschi et al.,2017). However, they did

not derive a strong conclusion about the main dis-

tinguishing speech-based symptoms in classifying

dementia from depression (Fraser et al.,2016b).

To address the ﬁrst limitation, we generate a

novel aggregated dataset, which combines several

speech datasets comprising AD, MCI, HC, and

Depr labels with a variety of data collection pro-

cedures. To address the second and third limita-

tions, we introduce a novel approach, which applies

clustering techniques to inspect what data-driven

feature categories (symptoms) are the main differ-

entiators between AD, MCI, Depr, and HC sam-

ples. We then use the distinguishing symptoms as

a feature selection technique to classify AD, MCI,

and Depr. Our key ﬁndings indicate that 1) the

non-linear clustering approaches outperform the

linear techniques in terms of separability level of

distinct disease clusters; 2) acoustic abnormalities,

variations in lexical complexity and richness, repet-

itiveness (or circularity) of speech, word ﬁnding

difﬁculty, and coherence impairment are the main

differentiating symptoms to distinguish between

different types of dementia (e.g., AD and MCI),

and Depr; 3) data-driven differentiators are able to

substantially improve performance of classiﬁcation

across diseases.

2 Related Work

There has been a substantial number of studies on

detecting either dementia (e.g., MCI or AD) or

depression from spontaneous speech. However,

little has been done to distinguish dementia from

depression using discourse patterns.

To discriminate dementia from depression,

Fraser et al. (2016b) applied speech data from the

Pitt corpus in the DementiaBank database (Becker

et al.,1994), elicited from elderly participants

through picture description task, with ‘Cookie

Theft’ (Goodglass et al.,2001) used as a picture.

The samples were labeled as either AD or HC based

on a personal history and a neuropsychological as-

sessment battery (Iverson et al.,2008). A subset

of the samples were labeled as depressed or non-

depressed based on the established threshold on

Hamilton Depression Rating Scale (HAM-D) test

scores (Bagby et al.,2004). To explore the distin-

guishing discourse patterns between AD and Depr,

Murray (2010) collected a speech dataset of elderly

participants (with Depr, AD, or HC labels) who

completed a picture description task, with Norman

Rockwell’s painting ‘The Soldier’ used as a picture.

Samples with Depr were diagnosed based on DSM-

IV criteria (Frances et al.,1995) and samples with

AD met NINCDS-ADRDA criteria (Tierney et al.,

1988) for probable AD. The datasets used in these

studies didn’t include other types of dementia such

as MCI, and all of their samples followed the same

data collection procedure, while we create an ag-

gregated dataset, which consists of AD, MCI, HC,

and Depr samples from different speech datasets

with various data collection procedures.

Murray (2010) examined whether elderly indi-

viduals with depression can be distinguished from

those at early stages of AD through distinct patterns

in narrative speech. Based on their ﬁndings, indi-

viduals with AD generated less informative speech

compared to the depressed patients in their pic-

ture descriptions, while there were no signiﬁcant

differences in the informativeness of the narratives

between HC and Depr samples. Furthermore, quan-

titative and syntactic measures of discourse did not

differ across the three groups. However, Murray

(2010) did not attempt to make predictions using

the data.

Fraser et al. (2016b) investigated if the auto-

mated AD screening tools misclassify cognitively

healthy participants with Depr as AD when using

narrative speech. They also used linguistic and

acoustic features to classify non-depressed AD sub-

jects from those with comorbid depression from

speech elicited through picture description task. In

their study, they compared logistic regression (LR)

with support vector machines (SVM) classiﬁca-

tion models. Their performance in distinguishing

between depressed and non-depressed AD sam-

ples was moderate (accuracy = 0.658) due to a

wide range of overlapping symptoms. In addi-

tion, they only applied classiﬁcation approaches

and they didn’t derive the most informative fea-

tures discriminating between AD patients with and

without depression. In the present work, we apply

clustering approaches to cluster the diseases based

on the similarities in the discourse patterns, and

apply interpretability techniques to explore the dis-

tinguishing feature categories (symptoms) between

distinct diagnosis labels (i.e., HC, AD, MCI, and

Depr). We use the differentiating symptoms as a

feature selection technique to classify the diseases.

3 Methods

3.1 Dataset

In this paper, we generated an aggregated su-

perset of the datasets listed in Table 1that con-

tains speech recordings of English-speaking par-

ticipants describing pictures. All the audio record-

ings were manually transcribed by trained transcrip-

tionists, using the CHAT protocol and annotations

(MacWhinney,2014).

Dataset AD MCI Depr HC

DementiaBank (Becker et al.,1994) 178 138 0 229

Healthy Aging 0 214 0 211

ADReSS (Luz et al.,2020) 54 0 0 54

DEPAC+ (Tasnim et al.,2022) 0 0 222 532

AD Clinical Trial 1616 0 0 0

Aggregated dataset 1848 352 222 1026

Table 1: Speech datasets used. For each dataset, the

number of samples with each diagnosis label is re-

ported in the following columns.

DementiaBank

(Becker et al.,1994) and

ADReSS

(Luz et al.,2020) are the datasets of

pathological speech elicited from participants

through picture description task, with ‘Cookie

Theft’ (Goodglass et al.,2001) used as a picture.

The recordings are labeled as AD, MCI, and HC.

Healthy Aging

is the dataset of speech elicited

from community volunteers through picture de-

scription task, with ‘Family in the Kitchen’, ‘Man

in the Living Room’, ‘Food Market’, ‘Picnic’,

‘Grandmother’s Birthday’, and ‘Romantic Dinner’

proprietary images. The recordings are labeled as

possible HC and MCI. Soft labels are based on the

established threshold on Montreal Cognitive As-

sessment (Nasreddine et al.,2005) screening tool.

DEPAC+

is the extended version of the

DEPAC

(Tasnim et al.,2022) dataset, with more samples

collected using the same data collection procedure.

This is a dataset of narrative speech elicited from

participants through picture description task, with

‘Family in the Kitchen’ and ‘Man Falling’ images.

The recordings are labeled as HC and Depr. Soft

labels are based on the established threshold on

Patient Health Questionnaire-9 (PHQ-9) (Kroenke

et al.,2001) test scores1.

AD Clinical Trial

is a dataset of speech record-

ings from the baseline and screening visits of a clin-

The participants with a PHQ-9 score

≤9

were labeled as

HC, and the remaining samples with a PHQ-9 score

≥10

met

criteria for symptoms of depression.

ical trial elicited from participants through picture

description task, with ‘Family in the kitchen’, ‘Man

in the Living Room’, ‘Grandmother’s Birthday’,

‘Romantic Dinner’, and ‘Cookie Theft’ (Goodglass

et al.,2001) images. All the recordings are labeled

as AD according to the the National Institute on Ag-

ing/Alzheimer’s Association citeria (Frisoni et al.,

2011).

All images other than ‘Cookie Theft’ (Goodglass

et al.,2001) were designed to match the ‘Cookie

theft’ picture in style and the amount of information

content units according to picture design principles

described by Patel and Connaghan (2014).

3.2 Feature Extraction

We extracted 220 acoustic features from audio, and

325 linguistic features from the associated tran-

scripts. These features were classiﬁed into the fol-

lowing categories (the full list is in Appendix A):

Acoustic:

This category includes spectral and

voicing-related features (e.g., Mel-Frequency Cep-

stral Coefﬁcients (MFCC) (Rudzicz et al.,2012),

Fundamental frequency

(F0)

, or statistical func-

tionals of Zero-Crossing Rate (ZCR) (Kulkarni,

2018)) describing the acoustic properties of the

sound wave.

Syntactic Complexity:

This category com-

prises variables like the frequencies of various pro-

duction rules from the constituency parsing tree

of the transcripts (Chae and Nenkova,2009), or

Lu’s syntactic complexity features (Lu,2010) enu-

merating the rate of usage of different syntactic

structures.

Discourse Mapping:

This category consists of

features such as utterance distances, or speech-

graph features (Mota et al.,2012) like graph density

(Mirheidari et al.,2018) to calculate the repetitive-

ness or circularity of speech.

Lexical Complexity and Richness:

This cate-

gory accounts for the variables like frequency of

words, or measures of vocabulary diversity such

as type-token ratio (Richards,1987) describing the

lexical complexity and vocabulary richness of the

transcripts.

Information Content Units:

This category in-

cludes variables such as the number of objects,

subjects, locations, and actions used to measure

the number of items correctly named in the picture

description task previously found to be associated

with memory impairment (Croisile et al.,1996).

Sentiment:

This category contains features such

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Data-drivenApproachtoDifferentiatingbetweenDepressionandDementiafromNoisySpeechandLanguageDataMalikehEhghaghi1,2,FrankRudzicz1,2,3,4,5,JekaterinaNovikova11WinterlightLabs,Toronto,ON2DepartmentofComputerScience,UniversityofToronto,ON3VectorInstituteforArticialIntelligence,Toronto,ON4LiKaShingKnowled...

展开>> 收起<<

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data Malikeh Ehghaghi12 Frank Rudzicz12345 Jekaterina Novikova1.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data Malikeh Ehghaghi12 Frank Rudzicz12345 Jekaterina Novikova1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: