Sparse Dynamical Features generation application to Parkinsons Disease diagnosis Houssem Meghnoudja Bogdan Robua Mazen Alamira

2025-05-03 0 0 1.97MB 18 页 10玖币

侵权投诉

Sparse Dynamical Features generation, application to Parkinson’s

Disease diagnosis

Houssem Meghnoudja,∗, Bogdan Robua, Mazen Alamira

aUniv. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France

Abstract

In this study we focus on the diagnosis of Parkinson’s Disease (PD) based on electroencephalogram (EEG) signals. We propose

a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to

extract new demarcating features of the disease. The method was evaluated on a publicly available dataset containing EEG signals

recorded during a 3-oddball auditory task involving N =50 subjects, of whom 25 suﬀer from PD. By extracting two features,

and separating them with a straight line using a Linear Discriminant Analysis (LDA) classiﬁer, we can separate the healthy from

the unhealthy subjects with an accuracy of 90 % (p<0.03) using a single channel. By aggregating the information from three

channels and making them vote, we obtain an accuracy of 94 %, a sensitivity of 96 % and a speciﬁcity of 92 %. The evaluation

was carried out using a nested Leave-One-Out cross-validation procedure, thus preventing data leakage problems and giving a less

biased evaluation. Several tests were carried out to assess the validity and robustness of our approach, including the test where we

use only half the available data for training. Under this constraint, the model achieves an accuracy of 83.8 %.

Keywords: Parkinson’s Disease, Dynamical system, Electroencephalogram, Sparse features, Machine Learning

Highlights

•Features derived from the dynamical, frequency and temporal content of EEGs are relevant biomarkers for the

diagnosis of PD.

•Two explainable features are suﬃcient for an LDA model to achieve a classiﬁcation accuracy of 94 %.

•Few features with simple classiﬁers are more suitable for practical use, more explainable and trustworthy.

1. Introduction

Parkinson’s Disease (PD) is a chronic neurodegenerative disorder aﬀecting more than 6 million persons worldwide as

reported by the World Health Organization (WHO,2006;Dorsey et al.,2018). It is primarily caused by the lack of

dopamine in the brain, due to the slow death of the dopaminergic cells (WHO,2006;Balestrino and Schapira,2020).

PD is known by the general public for its motor symptoms such as: tremor at rest, rigidity, bradykinesia, akinesia, etc.

(WHO,2006;Balestrino and Schapira,2020), however, non-motor symptoms may accompany or precede the onset

of motor symptoms, sometimes even arriving 20 years before the onset of the latter (Chaudhuri et al.,2005;Kalia and

Lang,2015). Aﬀecting patients on a daily basis, non-motor symptoms are various: pain, fatigue, sleep disturbances,

bradyphrenia, communication issues, etc. , only to cite a few (Pfeiﬀer,2016;Witjas et al.,2002).

The diagnosis of PD is entirely clinical and is usually based on the manifestation of the motor symptoms (Berardelli

et al.,2013). Before the appearance of the latter, patients suﬀer, raising the urge to look for new biomarkers allowing

the early diagnosis of PD. The clinical diagnosis of PD is generally based on a pathological diagnosis or on a clinical

diagnosis criterion (as the one from the United Kingdom PD Society Brain Research Center (Gibb and Lees,1988)).

The overall clinical diagnosis accuracy is in the order of 75 % according to the World Health Organization (WHO,

2006) or around 79 % following (Rizzo et al.,2016). It is very important to note that the clinical diagnosis accuracy

did not signiﬁcantly improve during the last years particularly in the early stages of the disease where the response

to dopaminergic treatment is not clear and less prominent (Rizzo et al.,2016). Actually, during the early disease

manifestation (<5 years of disease duration) the clinical diagnosis accuracy is around 53 % and even lower, around

26 % accuracy, for patients with <3 years disease duration (Adler et al.,2014).

∗Corresponding author at: 11 Rue des Math´

ematiques, 38400 Saint-Martin-d’H`

eres, France

Email addresses: houssem.meghnoudj@gipsa-lab.fr (Houssem Meghnoudj), bogdan.robu@univ-grenoble-alpes.fr (Bogdan

Robu), mazen.alamir@gipsa-lab.inpg.fr (Mazen Alamir)

arXiv:2210.11624v2 [eess.SY] 29 Mar 2023

Electroencephalography (EEG) is a non-invasive method to record the electrical activity on the scalp which has been

shown to represent the macroscopic activity of the brain underneath. It is used by several studies to assess individuals

health conditions and to study brain function in healthy individuals as well as to diagnose various diseases that al-

ter the brain electrical activity such as: Parkinson’s Disease, epilepsy, Alzheimer’s, sleep disorders, schizophrenia,

etc (Souﬁneyestani et al.,2020).

EEG signals are known to have a low signal-to-noise ratio and present many diﬃculties. EEG noise is deﬁned by

any measured signal whose source is not the coveted brain activity (Urig¨

uen and Garcia-Zapirain,2015). Unfortu-

nately, in most cases the EEG signal is contaminated by various unwanted artefacts, even though we try to limit their

occurrence during the recording session. These artefacts are entangled with the desired brain activity and can have

an amplitude up to 100 times that of the brain activity. In most EEG we encounter the following undesired artefacts:

ocular, muscular, cardiac, perspiration, line noise, etc. (Luca;Urig¨

uen and Garcia-Zapirain (2015) give more details).

Another diﬃculty that we may encounter during EEG analysis is the volume conduction, i.e. the transmission of elec-

tric ﬁelds from a primary current source through biological tissue towards the recording electrodes (Olejniczak,2006).

Because of volume conduction, unwanted artefacts will impact a broader region and therefore will contaminate more

electrodes. In addition, we lose the ability to study a single source or brain region of interest; information is diluted

and a signal recorded at one electrode is a combination of all the electrical activities present elsewhere (Urig¨

uen and

Garcia-Zapirain,2015).

Parkinson’s disease diagnosis using EEG has been studied in several works. Cavanagh et al. (2018) uses a selection of

Fourier transform coeﬃcients to achieve a maximum accuracy of 82 %. It is to be noted that in our study we use the

same data as the former. Oh et al. (2020) proposes a fully automated approach based on a 1-Dimensional Convolu-

tional Neural Network (1-D CNN). The model directly classiﬁes the temporal EEG epochs achieving an accuracy of

88.2 %. To perform the diagnosis, Bhurane et al. (2019) relies on correlation coeﬃcients calculated between channels

as well as the coeﬃcients of an AR model identiﬁed on the EEG to yield a presumable accuracy of 99.1 %. Yuvaraj

et al. (2018) uses high-order spectra to perform the diagnosis by extracting thirteen features from the EEG frequency

spectrum, he achieved a presumable accuracy of 99.25 %. Han et al. (2013) uses the coeﬃcients of an AR model and

the wavelet packet entropy to analyse and investigate whether there is a diﬀerence between the parkinsonians and the

healthy individuals with no attempt to separate the subjects. Finally, Liu et al. (2017) utilises entropy-based features of

10 channels and a three-way decision model to obtain a classiﬁcation accuracy of 92.9 %. This last study would have

been more relevant if the author addressed the problem of unbalanced data-set. We note that the majority of studies

are based only on the frequency features of the EEG and that few studies focus on the temporal features while the two

domains should complement each other. Only a few of the features used are explainable and we can understand their

design basis to derive conclusions for future work.

We strongly believe that some of the above mentioned methods (Cavanagh et al.,2018;Oh et al.,2020;Bhurane et al.,

2019;Yuvaraj et al.,2018) are subject to data leakage problems. Data leakage is deﬁned as the use of information in

the model training process that is not supposed to be available at the time of prediction (Kaufman et al.,2012). This

would not be possible in a real life scenario, where we receive new samples of unlabelled data that we need to cate-

gorise. This data leakage will bias the evaluation of the model, which will perform better on the available data used

for training, but will perform poorly on the new data. The ﬁrst type of data leakage that some of the proposed methods

suﬀer from is group leakage, where correlated data from the same subject are present in both the training and the test

sets (Ayotte et al.,2021). In this case, and using limited amounts of data, a complex model such as the 1-D CNN can

even identify the subject’s signature. The second type of data leakage is the fact of optimising hyper-parameters and

perform feature selection directly on the test-set (absence of a validation set) (Kaufman et al.,2012).

The aim of this paper is to propose a method for PD diagnosis using EEG signals recorded during a 3-oddball audi-

tory task. The data at our disposal are composed of N=50 subjects, of which 25 patients suﬀering from Parkinson’s

disease. Our main focus is not to have the highest accuracy at any cost, but rather to develop a valid method with

minimal bias. We aim to identify new biomarkers that go beyond traditional EEG statistics and spectral content as

found in the literature, but instead consider the combination of frequency content, dynamics, and temporal aspects of

the EEG.

The proposed method has several notable advantages:

•It is inspired by the current understanding of how cognitive processes and brain works.

•It involves explainable features that may lead to future studies.

•It has the potential to work for the early and late stage disease diagnosis.

•It utilizes a simple and interpretable model with low computational demands.

•It has been rigorously constructed to avoid data leakage issues.

•It has been validated on a publicly available database, with transparent and open access implementation code

and clearly described execution steps.

In the present paper, the data we used and the pre-processings we applied are presented in Section 2. In Section 3,

the basic concepts of our method and its resemblance to the mechanism underlying cognitive processes are described,

furthermore, how the proposed idea can be put into practice from a mathematical point of view is also described in this

section. In Section 4, the results obtained are presented along with the various validity and robustness tests, moreover,

an evaluation in a more constrained settings is provided. Finally, the conclusion of our work is outlined in Section 5.

2. Dataset

First of all it is to note that our method is agnostic to the dataset selection as long as it contains EEG data. Moreover

it is straightforwardly applicable if the EEG was recorded during a 3-oddball auditory experiment. Several EEG

datasets dealing with the diagnosis of PD exist, we examined these before selecting the one on which we can evaluate

and test our method. As we want a signiﬁcantly large number of patients as well as a large amount of data, the

following dataset: http://predict.cs.unm.edu (ID: d001) (Cavanagh et al.,2017) was chosen. For clarity and

reproducibility, we tested our method on a publicly available dataset and made our implementation code accessible to

the public through the link: https://github.com/HoussemMEG/SDF_PD. Additionally, an explanatory animation

is included.

2.1. Data-set description

The experimental EEG data available was recorded from N=50 participants, 25 of whom were suﬀering from PD and

an equal number of sex and age matched participants serving as a control group (CTL). The PD group were subject

to the same experiment twice, once on-medication and the other time oﬀ-medication. In this document, we only con-

sider the oﬀ-medication sessions as they showed a noticeable separability from the CTL group in comparison to the

on-medication sessions.

The PD group were subject to a Uniﬁed Parkinson’s Disease Rating Scale (UPDRS) assessing the severity of their

disease which was scored by neurologists, the mean UPDRS score is (24.80±8.66). All participants underwent a Mini

Mental State Exam (MMSE), and all obtained a scored above 26 (PD: 28.68 ±1.03, CTL: 28.76 ±1.05) conﬁrming

their ability to comprehend the task they would be subjected to. Complete details and informations regarding the

subjects and the experimental procedure can be found in (Cavanagh et al.,2018).

The experiment consisted of a 3-Oddball auditory task, during which the subjects were presented with a series of

200 repetitive auditory stimuli (trials) infrequently interrupted by a deviant stimulus. Three types of stimuli can be

distinguished:

1. Standard (70 % of the trials).

2. Target (15 % of the trials).

3. Novel /Distractor (15 % of the trials).

During this task, the subjects had to count the number of target stimuli they had heard throughout the whole ex-

periment. The auditory stimuli were presented for a period of 200 ms and were separated by a random Inter-Trial

Interval (ITI) drawn from a uniform distribution of (500 — 1000) ms preventing subjects habituation and anticipation.

Figure.1draws an example of an auditory stimuli sequence.

200

trials

S1S2

T1S3S4N1N2

stimulus

duration (200 ms)

ITI (500, 1000) ms

duration

( 3 minutes)

S = Standard T = Target N = Novel

Figure 1. Example of a sequence of auditory stimuli.

2.2. Data analysis and pre-processing

Throughout the experiment, the EEG signal was continuously recorded at a sampling rate of fs=500 Hz by the mean

of 64 electrodes (channels). Very ventral temporal sites were removed by (Cavanagh et al.,2018) as they tend to be

unreliable, leaving at the end 60 channels. The data were then re-referenced to an average reference.

As mentioned in the introduction part, EEG signals are known to be very noisy and present many practical diﬃcul-

ties. Indeed, the coveted brain activity is of a low amplitude and is often drowned out by ambient noise, making the

pre-processing stage mandatory. Despite the intrinsic complexity of EEGs and their noise content, the pre-processing

steps we have applied are very mild due to the fact that our method is robust to noise. Firstly, to separate and disen-

tangle the unwanted, high-amplitude ocular activity from the coveted cerebral activity, we conducted an Independent

Component Analysis (ICA) on the data (Tharwat,2018). We analyzed each independent component (IC) of each

subject individually, the ICs that contained eye blinking were removed by projection following the guidelines and rec-

ommendation of (Luca) and (Cavanagh et al.,2018). Secondly, the data were then bandpass ﬁltered using a Hamming

window, attenuating the frequencies outside the (1 — 30) Hz interval. This frequency interval was selected because

many studies take 20 or 30 Hz as the upper ﬁltering limit (Starkstein et al.,1989). We have taken the widest interval

knowing that our method remains valid even if we widen this interval further.

Time windows (segments) starting from stimulus onset (0 ms) up to (+500 ms) post-stimulus were formed, resulting

in 200 time-locked segments, one for each stimulus (see Fig.2for a more detailed graphical representation). An

event-related potential (ERP) was also calculated separately for each stimulus type by vertically averaging all the

signal segments corresponding to the same stimulus type and channel (see Fig.3) (Luck,2005). The aim of this step

is to ﬁlter the signal and sum up the events occurring at the same time to make them stand out from the ambient noise.

Moreover, all the pre-processing steps were performed using MNE (An open-source Python package for exploring,

visualizing, and analyzing human neurophysiological data) version 0.24.1 (Gramfort et al.,2013).

3. Methodology

3.1. Idea and inspiration

To process, encode, retrieve and transmit information, biological neuronal networks oscillate (Ward,2003;Buzsaki

and Draguhn,2004). The frequencies and timings (phase) of these oscillations are important as they are at the basis

of the mechanism underlying cognitive processes (Bas¸ar et al.,2001;Fries,2005). As suggested by many studies,

oscillation frequencies are task dependent (Ward,2003). The oscillation timing is of a great importance since it carries

the information about the neuronal dynamics and it is also what makes neuronal synchronization possible. This latter

plays a crucial role in cognitive processes (Ward,2003;Buzsaki and Draguhn,2004). It should be noted that the EEG

mainly measures the electrical potential of a group of neurons oscillating in synchrony. Due to volume conduction,

the recorded EEG is the result from the combined activity of diﬀerent electrical sources distributed in several regions

of the brain. The mechanism of synchronization and desynchronization of a group of neurons suggest that the rhythms

contributing to the EEG occur in a pulsatory manner (Olejniczak,2006).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SparseDynamicalFeaturesgeneration,applicationtoParkinson'sDiseasediagnosisHoussemMeghnoudja,,BogdanRobua,MazenAlamiraaUniv.GrenobleAlpes,CNRS,GrenobleINP,GIPSA-lab,38000Grenoble,FranceAbstractInthisstudywefocusonthediagnosisofParkinson'sDisease(PD)basedonelectroencephalogram(EEG)signals.Weproposean...

展开>> 收起<<

Sparse Dynamical Features generation application to Parkinsons Disease diagnosis Houssem Meghnoudja Bogdan Robua Mazen Alamira.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sparse Dynamical Features generation application to Parkinsons Disease diagnosis Houssem Meghnoudja Bogdan Robua Mazen Alamira

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: