Sparse Dynamical Features generation application to Parkinsons Disease diagnosis Houssem Meghnoudja Bogdan Robua Mazen Alamira

2025-05-03 0 0 1.97MB 18 页 10玖币
侵权投诉
Sparse Dynamical Features generation, application to Parkinson’s
Disease diagnosis
Houssem Meghnoudja,, Bogdan Robua, Mazen Alamira
aUniv. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
Abstract
In this study we focus on the diagnosis of Parkinson’s Disease (PD) based on electroencephalogram (EEG) signals. We propose
a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to
extract new demarcating features of the disease. The method was evaluated on a publicly available dataset containing EEG signals
recorded during a 3-oddball auditory task involving N =50 subjects, of whom 25 suer from PD. By extracting two features,
and separating them with a straight line using a Linear Discriminant Analysis (LDA) classifier, we can separate the healthy from
the unhealthy subjects with an accuracy of 90 % (p<0.03) using a single channel. By aggregating the information from three
channels and making them vote, we obtain an accuracy of 94 %, a sensitivity of 96 % and a specificity of 92 %. The evaluation
was carried out using a nested Leave-One-Out cross-validation procedure, thus preventing data leakage problems and giving a less
biased evaluation. Several tests were carried out to assess the validity and robustness of our approach, including the test where we
use only half the available data for training. Under this constraint, the model achieves an accuracy of 83.8 %.
Keywords: Parkinson’s Disease, Dynamical system, Electroencephalogram, Sparse features, Machine Learning
Highlights
Features derived from the dynamical, frequency and temporal content of EEGs are relevant biomarkers for the
diagnosis of PD.
Two explainable features are sucient for an LDA model to achieve a classification accuracy of 94 %.
Few features with simple classifiers are more suitable for practical use, more explainable and trustworthy.
1. Introduction
Parkinson’s Disease (PD) is a chronic neurodegenerative disorder aecting more than 6 million persons worldwide as
reported by the World Health Organization (WHO,2006;Dorsey et al.,2018). It is primarily caused by the lack of
dopamine in the brain, due to the slow death of the dopaminergic cells (WHO,2006;Balestrino and Schapira,2020).
PD is known by the general public for its motor symptoms such as: tremor at rest, rigidity, bradykinesia, akinesia, etc.
(WHO,2006;Balestrino and Schapira,2020), however, non-motor symptoms may accompany or precede the onset
of motor symptoms, sometimes even arriving 20 years before the onset of the latter (Chaudhuri et al.,2005;Kalia and
Lang,2015). Aecting patients on a daily basis, non-motor symptoms are various: pain, fatigue, sleep disturbances,
bradyphrenia, communication issues, etc. , only to cite a few (Pfeier,2016;Witjas et al.,2002).
The diagnosis of PD is entirely clinical and is usually based on the manifestation of the motor symptoms (Berardelli
et al.,2013). Before the appearance of the latter, patients suer, raising the urge to look for new biomarkers allowing
the early diagnosis of PD. The clinical diagnosis of PD is generally based on a pathological diagnosis or on a clinical
diagnosis criterion (as the one from the United Kingdom PD Society Brain Research Center (Gibb and Lees,1988)).
The overall clinical diagnosis accuracy is in the order of 75 % according to the World Health Organization (WHO,
2006) or around 79 % following (Rizzo et al.,2016). It is very important to note that the clinical diagnosis accuracy
did not significantly improve during the last years particularly in the early stages of the disease where the response
to dopaminergic treatment is not clear and less prominent (Rizzo et al.,2016). Actually, during the early disease
manifestation (<5 years of disease duration) the clinical diagnosis accuracy is around 53 % and even lower, around
26 % accuracy, for patients with <3 years disease duration (Adler et al.,2014).
Corresponding author at: 11 Rue des Math´
ematiques, 38400 Saint-Martin-d’H`
eres, France
Email addresses: houssem.meghnoudj@gipsa-lab.fr (Houssem Meghnoudj), bogdan.robu@univ-grenoble-alpes.fr (Bogdan
Robu), mazen.alamir@gipsa-lab.inpg.fr (Mazen Alamir)
1
arXiv:2210.11624v2 [eess.SY] 29 Mar 2023
Electroencephalography (EEG) is a non-invasive method to record the electrical activity on the scalp which has been
shown to represent the macroscopic activity of the brain underneath. It is used by several studies to assess individuals
health conditions and to study brain function in healthy individuals as well as to diagnose various diseases that al-
ter the brain electrical activity such as: Parkinson’s Disease, epilepsy, Alzheimer’s, sleep disorders, schizophrenia,
etc (Soufineyestani et al.,2020).
EEG signals are known to have a low signal-to-noise ratio and present many diculties. EEG noise is defined by
any measured signal whose source is not the coveted brain activity (Urig¨
uen and Garcia-Zapirain,2015). Unfortu-
nately, in most cases the EEG signal is contaminated by various unwanted artefacts, even though we try to limit their
occurrence during the recording session. These artefacts are entangled with the desired brain activity and can have
an amplitude up to 100 times that of the brain activity. In most EEG we encounter the following undesired artefacts:
ocular, muscular, cardiac, perspiration, line noise, etc. (Luca;Urig¨
uen and Garcia-Zapirain (2015) give more details).
Another diculty that we may encounter during EEG analysis is the volume conduction, i.e. the transmission of elec-
tric fields from a primary current source through biological tissue towards the recording electrodes (Olejniczak,2006).
Because of volume conduction, unwanted artefacts will impact a broader region and therefore will contaminate more
electrodes. In addition, we lose the ability to study a single source or brain region of interest; information is diluted
and a signal recorded at one electrode is a combination of all the electrical activities present elsewhere (Urig¨
uen and
Garcia-Zapirain,2015).
Parkinson’s disease diagnosis using EEG has been studied in several works. Cavanagh et al. (2018) uses a selection of
Fourier transform coecients to achieve a maximum accuracy of 82 %. It is to be noted that in our study we use the
same data as the former. Oh et al. (2020) proposes a fully automated approach based on a 1-Dimensional Convolu-
tional Neural Network (1-D CNN). The model directly classifies the temporal EEG epochs achieving an accuracy of
88.2 %. To perform the diagnosis, Bhurane et al. (2019) relies on correlation coecients calculated between channels
as well as the coecients of an AR model identified on the EEG to yield a presumable accuracy of 99.1 %. Yuvaraj
et al. (2018) uses high-order spectra to perform the diagnosis by extracting thirteen features from the EEG frequency
spectrum, he achieved a presumable accuracy of 99.25 %. Han et al. (2013) uses the coecients of an AR model and
the wavelet packet entropy to analyse and investigate whether there is a dierence between the parkinsonians and the
healthy individuals with no attempt to separate the subjects. Finally, Liu et al. (2017) utilises entropy-based features of
10 channels and a three-way decision model to obtain a classification accuracy of 92.9 %. This last study would have
been more relevant if the author addressed the problem of unbalanced data-set. We note that the majority of studies
are based only on the frequency features of the EEG and that few studies focus on the temporal features while the two
domains should complement each other. Only a few of the features used are explainable and we can understand their
design basis to derive conclusions for future work.
We strongly believe that some of the above mentioned methods (Cavanagh et al.,2018;Oh et al.,2020;Bhurane et al.,
2019;Yuvaraj et al.,2018) are subject to data leakage problems. Data leakage is defined as the use of information in
the model training process that is not supposed to be available at the time of prediction (Kaufman et al.,2012). This
would not be possible in a real life scenario, where we receive new samples of unlabelled data that we need to cate-
gorise. This data leakage will bias the evaluation of the model, which will perform better on the available data used
for training, but will perform poorly on the new data. The first type of data leakage that some of the proposed methods
suer from is group leakage, where correlated data from the same subject are present in both the training and the test
sets (Ayotte et al.,2021). In this case, and using limited amounts of data, a complex model such as the 1-D CNN can
even identify the subject’s signature. The second type of data leakage is the fact of optimising hyper-parameters and
perform feature selection directly on the test-set (absence of a validation set) (Kaufman et al.,2012).
The aim of this paper is to propose a method for PD diagnosis using EEG signals recorded during a 3-oddball audi-
tory task. The data at our disposal are composed of N=50 subjects, of which 25 patients suering from Parkinson’s
disease. Our main focus is not to have the highest accuracy at any cost, but rather to develop a valid method with
minimal bias. We aim to identify new biomarkers that go beyond traditional EEG statistics and spectral content as
found in the literature, but instead consider the combination of frequency content, dynamics, and temporal aspects of
the EEG.
2
The proposed method has several notable advantages:
It is inspired by the current understanding of how cognitive processes and brain works.
It involves explainable features that may lead to future studies.
It has the potential to work for the early and late stage disease diagnosis.
It utilizes a simple and interpretable model with low computational demands.
It has been rigorously constructed to avoid data leakage issues.
It has been validated on a publicly available database, with transparent and open access implementation code
and clearly described execution steps.
In the present paper, the data we used and the pre-processings we applied are presented in Section 2. In Section 3,
the basic concepts of our method and its resemblance to the mechanism underlying cognitive processes are described,
furthermore, how the proposed idea can be put into practice from a mathematical point of view is also described in this
section. In Section 4, the results obtained are presented along with the various validity and robustness tests, moreover,
an evaluation in a more constrained settings is provided. Finally, the conclusion of our work is outlined in Section 5.
2. Dataset
First of all it is to note that our method is agnostic to the dataset selection as long as it contains EEG data. Moreover
it is straightforwardly applicable if the EEG was recorded during a 3-oddball auditory experiment. Several EEG
datasets dealing with the diagnosis of PD exist, we examined these before selecting the one on which we can evaluate
and test our method. As we want a significantly large number of patients as well as a large amount of data, the
following dataset: http://predict.cs.unm.edu (ID: d001) (Cavanagh et al.,2017) was chosen. For clarity and
reproducibility, we tested our method on a publicly available dataset and made our implementation code accessible to
the public through the link: https://github.com/HoussemMEG/SDF_PD. Additionally, an explanatory animation
is included.
2.1. Data-set description
The experimental EEG data available was recorded from N=50 participants, 25 of whom were suering from PD and
an equal number of sex and age matched participants serving as a control group (CTL). The PD group were subject
to the same experiment twice, once on-medication and the other time o-medication. In this document, we only con-
sider the o-medication sessions as they showed a noticeable separability from the CTL group in comparison to the
on-medication sessions.
The PD group were subject to a Unified Parkinson’s Disease Rating Scale (UPDRS) assessing the severity of their
disease which was scored by neurologists, the mean UPDRS score is (24.80±8.66). All participants underwent a Mini
Mental State Exam (MMSE), and all obtained a scored above 26 (PD: 28.68 ±1.03, CTL: 28.76 ±1.05) confirming
their ability to comprehend the task they would be subjected to. Complete details and informations regarding the
subjects and the experimental procedure can be found in (Cavanagh et al.,2018).
The experiment consisted of a 3-Oddball auditory task, during which the subjects were presented with a series of
200 repetitive auditory stimuli (trials) infrequently interrupted by a deviant stimulus. Three types of stimuli can be
distinguished:
1. Standard (70 % of the trials).
2. Target (15 % of the trials).
3. Novel /Distractor (15 % of the trials).
During this task, the subjects had to count the number of target stimuli they had heard throughout the whole ex-
periment. The auditory stimuli were presented for a period of 200 ms and were separated by a random Inter-Trial
Interval (ITI) drawn from a uniform distribution of (500 — 1000) ms preventing subjects habituation and anticipation.
Figure.1draws an example of an auditory stimuli sequence.
3
200
trials
S1S2
T1S3S4N1N2
stimulus
duration (200 ms)
ITI (500, 1000) ms
duration
( 3 minutes)
S = Standard T = Target N = Novel
Figure 1. Example of a sequence of auditory stimuli.
2.2. Data analysis and pre-processing
Throughout the experiment, the EEG signal was continuously recorded at a sampling rate of fs=500 Hz by the mean
of 64 electrodes (channels). Very ventral temporal sites were removed by (Cavanagh et al.,2018) as they tend to be
unreliable, leaving at the end 60 channels. The data were then re-referenced to an average reference.
As mentioned in the introduction part, EEG signals are known to be very noisy and present many practical dicul-
ties. Indeed, the coveted brain activity is of a low amplitude and is often drowned out by ambient noise, making the
pre-processing stage mandatory. Despite the intrinsic complexity of EEGs and their noise content, the pre-processing
steps we have applied are very mild due to the fact that our method is robust to noise. Firstly, to separate and disen-
tangle the unwanted, high-amplitude ocular activity from the coveted cerebral activity, we conducted an Independent
Component Analysis (ICA) on the data (Tharwat,2018). We analyzed each independent component (IC) of each
subject individually, the ICs that contained eye blinking were removed by projection following the guidelines and rec-
ommendation of (Luca) and (Cavanagh et al.,2018). Secondly, the data were then bandpass filtered using a Hamming
window, attenuating the frequencies outside the (1 30) Hz interval. This frequency interval was selected because
many studies take 20 or 30 Hz as the upper filtering limit (Starkstein et al.,1989). We have taken the widest interval
knowing that our method remains valid even if we widen this interval further.
Time windows (segments) starting from stimulus onset (0 ms) up to (+500 ms) post-stimulus were formed, resulting
in 200 time-locked segments, one for each stimulus (see Fig.2for a more detailed graphical representation). An
event-related potential (ERP) was also calculated separately for each stimulus type by vertically averaging all the
signal segments corresponding to the same stimulus type and channel (see Fig.3) (Luck,2005). The aim of this step
is to filter the signal and sum up the events occurring at the same time to make them stand out from the ambient noise.
Moreover, all the pre-processing steps were performed using MNE (An open-source Python package for exploring,
visualizing, and analyzing human neurophysiological data) version 0.24.1 (Gramfort et al.,2013).
3. Methodology
3.1. Idea and inspiration
To process, encode, retrieve and transmit information, biological neuronal networks oscillate (Ward,2003;Buzsaki
and Draguhn,2004). The frequencies and timings (phase) of these oscillations are important as they are at the basis
of the mechanism underlying cognitive processes (Bas¸ar et al.,2001;Fries,2005). As suggested by many studies,
oscillation frequencies are task dependent (Ward,2003). The oscillation timing is of a great importance since it carries
the information about the neuronal dynamics and it is also what makes neuronal synchronization possible. This latter
plays a crucial role in cognitive processes (Ward,2003;Buzsaki and Draguhn,2004). It should be noted that the EEG
mainly measures the electrical potential of a group of neurons oscillating in synchrony. Due to volume conduction,
the recorded EEG is the result from the combined activity of dierent electrical sources distributed in several regions
of the brain. The mechanism of synchronization and desynchronization of a group of neurons suggest that the rhythms
contributing to the EEG occur in a pulsatory manner (Olejniczak,2006).
4
摘要:

SparseDynamicalFeaturesgeneration,applicationtoParkinson'sDiseasediagnosisHoussemMeghnoudja,,BogdanRobua,MazenAlamiraaUniv.GrenobleAlpes,CNRS,GrenobleINP,GIPSA-lab,38000Grenoble,FranceAbstractInthisstudywefocusonthediagnosisofParkinson'sDisease(PD)basedonelectroencephalogram(EEG)signals.Weproposean...

展开>> 收起<<
Sparse Dynamical Features generation application to Parkinsons Disease diagnosis Houssem Meghnoudja Bogdan Robua Mazen Alamira.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.97MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注