On The Effects Of Data Normalisation For Domain Adaptation On EEG Data Andrea Apicella12 Francesco Isgr o12 Andrea Pollastro12 and Roberto

2025-05-02 0 0 1.23MB 21 页 10玖币
侵权投诉
On The Effects Of Data Normalisation For
Domain Adaptation On EEG Data
Andrea Apicella1,2, Francesco Isgr`o1,2, Andrea Pollastro1,2, and Roberto
Prevete1,2
1Department of Electrical Engineering and Information Technology, University of
Naples Federico II, Naples, Italy
2Laboratory of Augmented Reality for Health Monitoring (ARHeMLab)
Abstract. In the Machine Learning (ML) literature, a well-known prob-
lem is the Dataset Shift problem where, differently from the ML standard
hypothesis, the data in the training and test sets can follow different
probability distributions, leading ML systems toward poor generalisa-
tion performances. This problem is intensely felt in the Brain-Computer
Interface (BCI) context, where bio-signals as Electroencephalographic
(EEG) are often used. In fact, EEG signals are highly non-stationary
both over time and between different subjects. To overcome this prob-
lem, several proposed solutions are based on recent transfer learning
approaches such as Domain Adaption (DA). In several cases, however,
the actual causes of the improvements remain ambiguous. This paper fo-
cuses on the impact of data normalisation, or standardisation strategies
applied together with DA methods. In particular, using SEED,DEAP,
and BCI Competition IV 2a EEG datasets, we experimentally evaluated
the impact of different normalization strategies applied with and without
several well-known DA methods, comparing the obtained performances.
It results that the choice of the normalisation strategy plays a key role on
the classifier performances in DA scenarios, and interestingly, in several
cases, the use of only an appropriate normalisation schema outperforms
the DA technique.
Keywords: BCI ·EEG ·domain shift ·normalization ·scaling ·pre-
processing
1 Introduction
In recent years, Brain-Computer Interfaces (BCIs) have been emerging as tech-
nology allowing the human brain to communicate with external devices without
the use of peripheral nerves and muscles, enhancing the interaction capability
of the user with the environment. BCI applications go from severely disabled
persons for rehabilitation purposes to healthy subjects for devising new types
This paper has been published in its final version on Engineering Applications of Ar-
tificial Intelligence journal with DOI https://doi.org/10.1016/j.engappai.2023.
106205
arXiv:2210.01081v3 [cs.LG] 10 Jul 2023
2 Andrea Apicella, Francesco Isgr`o, Andrea Pollastro, and Roberto Prevete
of applications [1]. In particular, BCI has a growing interest in the scientific
community thanks to its implication in several medical fields, such as assisting
[2], monitoring [3], enhancing [4], or diagnosing patients’ emotional or physical
states [5,6]. Current literature reports that patients subjected to BCI-based Re-
habilitation methods show benefit and improvement in their injured capacities
[7]. Currently, several methods exist to allow the interaction between humans
and machines. In particular, several proposals for BCI methods based on Elec-
troencephalographic (EEG) signals are made. This is because measuring and
monitoring the brain’s electrical activity can provide important information re-
lated to the brain’s physiological, functional, and pathological status. EEG sig-
nals are particularly suitable for this aim thanks to their essential qualities, such
as non-invasiveness and high temporal resolution.
Modern Machine Learning (ML) methods such as Deep Neural Networks
(DNNs) are mainly used to process acquired EEG signals for several tasks, such
as emotion classification, engagement and attention detection. In general, a su-
pervised ML model learns from human classified data to generalise to new un-
known data. The standard pipeline to develop an ML system consists in i) data
acquisition, ii) data preprocessing, iii) feature extraction, iv) model learning v)
model validation. However, the performance obtained using classical ML meth-
ods in EEG-related tasks is often poor [8]. This is mainly because the EEG signal
is highly non-stationary [9], substantial differences across the EEG acquired at
different times or from different subjects exist, even with the same affect felt.
More in detail, the starting hypothesis of the traditional ML methods states
that all the used data, whether used in the training process or not, come from
the same probability distribution. This assumption results are not always veri-
fied in the case of EEG signals. In the ML literature, this is an instance of the
Dataset Shift problem [10]. In a nutshell, a Dataset Shift arises when the start-
ing ML assumption is not valid, so the distribution of the training data differs
from the data distribution used outside of the training stage. In other words, a
model trained on a set of EEG data acquired from a given subject at a specific
time (or during a specific session) should not work as expected in classifying
EEG signals acquired from a different subject at different times. In other words,
the model has poor generalisation performance. A first attempt to mitigate this
problem is training specific models for each subject (Subject-Dependent models)
to reduce the performance gap due to using the same ML system on different
users. However, non-stationary signal problems related to the different user’s
physical and psychological conditions at different times remain. Furthermore, a
Subject-Dependent model is valid only for the subject providing training data
acquisition, making these models expensive and not very versatile and uncom-
fortable to the user, who will be tied to initial acquisition sessions before it can
actually use the system for real classifications.
For these reasons, newer studies [11,12] tried to overcome these limits given
by Dataset Shift, taking into account the difference between the data distribu-
tion probabilities (domains) acquired in different times and for different sub-
jects. Several proposed solutions are based on Transfer Learning (TL) [13], a set
Title Suppressed Due to Excessive Length 3
of approaches aiming to transfer the knowledge learned from a system to im-
prove another. TL approaches can be categorised into several subfamilies. One
of the most famous is the Domain Adaptation (DA)[12] approaches family. DA
approaches start from the hypothesis that unlabeled data from the target do-
main are also available during the training stage. For example, in the case of
EEG-based emotion recognition, class-labelled data can be acquired in an initial
session and classified using a standardised labelling protocol (e.g., questionnaires
administered during the task). In contrast, class-unlabeled data can be acquired
in a later session. DA provides several methods exploiting both labeled and unla-
beled data to build an ML model able to minimise the discrepancy between the
two data distributions, leading to better classification performances on unlabeled
data. Thus, performance improvements are often reported using DA methods in
several EEG-based classification studies. However, from a methodological point
of view, it is essential to note that the pipeline to develop and evaluate an ML
model consists of several steps which can influence each other [14]. Consequently,
in several cases [15] the causes of the improvements can remain ambiguous. This
paper focuses on the impact of data normalisation, or standardisation strategies
applied together with DA methods.
However, DA methods assume that all the class-labelled data used during the
training comes from the same source probability distribution (source domain),
i.e. all the labelled data belong to the same unique domain. This assumption is
often neglected in several EEG-based works [16,17], considering all the labeled
data together during the training stage. Indeed, in several cross-subject/cross-
session studies adopting DA strategies, it is not hard to see attempts to gen-
eralise toward an unseen domain (a subject or a session) using learning/source
data acquisitions from several other and different sessions/subjects without con-
sidering their different probability distributions, so treated as belonging to the
same domain. Despite this, performance improvements are often reported us-
ing DA methods in several EEG-based classification studies. We hypothesise
that this improvement may not be caused by the DA method but by some data
normalisation or standardisation strategies applied a priori.
More in detail, in ML applications, normalisation functions[18] are often
applied to pre-process the input features before to be fed to the ML system.
Normalisation functions are often adopted to scale or transform the features
such that each feature has a uniform contribution to the ML pipeline. In [18]
is shown that using some normalisation function can impact or not on the final
classification performance, depending on the different features and properties
that data may have. However, several tasks involving EEG and ML methods
applying well-known normalisation functions (such as Z-score normalisation[18])
on the input features have been proposed over the years (for example, [19]). In
many of these studies, the normalisation function is often a de-facto standard in
an EEG ML pipeline. In particular, one of the most used normalisation strategies
is the Z-score normalisation, consisting of a translation and a scaling of the data
with respect to its mean and variance. For instance, in [20,21,22,23,24] is shown
that using a normalisation function can affect the cross-subject performances.
4 Andrea Apicella, Francesco Isgr`o, Andrea Pollastro, and Roberto Prevete
In particular, the translation with respect to the mean can already be seen as a
simple form of domain adaptation.
This study aims to investigate if and how some normalisation strategies affect
the performance of some of the DA methods applied to EEG signal classification.
The main contribution of this research work is that in several EEG classifica-
tion problems, the higher impact in reducing the domain shift seems to be due
mainly to the data normalisation stage rather than the application of several
DA methods commonly used in the literature.
The paper is organised as follows: in Section 2 some of the most known
DA methods are reported, in Section 4 the DA framework is described, and
our hypothesis is expressed, in Section 5 the experimental assessment, and the
obtained results are reported, in Section 7 the obtained results are discussed.
Finally, Section 8 is left to the final remarks.
2 Related works
As in this work, we want to investigate the impact of input normalization strate-
gies on DA methods. We first discuss DA approaches. Then, we present the main
standard data normalization techniques in this context. Finally, we highlight dif-
ferences and similarities with related research studies.
More recently, Transfer Learning (TL) methods are receiving strong atten-
tion from the scientific community. TL methods are based on the concept of
Domain. Following the survey of Pan et al. [13], a Domain can be defined as
a set D={F, P (X)}where Fis a feature space and P(X) is the marginal
probability distribution of a specific dataset X={x1, x2, . . . , xn} ∈ F. Do-
main Adaptation methods start from the hypothesis that data sampled from
two different Domains are available, called Source Domain and Target Domain,
respectively. The main difference between Source and Target is that, while both
data and labels SSource ={(xi, yi)}n
i=1 can be sampled from the Source domain,
only feature data points XT arget ={xj}m
j=1 FT arget sampled from the Target
Domain are available during the training stage, without any knowledge (unsu-
pervised DA) or minimal knowledge (semi-supervised DA) of their real labels.
DA methods are getting a great deal of attention in the scientific community in
different contexts, such as image classification, voice recognition, etc., and sev-
eral proposals have been made over the years. One trend of the literature is to
adapt DA methods originally proposed in a context (e.g., image classification)
to another one (e.g., EEG emotion recognition). For example, in [25] methods
to adapt DA strategies from the image classification context to EEG emotion
classification are proposed. However, each context has its characteristics and pe-
culiarities, making it not trivial to adapt a DA method from one task to another.
The scientific community attempted to adapt well-established DA methods to
tasks involving EEG signal processing in the emotion recognition field.
In [15], DA methods are divided into two main categories: i) shallow DA
methods, where a representation function projecting the source and the target
Title Suppressed Due to Excessive Length 5
data is given a-priori, and deep DA methods, where the data representation is
learned as part of the DA strategy.
For instance, one of the most known shallow DA methods is Transfer Com-
ponent Analysis (TCA, [26]). TCA searches for a data transformation based
on the Maximum Mean Discrepancy (MMD,[27]). MMD was proposed to test
the similarity between two probability distributions. An empirical estimation of
MMD is given by
MMD(XS, XT) = || 1
|XS|
|XS|
X
i=1
ϕ(x(i)
S)1
|XT|
|XT|
X
i=1
ϕ(x(i)
T)||2
H
where XS={x(i)
S}M
i=1 and XT={x(i)
T}N
i=1 are data sampled from the source and
the target domain respectively, while ϕ(·) is an appropriate feature mapping.
Starting from the hypothesis that the data are sampled from two different
domains, TCA searches for a transformation of the data such that the data vari-
ance is maximally preserved reducing, at the same time, the MMD discrepancy
between the domains distributions.
An evaluation of the TCA on EEG data for emotion recognition was made in
[16]. While it is not specifically proposed for Domain Adaptation, Kernel-PCA
(KPCA,[28]) can be viewed as another shallow-DA strategy. In a nutshell, KPCA
uses the kernel trick to project the data into proper kernel space and then apply
the PCA to the projected data.
On another side, many modern deep DA strategies rely on Domain Adver-
sarial Learning approaches, proposed in [15,29,30]. In a nutshell, these proposals
learn a DNN feature representation considering both the desired task and the
discrepancy between the Source and the Target domain. The goal is to make the
data distributions indistinguishable for an ad-hoc domain discriminator. The fi-
nal model is a deep neural network model (Domain Adversarial Neural Network,
DANN) predicting, for each input, both the corresponding class and the be-
longing domain. Therefore, learning a feature mapping that maximises the class
prediction performances and the domain classification loss to make the feature
distributions as similar as possible is made. Adversarial Discriminative Domain
Adaptation (ADDA) is another Domain adversarial learning strategy proposed
in [31]. Differently from DANN, ADDA learns two autoencoders ESand ET,
to represent the Source and the Target domains, respectively. Furthermore, ES
is trained together with a classifier C, exploiting the available Source domain
labelled data. Then, through an adversarial learning procedure, ETis trained
to map the Target domain data to the space of the ESoutputs. Finally, target
data in EScan be classified by C.
Domain adversarial learning methods are widely used in several studies for
EEG data recognition, for example, in [31,32,33].
All the methods mentioned above only consider two domains: the Source and
the Target one.
However, simple methods used to reduce gaps between different data relied
on data normalisation schemes, such as min-max or z-score normalisation, where
摘要:

OnTheEffectsOfDataNormalisationForDomainAdaptationOnEEGDataAndreaApicella1,2,FrancescoIsgr`o1,2,AndreaPollastro1,2,andRobertoPrevete1,21DepartmentofElectricalEngineeringandInformationTechnology,UniversityofNaplesFedericoII,Naples,Italy2LaboratoryofAugmentedRealityforHealthMonitoring(ARHeMLab)Abstrac...

展开>> 收起<<
On The Effects Of Data Normalisation For Domain Adaptation On EEG Data Andrea Apicella12 Francesco Isgr o12 Andrea Pollastro12 and Roberto.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:1.23MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注