On The Effects Of Data Normalisation For Domain Adaptation On EEG Data Andrea Apicella12 Francesco Isgr o12 Andrea Pollastro12 and Roberto

2025-05-02 0 0 1.23MB 21 页 10玖币

侵权投诉

On The Eﬀects Of Data Normalisation For

Domain Adaptation On EEG Data

Andrea Apicella1,2, Francesco Isgr`o1,2, Andrea Pollastro1,2, and Roberto

Prevete1,2

1Department of Electrical Engineering and Information Technology, University of

Naples Federico II, Naples, Italy

2Laboratory of Augmented Reality for Health Monitoring (ARHeMLab)

Abstract. In the Machine Learning (ML) literature, a well-known prob-

lem is the Dataset Shift problem where, diﬀerently from the ML standard

hypothesis, the data in the training and test sets can follow diﬀerent

probability distributions, leading ML systems toward poor generalisa-

tion performances. This problem is intensely felt in the Brain-Computer

Interface (BCI) context, where bio-signals as Electroencephalographic

(EEG) are often used. In fact, EEG signals are highly non-stationary

both over time and between diﬀerent subjects. To overcome this prob-

lem, several proposed solutions are based on recent transfer learning

approaches such as Domain Adaption (DA). In several cases, however,

the actual causes of the improvements remain ambiguous. This paper fo-

cuses on the impact of data normalisation, or standardisation strategies

applied together with DA methods. In particular, using SEED,DEAP,

and BCI Competition IV 2a EEG datasets, we experimentally evaluated

the impact of diﬀerent normalization strategies applied with and without

several well-known DA methods, comparing the obtained performances.

It results that the choice of the normalisation strategy plays a key role on

the classiﬁer performances in DA scenarios, and interestingly, in several

cases, the use of only an appropriate normalisation schema outperforms

the DA technique.

Keywords: BCI ·EEG ·domain shift ·normalization ·scaling ·pre-

processing

1 Introduction

In recent years, Brain-Computer Interfaces (BCIs) have been emerging as tech-

nology allowing the human brain to communicate with external devices without

the use of peripheral nerves and muscles, enhancing the interaction capability

of the user with the environment. BCI applications go from severely disabled

persons for rehabilitation purposes to healthy subjects for devising new types

This paper has been published in its ﬁnal version on Engineering Applications of Ar-

tiﬁcial Intelligence journal with DOI https://doi.org/10.1016/j.engappai.2023.

106205

arXiv:2210.01081v3 [cs.LG] 10 Jul 2023

2 Andrea Apicella, Francesco Isgr`o, Andrea Pollastro, and Roberto Prevete

of applications [1]. In particular, BCI has a growing interest in the scientiﬁc

community thanks to its implication in several medical ﬁelds, such as assisting

[2], monitoring [3], enhancing [4], or diagnosing patients’ emotional or physical

states [5,6]. Current literature reports that patients subjected to BCI-based Re-

habilitation methods show beneﬁt and improvement in their injured capacities

[7]. Currently, several methods exist to allow the interaction between humans

and machines. In particular, several proposals for BCI methods based on Elec-

troencephalographic (EEG) signals are made. This is because measuring and

monitoring the brain’s electrical activity can provide important information re-

lated to the brain’s physiological, functional, and pathological status. EEG sig-

nals are particularly suitable for this aim thanks to their essential qualities, such

as non-invasiveness and high temporal resolution.

Modern Machine Learning (ML) methods such as Deep Neural Networks

(DNNs) are mainly used to process acquired EEG signals for several tasks, such

as emotion classiﬁcation, engagement and attention detection. In general, a su-

pervised ML model learns from human classiﬁed data to generalise to new un-

known data. The standard pipeline to develop an ML system consists in i) data

acquisition, ii) data preprocessing, iii) feature extraction, iv) model learning v)

model validation. However, the performance obtained using classical ML meth-

ods in EEG-related tasks is often poor [8]. This is mainly because the EEG signal

is highly non-stationary [9], substantial diﬀerences across the EEG acquired at

diﬀerent times or from diﬀerent subjects exist, even with the same aﬀect felt.

More in detail, the starting hypothesis of the traditional ML methods states

that all the used data, whether used in the training process or not, come from

the same probability distribution. This assumption results are not always veri-

ﬁed in the case of EEG signals. In the ML literature, this is an instance of the

Dataset Shift problem [10]. In a nutshell, a Dataset Shift arises when the start-

ing ML assumption is not valid, so the distribution of the training data diﬀers

from the data distribution used outside of the training stage. In other words, a

model trained on a set of EEG data acquired from a given subject at a speciﬁc

time (or during a speciﬁc session) should not work as expected in classifying

EEG signals acquired from a diﬀerent subject at diﬀerent times. In other words,

the model has poor generalisation performance. A ﬁrst attempt to mitigate this

problem is training speciﬁc models for each subject (Subject-Dependent models)

to reduce the performance gap due to using the same ML system on diﬀerent

users. However, non-stationary signal problems related to the diﬀerent user’s

physical and psychological conditions at diﬀerent times remain. Furthermore, a

Subject-Dependent model is valid only for the subject providing training data

acquisition, making these models expensive and not very versatile and uncom-

fortable to the user, who will be tied to initial acquisition sessions before it can

actually use the system for real classiﬁcations.

For these reasons, newer studies [11,12] tried to overcome these limits given

by Dataset Shift, taking into account the diﬀerence between the data distribu-

tion probabilities (domains) acquired in diﬀerent times and for diﬀerent sub-

jects. Several proposed solutions are based on Transfer Learning (TL) [13], a set

Title Suppressed Due to Excessive Length 3

of approaches aiming to transfer the knowledge learned from a system to im-

prove another. TL approaches can be categorised into several subfamilies. One

of the most famous is the Domain Adaptation (DA)[12] approaches family. DA

approaches start from the hypothesis that unlabeled data from the target do-

main are also available during the training stage. For example, in the case of

EEG-based emotion recognition, class-labelled data can be acquired in an initial

session and classiﬁed using a standardised labelling protocol (e.g., questionnaires

administered during the task). In contrast, class-unlabeled data can be acquired

in a later session. DA provides several methods exploiting both labeled and unla-

beled data to build an ML model able to minimise the discrepancy between the

two data distributions, leading to better classiﬁcation performances on unlabeled

data. Thus, performance improvements are often reported using DA methods in

several EEG-based classiﬁcation studies. However, from a methodological point

of view, it is essential to note that the pipeline to develop and evaluate an ML

model consists of several steps which can inﬂuence each other [14]. Consequently,

in several cases [15] the causes of the improvements can remain ambiguous. This

paper focuses on the impact of data normalisation, or standardisation strategies

applied together with DA methods.

However, DA methods assume that all the class-labelled data used during the

training comes from the same source probability distribution (source domain),

i.e. all the labelled data belong to the same unique domain. This assumption is

often neglected in several EEG-based works [16,17], considering all the labeled

data together during the training stage. Indeed, in several cross-subject/cross-

session studies adopting DA strategies, it is not hard to see attempts to gen-

eralise toward an unseen domain (a subject or a session) using learning/source

data acquisitions from several other and diﬀerent sessions/subjects without con-

sidering their diﬀerent probability distributions, so treated as belonging to the

same domain. Despite this, performance improvements are often reported us-

ing DA methods in several EEG-based classiﬁcation studies. We hypothesise

that this improvement may not be caused by the DA method but by some data

normalisation or standardisation strategies applied a priori.

More in detail, in ML applications, normalisation functions[18] are often

applied to pre-process the input features before to be fed to the ML system.

Normalisation functions are often adopted to scale or transform the features

such that each feature has a uniform contribution to the ML pipeline. In [18]

is shown that using some normalisation function can impact or not on the ﬁnal

classiﬁcation performance, depending on the diﬀerent features and properties

that data may have. However, several tasks involving EEG and ML methods

applying well-known normalisation functions (such as Z-score normalisation[18])

on the input features have been proposed over the years (for example, [19]). In

many of these studies, the normalisation function is often a de-facto standard in

an EEG ML pipeline. In particular, one of the most used normalisation strategies

is the Z-score normalisation, consisting of a translation and a scaling of the data

with respect to its mean and variance. For instance, in [20,21,22,23,24] is shown

that using a normalisation function can aﬀect the cross-subject performances.

4 Andrea Apicella, Francesco Isgr`o, Andrea Pollastro, and Roberto Prevete

In particular, the translation with respect to the mean can already be seen as a

simple form of domain adaptation.

This study aims to investigate if and how some normalisation strategies aﬀect

the performance of some of the DA methods applied to EEG signal classiﬁcation.

The main contribution of this research work is that in several EEG classiﬁca-

tion problems, the higher impact in reducing the domain shift seems to be due

mainly to the data normalisation stage rather than the application of several

DA methods commonly used in the literature.

The paper is organised as follows: in Section 2 some of the most known

DA methods are reported, in Section 4 the DA framework is described, and

our hypothesis is expressed, in Section 5 the experimental assessment, and the

obtained results are reported, in Section 7 the obtained results are discussed.

Finally, Section 8 is left to the ﬁnal remarks.

2 Related works

As in this work, we want to investigate the impact of input normalization strate-

gies on DA methods. We ﬁrst discuss DA approaches. Then, we present the main

standard data normalization techniques in this context. Finally, we highlight dif-

ferences and similarities with related research studies.

More recently, Transfer Learning (TL) methods are receiving strong atten-

tion from the scientiﬁc community. TL methods are based on the concept of

Domain. Following the survey of Pan et al. [13], a Domain can be deﬁned as

a set D={F, P (X)}where Fis a feature space and P(X) is the marginal

probability distribution of a speciﬁc dataset X={x1, x2, . . . , xn} ∈ F. Do-

main Adaptation methods start from the hypothesis that data sampled from

two diﬀerent Domains are available, called Source Domain and Target Domain,

respectively. The main diﬀerence between Source and Target is that, while both

data and labels SSource ={(xi, yi)}n

i=1 can be sampled from the Source domain,

only feature data points XT arget ={xj}m

j=1 ∈FT arget sampled from the Target

Domain are available during the training stage, without any knowledge (unsu-

pervised DA) or minimal knowledge (semi-supervised DA) of their real labels.

DA methods are getting a great deal of attention in the scientiﬁc community in

diﬀerent contexts, such as image classiﬁcation, voice recognition, etc., and sev-

eral proposals have been made over the years. One trend of the literature is to

adapt DA methods originally proposed in a context (e.g., image classiﬁcation)

to another one (e.g., EEG emotion recognition). For example, in [25] methods

to adapt DA strategies from the image classiﬁcation context to EEG emotion

classiﬁcation are proposed. However, each context has its characteristics and pe-

culiarities, making it not trivial to adapt a DA method from one task to another.

The scientiﬁc community attempted to adapt well-established DA methods to

tasks involving EEG signal processing in the emotion recognition ﬁeld.

In [15], DA methods are divided into two main categories: i) shallow DA

methods, where a representation function projecting the source and the target

Title Suppressed Due to Excessive Length 5

data is given a-priori, and deep DA methods, where the data representation is

learned as part of the DA strategy.

For instance, one of the most known shallow DA methods is Transfer Com-

ponent Analysis (TCA, [26]). TCA searches for a data transformation based

on the Maximum Mean Discrepancy (MMD,[27]). MMD was proposed to test

the similarity between two probability distributions. An empirical estimation of

MMD is given by

MMD(XS, XT) = || 1

|XS|

i=1

ϕ(x(i)

S)−1

|XT|

i=1

ϕ(x(i)

T)||2

where XS={x(i)

S}M

i=1 and XT={x(i)

T}N

i=1 are data sampled from the source and

the target domain respectively, while ϕ(·) is an appropriate feature mapping.

Starting from the hypothesis that the data are sampled from two diﬀerent

domains, TCA searches for a transformation of the data such that the data vari-

ance is maximally preserved reducing, at the same time, the MMD discrepancy

between the domains distributions.

An evaluation of the TCA on EEG data for emotion recognition was made in

[16]. While it is not speciﬁcally proposed for Domain Adaptation, Kernel-PCA

(KPCA,[28]) can be viewed as another shallow-DA strategy. In a nutshell, KPCA

uses the kernel trick to project the data into proper kernel space and then apply

the PCA to the projected data.

On another side, many modern deep DA strategies rely on Domain Adver-

sarial Learning approaches, proposed in [15,29,30]. In a nutshell, these proposals

learn a DNN feature representation considering both the desired task and the

discrepancy between the Source and the Target domain. The goal is to make the

data distributions indistinguishable for an ad-hoc domain discriminator. The ﬁ-

nal model is a deep neural network model (Domain Adversarial Neural Network,

DANN) predicting, for each input, both the corresponding class and the be-

longing domain. Therefore, learning a feature mapping that maximises the class

prediction performances and the domain classiﬁcation loss to make the feature

distributions as similar as possible is made. Adversarial Discriminative Domain

Adaptation (ADDA) is another Domain adversarial learning strategy proposed

in [31]. Diﬀerently from DANN, ADDA learns two autoencoders ESand ET,

to represent the Source and the Target domains, respectively. Furthermore, ES

is trained together with a classiﬁer C, exploiting the available Source domain

labelled data. Then, through an adversarial learning procedure, ETis trained

to map the Target domain data to the space of the ESoutputs. Finally, target

data in EScan be classiﬁed by C.

Domain adversarial learning methods are widely used in several studies for

EEG data recognition, for example, in [31,32,33].

All the methods mentioned above only consider two domains: the Source and

the Target one.

However, simple methods used to reduce gaps between diﬀerent data relied

on data normalisation schemes, such as min-max or z-score normalisation, where

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnTheEffectsOfDataNormalisationForDomainAdaptationOnEEGDataAndreaApicella1,2,FrancescoIsgr`o1,2,AndreaPollastro1,2,andRobertoPrevete1,21DepartmentofElectricalEngineeringandInformationTechnology,UniversityofNaplesFedericoII,Naples,Italy2LaboratoryofAugmentedRealityforHealthMonitoring(ARHeMLab)Abstrac...

展开>> 收起<<

On The Effects Of Data Normalisation For Domain Adaptation On EEG Data Andrea Apicella12 Francesco Isgr o12 Andrea Pollastro12 and Roberto.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On The Effects Of Data Normalisation For Domain Adaptation On EEG Data Andrea Apicella12 Francesco Isgr o12 Andrea Pollastro12 and Roberto

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: