MAtt A Manifold Attention Network for EEG Decoding Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei

2025-05-02 0 0 7.99MB 22 页 10玖币

侵权投诉

MAtt: A Manifold Attention Network for

EEG Decoding

Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei

National Yang Ming Chiao Tung University, Hsunchu, Taiwan

wei@nycu.edu.tw

Abstract

Recognition of electroencephalographic (EEG) signals highly affect the efﬁciency

of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-

learning (DL)-based EEG decoders offer improved performances, the development

of geometric learning (GL) has attracted much attention for offering exceptional

robustness in decoding noisy EEG data. However, there is a lack of studies on the

merged use of deep neural networks (DNNs) and geometric learning for EEG de-

coding. We herein propose a manifold attention network (mAtt), a novel geometric

deep learning (GDL)-based model, featuring a manifold attention mechanism that

characterizes spatiotemporal representations of EEG data fully on a Riemannian

symmetric positive deﬁnite (SPD) manifold. The evaluation of the proposed MAtt

on both time-synchronous and -asyncronous EEG datasets suggests its superiority

over other leading DL methods for general EEG decoding. Furthermore, analysis

of model interpretation reveals the capability of MAtt in capturing informative

EEG features and handling the non-stationarity of brain dynamics.

1 Introduction and related works

A brain-computer interface (BCI) is a type of human-machine interaction that bridges a pathway from

brain to external devices. Electroencephalogram (EEG), a non-invasive neuromonitoring modality

with high portability and affordability, has been widely used to explore practical applications of BCI

in the real world [

]. For instance, disabled users can type messages through an EEG-based BCI

that recognizes the steady-state visual evoked potential (SSVEP) induced by ﬂickering visual targets

presented on a screen [

]. Stroke patients who need restoration of motor function undergo

motor-imagery (MI) BCI-controlled rehabilitation as an active training [

]. Most EEG-based BCI

systems are designed to detect/recognize reproducible time-asynchronous or time-synchronous EEG

patterns of interest, depending on the schemes of BCI [

]. For example, the MI EEG pattern is an

endogenous oscillatory perturbation sourced from the motor cortex without an explicit onset time

[

]. On the other hand, a time-synchronous EEG pattern is time-locked to a speciﬁc event. For

example, the pattern of SSVEP is synchronized to the change of brightness on a ﬂickering visual

target. The efﬁciency of BCI systems largely relies on the accuracy and robustness of the EEG

decoder. However, due to the low signal-to-noise ratio (SNR) [

] and non-stationarity [

] of EEG,

translating perplexing EEG signals into meaningful information has been a grand challenge in the

ﬁeld.

Recent advances in deep learning (DL) have contributed to the rapid development of DL-based EEG

decoding techniques [

]. DL models are capable of extracting features automatically according

to given training data. Convolutional neural network (CNN) is one type of the most common DL

models and has achieved remarkable performance in tasks such as image recognition and object

detection [

]. CNN models newly designed for EEG decoding use convolutional kernels that

analogously function as conventional spatial and temporal ﬁlters but with extra ﬂexibility to optimize

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.01986v1 [cs.LG] 5 Oct 2022

the transformation of EEG data automatically through model training [

]. In addition to

the fast growth of DL-based EEG decoders, geometric learning (GL) approaches, mostly based on

Riemannian geometry (RG), have been adopted in the ﬁeld of BCI [

]. RG is a type of non-Euclidean

geometry that has a different interpretation of Euclid’s ﬁfth postulate (i.e. parallel postulate) [

In GL, geodesic between points on the manifold is a critical feature for classiﬁcation tasks in BCI.

The power and spatial distribution of a segment of multi-channel EEG signals can be coded into

a covariance matrix that is symmetric positive deﬁnite (SPD) in general. The use of Riemannian

geometry allows mapping of EEG data directly onto a Riemannian manifold where Riemannian

metrics are insensitive to outliers and noise [

]. RG can also avoid swelling effect [

], which is

a common issue when employing Euclidean metric. Furthermore, metrics on Riemannian manifold

have several types of invariance properties [

], which make the model have higher generalization

capability to complex EEG signals. In 2010, Barachant et al. [

] proposed Minimum Distance

to Mean (MDM) that maps target EEG data onto the SPD manifold to ﬁnd the nearest class center.

Later on, they developed TSLDA [

] that projects data from the manifold to a speciﬁc tangent space

where Euclidean classiﬁers are applicable. RG-based classiﬁcation for EEG decoding has shown

extra robustness as the relationship between data samples can be stably preserved, leading to success

in recent data competitions in the BCI ﬁeld such as ’DecMEG2014’1and the ’BCI challenge’2.

The nascent ﬁeld of geometric deep learning (GDL) [

] has expanded by emerging techniques

to generalize the use of deep neural networks to non-Euclidean structures, such as graphs and

manifolds. Efforts have been made to transitioning useful operations from Euclidean to Riemannian

spaces, including convolution [

], activation function [

], batch normalization [

that facilitate the ongoing development of GDL tools. SPDNet [

] is a Riemannian network

for non-linear SPD-based learning on Riemannian manifolds using bi-linear mapping that mimics

Euclidean convolution for visual classiﬁcation tasks. ManifoldNet [

] offers high performance in

medical image classiﬁcation with manifold autoencoder. [

] characterizes 3D movement via the

manifold polar coordinate with a geodesic CNN. [

] performs convolution on the manifold as a

generalization of local graph or manifold pseudo-coordinate for vertex classiﬁcation on graph and

shape correspondence task. In contrast of the vast develop of GDL in many other scientiﬁc ﬁelds,

only few studies focus on decoding EEG data with a merge use of GL and DL. [

] proposed a

network architecture that integrates fusion of Euclidean-based module and manifold-based module

with multiple LSTM and attention structures to extract spatiotemporal information of EEG. [

]

proposes a Riemannian-embedding-banks method that separates the entire embeddings into multiple

sub-problems for learning spatial patterns of MI EEG signals based on the features extracted from

the SPDNet. [

] combines federated learning and transfer learning on Riemannian manifold using

the spatial information of EEG. [

] proposes deep optimal transport on the manifold to minimize

the cost of domain adaptation from the source domain to the target domain. [

] extracts multi-view

representations of EEG. These studies have established cornerstones toward the ﬁeld of future GDL

for EEG decoding, but the increment of performance is yet marginal. Most of the above-mentioned

techniques can not map the temporal information of EEG onto the manifold, or still rely on Euclidean

tools to handle EEG features. We herein propose a manifold attention network, a novel GDL

framework, which maps EEG features on a Riemannian SPD manifold where the spatiotemporal

EEG patterns are fully characterized. The main contributions of the present study are the following:

• a manifold attention network proposed for decoding general types of EEG data.

•

a lightweight, interpretable, and efﬁcient GDL framework that is capable of capturing

spatiotemporal EEG features across Euclidean and Riemannian spaces.

•

an empirical validation of our proposed model demonstrating its generalizable superiority

over leading DL approaches in EEG decoding.

•

neuroscientiﬁc insights interpreted by the model that not only echo prior knowledge but also

offer a new look into the dynamical brain.

This article is organized as follows: we ﬁrst brief the essential background of RG and manifold

attention mechanism; next, we leverage the proposed MAtt architecture with details of model design;

we then validate our proposed model experimentally; lastly, we interpret our proposed model with

neuroscientiﬁc insights. Our source code is released in https://github.com/CECNL/MAtt.

1DecMEG2014: https://www.kaggle.com/competitions/decoding-the-human-brain/leaderboard

2BCI challenge: https://www.kaggle.com/c/inria-bci-challenge

2 Preliminary

A manifold is considered as an expansion of curve and surface in Euclidean space. It is a topological

space that can locally regarded as an open set in Hilbert space. Suppose a manifold is endowed with

a differential structure (i.e. a collection of charts satisfying transition mapping, which is deﬁned on

the overlap of charts), it is then the so-called differential manifold [

]. Riemannian geometry is a

differential manifold equipped with Riemannian metric. We consider the symmetric positive deﬁnite

(SPD) manifold, which allows us to manipulate manifold-valued data on the manifold directly. The

spatial information of EEG signal can be represented as a speciﬁc covariance matrix, which records

the relationship between channels, and is a critical representation for us to understand EEG signals.

However, the solution of the Riemannian mean doesn’t have a close form once the manifold equipped

with afﬁne invariant metric (AIM), thus we need to calculate the approximate mean in an iteration

manner [

] until convergence conditions are satisﬁed. However, Riemannian mean may cause a

heavy computational load in deep learning because of its high complexity. Therefore, we seek an

approximation based on Log-Euclidean metric [24] as described below.

2.1 Notations

GL(n, R) := {A∈Rn×n|determinant(A)6= 0}

is a general linear group, which is the set of all

real non-singular sqaure matrices.

(M, g)

denotes connected Riemannian manifold.

Sym(n) :=

{S∈Mn×n(R)|ST=S}

is the space of all

n×n

real symmetric matrices, where

Mn×n(R)

speciﬁes the space of all real square matrices,

(.)T

is the transpose operator, and

Sym+(n) :=

{P∈Mn×n(R)|P=PT, vTP v > 0,∀v∈Rn− {0}}

is the set of all

n×n

symmetric positive

deﬁnite(SPD) matrices.

< A, B >F

means the Frobenius inner product, deﬁned as

T r(ATB)

, where

T r(.)

is the trace operator.

Log(.)

and

Exp(.)

are the principle logrithm operator for SPD matrix

[

] and exponential operator for symmetric matrix respectively. Both of them can be computed

using the orthogonal diagonalization.

Exp :Sym(n)7→ Sym+(n)

, an operator maps a symmetric

matrix S∈Sym(n)to Sym+(n)by:

Exp(S) = V diag(exp(σ1), ..., exp(σn))VT

where Vis the matrix of eigenvectors of S.

The inverse projection of

Exp

operation is

Log

operator:

Log :Sym+(n)7→ Sym(n)

is an operator

that maps a SPD matrix P∈Sym+(n)to Sym(n)by:

Log(P) = U diag(log(σ1), ..., log(σn))UT(1)

where Uis the matrix of eigenvectors of P, since P∈Sym+(n),σi>0, i = 1, ..., n

2.2 Log-Euclidean metric

Log-Euclidean metric (LEM) offers an elegant, analogous, and efﬁcient generalization to calculate

the center on the SPD manifold than the afﬁne-invariant metric (AIM) [

]. LEM is a bi-invariant

metric on the Lie group on the SPD manifold [

]. The geodesic distance from

on the

Sym+(n)is also given by [24]:

δL(P1, P2) = kLog(P1)−Log(P2)kF(2)

Furthermore, we can also deﬁne the Log-Euclidean mean(G) via the Log-Euclidean distance:

G(P1, ...Pk) = arg min

P∈Sym+(n)

l=1

δ2

L(P, Pl)

where

P1, ..., Pk∈Sym+(n)

. Fortunately, the solution to the formula above has a closed form to

follow, given by [42]:

G=Exp 1

l=1

Log(Pl)!

We utilizes the weighted Log-Euclidean mean that is endowed with different weights in different

in our work. We denote the weight of each

, where

∀l∈ {1,2, ..., k}

. Here,

{wl}k

l=1

(a) (b)

Figure 1: (a) The overview of the proposed model architecture. (b) E2R operation: split latent feature

into several epochs, and convert each one to a speciﬁc SPD matrix.

satisﬁes the convexity constraint deﬁnition (i.e.

l=1

wl= 1

, and

wl>0

). The deﬁnition and the

corresponding weighted Log-Euclidean mean can be deﬁned and derived as:

G(P1, ...Pk) = arg min

P∈Sym+(n)

l=1

wlδ2

L(P, Pl)

and

G=Exp k

l=1

wlLog(Pl)!

respectively.

3 Methodology

As shown in Figure 1(a), the architecture of MAtt includes components of the feature extraction

(FE), the manifold attention module, transitioning from Euclidean to Riemannian space (E2R), and

transitioning from Riemannian to Euclidean space (R2E).

3.1 Feature extraction of EEG signals

We adopt two convolutional layers to extract information of raw EEG signals, where the ﬁrst

convolutional layer performs spatial ﬁltering to the multi-channel EEG signals and the second

convolutional layer extracts spatiotemporal features. Our parameter setting follows [19].

3.2 From Euclidean space to SPD manifold (E2R operation)

(a) (b)

Figure 2: (a) The architecture of the proposed manifold attention module.

qi, ki, vi

refer to the query,

key, and value of the

ith

input matrix

˜xi

respectively;

stands for the

ith

output of the proposed

module. (b) Illustration of the operation of Log-Euclidean mean used in proposed module as

i= 1

and number of epoch is 3;

and

refer to

ith

query and

jth

key respectively;

denotes the

distance between

and

on the SPD manifold

;

refers to the tangent space based on identity

matrix I.

As illustrated in Figure1(b), we convert the embeddings from the feature extraction stage to the SPD

data and map the feature embeddings from Euclidean space to the SPD manifold. Suppose

denotes

the embeddings after the feature extraction stage, we divide the whole embeddings into several epochs

f1,˜

f2, ..., ˜

, and calculate the sample covariance matrix(SCM) of each

fi,∀i∈ {1,2, ..., m}

. By

doing so, we get a sequence of covariance matrices that present the temporal information of the

embeddings

in the form of SPD data, called

SCM ˜

f1, SCM ˜

f2, ..., SCM ˜

. After we get some

datapoints, we do trace-normalization and add a small number



on each main diagonal element for

each

SCM ˜

(i.e.

SCM ˜

trSCM ˜

fi+I

) where

i∈ {1,2, ..., m}

is the identity matrix, and we set



1e-5 in our source code. The resulting SPD sequence is denoted as

X= [ ˜x1,˜x2, ..., ˜xm]

. We add a

small identity matrix on them to promise ˜xito be a well-deﬁned SPD matrix.

3.3 Manifold attention module

Forward procedure:

The input of this layer is a sequence of SPD data. The overview of the

manifold attention module is illustrated in Figure 2(a). Motivated by [

] and [

], we capture the

spatiotemporal information on the manifold. Suppose the module takes a sequence of SPD matrices

[ ˜x1,˜x2, ..., ˜xm]

, denoted as

. Herein we have the query, key, and value in the form of SPD matrices

on the manifold [

]. We convert the

˜xi

to the

qi, ki,

and

via bilinear mapping [

] and exploit

non-linear and valid features from each segment. Suppose the shape of

˜xi

dc×dc,

and

hq, hk

, and

hvis the mapping from ˜xito qi, ki,and virespectively. We have:

qi=hq( ˜xi;Wq) = Wq˜xiWT

q;ki=hk( ˜xi;Wk) = Wk˜xiWT

k;vi=hv( ˜xi;Wv) = Wv˜xiWT

where

˜xi∈Sym+(dc)

Wq, Wk

, and

Wv∈Rdu×dc(du< dc)

denotes transformation matrices.

Moreover, to make sure the output

qi, ki

, and

are also SPD matrices, transition matrices

Wq, Wk,

and Wvare constrained as row-full rank matrices.

After we got

qi, ki

, and

by bilinear mapping, we deﬁne the similarity for measuring the

and

SPD matrices. In Euclidean space, there are several ways to deﬁne the similarity. A most common

way is to use dot-product [

] to measure the similarity of query and key. However, our query,

key, and value are SPD matrices instead of vectors as regular attention. We deﬁne the similarity

based on the Log-Euclidean distance (Eq. 2) between query and key. Suppose we have

and

for some

i, j ∈ {1, ..., m}

. The similarity

sim(.)

is a

strictly decreasing f unction

of distance

[0,∞)7→ [0,1]

and is deﬁned as:

sim(qi, kj) = 1

1+log(1+δL(qi,kj)) : = αij

. Then, the attention

matrix is:

A= [αij ]m×m

We then use

Softmax

function to shrink the range along the row direction, making values in row

have convexity constraint property. The ﬁnal attention probability matrix A0is:

A0=Softmax(A) = Sof tmax([αij ]m×m)=[α0

ij ]m×m

where

α0

ij =exp(αij )

k=1 exp(αik ),∀i, j ∈1,· · · , m

. Finally, we combine the attention probability matrix

and

v1, v2, ..., vm

to get the ﬁnal output

1, v0

2, ..., v0

and deﬁne the output

(

∀i= 1,2, ..., m

) via

Log-Euclidean mean as:

i=Exp m

l=1

α0

ilLog(vl)!

The forward procedure of proposed manifold attention module is illustrated in Algorithm 1.

Backward procedure:

In order to perform gradient descent parameter updating on the Riemannian

manifold, we employed the Riemannian gradient descent method to update the parameters. The

trainable parameters in this module are

Wq, Wk,

and

. We require the weight updated on Stiefel

manifold [

], denoted as

St(p, n) = {X∈Rn×p|XTX=Ip}

. Since our manifold attention

module has a different mathematical architecture to those in Euclidean space, we herein extend

Euclidean gradients onto a Riemannian space. To be precise, we expect our gradients on the Stiefel

manifold to generate valid orthogonal weights. The Euclidean gradients of the

, and

within the attention module can be derived by the chain rule. Suppose the

is the loss, the query,

key, and value generated in the manifold attention module are

qi, ki,

and

vi∀i= 1 · · · m

respectively,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MAtt:AManifoldAttentionNetworkforEEGDecodingYue-TingPanJing-LunChouChun-ShuWeiNationalYangMingChiaoTungUniversity,Hsunchu,Taiwanwei@nycu.edu.twAbstractRecognitionofelectroencephalographic(EEG)signalshighlyaffecttheefciencyofnon-invasivebrain-computerinterfaces(BCIs).Whilerecentadvancesofdeep-learni...

展开>> 收起<<

MAtt A Manifold Attention Network for EEG Decoding Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MAtt A Manifold Attention Network for EEG Decoding Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: