MAtt A Manifold Attention Network for EEG Decoding Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei

2025-05-02 0 0 7.99MB 22 页 10玖币
侵权投诉
MAtt: A Manifold Attention Network for
EEG Decoding
Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei
National Yang Ming Chiao Tung University, Hsunchu, Taiwan
wei@nycu.edu.tw
Abstract
Recognition of electroencephalographic (EEG) signals highly affect the efficiency
of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-
learning (DL)-based EEG decoders offer improved performances, the development
of geometric learning (GL) has attracted much attention for offering exceptional
robustness in decoding noisy EEG data. However, there is a lack of studies on the
merged use of deep neural networks (DNNs) and geometric learning for EEG de-
coding. We herein propose a manifold attention network (mAtt), a novel geometric
deep learning (GDL)-based model, featuring a manifold attention mechanism that
characterizes spatiotemporal representations of EEG data fully on a Riemannian
symmetric positive definite (SPD) manifold. The evaluation of the proposed MAtt
on both time-synchronous and -asyncronous EEG datasets suggests its superiority
over other leading DL methods for general EEG decoding. Furthermore, analysis
of model interpretation reveals the capability of MAtt in capturing informative
EEG features and handling the non-stationarity of brain dynamics.
1 Introduction and related works
A brain-computer interface (BCI) is a type of human-machine interaction that bridges a pathway from
brain to external devices. Electroencephalogram (EEG), a non-invasive neuromonitoring modality
with high portability and affordability, has been widely used to explore practical applications of BCI
in the real world [
1
,
2
,
3
]. For instance, disabled users can type messages through an EEG-based BCI
that recognizes the steady-state visual evoked potential (SSVEP) induced by flickering visual targets
presented on a screen [
4
,
5
,
6
]. Stroke patients who need restoration of motor function undergo
motor-imagery (MI) BCI-controlled rehabilitation as an active training [
7
,
8
]. Most EEG-based BCI
systems are designed to detect/recognize reproducible time-asynchronous or time-synchronous EEG
patterns of interest, depending on the schemes of BCI [
9
]. For example, the MI EEG pattern is an
endogenous oscillatory perturbation sourced from the motor cortex without an explicit onset time
[
10
]. On the other hand, a time-synchronous EEG pattern is time-locked to a specific event. For
example, the pattern of SSVEP is synchronized to the change of brightness on a flickering visual
target. The efficiency of BCI systems largely relies on the accuracy and robustness of the EEG
decoder. However, due to the low signal-to-noise ratio (SNR) [
11
] and non-stationarity [
12
] of EEG,
translating perplexing EEG signals into meaningful information has been a grand challenge in the
field.
Recent advances in deep learning (DL) have contributed to the rapid development of DL-based EEG
decoding techniques [
13
]. DL models are capable of extracting features automatically according
to given training data. Convolutional neural network (CNN) is one type of the most common DL
models and has achieved remarkable performance in tasks such as image recognition and object
detection [
14
,
15
,
16
]. CNN models newly designed for EEG decoding use convolutional kernels that
analogously function as conventional spatial and temporal filters but with extra flexibility to optimize
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.01986v1 [cs.LG] 5 Oct 2022
the transformation of EEG data automatically through model training [
17
,
18
,
19
]. In addition to
the fast growth of DL-based EEG decoders, geometric learning (GL) approaches, mostly based on
Riemannian geometry (RG), have been adopted in the field of BCI [
20
]. RG is a type of non-Euclidean
geometry that has a different interpretation of Euclid’s fifth postulate (i.e. parallel postulate) [
21
].
In GL, geodesic between points on the manifold is a critical feature for classification tasks in BCI.
The power and spatial distribution of a segment of multi-channel EEG signals can be coded into
a covariance matrix that is symmetric positive definite (SPD) in general. The use of Riemannian
geometry allows mapping of EEG data directly onto a Riemannian manifold where Riemannian
metrics are insensitive to outliers and noise [
22
,
20
]. RG can also avoid swelling effect [
23
], which is
a common issue when employing Euclidean metric. Furthermore, metrics on Riemannian manifold
have several types of invariance properties [
24
,
22
], which make the model have higher generalization
capability to complex EEG signals. In 2010, Barachant et al. [
25
] proposed Minimum Distance
to Mean (MDM) that maps target EEG data onto the SPD manifold to find the nearest class center.
Later on, they developed TSLDA [
26
] that projects data from the manifold to a specific tangent space
where Euclidean classifiers are applicable. RG-based classification for EEG decoding has shown
extra robustness as the relationship between data samples can be stably preserved, leading to success
in recent data competitions in the BCI field such as ’DecMEG2014’1and the ’BCI challenge’2.
The nascent field of geometric deep learning (GDL) [
27
] has expanded by emerging techniques
to generalize the use of deep neural networks to non-Euclidean structures, such as graphs and
manifolds. Efforts have been made to transitioning useful operations from Euclidean to Riemannian
spaces, including convolution [
28
,
29
,
30
], activation function [
28
,
29
], batch normalization [
31
,
32
],
that facilitate the ongoing development of GDL tools. SPDNet [
28
] is a Riemannian network
for non-linear SPD-based learning on Riemannian manifolds using bi-linear mapping that mimics
Euclidean convolution for visual classification tasks. ManifoldNet [
29
] offers high performance in
medical image classification with manifold autoencoder. [
33
] characterizes 3D movement via the
manifold polar coordinate with a geodesic CNN. [
27
] performs convolution on the manifold as a
generalization of local graph or manifold pseudo-coordinate for vertex classification on graph and
shape correspondence task. In contrast of the vast develop of GDL in many other scientific fields,
only few studies focus on decoding EEG data with a merge use of GL and DL. [
34
] proposed a
network architecture that integrates fusion of Euclidean-based module and manifold-based module
with multiple LSTM and attention structures to extract spatiotemporal information of EEG. [
35
]
proposes a Riemannian-embedding-banks method that separates the entire embeddings into multiple
sub-problems for learning spatial patterns of MI EEG signals based on the features extracted from
the SPDNet. [
36
] combines federated learning and transfer learning on Riemannian manifold using
the spatial information of EEG. [
37
] proposes deep optimal transport on the manifold to minimize
the cost of domain adaptation from the source domain to the target domain. [
38
] extracts multi-view
representations of EEG. These studies have established cornerstones toward the field of future GDL
for EEG decoding, but the increment of performance is yet marginal. Most of the above-mentioned
techniques can not map the temporal information of EEG onto the manifold, or still rely on Euclidean
tools to handle EEG features. We herein propose a manifold attention network, a novel GDL
framework, which maps EEG features on a Riemannian SPD manifold where the spatiotemporal
EEG patterns are fully characterized. The main contributions of the present study are the following:
a manifold attention network proposed for decoding general types of EEG data.
a lightweight, interpretable, and efficient GDL framework that is capable of capturing
spatiotemporal EEG features across Euclidean and Riemannian spaces.
an empirical validation of our proposed model demonstrating its generalizable superiority
over leading DL approaches in EEG decoding.
neuroscientific insights interpreted by the model that not only echo prior knowledge but also
offer a new look into the dynamical brain.
This article is organized as follows: we first brief the essential background of RG and manifold
attention mechanism; next, we leverage the proposed MAtt architecture with details of model design;
we then validate our proposed model experimentally; lastly, we interpret our proposed model with
neuroscientific insights. Our source code is released in https://github.com/CECNL/MAtt.
1DecMEG2014: https://www.kaggle.com/competitions/decoding-the-human-brain/leaderboard
2BCI challenge: https://www.kaggle.com/c/inria-bci-challenge
2
2 Preliminary
A manifold is considered as an expansion of curve and surface in Euclidean space. It is a topological
space that can locally regarded as an open set in Hilbert space. Suppose a manifold is endowed with
a differential structure (i.e. a collection of charts satisfying transition mapping, which is defined on
the overlap of charts), it is then the so-called differential manifold [
39
]. Riemannian geometry is a
differential manifold equipped with Riemannian metric. We consider the symmetric positive definite
(SPD) manifold, which allows us to manipulate manifold-valued data on the manifold directly. The
spatial information of EEG signal can be represented as a specific covariance matrix, which records
the relationship between channels, and is a critical representation for us to understand EEG signals.
However, the solution of the Riemannian mean doesn’t have a close form once the manifold equipped
with affine invariant metric (AIM), thus we need to calculate the approximate mean in an iteration
manner [
25
,
29
] until convergence conditions are satisfied. However, Riemannian mean may cause a
heavy computational load in deep learning because of its high complexity. Therefore, we seek an
approximation based on Log-Euclidean metric [24] as described below.
2.1 Notations
GL(n, R) := {ARn×n|determinant(A)6= 0}
is a general linear group, which is the set of all
real non-singular sqaure matrices.
(M, g)
denotes connected Riemannian manifold.
Sym(n) :=
{SMn×n(R)|ST=S}
is the space of all
n×n
real symmetric matrices, where
Mn×n(R)
specifies the space of all real square matrices,
(.)T
is the transpose operator, and
Sym+(n) :=
{PMn×n(R)|P=PT, vTP v > 0,vRn− {0}}
is the set of all
n×n
symmetric positive
definite(SPD) matrices.
< A, B >F
means the Frobenius inner product, defined as
T r(ATB)
, where
T r(.)
is the trace operator.
Log(.)
and
Exp(.)
are the principle logrithm operator for SPD matrix
[
40
] and exponential operator for symmetric matrix respectively. Both of them can be computed
using the orthogonal diagonalization.
Exp :Sym(n)7→ Sym+(n)
, an operator maps a symmetric
matrix SSym(n)to Sym+(n)by:
Exp(S) = V diag(exp(σ1), ..., exp(σn))VT
where Vis the matrix of eigenvectors of S.
The inverse projection of
Exp
operation is
Log
operator:
Log :Sym+(n)7→ Sym(n)
is an operator
that maps a SPD matrix PSym+(n)to Sym(n)by:
Log(P) = U diag(log(σ1), ..., log(σn))UT(1)
where Uis the matrix of eigenvectors of P, since PSym+(n),σi>0, i = 1, ..., n
2.2 Log-Euclidean metric
Log-Euclidean metric (LEM) offers an elegant, analogous, and efficient generalization to calculate
the center on the SPD manifold than the affine-invariant metric (AIM) [
24
,
41
]. LEM is a bi-invariant
metric on the Lie group on the SPD manifold [
24
]. The geodesic distance from
P1
to
P2
on the
Sym+(n)is also given by [24]:
δL(P1, P2) = kLog(P1)Log(P2)kF(2)
Furthermore, we can also define the Log-Euclidean mean(G) via the Log-Euclidean distance:
G(P1, ...Pk) = arg min
PSym+(n)
k
X
l=1
δ2
L(P, Pl)
where
P1, ..., PkSym+(n)
. Fortunately, the solution to the formula above has a closed form to
follow, given by [42]:
G=Exp 1
k
k
X
l=1
Log(Pl)!
We utilizes the weighted Log-Euclidean mean that is endowed with different weights in different
Pl
in our work. We denote the weight of each
Pl
as
wl
, where
l∈ {1,2, ..., k}
. Here,
{wl}k
l=1
3
(a) (b)
Figure 1: (a) The overview of the proposed model architecture. (b) E2R operation: split latent feature
into several epochs, and convert each one to a specific SPD matrix.
satisfies the convexity constraint definition (i.e.
k
X
l=1
wl= 1
, and
wl>0
). The definition and the
corresponding weighted Log-Euclidean mean can be defined and derived as:
G(P1, ...Pk) = arg min
PSym+(n)
k
X
l=1
wlδ2
L(P, Pl)
and
G=Exp k
X
l=1
wlLog(Pl)!
respectively.
3 Methodology
As shown in Figure 1(a), the architecture of MAtt includes components of the feature extraction
(FE), the manifold attention module, transitioning from Euclidean to Riemannian space (E2R), and
transitioning from Riemannian to Euclidean space (R2E).
3.1 Feature extraction of EEG signals
We adopt two convolutional layers to extract information of raw EEG signals, where the first
convolutional layer performs spatial filtering to the multi-channel EEG signals and the second
convolutional layer extracts spatiotemporal features. Our parameter setting follows [19].
3.2 From Euclidean space to SPD manifold (E2R operation)
(a) (b)
Figure 2: (a) The architecture of the proposed manifold attention module.
qi, ki, vi
refer to the query,
key, and value of the
ith
input matrix
˜xi
respectively;
v0
i
stands for the
ith
output of the proposed
module. (b) Illustration of the operation of Log-Euclidean mean used in proposed module as
i= 1
and number of epoch is 3;
qi
and
kj
refer to
ith
query and
jth
key respectively;
dj
denotes the
distance between
q1
and
kj
on the SPD manifold
M
;
TI
refers to the tangent space based on identity
matrix I.
4
As illustrated in Figure1(b), we convert the embeddings from the feature extraction stage to the SPD
data and map the feature embeddings from Euclidean space to the SPD manifold. Suppose
˜
f
denotes
the embeddings after the feature extraction stage, we divide the whole embeddings into several epochs
˜
f1,˜
f2, ..., ˜
fm
, and calculate the sample covariance matrix(SCM) of each
˜
fi,i∈ {1,2, ..., m}
. By
doing so, we get a sequence of covariance matrices that present the temporal information of the
embeddings
˜
f
in the form of SPD data, called
SCM ˜
f1, SCM ˜
f2, ..., SCM ˜
fm
. After we get some
datapoints, we do trace-normalization and add a small number
on each main diagonal element for
each
SCM ˜
fi
(i.e.
SCM ˜
fi
trSCM ˜
fi+I
) where
i∈ {1,2, ..., m}
,
I
is the identity matrix, and we set
as
1e-5 in our source code. The resulting SPD sequence is denoted as
˜
X= [ ˜x1,˜x2, ..., ˜xm]
. We add a
small identity matrix on them to promise ˜xito be a well-defined SPD matrix.
3.3 Manifold attention module
Forward procedure:
The input of this layer is a sequence of SPD data. The overview of the
manifold attention module is illustrated in Figure 2(a). Motivated by [
28
] and [
43
], we capture the
spatiotemporal information on the manifold. Suppose the module takes a sequence of SPD matrices
[ ˜x1,˜x2, ..., ˜xm]
, denoted as
˜
X
. Herein we have the query, key, and value in the form of SPD matrices
on the manifold [
43
]. We convert the
˜xi
to the
qi, ki,
and
vi
via bilinear mapping [
28
] and exploit
non-linear and valid features from each segment. Suppose the shape of
˜xi
is
dc×dc,
and
hq, hk
, and
hvis the mapping from ˜xito qi, ki,and virespectively. We have:
qi=hq( ˜xi;Wq) = Wq˜xiWT
q;ki=hk( ˜xi;Wk) = Wk˜xiWT
k;vi=hv( ˜xi;Wv) = Wv˜xiWT
v
where
˜xiSym+(dc)
,
Wq, Wk
, and
WvRdu×dc(du< dc)
denotes transformation matrices.
Moreover, to make sure the output
qi, ki
, and
vi
are also SPD matrices, transition matrices
Wq, Wk,
and Wvare constrained as row-full rank matrices.
After we got
qi, ki
, and
vi
by bilinear mapping, we define the similarity for measuring the
qi
and
kj
SPD matrices. In Euclidean space, there are several ways to define the similarity. A most common
way is to use dot-product [
43
] to measure the similarity of query and key. However, our query,
key, and value are SPD matrices instead of vectors as regular attention. We define the similarity
based on the Log-Euclidean distance (Eq. 2) between query and key. Suppose we have
qi
and
kj
,
for some
i, j ∈ {1, ..., m}
. The similarity
sim(.)
is a
strictly decreasing f unction
of distance
[0,)7→ [0,1]
and is defined as:
sim(qi, kj) = 1
1+log(1+δL(qi,kj)) : = αij
. Then, the attention
matrix is:
A= [αij ]m×m
We then use
Softmax
function to shrink the range along the row direction, making values in row
have convexity constraint property. The final attention probability matrix A0is:
A0=Softmax(A) = Sof tmax([αij ]m×m)=[α0
ij ]m×m
where
α0
ij =exp(αij )
Pm
k=1 exp(αik ),i, j 1,· · · , m
. Finally, we combine the attention probability matrix
and
v1, v2, ..., vm
to get the final output
v0
1, v0
2, ..., v0
m
and define the output
v0
i
(
i= 1,2, ..., m
) via
Log-Euclidean mean as:
v0
i=Exp m
X
l=1
α0
ilLog(vl)!
The forward procedure of proposed manifold attention module is illustrated in Algorithm 1.
Backward procedure:
In order to perform gradient descent parameter updating on the Riemannian
manifold, we employed the Riemannian gradient descent method to update the parameters. The
trainable parameters in this module are
Wq, Wk,
and
Wv
. We require the weight updated on Stiefel
manifold [
44
,
28
], denoted as
St(p, n) = {XRn×p|XTX=Ip}
. Since our manifold attention
module has a different mathematical architecture to those in Euclidean space, we herein extend
Euclidean gradients onto a Riemannian space. To be precise, we expect our gradients on the Stiefel
manifold to generate valid orthogonal weights. The Euclidean gradients of the
Wq
,
Wk
, and
Wv
within the attention module can be derived by the chain rule. Suppose the
L
is the loss, the query,
key, and value generated in the manifold attention module are
qi, ki,
and
vii= 1 · · · m
respectively,
5
摘要:

MAtt:AManifoldAttentionNetworkforEEGDecodingYue-TingPanJing-LunChouChun-ShuWeiNationalYangMingChiaoTungUniversity,Hsunchu,Taiwanwei@nycu.edu.twAbstractRecognitionofelectroencephalographic(EEG)signalshighlyaffecttheefciencyofnon-invasivebrain-computerinterfaces(BCIs).Whilerecentadvancesofdeep-learni...

展开>> 收起<<
MAtt A Manifold Attention Network for EEG Decoding Yue-Ting Pan Jing-Lun Chou Chun-Shu Wei.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:7.99MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注