1 Self-supervised Learning for Label-Efficient Sleep Stage Classification A Comprehensive Evaluation

2025-04-26 0 0 1.7MB 11 页 10玖币
侵权投诉
1
Self-supervised Learning for Label-Efficient Sleep
Stage Classification: A Comprehensive Evaluation
Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, and Xiaoli Li
Abstract—The past few years have witnessed a remarkable
advance in deep learning for EEG-based sleep stage classification
(SSC). However, the success of these models is attributed to
possessing a massive amount of labeled data for training, limiting
their applicability in real-world scenarios. In such scenarios, sleep
labs can generate a massive amount of data, but labeling these
data can be expensive and time-consuming. Recently, the self-
supervised learning (SSL) paradigm has shined as one of the
most successful techniques to overcome the scarcity of labeled
data. In this paper, we evaluate the efficacy of SSL to boost the
performance of existing SSC models in the few-labels regime. We
conduct a thorough study on three SSC datasets, and we find that
fine-tuning the pretrained SSC models with only 5% of labeled
data can achieve competitive performance to the supervised
training with full labels. Moreover, self-supervised pretraining
helps SSC models to be more robust to data imbalance and
domain shift problems.
Index Terms—Sleep stage classification, EEG, self-supervised
learning, label-efficient learning
I. INTRODUCTION
Sleep stage classification (SSC) plays a key role in diagnos-
ing many common diseases such as insomnia and sleep apnea
[1]. To assess the sleep quality or diagnose sleep disorders,
overnight polysomnogram (PSG) readings are split into 30-
second segments, i.e., epochs, and assigned a sleep stage. This
process is performed manually by specialists, who follow a
set of rules, e.g., the American Academy of Sleep Medicine
(AASM) [2] to identify the patterns and classify the PSG
epochs into sleep stages. This manual process is tedious,
exhaustive, and time-consuming.
To overcome this issue, numerous deep learning-based SSC
models were developed to automate the data labeling process.
These models are trained on a massive labeled dataset and
applied to the dataset of interest. For example, Jadhav et
al. [3] explored different deep learning models to exploit raw
Emadeldeen Eldele and Chee-Keong Kwoh are with the School of Com-
puter Science and Engineering, Nanyang Technological University, Singapore
(E-mail: {emad0002, asckkwoh}@ntu.edu.sg).
Mohamed Ragab and Zhenghua Chen are with the Institute for Info-
comm Research (I2R) and the Centre for Frontier AI Research (CFAR),
Agency for Science, Technology and Research (ASTAR), Singapore (E-mail:
{mohamedr002, chen0832}@e.ntu.edu.sg).
Min Wu is with the Institute for Infocomm Research (I2R), Agency for Sci-
ence, Technology and Research (ASTAR), Singapore (E-mail: wumin@i2r.a-
star.edu.sg).
Xiaoli Li is with Institute for Infocomm Research (I2R), Centre for Frontier
Research (CFAR), Agency of Science, Technology and Research (ASTAR),
Singapore, and also with the School of Computer Science and Engineering at
Nanyang Technological University, Singapore (E-mail: xlli@i2r.a-star.edu.sg).
First author is supported by ASTAR SINGA Scholarship.
Min Wu is the corresponding author.
electroencephalogram (EEG) signals, as well as their time-
frequency spectra. Also, Phyo et al. [4] attempted to improve
the performance of the deep learning model on the confusing
transitioning epochs between stages. In addition, Phan et al. [5]
proposed a transformer backbone that provides interpretable,
and uncertainty quantified predictions. However, the success
of these approaches hinges on a massive amount of labeled
data to train the deep learning models, which might not be
feasible. In practice, sleep labs can collect a vast amount of
overnight recordings, but the difficulties in labeling the data
limit deploying these data-hungry models. Thus, unfortunately,
the SSC works developed in the past few years have now a
bottleneck: the size, quality, and availability of labeled data.
One alternative solution to pass through this bottleneck is
the self-supervised learning (SSL) paradigm, which witnessed
increased interest recently due to its ability to learn useful
representations from unlabeled data. In SSL, the model is
pretrained on a newly defined task that does not require
any labeled data, where ground-truth pseudo labels can be
generated for free. Such tasks are designed to learn the model
to recognize general characteristics about the data without
being directed with labels. Currently, SSL algorithms can
produce state-of-the-art performance on standard computer
vision benchmarks [9], [11]–[13]. Consequently, the SSL
paradigm has gained more interest to be applied for sleep stage
classification problem [8], [14].
Most prior works aim to propose novel SSL algorithms and
show how they could improve the performance of sleep stage
classification. Instead, in this work, our aim is to examine the
efficacy of SSL paradigm to re-motivate deploying existing
SSC works in real-world scenarios, where only few labeled
samples are available. Therefore, we revisit a prominent subset
of SSC models and perform an empirical study to evaluate
their performance under the few-labeled data settings. More-
over, we explore the efficacy of different SSL algorithms on
their performance and robustness. We also study the effect of
sleep data characteristics, e.g., data imbalance and temporal
relations, on the learned self-supervised representations. Fi-
nally, we assess the transferability of self-supervised against
supervised representations and their robustness to domain
shift. The overall framework is illustrated in Fig. 1. We
perform an extensive set of experiments on three sleep staging
datasets to systemically analyze the SSC models under the
few-labeled data settings. The experimental results of this
study aim to provide a solid and realistic real-world assessment
of the existing sleep stage classification models.
arXiv:2210.06286v1 [eess.SP] 10 Oct 2022
2
ClsTrans
SimCLR
CPC
TS-TCC
Self-supervised Learning Algorithms
Feature Extraction
CNN branch
CNN branch
Dropout
BiLSTM
FC
softmax
Temporal
Encoder
Maximize
Agreement
Feature
Extraction
Feature
Extraction
Projection
Head
Projection
Head
.
.SSC model .
.
0
1
.
.
k
Predictions
Feature
Extraction
Autoregressive
Model
Predictions
Feature
Extraction
Autoregressive
Model
Feature
Extraction
Autoregressive
Model
Projection
Head
Projection
Head
Maximize
Agreement
Small-kernel CNN
Wide-kernel CNN
Adaptive
Feature
Recalibration
Causal
self-attention softmax
CNN
block
1D
Convolution Batch
Norm Max
Pooling
CNN
block CNN
block softmax
ReLU
Sleep Stage Classification Models
Evaluating
Datasets
Sleep-EDF
DeepSleepNet
AttnSleep
1D-CNN
Classifier
SHHS
ISRUC
Fig. 1. The architecture of our evaluation framework. We experiment with three sleep stage classification models, i.e., DeepSleepNet [6], AttnSleep [7], and
1D-CNN [8]. We also include four self-supervised learning algorithms, i.e., ClsTran, SimCLR [9], CPC [10], and TS-TCC [8]. The different experiments are
performed on Sleep-EDF, SHHS, and ISRUC datasets.
II. RELATED WORK
A. Sleep Stage Classification
A wide span of EEG-based sleep stage classification meth-
ods have been introduced in recent years. These methods pro-
posed different architecture designs. For example, some meth-
ods adopted multiple parallel convolutional neural networks
(CNNs) branches to extract better features from EEG signals
[4], [6], [7]. Also, some methods included residual CNN layers
[15], [16], while others used graph-based CNN networks
[17]. On the other hand, Phan et al. [18] proposed Long
Short Term Memory (LSTM) networks to extract features
from EEG spectrograms. To handle the temporal dependencies
among EEG features, these methods had different approaches.
For instance, some works adopted recurrent neural networks
(RNNs), e.g., bi-directional LSTM networks as in [4], [6],
[16]. Other works adopted the multi-head self-attention as
a faster and more efficient way to capture the temporal
dependencies in timesteps, as in [7], [19].
Despite the proven performance of these architectures, they
require a huge labeled training dataset to feed the deep learning
models. None of these works studied the performance of their
models in the few labeled data regime, which is our scope in
this work.
B. Self-supervised Learning Approaches
Self-supervised learning received more attention recently
because of its ability to learn useful representations from
unlabeled data. The first SSL auxiliary tasks showed a big
improvement in the performance of the downstream task. For
example, Noroozi et al. proposed training the model to solve a
jigsaw puzzle on a patched image [20]. In addition, Gidaris et
al. proposed rotating the input images, then trained the model
to predict the rotation angle [21]. The success of these auxil-
iary tasks motivated adapting contrastive learning algorithms,
which showed to be more effective due to their ability to learn
invariant features. The key idea behind contrastive learning is
to define positive and negative pairs for each sample, then push
the sample closer to the positive pairs, and pull it away from
the negative pairs. In general, contrastive-based approaches
rely on data augmentations to generate positive and negative
3
pairs. For example, SimCLR considered the augmented views
of the sample as positive pairs, while all the other samples
within the same mini-batch are considered as negative pairs
[9]. Also, MoCo increased the number of negative pairs by
keeping samples from other mini-batches in a memory bank
[11]. On the other hand, some recent algorithms neglected the
negative pairs and proposed using only positive pairs such as
BYOL [13] and SimSiam [12].
C. Self-supervised learning for Sleep Staging
The success of SSL in computer vision applications mo-
tivated their adoption for sleep stage classification. For ex-
ample, Mohsenvand et al. [22] and Jiang et al. [23] pro-
posed SimCLR-like methodologies and applied EEG-related
augmentations for sleep stage classification. Also, Banville
et al. applied three pretext tasks, i.e., relative positioning,
temporal shuffling, and contrastive predictive coding (CPC)
to explore the underlying structure of the unlabeled sleep
EEG data [24]. The CPC [10] algorithm predicts the future
timesteps in the time-series signal, which motivated other
works to build on it. For example, SleepDPC solved two
problems, i.e., predicting future representations of epochs,
and distinguishing epochs from other different epochs [25].
Also, TS-TCC proposed temporal and contextual contrasting
approaches to learn instance-wise representations about the
sleep EEG data [8]. In addition, SSLAPP developed a con-
trastive learning approach with attention-based augmentations
in the embedding space to add more positive pairs [26].
Last, CoSleep [14] and SleepECL [27] are yet another two
contrastive methods that exploit information, e.g., inter-epoch
dependency and frequency domain views, from EEG data to
obtain more positive pairs for contrastive learning.
III. EVALUATION FRAMEWORK
A. Preliminaries
In this section, we describe the SSL-related terminologies,
i.e., pretext tasks, contrastive learning, and downstream tasks.
1) Problem Formulation: We assume that the input is
single-channel EEG data in Rd, and each sample has one label
from one of Cclasses. The supervised downstream task has
an access to the inputs and the corresponding labels, while
the self-supervised learning algorithms have access only to
the inputs.
The SSC networks consist of three main parts. The first
is the feature extractor, which maps the input data into the
embedded space fφ:RdRm1parameterized by neural
network parameters φ. The second is the temporal encoder
(TE), which is another intermediate network to improve the
temporal representations. The TE may change the dimension
of the embedded features fθ:Rm1Rm. Finally, the
classifier fγ:RmRC, which produces the predictions.
The SSL algorithms learn φfrom unlabeled data, while fine-
tuning learns θand γwith also updating φ.
2) Pretext tasks: Pretext tasks refer to the pre-designed
tasks to learn the model generalized representations from the
unlabeled data. Here, we describe two main types of pretext
tasks, i.e., auxiliary, and contrastive tasks.
a) Auxiliary tasks: This category includes defining a new
task along with free-to-generate pseudo labels. These tasks
can be defined as classification, regression, or any others. In
the context of time-series applications, a new classification
auxiliary task was defined in [28], [29] by generating several
views to the signals using augmentations such as: adding noise,
rotation, and scaling . Each view was assigned a label, and the
model was pretrained to classify these transformations. This
approach showed success in learning underlying representa-
tions from unlabeled data. However, it is usually designed
with heuristics that might limit the generality of the learned
representations [8].
b) Contrastive learning: In contrastive learning, repre-
sentations are learned by comparing the similarity between
samples. In specific, we define positive and negative pairs
for each sample. Next, the feature extractor is trained to
achieve the contrastive objective, i.e., push the features of the
sample towards the positive pairs, and pull them away from
the negative pairs. These pairs are usually generated via data
augmentations. Notably, some studies [9], [30] relied on strong
successive augmentations and found them to be a key factor
in the success of their contrastive techniques.
Formally, given a dataset with Nunlabeled samples, we
generate two views for each sample x, i.e., {ˆ
xi,ˆ
xj}using data
augmentations. Therefore, in a multiviewed batch with Nsam-
ples for each view, we have a total of 2Nsamples. Next, the
feature extractor transforms them into the embedding space,
and a projection head h(·)is used to obtain low-dimensional
embeddings, i.e., zi=h(fφ(ˆ
xi)) and zj=h(fφ(ˆ
xj)). As-
suming that for an anchor sample indexed iI≡ {1. . . 2N},
and A(k)I\{k}. The objective of contrastive learning is to
encourage the similarity between positive pairs and separate
the negative pairs apart using the NT-Xent loss, defined as
follows:
LNT-Xent =1
2NX
iI
log exp (zi·zj)
PaA(i)exp (zi·za),(1)
where ·symbol denotes the inner dot product, and τis a
temperature parameter.
3) Downstream tasks: Downstream tasks are the main tasks
of interest that lacked a sufficient amount of labeled data
for training the deep learning models. In this paper, the
downstream task is sleep stage classification, i.e., classifying
the PSG epochs into one of five classes, i.e., W, N1, N2,
N3, and REM. However, in general, the downstream task
can be different and defined by various applications. Notably,
different pretext tasks can have a different impact on the
same downstream task. Therefore, it is important to design a
relevant pretext task to the problem of interest, to learn better
representations. Despite the numerous proposed methods in
self-supervised learning, identifying the proper pretext task is
still an open research question [31].
B. Sleep Stage Classification Models
We perform our experiments on three sleep stage classi-
fication models, i.e., DeepSleepNet [6], AttnSleep [7], and
摘要:

1Self-supervisedLearningforLabel-EfcientSleepStageClassication:AComprehensiveEvaluationEmadeldeenEldele,MohamedRagab,ZhenghuaChen,MinWu,Chee-KeongKwoh,andXiaoliLiAbstract—ThepastfewyearshavewitnessedaremarkableadvanceindeeplearningforEEG-basedsleepstageclassication(SSC).However,thesuccessofthesem...

展开>> 收起<<
1 Self-supervised Learning for Label-Efficient Sleep Stage Classification A Comprehensive Evaluation.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.7MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注