
Multimodality Multi-Lead ECG Arrhythmia
Classification using Self-Supervised Learning
1st Thinh Phan
Department of AI Convergence
Chonnam National University
Gwangju, South Korea 61186
phantrandacthinh2382@gmail.com
2nd Duc Le
Department of CSCE
University of Arkansas,
Fayetteville, Arkansas 72703
minhducl@uark.edu
3rd Patel Brijesh
Department of Cardiology
West Virginia University
Morgantown, WV 26506
brijesh.patel@wvumedicine.org
4th Donald Adjeroh
Department of CSEE
West Virginia University
Morgantown, WV 26506
don@csee.wvu.edu
5th Jingxian Wu
Department of ELEG
University of Arkansas,
Fayetteville, Arkansas 72703
wuj@uark.edu
6th Morten Olgaard Jensen
Department of Biomedical Engineering
University of Arkansas,
Fayetteville, Arkansas 72703
mojensen@uark.edu
7th Ngan Le
Department of CSCE
University of Arkansas,
Fayetteville, Arkansas 72703
thile@uark.edu
Abstract—Electrocardiogram (ECG) signal is one of the most
effective sources of information mainly employed for the diagno-
sis and prediction of cardiovascular diseases (CVDs) connected
with the abnormalities in heart rhythm. Clearly, single modality
ECG (i.e. time series) cannot convey its complete characteristics,
thus, exploiting both time and time-frequency modalities in the
form of time-series data and spectrogram is needed. Leveraging
the cutting-edge self-supervised learning (SSL) technique on unla-
beled data, we propose SSL-based multimodality ECG classifica-
tion. Our proposed network follows SSL learning paradigm and
consists of two modules corresponding to pre-stream task, and
down-stream task, respectively. In the SSL-pre-stream task, we
utilize self-knowledge distillation (KD) techniques with no labeled
data, on various transformations and in both time and frequency
domains. In the down-stream task, which is trained on labeled
data, we propose a gate fusion mechanism to fuse information
from multimodality.To evaluate the effectiveness of our approach,
ten-fold cross validation on the 12-lead PhysioNet 2020 dataset
has been conducted. https://github.com/UARK-AICV/ECG SSL
12Lead.
Index Terms—ECG classification, self-supervised learning, con-
trastive learning, multimodalities, multi-lead
I. INTRODUCTION
CVDs are leading causes of deaths globally. The mortality
rate can be considerably reduced by early treatment if occult
signals linked with CVD are detected by ECG. The ECG
signals, which record cardiac electrical activities, are widely
adopted to diagnose abnormal heart rhythms and intra-cardiac
conduction abnormalities.
Traditionally, cardiac feature extraction [1] and pattern
classifiers [2] are separated. Notwithstanding the decent per-
formance, they are not applicable to real-life cases because
of high time consumption and complexity. Recent DNNs-
based approaches have obtained amazing research progress
in various domains [3]–[5]. DNN-based ECG classification
in general can be categorized into single lead classification
[6], [7] or multiple lead classification [8], [9]. Our proposed
method belongs to the second category. In this group, [8]
proposed a simple residual neural network used for classifying
6 types of abnormalities on the in-house 12-lead database,
where some of its testing metrics are better than those of
expert cardiologists. [9] trained a CNNs-based structure on
multi-lead ECG data to diagnose myocardial infraction. Due
to high complexity in higher-dimensional data of 12-lead
ECG, different architectures have been adopted to model time
correlation among ECG sample points [10], [11]. Besides
time-series data, time-frequency has played an important role
in ECG analysis. [12] used STFT-based spectrogram and 2D
CNNs for ECG arrhythmia classification. [7] proposed using
STFT and stationary wavelet transform (SWT) transformations
to obtain two-dimensional (2-D) matrix input suitable for deep
CNNs. [13] proposed a novel wavelet sequence based on deep
bidirectional LSTM network model.
Furthermore, with the growing demand for medical exam-
ination and treatment, the healthcare industry steadily accu-
mulates innumerable amounts of data but these unlabelled
data might not be serviceable for most tasks. To address this
limitation, we leverage the recent advanced SSL techniques,
i.e., contrastive learning [14]. A common workflow to apply
SSL is to train the network in an unsupervised manner by
learning with a pre-stream task, and then finetuning the pre-
trained network on a target downstream task. The suitable pre-
stream tasks can be considered in four categories: context-
based [15], generation-based [16], free semantic label-based
[17], [18], and cross-modal-based [19], [20]. Recently, SSL-
based DNNs have also been applied in ECG classification. [21]
proposed a self-designed loss to bring the representations from
the same patient closer and study the shared context of individ-
ual recordings through time and scenarios. [22] customized an
SSL model that understands the differences among segments
from individual patients and dissimilarities between patients’
recordings from same category. [23] conducted extensive ex-
periments on four SSL methods with multiple combinations of
transformations and demonstrated the improvement in macro
arXiv:2210.06297v1 [eess.SP] 30 Sep 2022