
2
networks with flexible structure and internal configuration.
This work overcomes the possibility of varying ECG signals
over time and individual differences, achieving 97% classi-
fication accuracy on MIT-BIH.
Fig. 1. Standard fiducial points in the ECG. The ECG signal consists of
five major deflections, including P, Q, R, S and T, plus a small deflection,
known as the U wave [19].
Although these methods work well, they are difficult to
implement and use because most are based on manually de-
signed features. Different feature extraction techniques are
used to cope with different diseases due to the importance of
proper representation of the ECG signal. But there are many
diseases and it is impossible to design features for each of
them. Compared with traditional methods, deep learning
can automatically extract features and perform complex
data preprocessing. In recent years, deep learning has made
impressive achievements in computer-aided medical diag-
nosis [20] and is used in ECG classification [21]. In [22], ECG
signals are transformed into grayscale images and computer
vision techniques are used. An end-to-end deep learning
algorithm for ECG analysis is proposed, which employs a
deep neural network (DNN) to classify the class of ECG
[23]. In practice, many diseases need to be diagnosed on
the holistic ECG signal, as some of them appear over time.
And most of the previous works focus on the classification
of individual heartbeats instead of the holistic ECG signal.
There is very little effort devoted to classify the holistic
ECG signal. Hence it motivates us to achieve the classifi-
cation of the holistic ECG signal. As the ECG signal has
temporal variation and unique individual characteristics,
which means that the same type of ECG signal varies
among patients under different physical conditions, a two-
stream architecture is proposed in this paper. The archi-
tecture incorporates identified and temporal networks and
accurately classify the holistic ECG over a long period.
Specifically, individualized networks are used to extract fea-
tures of individual heartbeats while temporal networks are
employed to extract temporal correlations between heart-
beats, taking into account temporal variation and unique
individual characteristics of the ECG. And with the pur-
pose of demonstrating the generalization and excellence of
our architecture, seven detailed categories of heartbeat are
collected, each containing data from a thousand different
adults, also expending the ECG study with deep learning.
The work operates as an inter-patient paradigm rather than
an intra-patient paradigm.
Section 2 mainly introduces the proposed architecture
for ECG recognition and provides a detailed description of
its internal structure. In Section 3, the datasets uesd and
the preprocessing are illustrated and the implementation
details of the experiments are outlined. In Section 4, the
performance of the architecture is evaluated in the MIT-BIH
dataset and real life. The architecture is discussed in Section
5 and this paper is summarized in Section 6.
2 TWO-STREAM ARCHITECTURE FOR ECG
RECOGNITION
The ECG signal is influenced by the object and time, which
means that the same type of ECG signal varies among pa-
tients under different physical conditions. And inspired by
the Two-Stream networks of action recognition [24], where
the action is composed of spatial and temporal stream, the
two-stream architecture of ECG recognition is proposed.
In this architecture, the ECG signal can be decomposed
into identified and temporal components. The identified
part, in the form of an individual heartbeat (represented
as the P-QRS-T complex), contains information about the
unique individual characteristics represented in the ECG
signal. The temporal part, in the form of a holistic ECG
signal (represented as the combination of multiple P-QRS-
T complexes), transmits the symptoms of the ECG and
its changes over time. The corresponding ECG recognition
architecture accordingly is designed and divided it into two
streams, as shown in Fig. 2. Each stream is implemented
with a neural network, whose result is combined by late
fusion. Two fusion methods are considered: averaging the
output scores and training a fully connected layer on the
stacked features extracted from each stream.
Most studies in the field of heartbeat classification focus
on individual heartbeats and use an intra-patient paradigm.
In the scheme, the heartbeats of the same patient used in
both training and testing subsets make the evaluation result
overly optimistic [25]. Moreover, this scheme does not take
into account the time variation and unique individual char-
acteristics of the ECG. Our architecture takes these aspects
into consideration and operates as an inter-patient paradigm
rather than an intra-patient paradigm.
2.1 Identified stream network
The ECG signal contains unique individual characteristics
and disease symptoms. The identified stream network op-
erates at an individual heartbeat, efficiently extracting the
identity features and static features of the heartbeat. The
appearance of the individual heartbeat (static characteristics
of the heartbeat) is a useful clue as many diseases can
be identified from a single heartbeat without the holistic
ECG signal. The classification of individual heartbeats can
be achieved by the identified stream network designed.
And the classification of individual heartbeats is pretty
competitive for some specific classes. On the other hand,
the symptoms vary among patients, identity features matter.
This network can be pre-trained for the purpose of identifi-
cation in order to realize the identification capability on its
own.
The identified stream network is constructed by an 11-
layer 1D convolution neural network [26]. It consists of
seven alternating convolutions and average-pooling layers,