
availability of large training datasets of digitized ECGs that could be linked to concurrent diagnostic
information across various disease types. In this context, standardized administrative health data,
routinely generated at each encounter, provide a wonderful opportunity to explore the full spectrum
of patient diagnoses. These data include the most responsible diagnosis, as well as any comorbidities
the patient may have or develop during presentation.
In this study, we use a population-based dataset of >250,000 patients with various medical conditions
and >2 million in-hospital ECGs. Here, we use diagnoses coded using the World Health Organization
International Classification of Diseases (ICD) [
20
]. The goal of our study is to identify which diseases
(with previously known or unknown associations with ECGs) can be accurately diagnosed from the
patient’s first ECG during an emergency department (ED) visit or hospitalization based on a learned
DL model. It aims to provide a proof-of-concept for high-throughput screening of ICD-wide range of
diseases based on ECG, and presents disease candidates to be explored in future ECG studies with
focused investigation on specific diagnosis.
2 Method
This study used population-based datasets from 26 hospitals in Alberta, Canada (2007-2020), contain-
ing information on 772,932 healthcare episodes (hospitalization and ED visits) of 260,065 patients
who collectively had 13,179 unique ICD-10 codes/diseases [
20
]. We linked these episodes to a
dataset of 2,015,808 ECGs (Philips IntelliSpace system, 12-lead, 500 Hz, 10 s) using unique patient
identifiers and timing of ECG acquisition. After data cleaning and exclusions (poor signal quality
1
,
unlinked episodes, pacemaker and devices, < 18 years old, etc.), we used 1,514,968 ECGs that were
linked to 724,074 episodes of 239,852 patients with 11,207 unique ICD codes. An ICD-10 code is 3 to
7 characters that specifies a specific disease, where the first 3 characters denote the general category of
disease (e.g., ‘I214’ refers to ‘Non-ST elevation (NSTEMI) myocardial infarction’ and ‘I21’ refers to
its broader category ‘Acute myocardial infarction’). We used ICD codes and corresponding categories
as labels for prediction modelling. We found 1,319 ICD codes (full code, exact match) and 699 ICD
categories (match first 3 digits) that were each linked to at least 1000 ECGs.
We split our ECG dataset into the internal validation set (random 60%: 143,939 patients with 436,508
ECGs, used for training and internal validation) and external holdout set (remaining 40%: 95,913
patients with 287,566 ECGs), while ensuring that ECGs from the same patient were not shared
between the sets. Whenever there were multiple ECGs in an episode, we used only the first ECG
for evaluation, as it would be preferable in actual clinical practice to make a diagnostic prediction
at the first point of care in the ED or hospital. We trained two DL models, for full ICD codes and
ICD categories. We first trained and evaluated the performance with 80%-20% split within the
internal set, and selected a list of top labels based on discriminative performance (Area under receiver
operating characteristic curve (AUROC)). We then retrained the models on the entire internal set
and evaluated on the external set based on the selected labels. Our DL architecture was based on
ResNet [
6
], similar to the one used in earlier ECG modeling study [
12
]. Here, 12-lead ECG traces
were input to the network, consisting of convolutional layer (conv), 4 residual blocks with 2 conv
per block, followed by a dense layer to which age and sex features were concatenated. We used
batch normalization, ReLU and dropout after each conv. The last block is then fed into a dense layer
with sigmoid activation to output a 1319 (resp., 699) length vector of predicted probabilities for the
codes/diseases (resp., categories). We used the Adam optimizer, learning rate of 0.001, batch size of
512, and binary cross entropy as loss function.
3 Results
In our internal validation, we found 369 out of 1319 ICD codes and 170 out of 699 ICD categories
to have AUROC > 80%. Among these, 70 ICD codes and 29 ICD categories had AUROC > 90%.
However, several of these labels had low precision, therefore we restricted the list to the labels with
at least 5% AUPRC (area under precision-recall curve) or with an average precision that is at least
20 times greater than the prevalence of the condition. This yielded 151 ICD codes and 80 ICD
categories with AUROC > 80%; and 52 ICD codes and 18 ICD categories with AUROC > 90%.
Finally, we examined the replication of these lists in the external validation, and found that 128 out of
1
Trace quality was ensured on muscle artifact, AC noise, baseline wander, QRS clipping, leads-off flags etc.
2