ECG for high-throughput screening of multiple diseases Proof-of-concept using multi-diagnosis deep learning from population-based datasets

2025-04-26 0 0 2.7MB 6 页 10玖币
侵权投诉
ECG for high-throughput screening of multiple
diseases: Proof-of-concept using multi-diagnosis deep
learning from population-based datasets
Weijie Sun1,2Sunil Vasu Kalmady1,3Amir Salimi2Nariman Sepehrvand1
Eric Ly1Abram Hindle2Russell Greiner2,3Padma Kaul1
1Canadian VIGOUR Centre, Department of Medicine, University of Alberta, Alberta, Canada
2Department of Computing Science, University of Alberta, Alberta, Canada
3Alberta Machine Intelligence Institute, Alberta, Canada
weijie2@ualberta.ca kalmady@ualberta.ca
Abstract
Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but
may also occur in other non-cardiovascular conditions such as mental, neurological,
metabolic and infectious conditions. However, most of the recent success of deep
learning (DL) based diagnostic predictions in selected patient cohorts have been
limited to a small set of cardiac diseases. In this study, we use a population-based
dataset of >250,000 patients with >1000 medical conditions and >2 million ECGs
to identify a wide range of diseases that could be accurately diagnosed from the
patient’s first in-hospital ECG. Our DL models uncovered 128 diseases and 68
disease categories with strong discriminative performance.
1 Introduction
Electrocardiogram (ECG) captures the propagation of the electrical signal in the heart and is one of
the most routinely used non-invasive modalities in healthcare to diagnose cardiovascular diseases [
8
].
However, ECG signals can be complex, making it challenging and time-consuming to interpret, even
for experts. In recent years, deep learning (DL) models have been successful in reaching near human
levels of performance, however most of these studies have been limited to typical ECG abnormalities
such as arrhythmias [
1
] and a limited set of heart diseases including valvulopathy, cardiomyopathy,
and ischaemia [16].
Several clinical studies have shown strong associations of ECG abnormalities with numerous diseases
beyond cardiovascular conditions, including but not limited to mental disorders : depression [
18
],
bipolar disorder [
4
]; infectious conditions : HIV [
15
], sepsis [
14
]; metabolic diseases : diabetes
type 2 [
5
], amyloidosis [
2
]; drug use : psychotropics [
11
], cannabis [
23
]; neurological disorders:
Alzheimer disease [
24
], cerebral palsy [
10
]; respiratory diseases : pneumoconiosis [
22
], chronic
obstructive pulmonary disease [
7
]; digestive system diseases : liver cirrhosis [
17
], alcoholic liver
disease [
19
]; miscellaneous conditions: chronic kidney disease [
13
], preterm labour [
3
], systemic
lupus erythematosus [
9
] etc. However, despite well established clinical associations of ECG changes
with multiple diseases, very few studies have explored the information contained in ECGs that could
be harnessed for prediction of non-cardiovascular conditions. A major challenge here is the lack of
35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia.
arXiv:2210.06291v1 [eess.SP] 6 Oct 2022
availability of large training datasets of digitized ECGs that could be linked to concurrent diagnostic
information across various disease types. In this context, standardized administrative health data,
routinely generated at each encounter, provide a wonderful opportunity to explore the full spectrum
of patient diagnoses. These data include the most responsible diagnosis, as well as any comorbidities
the patient may have or develop during presentation.
In this study, we use a population-based dataset of >250,000 patients with various medical conditions
and >2 million in-hospital ECGs. Here, we use diagnoses coded using the World Health Organization
International Classification of Diseases (ICD) [
20
]. The goal of our study is to identify which diseases
(with previously known or unknown associations with ECGs) can be accurately diagnosed from the
patient’s first ECG during an emergency department (ED) visit or hospitalization based on a learned
DL model. It aims to provide a proof-of-concept for high-throughput screening of ICD-wide range of
diseases based on ECG, and presents disease candidates to be explored in future ECG studies with
focused investigation on specific diagnosis.
2 Method
This study used population-based datasets from 26 hospitals in Alberta, Canada (2007-2020), contain-
ing information on 772,932 healthcare episodes (hospitalization and ED visits) of 260,065 patients
who collectively had 13,179 unique ICD-10 codes/diseases [
20
]. We linked these episodes to a
dataset of 2,015,808 ECGs (Philips IntelliSpace system, 12-lead, 500 Hz, 10 s) using unique patient
identifiers and timing of ECG acquisition. After data cleaning and exclusions (poor signal quality
1
,
unlinked episodes, pacemaker and devices, < 18 years old, etc.), we used 1,514,968 ECGs that were
linked to 724,074 episodes of 239,852 patients with 11,207 unique ICD codes. An ICD-10 code is 3 to
7 characters that specifies a specific disease, where the first 3 characters denote the general category of
disease (e.g., ‘I214’ refers to ‘Non-ST elevation (NSTEMI) myocardial infarction’ and ‘I21’ refers to
its broader category ‘Acute myocardial infarction’). We used ICD codes and corresponding categories
as labels for prediction modelling. We found 1,319 ICD codes (full code, exact match) and 699 ICD
categories (match first 3 digits) that were each linked to at least 1000 ECGs.
We split our ECG dataset into the internal validation set (random 60%: 143,939 patients with 436,508
ECGs, used for training and internal validation) and external holdout set (remaining 40%: 95,913
patients with 287,566 ECGs), while ensuring that ECGs from the same patient were not shared
between the sets. Whenever there were multiple ECGs in an episode, we used only the first ECG
for evaluation, as it would be preferable in actual clinical practice to make a diagnostic prediction
at the first point of care in the ED or hospital. We trained two DL models, for full ICD codes and
ICD categories. We first trained and evaluated the performance with 80%-20% split within the
internal set, and selected a list of top labels based on discriminative performance (Area under receiver
operating characteristic curve (AUROC)). We then retrained the models on the entire internal set
and evaluated on the external set based on the selected labels. Our DL architecture was based on
ResNet [
6
], similar to the one used in earlier ECG modeling study [
12
]. Here, 12-lead ECG traces
were input to the network, consisting of convolutional layer (conv), 4 residual blocks with 2 conv
per block, followed by a dense layer to which age and sex features were concatenated. We used
batch normalization, ReLU and dropout after each conv. The last block is then fed into a dense layer
with sigmoid activation to output a 1319 (resp., 699) length vector of predicted probabilities for the
codes/diseases (resp., categories). We used the Adam optimizer, learning rate of 0.001, batch size of
512, and binary cross entropy as loss function.
3 Results
In our internal validation, we found 369 out of 1319 ICD codes and 170 out of 699 ICD categories
to have AUROC > 80%. Among these, 70 ICD codes and 29 ICD categories had AUROC > 90%.
However, several of these labels had low precision, therefore we restricted the list to the labels with
at least 5% AUPRC (area under precision-recall curve) or with an average precision that is at least
20 times greater than the prevalence of the condition. This yielded 151 ICD codes and 80 ICD
categories with AUROC > 80%; and 52 ICD codes and 18 ICD categories with AUROC > 90%.
Finally, we examined the replication of these lists in the external validation, and found that 128 out of
1
Trace quality was ensured on muscle artifact, AC noise, baseline wander, QRS clipping, leads-off flags etc.
2
摘要:

ECGforhigh-throughputscreeningofmultiplediseases:Proof-of-conceptusingmulti-diagnosisdeeplearningfrompopulation-baseddatasetsWeijieSun1;2SunilVasuKalmady1;3AmirSalimi2NarimanSepehrvand1EricLy1AbramHindle2RussellGreiner2;3PadmaKaul11CanadianVIGOURCentre,DepartmentofMedicine,UniversityofAlberta,Albert...

展开>> 收起<<
ECG for high-throughput screening of multiple diseases Proof-of-concept using multi-diagnosis deep learning from population-based datasets.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:2.7MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注