1 A Generalizable A rtificial Intelligence Model for COVID -19 Classification Task Using Chest X-ray Radiographs Evaluated O ver Four Clinical Dataset s with 15097 P atients

2025-04-28 0 0 498.01KB 20 页 10玖币
侵权投诉
1
A Generalizable Artificial Intelligence Model for COVID-19
Classification Task Using Chest X-ray Radiographs: Evaluated Over Four
Clinical Datasets with 15,097 Patients
Ran Zhang1 Ph.D., Xin Tie1 MS, John W. Garrett3,1 Ph.D., Dalton Griner1 MS, Zhihua Qi2
Ph.D., Nicholas B. Bevins2 Ph.D., Scott B. Reeder3,1,4,5,6 MD/Ph.D., and Guang-Hong Chen1,3
Ph.D.
1. Department of Medical Physics, School of Medicine and Public Health, University of
Wisconsin, Madison, WI 53705, USA
2. Department of Radiology, Henry Ford Health, Detroit, MI 48202, USA
3. Department of Radiology, School of Medicine and Public Health, University of Wisconsin,
Madison, WI 53792, USA
4. Department of Biomedical Engineering, University of Wisconsin, 1550 Engineering Dr,
Madison, WI, 53706, USA
5. Department of Medicine, University of Wisconsin, 1685 Highland Ave, Madison, WI,
53792, USA
6. Department of Emergency Medicine, University of Wisconsin, 800 University Bay Dr
Suite 310, Madison, WI, 53705, USA
Address correspondence to:
Guang-Hong Chen, Ph.D.,
Department of Medical Physics and Department of Radiology,
School of Medicine and Public Health,
University of Wisconsin in Madison, Madison, WI 53705,
Email: gchen7@wisc.edu
2
Abstract
Purpose
To answer the long-standing question of whether a model trained from a single clinical
site can be generalized to external sites.
Materials and Methods
17,537 chest x-ray radiographs (CXRs) from 3,264 COVID-19-positive patients and
4,802 COVID-19-negative patients were collected from a single site for AI model
development. The generalizability of the trained model was retrospectively evaluated
using four different real-world clinical datasets with a total of 26,633 CXRs from
15,097 patients (3,277 COVID-19-positive patients). The area under the receiver
operating characteristic curve (AUC) was used to assess diagnostic performance.
Results
The AI model trained using a single-source clinical dataset achieved an AUC of 0.82
(95% CI: 0.80, 0.84) when applied to the internal temporal test set. When applied to
datasets from two external clinical sites, an AUC of 0.81 (95% CI: 0.80, 0.82) and 0.82
(95% CI: 0.80, 0.84) were achieved. An AUC of 0.79 (95% CI: 0.77, 0.81) was
achieved when applied to a multi-institutional COVID-19 dataset collected by the
Medical Imaging and Data Resource Center (MIDRC). A power-law
dependence, ( is empirically found to be -0.21 to -0.25), indicates a relatively weak
performance dependence on the training data sizes.
Conclusion
COVID-19 classification AI model trained using well-curated data from a single
clinical site is generalizable to external clinical sites without a significant drop in
performance.
Summary
AI model trained using properly curated, the single-source dataset is generalizable to
external sites for the classification of COVID-19 using CXRs, and performance is only
weakly dependent on the sample size of the training data.
3
Key Points
A COVID-19 chest x-ray classification model trained using data from a single
clinical site demonstrated generalization to external test cohorts, with an AUC
range of 0.79-0.82.
The model’s performance has a weak power-law relationship with the training
data size, , with the exponent k ranging from -0.21 to -0.25.
Small training datasets (~100 patients) can be used to develop a baseline AI
model with good initial performance, suggesting the importance of data quality
over data size in medical AI model development for this application.
Abbreviations
AI: artificial intelligence
AUC: area under the receiver operating characteristic curve
COVID-19: coronavirus disease 2019
CXR: chest x-ray radiograph
RT-PCR: reverse transcriptase polymerase chain reaction
4
Introduction
In recent years, deep learning algorithms have shown great promise in medical image
analysis to help radiologists and clinicians in disease detection, classification, and
severity assessment, thanks to the development of hardware and algorithms in computer
vision (1,2). This promise has attracted tremendous research interest over the past two
years as the COVID-19 pandemic hit the world and put enormous pressure on
healthcare systems. Rapid diagnosis and patient triage play a vital role, especially in
the early stage of the pandemic and in resource-limited settings. In response to this
urgent need, researchers have rushed to develop AI-based diagnostic models using
chest x-rays (CXRs), as shown by the massive uptake in the number of publications on
this subject (3). While hundreds of AI models for COVID-19 diagnosis using CXRs
have been developed and claimed excellent performance, most of the models failed in
generalization when tested externally, as identified in several systematic reviews (4–6).
The poor generalizability of AI models is often attributed to the size and quality
of the training data. Without proper data collection and curation strategies, spurious
confounding factors, i.e., shortcuts, may exist in the training data (7,8). The model will
learn these shortcuts rather than the desired disease features. When shortcuts exist in
the training dataset, the trained model can achieve extremely high performance when
tested on the internal test set consisting of identically and independently distributed
samples from the training distribution. However, the performance may degrade to the
chance level when tested externally on real-world clinical datasets.
It is often assumed that increasing the data size and collecting data from
diversified sources, i.e., multiple institutions will ensure generalizable models (9).
However, such strategies may be difficult to implement due to the regulatory challenges
in medical data collection and sharing practices. Furthermore, these strategies may
become sub-optimal in a pandemic where solutions must be generated quickly and
reliably (10).
This study addresses two fundamental questions in medical AI model
development: 1) Is it possible to develop generalizable AI models from a carefully
curated single-source dataset? Namely, can a COVID-19 classification model trained
from a carefully curated single-source dataset be generalized to external sites? 2) How
does the performance of a generalizable AI model depend on the sample size of the
well-curated training dataset?
摘要:

1AGeneralizableArtificialIntelligenceModelforCOVID-19ClassificationTaskUsingChestX-rayRadiographs:EvaluatedOverFourClinicalDatasetswith15,097PatientsRanZhang1Ph.D.,XinTie1MS,JohnW.Garrett3,1Ph.D.,DaltonGriner1MS,ZhihuaQi2Ph.D.,NicholasB.Bevins2Ph.D.,ScottB.Reeder3,1,4,5,6MD/Ph.D.,andGuang-HongChen1,...

展开>> 收起<<
1 A Generalizable A rtificial Intelligence Model for COVID -19 Classification Task Using Chest X-ray Radiographs Evaluated O ver Four Clinical Dataset s with 15097 P atients.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:498.01KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注