Uncertainty estimations methods for a deep learning model to a id in clinical decision -making a clinicians perspective

2025-05-06 0 0 667.79KB 12 页 10玖币
侵权投诉
Uncertainty estimations methods for a deep learning
model to aid in clinical decision-making a clinician's
perspective
Michael Dohopolski1, Kai Wang1, Biling Wang1, Ti Bai1, Dan Nguyen1, David Sher1,
Steve Jiang1, Jing Wang1
1 Medical Artificial Intelligence and Automation Laboratory and Department of Radiation On-
cology, UT Southwestern Medical Center, Dallas TX 75235, USA
michael.dohopolski@utsouthwestern.edu
github: https://github.com/MikeDoho/FT_Uncertainty_Comparison
Abstract
Prediction uncertainty estimation has clinical significance as it can potentially
quantify prediction reliability. Clinicians may trust "blackbox" models more if
robust reliability information is available, which may lead to more models being
adopted into clinical practice. There are several deep learning-inspired uncer-
tainty estimation techniques, but few are implemented on medical datasets
fewer on single institutional datasets/models. We sought to compare dropout var-
iational inference (DO), test-time augmentation (TTA), conformal predictions,
and single deterministic methods for estimating uncertainty using our model
trained to predict feeding tube placement for 271 head and neck cancer patients
treated with radiation. We compared the area under the curve (AUC), sensitivity,
specificity, positive predictive value (PPV), and negative predictive value (NPV)
trends for each method at various cutoffs that sought to stratify patients into "cer-
tain" and "uncertain" cohorts. These cutoffs were obtained by calculating the per-
centile "uncertainty" within the validation cohort and applied to the testing co-
hort. Broadly, the AUC, sensitivity, and NPV increased as the predictions were
more “certain”—ie lower uncertainty estimates. However, when a majority vote
(implementing 2/3 criteria: DO, TTA, conformal predictions) or a stricter ap-
proach (3/3 criteria) were used, AUC, sensitivity, and NPV improved without a
notable loss in specificity or PPV. Especially for smaller, single institutional da-
tasets, it may be important to evaluate multiple estimations techniques before in-
corporating a model into clinical practice.
Keywords: uncertainty estimation, deep learning, clinical decision-making
1 Introduction
Novel algorithms, such as convolutional neural networks, have been able to diagnose
(predict) disease states or other medically important outcomes as well as respective
2
experts 1,2. Despite the successes of these models, we rarely see them implemented in
clinical practice. This phenomenon is multifactorial, but one concern is the validity of
a particular prediction. Can clinicians trust a prediction, and how can this be measured?
Can a model self-identify cases where it does not "know" if its prediction is reliable
i.e. can it say "I do not know?" Uncertainty estimation is a method proposed to quantify
the reliability of a prediction and might provide a means to reassure physicians regard-
ing a model's prediction. Alternatively, if the model can comment that it is "unsure"
then the physician can ignore the prediction and rely entirely on their clinical judgment.
There are two primary sources of uncertainty: aleatoric and epistemic uncertainties
3. At a basic level, aleatoric uncertainty is associated with inherent noise within the
data; clinically, a contributor of aleatoric uncertainty may be artifact present on a CT
image. Epistemic uncertainty represents a model's lack of knowledge. For example, a
model trained to predict outcomes associated with a diverse array of head and neck
cancers may be "uncertain" with expected to make a prediction on a poorly represented
subset. Concretely, say a model was trained on 100 oropharyngeal cancer cases, 80
laryngeal cancer cases, and three nasopharyngeal cancer cases. When asked to make
another prediction on a new nasopharyngeal cancer case, it might be more "uncertain"
compared to a new oropharyngeal cancer case. Alternatively, epistemic might be able
to identify model misuse, where a prediction made by the model above would be more
"uncertain" if exposed to a thyroid cancer case. While limited, uncertainty estimation
has shown an increased prevalence in the medical-based machine learning literature,
and there have been several proposed methods for estimating both aleatoric and epis-
temic uncertainty 39.
Most comparison studies on uncertainty estimation methods describe the pros and
cons associated with various methods 10. For example, dropout variation inference and
test-time augmentation methods are more computationally expensive when making pre-
dictions on test data versus ensemble methods, which alternatively require significant
computational effort during training. Single deterministic approaches are less compu-
tationally expensive than the prior methods but require the model to be retrained to
predict the uncertainty distribution 5,6,8,9,11.
Excitingly, Berger et al. recently published their work, which compared out-of-dis-
tribution detection methods (i.e. epistemic uncertainty surrogates) using a large publicly
available medical dataset (CheXpert) and made the critical observation that associated
performance on traditional computer vision datasets do not always translate as well
when applied to medical dataset 4,12. They also briefly explored threshold selection for
out-of-distribution identification using temperature scalinghigh confidence/low un-
certainty was associated with higher accuracy. However, accuracy is only one metric
that is important to a clinician; sensitivity and specificity are key. Moreover, threshold
selection is critical as having too strict a threshold might limit a model's utility (i.e.
decrease the patient sample size where the model is "certain") or negatively affect clin-
ically used metrics such as sensitivity or specificity 13.
This study employs several epistemic and aleatoric uncertainty estimation methods
and compares performance at various cutoffs using AUC, sensitivity, and specificity
for a model trained to predict a clinically significant eventfeeding tube placement
in head and neck cancer patients treated with definitive radiation therapy. Feeding tube
3
placement is important for nutritional supplementation. If patients are not accurately
identified that they may need a feeding tube, then treatment delays can occur, which
are associated with worse survival 14,15. Meanwhile, feeding tube placement is a surgical
procedure and can be associated with worse quality of life. So accurately predicting
which patients need a feeding tube is important 16. Our particular clinical scenario aside,
many medical dilemmas require reliable criteria to make proper medical decision-mak-
ing. We hope our uncertainty estimation analyses highlight practical challenges in im-
plementing these methods for models trained on relatively small single-institutional da-
tasets. In conducting our analyses, we introduce the idea of implementing multiple un-
certainty estimation methods to improve general discriminative ability while not sacri-
ficing other metrics like specificity or sensitivityan application we have not previ-
ously seen.
2 Methods
1. Data
This single institutional dataset included 271 patients. The outcome predicted was feed-
ing tube placement or ≥10% weight loss if the patient declined feeding tube placement;
this accounted for 42% of patients within the dataset. CT imaging, including the radia-
tion planning CT and on-treatment cone-beam CT, and radiation dose were used as a
three-channel input. The input was of size 150 × 80 × 80 to focus on the oral cavity,
oropharynx, and esophagus (structures important for swallowing) 17.
2. Model
Transfer learning using MedNet's ResNet50's architecture was employed 18,19. Fivefold
cross-validation was performed. Roughly 80% and 20% were used for training/valida-
tion and testing. The original model was trained with stochastic gradient descent using
a momentum of 0.9 and weight decay of 0.001. Batch normalization and dropout were
utilized. The model was trained over 120 epochs without early stopping. Cross entropy
with a class weight of 1/3 and 2/3 was used for class 0 and class 1. A separate model
was trained using the same hyperparameters but a different loss function that incorpo-
rated a Kullback-Leibler divergence term to create the deterministic/evidential deep
learning model 20.
3. Uncertainty Estimation Methods
Dropout Variational Inference
The implementation of dropout (DO) variational inference was popularized by Kendall
and Gal to approximate epistemic uncertainty. A model is trained with dropout and
incorporates dropout at test time. Multiple predictions are made for a single image, and
the class probabilities are used to calculate the informational entropy 9. We used 300
摘要:

Uncertaintyestimationsmethodsforadeeplearningmodeltoaidinclinicaldecision-making–aclinician'sperspectiveMichaelDohopolski1,KaiWang1,BilingWang1,TiBai1,DanNguyen1,DavidSher1,SteveJiang1,JingWang11MedicalArtificialIntelligenceandAutomationLaboratoryandDepartmentofRadiationOn-cology,UTSouthwesternMedic...

展开>> 收起<<
Uncertainty estimations methods for a deep learning model to a id in clinical decision -making a clinicians perspective.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:12 页 大小:667.79KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注