Uncertainty estimations methods for a deep learning model to a id in clinical decision -making a clinicians perspective

2025-05-06 0 0 667.79KB 12 页 10玖币

侵权投诉

Uncertainty estimations methods for a deep learning

model to aid in clinical decision-making – a clinician's

perspective

Michael Dohopolski1, Kai Wang1, Biling Wang1, Ti Bai1, Dan Nguyen1, David Sher1,

Steve Jiang1, Jing Wang1

1 Medical Artificial Intelligence and Automation Laboratory and Department of Radiation On-

cology, UT Southwestern Medical Center, Dallas TX 75235, USA

michael.dohopolski@utsouthwestern.edu

github: https://github.com/MikeDoho/FT_Uncertainty_Comparison

Abstract

Prediction uncertainty estimation has clinical significance as it can potentially

quantify prediction reliability. Clinicians may trust "blackbox" models more if

robust reliability information is available, which may lead to more models being

adopted into clinical practice. There are several deep learning-inspired uncer-

tainty estimation techniques, but few are implemented on medical datasets—

fewer on single institutional datasets/models. We sought to compare dropout var-

iational inference (DO), test-time augmentation (TTA), conformal predictions,

and single deterministic methods for estimating uncertainty using our model

trained to predict feeding tube placement for 271 head and neck cancer patients

treated with radiation. We compared the area under the curve (AUC), sensitivity,

specificity, positive predictive value (PPV), and negative predictive value (NPV)

trends for each method at various cutoffs that sought to stratify patients into "cer-

tain" and "uncertain" cohorts. These cutoffs were obtained by calculating the per-

centile "uncertainty" within the validation cohort and applied to the testing co-

hort. Broadly, the AUC, sensitivity, and NPV increased as the predictions were

more “certain”—ie lower uncertainty estimates. However, when a majority vote

(implementing 2/3 criteria: DO, TTA, conformal predictions) or a stricter ap-

proach (3/3 criteria) were used, AUC, sensitivity, and NPV improved without a

notable loss in specificity or PPV. Especially for smaller, single institutional da-

tasets, it may be important to evaluate multiple estimations techniques before in-

corporating a model into clinical practice.

Keywords: uncertainty estimation, deep learning, clinical decision-making

1 Introduction

Novel algorithms, such as convolutional neural networks, have been able to diagnose

(predict) disease states or other medically important outcomes as well as respective

experts 1,2. Despite the successes of these models, we rarely see them implemented in

clinical practice. This phenomenon is multifactorial, but one concern is the validity of

a particular prediction. Can clinicians trust a prediction, and how can this be measured?

Can a model self-identify cases where it does not "know" if its prediction is reliable—

i.e. can it say "I do not know?" Uncertainty estimation is a method proposed to quantify

the reliability of a prediction and might provide a means to reassure physicians regard-

ing a model's prediction. Alternatively, if the model can comment that it is "unsure"

then the physician can ignore the prediction and rely entirely on their clinical judgment.

There are two primary sources of uncertainty: aleatoric and epistemic uncertainties

3. At a basic level, aleatoric uncertainty is associated with inherent noise within the

data; clinically, a contributor of aleatoric uncertainty may be artifact present on a CT

image. Epistemic uncertainty represents a model's lack of knowledge. For example, a

model trained to predict outcomes associated with a diverse array of head and neck

cancers may be "uncertain" with expected to make a prediction on a poorly represented

subset. Concretely, say a model was trained on 100 oropharyngeal cancer cases, 80

laryngeal cancer cases, and three nasopharyngeal cancer cases. When asked to make

another prediction on a new nasopharyngeal cancer case, it might be more "uncertain"

compared to a new oropharyngeal cancer case. Alternatively, epistemic might be able

to identify model misuse, where a prediction made by the model above would be more

"uncertain" if exposed to a thyroid cancer case. While limited, uncertainty estimation

has shown an increased prevalence in the medical-based machine learning literature,

and there have been several proposed methods for estimating both aleatoric and epis-

temic uncertainty 3–9.

Most comparison studies on uncertainty estimation methods describe the pros and

cons associated with various methods 10. For example, dropout variation inference and

test-time augmentation methods are more computationally expensive when making pre-

dictions on test data versus ensemble methods, which alternatively require significant

computational effort during training. Single deterministic approaches are less compu-

tationally expensive than the prior methods but require the model to be retrained to

predict the uncertainty distribution 5,6,8,9,11.

Excitingly, Berger et al. recently published their work, which compared out-of-dis-

tribution detection methods (i.e. epistemic uncertainty surrogates) using a large publicly

available medical dataset (CheXpert) and made the critical observation that associated

performance on traditional computer vision datasets do not always translate as well

when applied to medical dataset 4,12. They also briefly explored threshold selection for

out-of-distribution identification using temperature scaling—high confidence/low un-

certainty was associated with higher accuracy. However, accuracy is only one metric

that is important to a clinician; sensitivity and specificity are key. Moreover, threshold

selection is critical as having too strict a threshold might limit a model's utility (i.e.

decrease the patient sample size where the model is "certain") or negatively affect clin-

ically used metrics such as sensitivity or specificity 13.

This study employs several epistemic and aleatoric uncertainty estimation methods

and compares performance at various cutoffs using AUC, sensitivity, and specificity

for a model trained to predict a clinically significant event—feeding tube placement—

in head and neck cancer patients treated with definitive radiation therapy. Feeding tube

placement is important for nutritional supplementation. If patients are not accurately

identified that they may need a feeding tube, then treatment delays can occur, which

are associated with worse survival 14,15. Meanwhile, feeding tube placement is a surgical

procedure and can be associated with worse quality of life. So accurately predicting

which patients need a feeding tube is important 16. Our particular clinical scenario aside,

many medical dilemmas require reliable criteria to make proper medical decision-mak-

ing. We hope our uncertainty estimation analyses highlight practical challenges in im-

plementing these methods for models trained on relatively small single-institutional da-

tasets. In conducting our analyses, we introduce the idea of implementing multiple un-

certainty estimation methods to improve general discriminative ability while not sacri-

ficing other metrics like specificity or sensitivity—an application we have not previ-

ously seen.

2 Methods

1. Data

This single institutional dataset included 271 patients. The outcome predicted was feed-

ing tube placement or ≥10% weight loss if the patient declined feeding tube placement;

this accounted for 42% of patients within the dataset. CT imaging, including the radia-

tion planning CT and on-treatment cone-beam CT, and radiation dose were used as a

three-channel input. The input was of size 150 × 80 × 80 to focus on the oral cavity,

oropharynx, and esophagus (structures important for swallowing) 17.

2. Model

Transfer learning using MedNet's ResNet50's architecture was employed 18,19. Fivefold

cross-validation was performed. Roughly 80% and 20% were used for training/valida-

tion and testing. The original model was trained with stochastic gradient descent using

a momentum of 0.9 and weight decay of 0.001. Batch normalization and dropout were

utilized. The model was trained over 120 epochs without early stopping. Cross entropy

with a class weight of 1/3 and 2/3 was used for class 0 and class 1. A separate model

was trained using the same hyperparameters but a different loss function that incorpo-

rated a Kullback-Leibler divergence term to create the deterministic/evidential deep

learning model 20.

3. Uncertainty Estimation Methods

Dropout Variational Inference

The implementation of dropout (DO) variational inference was popularized by Kendall

and Gal to approximate epistemic uncertainty. A model is trained with dropout and

incorporates dropout at test time. Multiple predictions are made for a single image, and

the class probabilities are used to calculate the informational entropy 9. We used 300

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Uncertaintyestimationsmethodsforadeeplearningmodeltoaidinclinicaldecision-making–aclinician'sperspectiveMichaelDohopolski1,KaiWang1,BilingWang1,TiBai1,DanNguyen1,DavidSher1,SteveJiang1,JingWang11MedicalArtificialIntelligenceandAutomationLaboratoryandDepartmentofRadiationOn-cology,UTSouthwesternMedic...

展开>> 收起<<

Uncertainty estimations methods for a deep learning model to a id in clinical decision -making a clinicians perspective.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uncertainty estimations methods for a deep learning model to a id in clinical decision -making a clinicians perspective

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: