AutoPrognosis 2.0 Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

2025-05-02 0 0 929.66KB 26 页 10玖币

侵权投诉

AutoPrognosis 2.0: Democratizing Diagnostic and

Prognostic Modeling in Healthcare with Automated

Machine Learning

Fergus Imriea, Bogdan Cebereb, Eoin F. McKinneyc, Mihaela van der

Schaarb,d

aDepartment of Electrical and Computer Engineering, University of California, Los

Angeles, CA, USA

bDepartment of Applied Mathematics and Theoretical Physics, University of

Cambridge, Cambridge, UK

cDepartment of Medicine, University of Cambridge, Cambridge, UK

dThe Alan Turing Institute, London, UK

Abstract

Diagnostic and prognostic models are increasingly important in medicine

and inform many clinical decisions. Recently, machine learning approaches

have shown improvement over conventional modeling techniques by better

capturing complex interactions between patient covariates in a data-driven

manner. However, the use of machine learning introduces a number of techni-

cal and practical challenges that have thus far restricted widespread adoption

of such techniques in clinical settings. To address these challenges and em-

power healthcare professionals, we present a machine learning framework,

AutoPrognosis 2.0, to develop diagnostic and prognostic models. AutoProg-

nosis leverages state-of-the-art advances in automated machine learning to

develop optimized machine learning pipelines, incorporates model explain-

ability tools, and enables deployment of clinical demonstrators, without re-

quiring signiﬁcant technical expertise. Our framework eliminates the major

technical obstacles to predictive modeling with machine learning that cur-

rently impede clinical adoption. To demonstrate AutoPrognosis 2.0, we pro-

vide an illustrative application where we construct a prognostic risk score for

diabetes using the UK Biobank, a prospective study of 502,467 individuals.

The models produced by our automated framework achieve greater discrim-

ination for diabetes than expert clinical risk scores. Our risk score has been

Preprint 2022

arXiv:2210.12090v1 [cs.LG] 21 Oct 2022

implemented as a web-based decision support tool1and can be publicly ac-

cessed by patients and clinicians worldwide. In addition, AutoPrognosis 2.0

is provided as an open-source python package. By open-sourcing our frame-

work as a tool for the community, clinicians and other medical practitioners

will be able to readily develop new risk scores, personalized diagnostics, and

prognostics using modern machine learning techniques.

Software: https://github.com/vanderschaarlab/AutoPrognosis

1. Introduction

Machine learning (ML) systems have the potential to revolutionize medicine

and become core clinical tools [1]. However, there are a diverse set of chal-

lenges that must be overcome prior to routine and widespread ML adoption

[2, 3]. In particular, there are substantial technical challenges in develop-

ing, understanding, and deploying ML systems which currently render them

largely inaccessible for medical practitioners [3, 4, 5, 6].

In an attempt to address this, we previously developed AutoPrognosis, an

automated machine learning (AutoML) framework to train predictive models

[7]. This framework has since been applied to derive prognostic models for

cardiovascular disease [8], cystic ﬁbrosis [9], and breast cancer [10], among

a number of other indications [11, 12, 13, 14, 15, 16]. However, our ini-

tial approach had signiﬁcant limitations from both algorithmic and usability

perspectives.

Consequently, in this work, we describe AutoPrognosis 2.0, which ad-

dresses all major obstacles limiting the development, interpretation and de-

ployment of ML methods in medicine and represents a step-change in diag-

nostic and prognostic modeling. In particular, we believe this is the world’s

ﬁrst method that can simultaneously: (1) solve classiﬁcation, regression, and

time-to-event problems; (2) optimize ML pipelines, determine the most ap-

propriate models, and automatically tune hyperparameters; (3) identify key

variables and novel risk factors, enabling clinicians to select diﬀerent numbers

of variables and understand the value of information; (4) provide a diverse

range of model explanations, including feature-based, example-based, and

1https://autoprognosis-biobank-diabetes.streamlitapp.com/

closed-form risk equations; and (5) produce web-based applications, allowing

models to be readily shared with the clinical community.

In this paper, we outline the major challenges facing clinical development

and translation of diagnostic and prognostic modeling. We then describe

our approach, AutoPrognosis 2.0, and detail how it addresses each challenge.

Finally, we demonstrate the application of AutoPrognosis 2.0 in an illustra-

tive scenario: prognostic risk prediction of diabetes using a cohort of 502,467

individuals from UK Biobank. However, we emphasize that AutoPrognosis

can be applied to construct diagnostic and prognostic models for any dis-

ease or clinical outcome, and is explicitly designed to make model building

accessibly by non-ML experts. We have open-sourced AutoPrognosis 2.0 as

a tool for the community, allowing clinicians or non-expert users to adopt

the automated framework to robustly and reproducibly develop optimized

personalized diagnostics, prognostics, and risk scores using modern machine

learning techniques.

2. Challenges in Diagnostic and Prognostic Modeling

There are numerous obstacles to developing and deploying diagnostic and

prognostic models that currently prevent healthcare professionals from cap-

italizing on recent algorithmic advances [1]. Our work seeks to empower

clinicians, medical researchers, epidemiologists, and biostatisticians through

an accessible, automated framework capable of identifying optimal solutions

to all major obstacles limiting ML model building with minimal need for

technical expertise. We begin by describing the seven major challenges faced

by these communities and how they are addressed by AutoPrognosis 2.0.

Challenge 1. Developing powerful ML pipelines

Developing performant ML models remains complex and typically in-

volves signiﬁcant time and eﬀort, even for expert ML practitioners. Indeed,

some estimates suggest over 95% of work is expended on software techni-

cals, leaving less than 5% for addressing the medical or scientiﬁc problem at

hand [17]. This is further complicated by the myriad of choices that must

be made when developing a new predictive model for diagnosis or progno-

sis, such as: what imputation strategy should be used; how should the data

be preprocessed; what (ML) model is best suited for the speciﬁc task; what

conﬁguration of hyperparameters should be used. These decisions aﬀect each

other, thus cannot be made in isolation; further, the optimal choices not only

Challenge 1. Developing powerful ML pipelines

AutoPrognosis uses AutoML to automate pipeline conﬁguration, per-

forming missing value imputation, feature processing, model selection,

and hyperparameter optimization.

Challenge 2. Understanding the value of ML and when it is necessary

AutoPrognosis compares a range of ML methods to traditional ap-

proaches and automatically identiﬁes what approach is best.

Challenge 3. Determining the value of information

AutoPrognosis can quantify the value of including additional predic-

tors, enabling systematic identiﬁcation of optimal variables.

Challenge 4. Understanding and debugging ML models

AutoPrognosis incorporates seven state-of-the-art interpretability

methods, allowing models to be understood and debugged as they are

generated.

Challenge 5. Making ML models accessible and usable

AutoPrognosis provides a platform to share model outputs by automat-

ing the creation of web-based applications.

Challenge 6. Deciding when and if to update clinical models

AutoPrognosis can quantify the beneﬁt of additional data or new pre-

dictive variables, and automatically determine the optimal system for

the new dataset.

Challenge 7. Transparent reproducibility

AutoPrognosis provides a standardized, publicly available framework,

facilitating reproducibility.

Table 1: Major challenges facing clinical development of diagnostic and prognostic models

and how these are addressed by AutoPrognosis. See Section 2 for more detail.

vary between applications, but also can change over time as more data is col-

lected and clinical practice changes [18].

Few resources are available to help empirically deﬁne optimal computa-

tional pipelines. AutoPrognosis 2.0 addresses this by incorporating an Au-

toML approach within a standardized framework, automating the process of

pipeline conﬁguration. AutoPrognosis navigates a broad algorithmic search

space in an eﬃcient fashion, systematically performing missing value imputa-

tion, feature processing, model selection, and hyperparameter optimization

in an unbiased manner without the need for human intervention or expert in-

sight. This avoids arbitrary parameter selection and ensures standardization

of pipelines, facilitating both reproducibility and optimized model perfor-

mance. Critically, this democratizes the model building step, eliminating the

requirement for expert ML knowledge and making cutting-edge methodology

accessible to all, freeing healthcare domain experts to deﬁne and address the

core clinical problems.

Challenge 2. Understanding the value of ML and when it is necessary

Traditional approaches, such as linear regression and Cox proportional

hazard models [19], are widely used and accepted across healthcare. Be-

fore replacing these established methods, it is vital to understand whether

ML is valuable for a given problem and quantify the beneﬁt of ML systems.

Indeed, there is no “free lunch” and we should not expect ML to always

outperform existing approaches. Several recent examples exist that present

settings where comparatively “simple” approaches outperformed ML [20, 21].

AutoPrognosis 2.0 can be used to compare a range of ML methods to tradi-

tional approaches at minimal technical cost to the user. Furthermore, since

these solutions are included in the algorithmic search space, AutoProgno-

sis will automatically identify whether such approaches are indeed best or if

more complex ML models are required.

Challenge 3. Determining the value of information

Selecting which variables to include in a predictive model represents a

key decision that not only impacts model performance but also the ease of

subsequent clinical use since any feature used will need to be collected in an

ongoing manner to use such systems. Thus, understanding the value of an

individual variable and the information it provides is critical. Often, this is

assessed by univariate statistical analysis or other selection methods such as

forward selection or backwards elimination [22]. AutoPrognosis 2.0 provides

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AutoPrognosis2.0:DemocratizingDiagnosticandPrognosticModelinginHealthcarewithAutomatedMachineLearningFergusImriea,BogdanCebereb,EoinF.McKinneyc,MihaelavanderSchaarb,daDepartmentofElectricalandComputerEngineering,UniversityofCalifornia,LosAngeles,CA,USAbDepartmentofAppliedMathematicsandTheoreticalPhy...

展开>> 收起<<

AutoPrognosis 2.0 Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AutoPrognosis 2.0 Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: