AutoPrognosis 2.0 Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

2025-05-02 0 0 929.66KB 26 页 10玖币
侵权投诉
AutoPrognosis 2.0: Democratizing Diagnostic and
Prognostic Modeling in Healthcare with Automated
Machine Learning
Fergus Imriea, Bogdan Cebereb, Eoin F. McKinneyc, Mihaela van der
Schaarb,d
aDepartment of Electrical and Computer Engineering, University of California, Los
Angeles, CA, USA
bDepartment of Applied Mathematics and Theoretical Physics, University of
Cambridge, Cambridge, UK
cDepartment of Medicine, University of Cambridge, Cambridge, UK
dThe Alan Turing Institute, London, UK
Abstract
Diagnostic and prognostic models are increasingly important in medicine
and inform many clinical decisions. Recently, machine learning approaches
have shown improvement over conventional modeling techniques by better
capturing complex interactions between patient covariates in a data-driven
manner. However, the use of machine learning introduces a number of techni-
cal and practical challenges that have thus far restricted widespread adoption
of such techniques in clinical settings. To address these challenges and em-
power healthcare professionals, we present a machine learning framework,
AutoPrognosis 2.0, to develop diagnostic and prognostic models. AutoProg-
nosis leverages state-of-the-art advances in automated machine learning to
develop optimized machine learning pipelines, incorporates model explain-
ability tools, and enables deployment of clinical demonstrators, without re-
quiring significant technical expertise. Our framework eliminates the major
technical obstacles to predictive modeling with machine learning that cur-
rently impede clinical adoption. To demonstrate AutoPrognosis 2.0, we pro-
vide an illustrative application where we construct a prognostic risk score for
diabetes using the UK Biobank, a prospective study of 502,467 individuals.
The models produced by our automated framework achieve greater discrim-
ination for diabetes than expert clinical risk scores. Our risk score has been
Preprint 2022
arXiv:2210.12090v1 [cs.LG] 21 Oct 2022
implemented as a web-based decision support tool1and can be publicly ac-
cessed by patients and clinicians worldwide. In addition, AutoPrognosis 2.0
is provided as an open-source python package. By open-sourcing our frame-
work as a tool for the community, clinicians and other medical practitioners
will be able to readily develop new risk scores, personalized diagnostics, and
prognostics using modern machine learning techniques.
Software: https://github.com/vanderschaarlab/AutoPrognosis
1. Introduction
Machine learning (ML) systems have the potential to revolutionize medicine
and become core clinical tools [1]. However, there are a diverse set of chal-
lenges that must be overcome prior to routine and widespread ML adoption
[2, 3]. In particular, there are substantial technical challenges in develop-
ing, understanding, and deploying ML systems which currently render them
largely inaccessible for medical practitioners [3, 4, 5, 6].
In an attempt to address this, we previously developed AutoPrognosis, an
automated machine learning (AutoML) framework to train predictive models
[7]. This framework has since been applied to derive prognostic models for
cardiovascular disease [8], cystic fibrosis [9], and breast cancer [10], among
a number of other indications [11, 12, 13, 14, 15, 16]. However, our ini-
tial approach had significant limitations from both algorithmic and usability
perspectives.
Consequently, in this work, we describe AutoPrognosis 2.0, which ad-
dresses all major obstacles limiting the development, interpretation and de-
ployment of ML methods in medicine and represents a step-change in diag-
nostic and prognostic modeling. In particular, we believe this is the world’s
first method that can simultaneously: (1) solve classification, regression, and
time-to-event problems; (2) optimize ML pipelines, determine the most ap-
propriate models, and automatically tune hyperparameters; (3) identify key
variables and novel risk factors, enabling clinicians to select different numbers
of variables and understand the value of information; (4) provide a diverse
range of model explanations, including feature-based, example-based, and
1https://autoprognosis-biobank-diabetes.streamlitapp.com/
2
closed-form risk equations; and (5) produce web-based applications, allowing
models to be readily shared with the clinical community.
In this paper, we outline the major challenges facing clinical development
and translation of diagnostic and prognostic modeling. We then describe
our approach, AutoPrognosis 2.0, and detail how it addresses each challenge.
Finally, we demonstrate the application of AutoPrognosis 2.0 in an illustra-
tive scenario: prognostic risk prediction of diabetes using a cohort of 502,467
individuals from UK Biobank. However, we emphasize that AutoPrognosis
can be applied to construct diagnostic and prognostic models for any dis-
ease or clinical outcome, and is explicitly designed to make model building
accessibly by non-ML experts. We have open-sourced AutoPrognosis 2.0 as
a tool for the community, allowing clinicians or non-expert users to adopt
the automated framework to robustly and reproducibly develop optimized
personalized diagnostics, prognostics, and risk scores using modern machine
learning techniques.
2. Challenges in Diagnostic and Prognostic Modeling
There are numerous obstacles to developing and deploying diagnostic and
prognostic models that currently prevent healthcare professionals from cap-
italizing on recent algorithmic advances [1]. Our work seeks to empower
clinicians, medical researchers, epidemiologists, and biostatisticians through
an accessible, automated framework capable of identifying optimal solutions
to all major obstacles limiting ML model building with minimal need for
technical expertise. We begin by describing the seven major challenges faced
by these communities and how they are addressed by AutoPrognosis 2.0.
Challenge 1. Developing powerful ML pipelines
Developing performant ML models remains complex and typically in-
volves significant time and effort, even for expert ML practitioners. Indeed,
some estimates suggest over 95% of work is expended on software techni-
cals, leaving less than 5% for addressing the medical or scientific problem at
hand [17]. This is further complicated by the myriad of choices that must
be made when developing a new predictive model for diagnosis or progno-
sis, such as: what imputation strategy should be used; how should the data
be preprocessed; what (ML) model is best suited for the specific task; what
configuration of hyperparameters should be used. These decisions affect each
other, thus cannot be made in isolation; further, the optimal choices not only
3
Challenge 1. Developing powerful ML pipelines
AutoPrognosis uses AutoML to automate pipeline configuration, per-
forming missing value imputation, feature processing, model selection,
and hyperparameter optimization.
Challenge 2. Understanding the value of ML and when it is necessary
AutoPrognosis compares a range of ML methods to traditional ap-
proaches and automatically identifies what approach is best.
Challenge 3. Determining the value of information
AutoPrognosis can quantify the value of including additional predic-
tors, enabling systematic identification of optimal variables.
Challenge 4. Understanding and debugging ML models
AutoPrognosis incorporates seven state-of-the-art interpretability
methods, allowing models to be understood and debugged as they are
generated.
Challenge 5. Making ML models accessible and usable
AutoPrognosis provides a platform to share model outputs by automat-
ing the creation of web-based applications.
Challenge 6. Deciding when and if to update clinical models
AutoPrognosis can quantify the benefit of additional data or new pre-
dictive variables, and automatically determine the optimal system for
the new dataset.
Challenge 7. Transparent reproducibility
AutoPrognosis provides a standardized, publicly available framework,
facilitating reproducibility.
Table 1: Major challenges facing clinical development of diagnostic and prognostic models
and how these are addressed by AutoPrognosis. See Section 2 for more detail.
4
vary between applications, but also can change over time as more data is col-
lected and clinical practice changes [18].
Few resources are available to help empirically define optimal computa-
tional pipelines. AutoPrognosis 2.0 addresses this by incorporating an Au-
toML approach within a standardized framework, automating the process of
pipeline configuration. AutoPrognosis navigates a broad algorithmic search
space in an efficient fashion, systematically performing missing value imputa-
tion, feature processing, model selection, and hyperparameter optimization
in an unbiased manner without the need for human intervention or expert in-
sight. This avoids arbitrary parameter selection and ensures standardization
of pipelines, facilitating both reproducibility and optimized model perfor-
mance. Critically, this democratizes the model building step, eliminating the
requirement for expert ML knowledge and making cutting-edge methodology
accessible to all, freeing healthcare domain experts to define and address the
core clinical problems.
Challenge 2. Understanding the value of ML and when it is necessary
Traditional approaches, such as linear regression and Cox proportional
hazard models [19], are widely used and accepted across healthcare. Be-
fore replacing these established methods, it is vital to understand whether
ML is valuable for a given problem and quantify the benefit of ML systems.
Indeed, there is no “free lunch” and we should not expect ML to always
outperform existing approaches. Several recent examples exist that present
settings where comparatively “simple” approaches outperformed ML [20, 21].
AutoPrognosis 2.0 can be used to compare a range of ML methods to tradi-
tional approaches at minimal technical cost to the user. Furthermore, since
these solutions are included in the algorithmic search space, AutoProgno-
sis will automatically identify whether such approaches are indeed best or if
more complex ML models are required.
Challenge 3. Determining the value of information
Selecting which variables to include in a predictive model represents a
key decision that not only impacts model performance but also the ease of
subsequent clinical use since any feature used will need to be collected in an
ongoing manner to use such systems. Thus, understanding the value of an
individual variable and the information it provides is critical. Often, this is
assessed by univariate statistical analysis or other selection methods such as
forward selection or backwards elimination [22]. AutoPrognosis 2.0 provides
5
摘要:

AutoPrognosis2.0:DemocratizingDiagnosticandPrognosticModelinginHealthcarewithAutomatedMachineLearningFergusImriea,BogdanCebereb,EoinF.McKinneyc,MihaelavanderSchaarb,daDepartmentofElectricalandComputerEngineering,UniversityofCalifornia,LosAngeles,CA,USAbDepartmentofAppliedMathematicsandTheoreticalPhy...

展开>> 收起<<
AutoPrognosis 2.0 Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:929.66KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注