Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes Xiao Li Brent R Logan S M Ferdous Hossain and Erica E M Moodie

2025-05-03 0 0 4.01MB 24 页 10玖币
侵权投诉
Dynamic Treatment Regimes using Bayesian Additive Regression
Trees for Censored Outcomes
Xiao Li, Brent R Logan, S M Ferdous Hossain, and Erica E M Moodie
October 25, 2022
Abstract
To achieve the goal of providing the best possible care to each patient, physicians need to customize treat-
ments for patients with the same diagnosis, especially when treating diseases that can progress further and
require additional treatments, such as cancer. Making decisions at multiple stages as a disease progresses can
be formalized as a dynamic treatment regime (DTR). Most of the existing optimization approaches for esti-
mating dynamic treatment regimes including the popular method of Q-learning were developed in a frequentist
context. Recently, a general Bayesian machine learning framework that facilitates using Bayesian regression
modeling to optimize DTRs has been proposed. In this article, we adapt this approach to censored outcomes
using Bayesian additive regression trees (BART) for each stage under the accelerated failure time modeling
framework, along with simulation studies and a real data example that compare the proposed approach with
Q-learning. We also develop an R wrapper function that utilizes a standard BART survival model to optimize
DTRs for censored outcomes. The wrapper function can easily be extended to accommodate any type of
Bayesian machine learning model.
Keywords: Accelerated Failure Time (AFT), allogeneic hematopoietic cell transplantation, precision medicine,
individualized treatment rules, survival analysis
1 Introduction
Optimizing medical therapy often requires that the treatment be individually tailored to the patient initially, and
that the treatment be adaptive to patient’s changing characteristics over time. Since patient responses can often
be heterogeneous, it is challenging for physicians to customize treatments for patients based on traditional clinical
trial results, which lack the ability to identify subgroups that have different treatment effects and rarely consider
successions of treatments. For chronic diseases that can evolve, it is even more important and difficult to choose
the best therapy in sequence. To give a simple example, oncologists typically choose an initial immunosuppressant
regime for patients with acute myeloid leukemia (AML) who are undergoing allogeneic hematopoietic cell trans-
plantation (AHCT), to prevent a serious potential complication called graft-versus-host disease (GVHD). At the
time that such an initial regime fails, a salvage treatment is chosen based on the patient’s prior treatments and
responses. Such a multi-stage treatment decision has been summarized as a dynamic treatment regime (DTR)
by Murphy Murphy (2003). Each decision rule in DTR takes a patient’s individual characteristics, treatment
history and possible intermediate outcomes observed up to a certain stage as inputs, and outputs a recommended
treatment for that stage.
A number of approaches have been proposed for estimating and optimizing DTRs, including those by Robins
Robins (2004), Moodie et al. Moodie et al. (2007), Qian and Murphy Qian and Murphy (2011), Zhao et al.
Zhao et al. (2015), Krakow et al. Krakow et al. (2017), Murray et al. Murray et al. (2018), and Simoneau et al.
1
arXiv:2210.13330v1 [stat.ME] 24 Oct 2022
Simoneau et al. (2020). Among the previous literature, the Bayesian machine learning (BML) method developed
by Murray et al. Murray et al. (2018) innovatively bridges the gap between Bayesian inferences and dynamic
programming methods from machine learning. A key advantage to a Bayesian approach to estimation is the
quantification of uncertainty in decision making through the resulting posterior distribution. A second benefit
that arises specifically in the BML approach is the highly flexible estimation that is employed, which minimizes
the risk of estimation errors due to model mis-specification.
However, the BML method has not yet been adapted to censored outcomes, which is one of the more common
types of outcomes in controlling chronic diseases. Motivated by the study of optimal therapeutic choices to
prevent and treat GVHD, in this paper, we extend this approach to censored outcomes under the accelerated
failure time (AFT) model framework. By modifying the data augmentation step in the BML method, the censored
observations can be imputed in an informative way so that the observed censoring time is well utilized. This
extension is illustrated using Bayesian additive regression trees (BART). We also implemented the proposed
AFT-BML approach by developing an R function that utilizes standard BART survival software directly without
needing to modify existing (complex) BART software directly. Parallel computing was used to speed up the
computational calculations. This R wrapper function can be easily adjusted to accommodate other types of
Bayesian machine learning methods.
This paper is organized as follows. In Section 2, we briefly review related methods, algorithms, and describe the
extended AFT-BML approach for optimizing DTRs. Section 3 presents simulation studies to demonstrate our
model performance by comparing to it to estimation using Q-learning. An analysis of our motivating dataset
of patients diagnosed with AML is given in Section 4. Finally, in Section 5 we discuss the advantages and
disadvantages of our approach and provide some suggestions for future work.
2 Methods
2.1 Dynamic Treatment Regimes
A dynamic treatment regime (DTR) is a series of decision rules that assign treatment based on the patient’s
characteristics and history at each stage. Without loss of generality, we focus on a two-stage intervention problem.
Furthermore, we start by describing DTRs in the non-survival setting, before proceeding to the censored survival
setting later. Following Murray’s notation Murray et al. (2018), let o1∈ O1be the covariates observed before
Stage 1, and a1∈ A1be the action taken at Stage 1. Denote y1as the payoff observed after Stage 1 and before
Stage 2. {o2, a2, y2}are defined similarly for Stage 2. The total payoff is assumed to be y=y1+ηy2, where
ηis an indicator that the patient entered Stage 2. A general diagram to present the two-stage decision making
problem is
o1a1y1
if η=1
o2a2y2.
Denote the accumulated history before Stage 2 treatment as ¯o2= (o1, a1, y1, o2)¯
O2. In this setting, a DTR
consists of two decision rules, one for each stage,
d1:O1→ A1and d2:¯
O2→ A2.
Optimizing the two-stage DTR (d1, d2) is equivalent to finding the decision rules d1and d2that maximize the
expected total payoff E(y). That is
dopt
2(¯o2) = arg sup
a2∈A2
E(y2|¯o2, a2)¯o2¯
O2,
dopt
1(o1) = arg sup
a1∈A1
E(y|o1, a1, dopt
2)o1∈ O1.
2
2.2 Bayesian Machine Learning for DTRs
Murray et al. Murray et al. (2018) described a new approach called Bayesian Machine Learning (BML) to optimize
DTRs; the method requires fitting a series of Bayesian regression models in reverse sequential order under the
approximate dynamic programming framework. The authors use the potential outcomes notation to describe
their approach, where y(a1, a2) denotes the payoff observed when action a1is taken at Stage 1 and action a2is
taken at Stage 2, and other potential outcomes (y2(a1, a2), y1(a1), and o2(a1)) are similarly defined. Assuming
the potential outcomes are consistent, then the observed outcome corresponds to the potential outcome for the
action actually followed, e.g. y1(a1) = y1,o2(a1) = o2,y2(a1, a2) = y2, and y(a1, a2) = y. The approach can
be summarized as follows. The Stage 2 regression model for y2(a1, a2) is estimated first, using the observed
covariates (¯o2, a2) and the observed response variable y2. Based on the estimated Stage 2 model, the estimated
optimal mapping from ¯
O2to A2, simply denoted as dopt
2, can be identified, as well as the relevant potential payoff
at Stage 2, denoted as y2(a1, dopt
2). With dopt
2and potential payoff y2(a1, dopt
2), the response variable for Stage
1 can be constructed as y(a1, dopt
2); this (potential) outcome is composed of the observed Stage 1 payoff y1and
the potential Stage 2 payoff y2(a1, dopt
2). Note that if the observed outcome a2matches the optimal outcome
according to dopt
2, then the potential payoff is simply the observed payoff y=y1+ηy2. Otherwise, the potential
outcome is unobserved and must be imputed (in this BML method, actually sampled from the posterior predictive
distribution as described further below). Given imputed values, the Stage 1 regression model for y(a1, dopt
2) then
can be estimated with observed covariates (o1, a1) to identify dopt
1. This type of backward induction strategy is
used in several DTR estimation methods, including g-estimation, Q-learning, dynamic weighted ordinary least
squares Robins (2004); Moodie et al. (2007); Nahum-Shani et al. (2012); Goldberg and Kosorok (2012); Simoneau
et al. (2020).
Estimation of the terminal stage regression model is simply a typical model of outcome by predictors fit using
standard Bayesian methods. The estimation of the nonterminal stage models, on the other hand, is not easily
done with standard Bayesian software because of the counterfactual or potential payoff under the unobserved
optimal action at each subsequent stage, which contributes to the outcome at the current stage. To address this
problem, Murray et al. Murray et al. (2018) developed a backward induction Gibbs (BIG) sampler to implement
the proposed BML approach in practice. It consists of three steps, repeated until convergence, using ˆ above
random variables to indicate sampled values in an MCMC algorithm:
Step 1 Draw a posterior sample of parameters θ2in the Stage 2 model and set the optimal action ˆaopt
i2=
ˆ
dopt
2(¯oi2;θ2), i= 1, . . . , n.
Step 2 Compare the observed ai2and the optimal ˆaopt
i2. For i= 1, . . . , n, if ai2= ˆaopt
i2, then set ˆyopt
i2=yi2; else,
sample ˆyopt
i2from the posterior predictive distribution of y2(ai1,ˆaopt
i2).
Step 3 Draw a posterior sample of parameters θ1in the Stage 1 model using outcome yi1+ηiˆyopt
i2.
2.3 AFT BART
Bayesian additive regression trees (BART) form a Bayesian nonparametric regression model developed by Chip-
man et al. Chipman et al. (2010), which is an ensemble of trees. The accelerated failure time BART Bonato et al.
(2011) is an extension of the approach to accommodate censored outcomes assuming the event time follows a
log normal distribution. Let tibe the event time, cibe the censoring time for individual i. Then, the observed
survival time is si= min(ti, ci), and the event indicator is δi=I(ti< ci). Denote by xi= (xi1, . . . , xip) the
p-dimensional vector of predictors. The relationship between tiand xiis expressed as
log ti=µ+f(xi) + εi, εi
iid
N(0, σ2)
fprior
BART, σ2prior
νλχ2(ν),
3
where f(xi) is a sum of mregression trees f(xi)Pm
j=1 g(xi;Tj,Mj) with Tjdenoting a binary tree with a
set of internal nodes and terminal nodes, and Mj={µj1, . . . , µjbj}denoting the set of parameter values on the
terminal nodes of tree Tj. Full details of the BART model, including prior distributions and MCMC sampling
algorithm, can be found in Chipman et al. (2010). Since the tiof censored observations are not observable,
an extra data augmentation step to impute tiis needed in each iteration when drawing Markov chain Monte
Carlo (MCMC) posterior samples with Gibbs sampling. In particular, the unobserved event times are randomly
sampled from a truncated normal distribution as
log ti|si, δi= 0, f(xi), σ2N(µ+f(xi), σ2)×I(ti> si).
After data augmentation, the complete log event times are treated as continuous outcomes and the standard
BART MCMC draws can be applied.
The AFT BART model with a log normal survival distribution is implemented within the BART R package
Sparapani et al. (2021); additional details are found in the Appendix B.
2.4 Proposed AFT-BML algorithm
Since the BML approach by Murray et al. Murray et al. (2018) is not directly applicable to censored observations,
we extended it by modifying the BIG sampler so that censoring can be accommodated. Here we are interested
in the time to an event (such as death) from the start of Stage 1. The Stage 2 treatment decision initiates
at an intermediate event such as disease progression. This effectively separates the payoff or event time into
two components: the time to the earliest of the event of interest and the intermediate event triggering Stage 2
(t1), and if the patient enters Stage 2 (η= 1), the time from the start of Stage 2 to the event of interest (t2).
Observed data accounting for censoring and entry to Stage 2 are denoted (s1, δ1) for Stage 1 and (s2, δ2) for
Stage 2. Continuing with the potential outcomes notation, let t(a1, a2) denote the time to the event of interest
when action a1is taken at Stage 1 and action a2is taken at Stage 2. Similarly, let t2(a1, a2) denote the event
time in Stage 2 (starting at the entry to Stage 2) under actions (a1, a2). Finally, potential time t1(a1) is the
time in Stage 1 until the first of the event of interest or entry to Stage 2. Corresponding payoffs on the log time
scale are denoted y(a1, a2) = log t(a1, a2), y2(a1, a2) = log t2(a1, a2), and y1(a1) = log t1(a1). Under consistency,
the observed outcome corresponds to the potential outcome for the action actually followed, e.g. t1(a1) = t1,
t2(a1, a2) = t2, and t(a1, a2) = t, and similarly for the y= log tversions.
Murray et al. Murray et al. (2018) recommended using Bayesian nonparametric regression models in Stages 1
and 2 for robustness. Here we illustrated our approach with AFT BART models in each stage. As before, we use
ˆabove random variables to indicate sampled values in an MCMC algorithm. The Stage 2 regression model for
t2(a1, a2) is estimated first, using the observed covariates (¯o2, a2) and the observed time to event data (s2, δ2),
according to the AFT BART model
log ti2=µ2+f2(¯o2, a2) + εi, εi
iid
N(0, σ2
2)
f2
prior
BART, σ2
2
prior
νλχ2(ν),
We can run the Stage 2 BART model until convergence, draw 1000 posterior samples from the model, and then
sample the optimal Stage 2 treatment rule for each MCMC sample according to
ˆ
dopt
2(¯o2) = arg sup
a2∈A2
E(log t2|¯o2, a2) = arg sup
a2∈A2
f2(¯o2, a2),
We also can implement a sampling procedure to generate potential outcomes for the total time from Stage 1
assuming optimal Stage 2 treatment as ˆ
t(a1,ˆ
dopt
2) = t1+ˆ
t2(a1,ˆ
dopt
2). Some of the potential outcomes resulting
from this procedure may still be censored, and we denote the possibly censored version of these potential outcomes
4
as (ˆs, ˆ
δ). This event time data are then modeled as a function of covariates (o1, a1) using another AFT BART
model given by
log ˆ
ti=µ1+f1(o1, a1) + εi, εi
iid
N(0, σ2
1)
f1
prior
BART, σ2
1
prior
νλχ2(ν),
For each sampled potential outcomes dataset, we run the Stage 1 AFT BART model until convergence, and
then draw one posterior sample from each fitted BART model to determine a sample from the posterior of dopt
1
according to
ˆ
dopt
1(o1) = arg sup
a1∈A1
E(log ˆ
t|o1, a1, dopt
2) = arg sup
a1∈A1
f1(o1, a1).
Details of the AFT-BML algorithm are as follows:
Step 1 Run BART on the Stage 2 data until convergence and draw 1000 samples from the posterior distribution
of f2and σ2
2. This implicitly involves the following two steps which are performed automatically by the
BART package.
Step 1a Draw unobserved event time ti2for censored subjects (δi2= 0) who reached Stage 2 (ηi= 1) using
a truncated normal distribution,
log ti2|si2, δi2= 0, f2, σ2
2N(µ2+f2(¯oi2, ai2), σ2
2)×I(ti2si2).
Step 1b Update f2and σ2
2with complete (uncensored) Stage 2 data.
Step 2 Draw 1000 samples of (ˆaopt
i2,ˆ
topt
i2) for each subject and use each sample to create an augmented dataset for
Stage 1 analysis, as follows:
Step 2a The optimal action at Stage 2 is chosen as ˆaopt
i2= arga2max f2(¯oi2, a2).
Step 2b If ai2= ˆaopt
i2and the observation is an event (δi2= 1), ˆ
topt
i2=ti2; if ai2= ˆaopt
i2and the observation is
censored (δi2= 0), log ˆ
topt
i2N(µ2+f2(¯oi2,ˆaopt
i2), σ2
2)×I(ˆ
topt
i2si2).
Step 2c If ai26= ˆaopt
i2, draw log ˆ
topt
i2for the counterfactual action ˆaopt
i2from N(µ2+f2(¯oi2,ˆaopt
i2), σ2
2).
Step 2d For those who reached Stage 2 (ηi= 1), set the observed data for the Stage 1 model as the potential
Stage 1 event time ˆ
ti, e.g. ˆsi=ˆ
ti=ti1+ˆ
topt
i2, and set ˆ
δi= 1. For those who did not reach Stage 2,
set the observed data for the Stage 1 model as ˆsi=ti1and ˆ
δi=δi1.
Step 3 Run BART on each of the augmented Stage 1 data sets until convergence and draw 1 sample from the
posterior distribution of f1and σ2
1for each augmented Stage 1 data set. As above, this requires the
following two steps which are performed automatically by the BART package:
Step 3a Draw unobserved event time ˆ
tifor censored subjects (ˆ
δi= 0) at Stage 1,
log ˆ
ti|ˆsi,ˆ
δi= 0, f1, σ2
1N(µ1+f1(oi1, ai1), σ2
1)×I(ˆ
tiˆsi).
Step 3b Update f1and σ2
1with complete (uncensored) data at Stage 1.
Step 4 From each augmented dataset and the corresponding sampled f1and σ2
1, draw one sample ˆaopt
i1for each
subject, based on
ˆaopt
i1= arg max
ai1
f1(oi1, ai1).
5
摘要:

DynamicTreatmentRegimesusingBayesianAdditiveRegressionTreesforCensoredOutcomesXiaoLi,BrentRLogan,SMFerdousHossain,andEricaEMMoodieOctober25,2022AbstractToachievethegoalofprovidingthebestpossiblecaretoeachpatient,physiciansneedtocustomizetreat-mentsforpatientswiththesamediagnosis,especiallywhentreati...

展开>> 收起<<
Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes Xiao Li Brent R Logan S M Ferdous Hossain and Erica E M Moodie.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:4.01MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注