Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes Xiao Li Brent R Logan S M Ferdous Hossain and Erica E M Moodie

2025-05-03 0 0 4.01MB 24 页 10玖币

侵权投诉

Dynamic Treatment Regimes using Bayesian Additive Regression

Trees for Censored Outcomes

Xiao Li, Brent R Logan, S M Ferdous Hossain, and Erica E M Moodie

October 25, 2022

Abstract

To achieve the goal of providing the best possible care to each patient, physicians need to customize treat-

ments for patients with the same diagnosis, especially when treating diseases that can progress further and

require additional treatments, such as cancer. Making decisions at multiple stages as a disease progresses can

be formalized as a dynamic treatment regime (DTR). Most of the existing optimization approaches for esti-

mating dynamic treatment regimes including the popular method of Q-learning were developed in a frequentist

context. Recently, a general Bayesian machine learning framework that facilitates using Bayesian regression

modeling to optimize DTRs has been proposed. In this article, we adapt this approach to censored outcomes

using Bayesian additive regression trees (BART) for each stage under the accelerated failure time modeling

framework, along with simulation studies and a real data example that compare the proposed approach with

Q-learning. We also develop an R wrapper function that utilizes a standard BART survival model to optimize

DTRs for censored outcomes. The wrapper function can easily be extended to accommodate any type of

Bayesian machine learning model.

Keywords: Accelerated Failure Time (AFT), allogeneic hematopoietic cell transplantation, precision medicine,

individualized treatment rules, survival analysis

1 Introduction

Optimizing medical therapy often requires that the treatment be individually tailored to the patient initially, and

that the treatment be adaptive to patient’s changing characteristics over time. Since patient responses can often

be heterogeneous, it is challenging for physicians to customize treatments for patients based on traditional clinical

trial results, which lack the ability to identify subgroups that have diﬀerent treatment eﬀects and rarely consider

successions of treatments. For chronic diseases that can evolve, it is even more important and diﬃcult to choose

the best therapy in sequence. To give a simple example, oncologists typically choose an initial immunosuppressant

regime for patients with acute myeloid leukemia (AML) who are undergoing allogeneic hematopoietic cell trans-

plantation (AHCT), to prevent a serious potential complication called graft-versus-host disease (GVHD). At the

time that such an initial regime fails, a salvage treatment is chosen based on the patient’s prior treatments and

responses. Such a multi-stage treatment decision has been summarized as a dynamic treatment regime (DTR)

by Murphy Murphy (2003). Each decision rule in DTR takes a patient’s individual characteristics, treatment

history and possible intermediate outcomes observed up to a certain stage as inputs, and outputs a recommended

treatment for that stage.

A number of approaches have been proposed for estimating and optimizing DTRs, including those by Robins

Robins (2004), Moodie et al. Moodie et al. (2007), Qian and Murphy Qian and Murphy (2011), Zhao et al.

Zhao et al. (2015), Krakow et al. Krakow et al. (2017), Murray et al. Murray et al. (2018), and Simoneau et al.

arXiv:2210.13330v1 [stat.ME] 24 Oct 2022

Simoneau et al. (2020). Among the previous literature, the Bayesian machine learning (BML) method developed

by Murray et al. Murray et al. (2018) innovatively bridges the gap between Bayesian inferences and dynamic

programming methods from machine learning. A key advantage to a Bayesian approach to estimation is the

quantiﬁcation of uncertainty in decision making through the resulting posterior distribution. A second beneﬁt

that arises speciﬁcally in the BML approach is the highly ﬂexible estimation that is employed, which minimizes

the risk of estimation errors due to model mis-speciﬁcation.

However, the BML method has not yet been adapted to censored outcomes, which is one of the more common

types of outcomes in controlling chronic diseases. Motivated by the study of optimal therapeutic choices to

prevent and treat GVHD, in this paper, we extend this approach to censored outcomes under the accelerated

failure time (AFT) model framework. By modifying the data augmentation step in the BML method, the censored

observations can be imputed in an informative way so that the observed censoring time is well utilized. This

extension is illustrated using Bayesian additive regression trees (BART). We also implemented the proposed

AFT-BML approach by developing an R function that utilizes standard BART survival software directly without

needing to modify existing (complex) BART software directly. Parallel computing was used to speed up the

computational calculations. This R wrapper function can be easily adjusted to accommodate other types of

Bayesian machine learning methods.

This paper is organized as follows. In Section 2, we brieﬂy review related methods, algorithms, and describe the

extended AFT-BML approach for optimizing DTRs. Section 3 presents simulation studies to demonstrate our

model performance by comparing to it to estimation using Q-learning. An analysis of our motivating dataset

of patients diagnosed with AML is given in Section 4. Finally, in Section 5 we discuss the advantages and

disadvantages of our approach and provide some suggestions for future work.

2 Methods

2.1 Dynamic Treatment Regimes

A dynamic treatment regime (DTR) is a series of decision rules that assign treatment based on the patient’s

characteristics and history at each stage. Without loss of generality, we focus on a two-stage intervention problem.

Furthermore, we start by describing DTRs in the non-survival setting, before proceeding to the censored survival

setting later. Following Murray’s notation Murray et al. (2018), let o1∈ O1be the covariates observed before

Stage 1, and a1∈ A1be the action taken at Stage 1. Denote y1as the payoﬀ observed after Stage 1 and before

Stage 2. {o2, a2, y2}are deﬁned similarly for Stage 2. The total payoﬀ is assumed to be y=y1+ηy2, where

ηis an indicator that the patient entered Stage 2. A general diagram to present the two-stage decision making

problem is

o1−→ a1−→ y1

if η=1

−→ o2−→ a2−→ y2.

Denote the accumulated history before Stage 2 treatment as ¯o2= (o1, a1, y1, o2)∈¯

O2. In this setting, a DTR

consists of two decision rules, one for each stage,

d1:O1→ A1and d2:¯

O2→ A2.

Optimizing the two-stage DTR (d1, d2) is equivalent to ﬁnding the decision rules d1and d2that maximize the

expected total payoﬀ E(y). That is

dopt

2(¯o2) = arg sup

a2∈A2

E(y2|¯o2, a2)∀¯o2∈¯

O2,

dopt

1(o1) = arg sup

a1∈A1

E(y|o1, a1, dopt

2)∀o1∈ O1.

2.2 Bayesian Machine Learning for DTRs

Murray et al. Murray et al. (2018) described a new approach called Bayesian Machine Learning (BML) to optimize

DTRs; the method requires ﬁtting a series of Bayesian regression models in reverse sequential order under the

approximate dynamic programming framework. The authors use the potential outcomes notation to describe

their approach, where y(a1, a2) denotes the payoﬀ observed when action a1is taken at Stage 1 and action a2is

taken at Stage 2, and other potential outcomes (y2(a1, a2), y1(a1), and o2(a1)) are similarly deﬁned. Assuming

the potential outcomes are consistent, then the observed outcome corresponds to the potential outcome for the

action actually followed, e.g. y1(a1) = y1,o2(a1) = o2,y2(a1, a2) = y2, and y(a1, a2) = y. The approach can

be summarized as follows. The Stage 2 regression model for y2(a1, a2) is estimated ﬁrst, using the observed

covariates (¯o2, a2) and the observed response variable y2. Based on the estimated Stage 2 model, the estimated

optimal mapping from ¯

O2to A2, simply denoted as dopt

2, can be identiﬁed, as well as the relevant potential payoﬀ

at Stage 2, denoted as y2(a1, dopt

2). With dopt

2and potential payoﬀ y2(a1, dopt

2), the response variable for Stage

1 can be constructed as y(a1, dopt

2); this (potential) outcome is composed of the observed Stage 1 payoﬀ y1and

the potential Stage 2 payoﬀ y2(a1, dopt

2). Note that if the observed outcome a2matches the optimal outcome

according to dopt

2, then the potential payoﬀ is simply the observed payoﬀ y=y1+ηy2. Otherwise, the potential

outcome is unobserved and must be imputed (in this BML method, actually sampled from the posterior predictive

distribution as described further below). Given imputed values, the Stage 1 regression model for y(a1, dopt

2) then

can be estimated with observed covariates (o1, a1) to identify dopt

1. This type of backward induction strategy is

used in several DTR estimation methods, including g-estimation, Q-learning, dynamic weighted ordinary least

squares Robins (2004); Moodie et al. (2007); Nahum-Shani et al. (2012); Goldberg and Kosorok (2012); Simoneau

et al. (2020).

Estimation of the terminal stage regression model is simply a typical model of outcome by predictors ﬁt using

standard Bayesian methods. The estimation of the nonterminal stage models, on the other hand, is not easily

done with standard Bayesian software because of the counterfactual or potential payoﬀ under the unobserved

optimal action at each subsequent stage, which contributes to the outcome at the current stage. To address this

problem, Murray et al. Murray et al. (2018) developed a backward induction Gibbs (BIG) sampler to implement

the proposed BML approach in practice. It consists of three steps, repeated until convergence, using ˆ above

random variables to indicate sampled values in an MCMC algorithm:

Step 1 Draw a posterior sample of parameters θ2in the Stage 2 model and set the optimal action ˆaopt

i2=

dopt

2(¯oi2;θ2), i= 1, . . . , n.

Step 2 Compare the observed ai2and the optimal ˆaopt

i2. For i= 1, . . . , n, if ai2= ˆaopt

i2, then set ˆyopt

i2=yi2; else,

sample ˆyopt

i2from the posterior predictive distribution of y2(ai1,ˆaopt

i2).

Step 3 Draw a posterior sample of parameters θ1in the Stage 1 model using outcome yi1+ηiˆyopt

i2.

2.3 AFT BART

Bayesian additive regression trees (BART) form a Bayesian nonparametric regression model developed by Chip-

man et al. Chipman et al. (2010), which is an ensemble of trees. The accelerated failure time BART Bonato et al.

(2011) is an extension of the approach to accommodate censored outcomes assuming the event time follows a

log normal distribution. Let tibe the event time, cibe the censoring time for individual i. Then, the observed

survival time is si= min(ti, ci), and the event indicator is δi=I(ti< ci). Denote by xi= (xi1, . . . , xip) the

p-dimensional vector of predictors. The relationship between tiand xiis expressed as

log ti=µ+f(xi) + εi, εi

iid

∼N(0, σ2)

fprior

∼BART, σ2prior

∼νλχ−2(ν),

where f(xi) is a sum of mregression trees f(xi)≡Pm

j=1 g(xi;Tj,Mj) with Tjdenoting a binary tree with a

set of internal nodes and terminal nodes, and Mj={µj1, . . . , µjbj}denoting the set of parameter values on the

terminal nodes of tree Tj. Full details of the BART model, including prior distributions and MCMC sampling

algorithm, can be found in Chipman et al. (2010). Since the tiof censored observations are not observable,

an extra data augmentation step to impute tiis needed in each iteration when drawing Markov chain Monte

Carlo (MCMC) posterior samples with Gibbs sampling. In particular, the unobserved event times are randomly

sampled from a truncated normal distribution as

log ti|si, δi= 0, f(xi), σ2∼N(µ+f(xi), σ2)×I(ti> si).

After data augmentation, the complete log event times are treated as continuous outcomes and the standard

BART MCMC draws can be applied.

The AFT BART model with a log normal survival distribution is implemented within the BART R package

Sparapani et al. (2021); additional details are found in the Appendix B.

2.4 Proposed AFT-BML algorithm

Since the BML approach by Murray et al. Murray et al. (2018) is not directly applicable to censored observations,

we extended it by modifying the BIG sampler so that censoring can be accommodated. Here we are interested

in the time to an event (such as death) from the start of Stage 1. The Stage 2 treatment decision initiates

at an intermediate event such as disease progression. This eﬀectively separates the payoﬀ or event time into

two components: the time to the earliest of the event of interest and the intermediate event triggering Stage 2

(t1), and if the patient enters Stage 2 (η= 1), the time from the start of Stage 2 to the event of interest (t2).

Observed data accounting for censoring and entry to Stage 2 are denoted (s1, δ1) for Stage 1 and (s2, δ2) for

Stage 2. Continuing with the potential outcomes notation, let t(a1, a2) denote the time to the event of interest

when action a1is taken at Stage 1 and action a2is taken at Stage 2. Similarly, let t2(a1, a2) denote the event

time in Stage 2 (starting at the entry to Stage 2) under actions (a1, a2). Finally, potential time t1(a1) is the

time in Stage 1 until the ﬁrst of the event of interest or entry to Stage 2. Corresponding payoﬀs on the log time

scale are denoted y(a1, a2) = log t(a1, a2), y2(a1, a2) = log t2(a1, a2), and y1(a1) = log t1(a1). Under consistency,

the observed outcome corresponds to the potential outcome for the action actually followed, e.g. t1(a1) = t1,

t2(a1, a2) = t2, and t(a1, a2) = t, and similarly for the y= log tversions.

Murray et al. Murray et al. (2018) recommended using Bayesian nonparametric regression models in Stages 1

and 2 for robustness. Here we illustrated our approach with AFT BART models in each stage. As before, we use

ˆabove random variables to indicate sampled values in an MCMC algorithm. The Stage 2 regression model for

t2(a1, a2) is estimated ﬁrst, using the observed covariates (¯o2, a2) and the observed time to event data (s2, δ2),

according to the AFT BART model

log ti2=µ2+f2(¯o2, a2) + εi, εi

iid

∼N(0, σ2

prior

∼BART, σ2

prior

∼νλχ−2(ν),

We can run the Stage 2 BART model until convergence, draw 1000 posterior samples from the model, and then

sample the optimal Stage 2 treatment rule for each MCMC sample according to

dopt

2(¯o2) = arg sup

a2∈A2

E(log t2|¯o2, a2) = arg sup

a2∈A2

f2(¯o2, a2),

We also can implement a sampling procedure to generate potential outcomes for the total time from Stage 1

assuming optimal Stage 2 treatment as ˆ

t(a1,ˆ

dopt

2) = t1+ˆ

t2(a1,ˆ

dopt

2). Some of the potential outcomes resulting

from this procedure may still be censored, and we denote the possibly censored version of these potential outcomes

as (ˆs, ˆ

δ). This event time data are then modeled as a function of covariates (o1, a1) using another AFT BART

model given by

log ˆ

ti=µ1+f1(o1, a1) + εi, εi

iid

∼N(0, σ2

prior

∼BART, σ2

prior

∼νλχ−2(ν),

For each sampled potential outcomes dataset, we run the Stage 1 AFT BART model until convergence, and

then draw one posterior sample from each ﬁtted BART model to determine a sample from the posterior of dopt

according to

dopt

1(o1) = arg sup

a1∈A1

E(log ˆ

t|o1, a1, dopt

2) = arg sup

a1∈A1

f1(o1, a1).

Details of the AFT-BML algorithm are as follows:

Step 1 Run BART on the Stage 2 data until convergence and draw 1000 samples from the posterior distribution

of f2and σ2

2. This implicitly involves the following two steps which are performed automatically by the

BART package.

Step 1a Draw unobserved event time ti2for censored subjects (δi2= 0) who reached Stage 2 (ηi= 1) using

a truncated normal distribution,

log ti2|si2, δi2= 0, f2, σ2

2∼N(µ2+f2(¯oi2, ai2), σ2

2)×I(ti2≥si2).

Step 1b Update f2and σ2

2with complete (uncensored) Stage 2 data.

Step 2 Draw 1000 samples of (ˆaopt

i2,ˆ

topt

i2) for each subject and use each sample to create an augmented dataset for

Stage 1 analysis, as follows:

Step 2a The optimal action at Stage 2 is chosen as ˆaopt

i2= arga2max f2(¯oi2, a2).

Step 2b If ai2= ˆaopt

i2and the observation is an event (δi2= 1), ˆ

topt

i2=ti2; if ai2= ˆaopt

i2and the observation is

censored (δi2= 0), log ˆ

topt

i2∼N(µ2+f2(¯oi2,ˆaopt

i2), σ2

2)×I(ˆ

topt

i2≥si2).

Step 2c If ai26= ˆaopt

i2, draw log ˆ

topt

i2for the counterfactual action ˆaopt

i2from N(µ2+f2(¯oi2,ˆaopt

i2), σ2

2).

Step 2d For those who reached Stage 2 (ηi= 1), set the observed data for the Stage 1 model as the potential

Stage 1 event time ˆ

ti, e.g. ˆsi=ˆ

ti=ti1+ˆ

topt

i2, and set ˆ

δi= 1. For those who did not reach Stage 2,

set the observed data for the Stage 1 model as ˆsi=ti1and ˆ

δi=δi1.

Step 3 Run BART on each of the augmented Stage 1 data sets until convergence and draw 1 sample from the

posterior distribution of f1and σ2

1for each augmented Stage 1 data set. As above, this requires the

following two steps which are performed automatically by the BART package:

Step 3a Draw unobserved event time ˆ

tifor censored subjects (ˆ

δi= 0) at Stage 1,

log ˆ

ti|ˆsi,ˆ

δi= 0, f1, σ2

1∼N(µ1+f1(oi1, ai1), σ2

1)×I(ˆ

ti≥ˆsi).

Step 3b Update f1and σ2

1with complete (uncensored) data at Stage 1.

Step 4 From each augmented dataset and the corresponding sampled f1and σ2

1, draw one sample ˆaopt

i1for each

subject, based on

ˆaopt

i1= arg max

ai1

f1(oi1, ai1).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DynamicTreatmentRegimesusingBayesianAdditiveRegressionTreesforCensoredOutcomesXiaoLi,BrentRLogan,SMFerdousHossain,andEricaEMMoodieOctober25,2022AbstractToachievethegoalofprovidingthebestpossiblecaretoeachpatient,physiciansneedtocustomizetreat-mentsforpatientswiththesamediagnosis,especiallywhentreati...

展开>> 收起<<

Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes Xiao Li Brent R Logan S M Ferdous Hossain and Erica E M Moodie.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes Xiao Li Brent R Logan S M Ferdous Hossain and Erica E M Moodie

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: