2.2 Bayesian Machine Learning for DTRs
Murray et al. Murray et al. (2018) described a new approach called Bayesian Machine Learning (BML) to optimize
DTRs; the method requires fitting a series of Bayesian regression models in reverse sequential order under the
approximate dynamic programming framework. The authors use the potential outcomes notation to describe
their approach, where y(a1, a2) denotes the payoff observed when action a1is taken at Stage 1 and action a2is
taken at Stage 2, and other potential outcomes (y2(a1, a2), y1(a1), and o2(a1)) are similarly defined. Assuming
the potential outcomes are consistent, then the observed outcome corresponds to the potential outcome for the
action actually followed, e.g. y1(a1) = y1,o2(a1) = o2,y2(a1, a2) = y2, and y(a1, a2) = y. The approach can
be summarized as follows. The Stage 2 regression model for y2(a1, a2) is estimated first, using the observed
covariates (¯o2, a2) and the observed response variable y2. Based on the estimated Stage 2 model, the estimated
optimal mapping from ¯
O2to A2, simply denoted as dopt
2, can be identified, as well as the relevant potential payoff
at Stage 2, denoted as y2(a1, dopt
2). With dopt
2and potential payoff y2(a1, dopt
2), the response variable for Stage
1 can be constructed as y(a1, dopt
2); this (potential) outcome is composed of the observed Stage 1 payoff y1and
the potential Stage 2 payoff y2(a1, dopt
2). Note that if the observed outcome a2matches the optimal outcome
according to dopt
2, then the potential payoff is simply the observed payoff y=y1+ηy2. Otherwise, the potential
outcome is unobserved and must be imputed (in this BML method, actually sampled from the posterior predictive
distribution as described further below). Given imputed values, the Stage 1 regression model for y(a1, dopt
2) then
can be estimated with observed covariates (o1, a1) to identify dopt
1. This type of backward induction strategy is
used in several DTR estimation methods, including g-estimation, Q-learning, dynamic weighted ordinary least
squares Robins (2004); Moodie et al. (2007); Nahum-Shani et al. (2012); Goldberg and Kosorok (2012); Simoneau
et al. (2020).
Estimation of the terminal stage regression model is simply a typical model of outcome by predictors fit using
standard Bayesian methods. The estimation of the nonterminal stage models, on the other hand, is not easily
done with standard Bayesian software because of the counterfactual or potential payoff under the unobserved
optimal action at each subsequent stage, which contributes to the outcome at the current stage. To address this
problem, Murray et al. Murray et al. (2018) developed a backward induction Gibbs (BIG) sampler to implement
the proposed BML approach in practice. It consists of three steps, repeated until convergence, using ˆ above
random variables to indicate sampled values in an MCMC algorithm:
Step 1 Draw a posterior sample of parameters θ2in the Stage 2 model and set the optimal action ˆaopt
i2=
ˆ
dopt
2(¯oi2;θ2), i= 1, . . . , n.
Step 2 Compare the observed ai2and the optimal ˆaopt
i2. For i= 1, . . . , n, if ai2= ˆaopt
i2, then set ˆyopt
i2=yi2; else,
sample ˆyopt
i2from the posterior predictive distribution of y2(ai1,ˆaopt
i2).
Step 3 Draw a posterior sample of parameters θ1in the Stage 1 model using outcome yi1+ηiˆyopt
i2.
2.3 AFT BART
Bayesian additive regression trees (BART) form a Bayesian nonparametric regression model developed by Chip-
man et al. Chipman et al. (2010), which is an ensemble of trees. The accelerated failure time BART Bonato et al.
(2011) is an extension of the approach to accommodate censored outcomes assuming the event time follows a
log normal distribution. Let tibe the event time, cibe the censoring time for individual i. Then, the observed
survival time is si= min(ti, ci), and the event indicator is δi=I(ti< ci). Denote by xi= (xi1, . . . , xip) the
p-dimensional vector of predictors. The relationship between tiand xiis expressed as
log ti=µ+f(xi) + εi, εi
iid
∼N(0, σ2)
fprior
∼BART, σ2prior
∼νλχ−2(ν),
3