Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests

2025-05-06 0 0 5.94MB 50 页 10玖币
侵权投诉
Heterogeneous Treatment Effect Estimation for
Observational Data using Model-based Forests
Susanne Dandl
LMU München, MCML
Andeas Bender
LMU München, MCML
Torsten Hothorn
Universität Zürich
Abstract
The estimation of heterogeneous treatment effects (HTEs) has attracted considerable
interest in many disciplines, most prominently in medicine and economics. Contempo-
rary research has so far primarily focused on continuous and binary responses where HTEs
are traditionally estimated by a linear model, which allows the estimation of constant or
heterogeneous effects even under certain model misspecifications. More complex models
for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate
the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint
estimation of treatment and prognostic effects. Model-based forests allow simultaneous
estimation of covariate-dependent treatment and prognostic effects, but only for random-
ized trials. In this paper, we propose modifications to model-based forests to address the
confounding issue in observational data. In particular, we evaluate an orthogonalization
strategy originally proposed by Robinson (1988, Econometrica) in the context of model-
based forests targeting HTE estimation in generalized linear models and transformation
models. We found that this strategy reduces confounding effects in a simulated study with
various outcome distributions. We demonstrate the practical aspects of HTE estimation
for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect
of Riluzole on the progress of Amyotrophic Lateral Sclerosis.
Keywords: Heterogeneous treatment effects, personalized medicine, random forest, observa-
tional data, censored survival data, generalized linear model, transformation model.
1. Introduction
Over the past years, there has been emerging interest in methods to estimate heterogeneous
treatment effects (HTEs) in various application fields. In healthcare, HTE estimation can
be understood as a core principle driving personalized medicine. As opposed to average
treatment effects, which assume a constant effect of a treatment on an outcome for the whole
population, HTEs account for the heterogeneity in the effect for subgroups or individuals
based on their characteristics. Most research on HTE estimation has mainly focused on
continuous and binary response variables. These methods have typically built upon Rubin’s
potential outcomes framework, a statistical approach to formulating and inferring causal
effects in various designs (Rubin 1974,2005).
Traditionally, statistical models were used to estimate the treatment effect, but machine learn-
ing methods have been more and more adapted for these tasks over the past decade. Machine
learning models rely on weaker assumptions and can automatically learn complex relation-
arXiv:2210.02836v1 [stat.ME] 6 Oct 2022
2Forest-based HTE Estimation
ships such as higher order interaction effects, resulting in greater predictive performance in
a variety of applications. In the case of continuous or binary responses, prominent methods
to estimate HTEs are based on random forests (Foster, Taylor, and Ruberg 2011;Lu, Sadiq,
Feaster, and Ishwaran 2018;Athey, Tibshirani, and Wager 2019;Powers, Qian, Jung, Schuler,
Shah, Hastie, and Tibshirani 2018;Su, Peña, Liu, and Levine 2018;Li, Levine, and Fan 2022),
Bayesian additive regression trees (BART) (Hill 2011;Hu, Gu, Lopez, Ji, and Wisnivesky
2020), or neural networks (Shalit, Johansson, and Sontag 2017;Curth, Lee, and van der
Schaar 2021;Chapfuwa, Assaad, Zeng, Pencina, Carin, and Henao 2021). Künzel, Sekhon,
Bickel, and Yu (2019) proposed general frameworks – T-learners, S-learners, U-learners, and
X-learners – that base treatment effect estimates on arbitrary machine learning models. Cher-
nozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018) coined the term
double/debiased machine learning models, which uses machine learning models for nuisance
parameter estimations. The approach still relies on parametric models for estimating treat-
ment effects, but Nie and Wager (2021) derived so-called R-learners that allow for arbitrary
(nonparametric or semiparametric) models.
Beyond continuous or binary responses, research on machine learning methods for HTE esti-
mation have primarily focused on (right-censored) survival data. Methods have been proposed
based on Bayesian additive regression trees (BART) (Henderson, Louis, Rosner, and Varad-
han 2018), random forest-type methods (Cui, Kosorok, Sverdrup, Wager, and Ruoqing 2022;
Tabib and Larocque 2020), or deep learning approaches (Curth et al. 2021;Chapfuwa et al.
2021). Theoretically, any machine learning model for survival analysis – such as random sur-
vival forests (Ishwaran, Kogalur, Blackstone, and Lauer 2008) or a Cox regression-based deep
neural network (deepSurv) (Katzman, Shaham, Cloninger, Bates, Jiang, and Kluger 2018) –
can estimate HTEs (Hu, Ji, and Li 2021). These models can estimate survival or hazard func-
tions in both treatment groups separately; HTEs are then defined as the difference in derived
properties of the two functions, e.g., as differences in the median survival time. However, Hu
et al. (2021) found that methods specifically designed for HTE estimation, like the adapted
BART (Henderson et al. 2018), produce more reliable estimates.
In general, for a continuous or binary outcome Yconditional on treatment wand covariates
x, the conditional average treatment effect τ(x)(CATE) can be estimated from the model
E(Y|W=w, X=x) = µ(x) + τ(x)weven if the model is misspecified, e.g., when the
prognostic effect µ(x)cannot be fully estimated due to missing covariate information. Beyond
mean regression, stricter assumptions are necessary both for randomized and for observational
studies to estimate HTEs. For example, under a true Cox model with survivor function
exp(exp(h(t) + µ(x) + τw)) with log-cumulative baseline hazard h(t)at time tand log-
hazard ratio τ, the prognostic effect µ(x)must be specified correctly, even in a randomized
trial. Estimated marginal log-hazard ratios ˆτ– i.e., when the model is fitted under the
constraint µ(x)0– are shrunken towards zero if this constraint is unrealistic (Aalen, Cook,
and Røysland 2015). Naturally, this problem carries over to heterogeneous log-hazard ratios
τ(x).
Consequently, HTE estimation in more complex models requires the simultaneous estimation
of both the prognostic part µ(x)and the predictive HTE τ(x). Model-based forests have been
demonstrated to allow estimation of µ(x)and τ(x)in randomized trials (Seibold, Zeileis, and
Hothorn 2016,2018;Korepanova, Seibold, Steffen, and Hothorn 2020;Buri and Hothorn
2020;Fokkema, Smits, Zeileis, Hothorn, and Kelderman 2018;Hothorn and Zeileis 2021b).
In a nutshell, model-based forests combine the parametric modeling framework with random
Dandl, Bender, Hothorn 3
forests to estimate individual treatment effects (Seibold et al. 2018). By using generalized
linear models and transformation models, model-based forests can be adapted for survival
data (Seibold et al. 2016,2018;Korepanova et al. 2020), ordinal data (Buri and Hothorn
2020), or clustered data (Fokkema et al. 2018). A unique feature of model-based forests is the
simultaneous estimation of both treatment and prognostic effects in the same forest model.
In observational studies the treatment group assignment is not under control of the researcher
and confounding effects could bias the estimation of HTEs. In this work, we propose and
evaluate novel variants of model-based forests for HTE estimation in observational studies.
Adaptions of Robinson’s orthogonalization strategy for generalized linear models and trans-
formation models are discussed and implemented. We review key components of model-based
forests for HTE estimation in randomized trials in Section 2. In Section 3, we start introduc-
ing the orthogonalization approach by Robinson (1988), which is instrumental for achieving
robustness to confounding effects in the non-randomized situation. We motivate previous de-
velopments using linear models(Dandl, Hothorn, Seibold, Sverdrup, Wager, and Zeileis 2022)
and leverage adaptations to more complex models discussed by Gao and Hastie (2022) to
define novel model-based forest variants suitable for HTE in the observational setting. These
variants’ performances are empirically assessed in a simulation study with a range of outcome
distributions in Section 4. Finally, in Section 5presenting a re-analysis of the patient-specific
effect of Riluzole in patients with Amyotrophic Lateral Sclerosis (ALS), practical aspects of
model estimation and interpretation are discussed.
2. Review of model-based forests for randomized trials
We are interested in estimating HTEs based on i.i.d. observations (y, x, w), where y,xand
ware realizations of the outcome Y, covariates X X , and control vs. treatment indicator
W∈ {0,1}.Y(0) and Y(1) denote the potential outcomes under the two treatment conditions
W∈ {0,1}. Throughout this paper, we assume that Xincludes all relevant variables to
explain heterogeneity both in the treatment effect and the outcome Y, and that the base
model underlying model-based forests is correctly specified.
We review model-based forests for HTE estimation based on randomized trials as introduced
by Seibold et al. (2018) and Korepanova et al. (2020). Within this section, we only consider
settings where the treatment assignment is randomized and, therefore, follows a binomial
model W|X=xB(1, π(x)) with constant propensities π(x)π. We omit discussion
of the abstract framework underlying model-based forests and instead discuss the important
linear, generalized linear (Seibold et al. 2018), and transformation models (Korepanova et al.
2020) in detail.
2.1. Linear model
For a continuous outcome YRwith symmetric error distribution, a model-based forest
might be defined based on the model
(Y|X=x, W =w) = µ(x) + τ(x)w+φZ (1)
where the residuals are given by the error term φZ with E(Z|X, W ) = 0 and standard
deviation φ > 0(Dandl et al. 2022). We are mainly interested in estimating τ(x), the
treatment effect that depends on predictive variables in x. With model-based forests, however,
4Forest-based HTE Estimation
we also obtain an estimated value for the prognostic effect µ(x), which depends on prognostic
variables in x. A variable might be predictive and prognostic at the same time. We refer to
these situations as “overlays”.
Because we assume in this section that π(x)πapplies, WXholds. Consequently, τ(x)
can be interpreted as a CATE
τ(x) = CATE(x) = E(Y(1) Y(0) |X=x)(2)
on the absolute scale. To estimate (µ(x), τ (x))>the L2loss
`(µ(x), τ(x)) = 1
/2(Yµ(x)τ(x)w)2(3)
is minimized w.r.t. µand τusing an ensemble of trees. Inspired by recursive partitioning
techniques (Hothorn, Hornik, and Zeileis 2006;Zeileis, Hothorn, and Hornik 2008), split
variable and split point selection are separated. The split variable is the variable that has
the lowest p-value for the bivariate permutation tests for the H0-hypothesis that µand τare
constant and independent of any split variable. The cut-point is the point of the chosen split
variable at which the score functions
s(ˆµ, ˆτ) := (Yˆµˆτw)(1, w)>
in the two resultant subgroups differ the most; details are available in Appendix 2 of Seibold
et al. (2018).
Once BNtrees were fitted to subsamples of the training data, predictions for the treat-
ment effect for a new observation xare obtained via local maximum likelihood aggregation
(Hothorn, Lausen, Benner, and Radespiel-Tröger 2004;Meinshausen 2006;Lin and Jeon 2006;
Athey et al. 2019;Hothorn and Zeileis 2021b). First, for the i-th training sample, the fre-
quency αiwith which it falls in the same leaf as xover all Btrees is measured. The obtained
weighting vector (α1, ..., αn)is used as an input for minimizing
(ˆµ(x),ˆτ(x))>= arg min
µ,τ
n
X
i=1
αi(x)`i(µ, τ)(4)
where `idenotes the loss for the i-th sample. Model-based forests easily allow adaptions
if HTEs for an outcome variable Ythat is not well represented by equation (1) should be
estimated. In this case, model-based forests can build on generalized linear models or trans-
formation models in the recursive partitioning framework (Zeileis et al. 2008). As detailed in
the following sections, the loss function `in equation (3) changes from the squared error to
the negative (partial) log-likelihood of some appropriate model.
2.2. Generalized linear models
When the conditional outcome distribution is better described through a generalized linear
model
(Y|X=x, W =w)ExpFam(θ(µ(x) + τ(x)w), φ)
with parameter θdepending on the additive function µ(x) + τ(x)w, the conditional mean
g(E(Y|X=x, W =w)) = µ(x) + τ(x)w=: ηw(x)(5)
Dandl, Bender, Hothorn 5
is linear on the scale of a link function g. Thus, the interpretation of τ(x)as CATE (2)
generally no longer holds. Instead, the predictive effect is understood as the difference in
natural parameters (DINA (Gao and Hastie 2022))
τ(x) = DINA(x) = η1(x)η0(x).(6)
In contrast to the linear model case, HTEs τ(x)are now defined on relative scales, such as
odds ratios in binary logistic regression models or multiplicative mean effects in a Poisson or
Gaussian model with a log-link. The negative log-likelihood contribution of some observation
(Y, x, w)is
`(µ, τ, φ) = log(f(Y|θ(µ(x) + τ(x)w), φ))
with fas the conditional density of an exponential family distribution
f(Y|θ(µ(x) + τ(x)w), φ).
Model-based trees and forests (Zeileis et al. 2008;Seibold et al. 2016,2018) jointly estimate the
prognostic effect µ(x)and the predictive effect τ(x). The procedure simultaneously minimizes
the negative log-likelihood with respect to µ(x)and τ(x). In each node of the model-based
forest, µ,τ, and potentially φare estimated by minimizing
`(µ, τ, φ) = log(f(Y|θ(µ+τ w), φ)) (7)
and regressing the bivariate gradient
`(µ, τ, φ)
(µ, τ)ˆµ,ˆτ, ˆ
φ
on x. This means that one is not explicitly looking for changes in the scale parameter φ, but
this could be implemented by looking at the three-variate gradient
`(µ, τ, φ)
(µ, τ, φ)ˆµ,ˆτ, ˆ
φ
for example, in a heteroscedastic normal linear model
(Y|X=x, W =w) = µ(x) + τ(x)w+φ(x)Z.
After the tree fitting phase, a HTE is estimated with equation (4) with `(µ, τ, φ)of equation (7)
as the corresponding loss function.
Thus, model-based forests can be directly applied to estimate HTEs on relative scales for
binary outcomes (binary logistic or probit regression, for example), counts (Poisson or quasi-
Poisson regression), or continuous outcomes where a multiplicative effect is of interest (normal
model with log-link).
2.3. Transformation models
More complex responses like ordered categorical or time-to-event outcomes are not covered
by generalized linear models but can be analysed using transformation models; corresponding
摘要:

HeterogeneousTreatmentEectEstimationforObservationalDatausingModel-basedForestsSusanneDandlLMUMünchen,MCMLAndeasBenderLMUMünchen,MCMLTorstenHothornUniversitätZürichAbstractTheestimationofheterogeneoustreatmenteects(HTEs)hasattractedconsiderableinterestinmanydisciplines,mostprominentlyinmedicineand...

展开>> 收起<<
Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests.pdf

共50页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:50 页 大小:5.94MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 50
客服
关注