Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests

2025-05-06 0 0 5.94MB 50 页 10玖币

侵权投诉

Heterogeneous Treatment Eﬀect Estimation for

Observational Data using Model-based Forests

Susanne Dandl

LMU München, MCML

Andeas Bender

LMU München, MCML

Torsten Hothorn

Universität Zürich

Abstract

The estimation of heterogeneous treatment eﬀects (HTEs) has attracted considerable

interest in many disciplines, most prominently in medicine and economics. Contempo-

rary research has so far primarily focused on continuous and binary responses where HTEs

are traditionally estimated by a linear model, which allows the estimation of constant or

heterogeneous eﬀects even under certain model misspeciﬁcations. More complex models

for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate

the treatment eﬀect. Most importantly, the noncollapsibility issue necessitates the joint

estimation of treatment and prognostic eﬀects. Model-based forests allow simultaneous

estimation of covariate-dependent treatment and prognostic eﬀects, but only for random-

ized trials. In this paper, we propose modiﬁcations to model-based forests to address the

confounding issue in observational data. In particular, we evaluate an orthogonalization

strategy originally proposed by Robinson (1988, Econometrica) in the context of model-

based forests targeting HTE estimation in generalized linear models and transformation

models. We found that this strategy reduces confounding eﬀects in a simulated study with

various outcome distributions. We demonstrate the practical aspects of HTE estimation

for survival and ordinal outcomes by an assessment of the potentially heterogeneous eﬀect

of Riluzole on the progress of Amyotrophic Lateral Sclerosis.

Keywords: Heterogeneous treatment eﬀects, personalized medicine, random forest, observa-

tional data, censored survival data, generalized linear model, transformation model.

1. Introduction

Over the past years, there has been emerging interest in methods to estimate heterogeneous

treatment eﬀects (HTEs) in various application ﬁelds. In healthcare, HTE estimation can

be understood as a core principle driving personalized medicine. As opposed to average

treatment eﬀects, which assume a constant eﬀect of a treatment on an outcome for the whole

population, HTEs account for the heterogeneity in the eﬀect for subgroups or individuals

based on their characteristics. Most research on HTE estimation has mainly focused on

continuous and binary response variables. These methods have typically built upon Rubin’s

potential outcomes framework, a statistical approach to formulating and inferring causal

eﬀects in various designs (Rubin 1974,2005).

Traditionally, statistical models were used to estimate the treatment eﬀect, but machine learn-

ing methods have been more and more adapted for these tasks over the past decade. Machine

learning models rely on weaker assumptions and can automatically learn complex relation-

arXiv:2210.02836v1 [stat.ME] 6 Oct 2022

2Forest-based HTE Estimation

ships such as higher order interaction eﬀects, resulting in greater predictive performance in

a variety of applications. In the case of continuous or binary responses, prominent methods

to estimate HTEs are based on random forests (Foster, Taylor, and Ruberg 2011;Lu, Sadiq,

Feaster, and Ishwaran 2018;Athey, Tibshirani, and Wager 2019;Powers, Qian, Jung, Schuler,

Shah, Hastie, and Tibshirani 2018;Su, Peña, Liu, and Levine 2018;Li, Levine, and Fan 2022),

Bayesian additive regression trees (BART) (Hill 2011;Hu, Gu, Lopez, Ji, and Wisnivesky

2020), or neural networks (Shalit, Johansson, and Sontag 2017;Curth, Lee, and van der

Schaar 2021;Chapfuwa, Assaad, Zeng, Pencina, Carin, and Henao 2021). Künzel, Sekhon,

Bickel, and Yu (2019) proposed general frameworks – T-learners, S-learners, U-learners, and

X-learners – that base treatment eﬀect estimates on arbitrary machine learning models. Cher-

nozhukov, Chetverikov, Demirer, Duﬂo, Hansen, Newey, and Robins (2018) coined the term

double/debiased machine learning models, which uses machine learning models for nuisance

parameter estimations. The approach still relies on parametric models for estimating treat-

ment eﬀects, but Nie and Wager (2021) derived so-called R-learners that allow for arbitrary

(nonparametric or semiparametric) models.

Beyond continuous or binary responses, research on machine learning methods for HTE esti-

mation have primarily focused on (right-censored) survival data. Methods have been proposed

based on Bayesian additive regression trees (BART) (Henderson, Louis, Rosner, and Varad-

han 2018), random forest-type methods (Cui, Kosorok, Sverdrup, Wager, and Ruoqing 2022;

Tabib and Larocque 2020), or deep learning approaches (Curth et al. 2021;Chapfuwa et al.

2021). Theoretically, any machine learning model for survival analysis – such as random sur-

vival forests (Ishwaran, Kogalur, Blackstone, and Lauer 2008) or a Cox regression-based deep

neural network (deepSurv) (Katzman, Shaham, Cloninger, Bates, Jiang, and Kluger 2018) –

can estimate HTEs (Hu, Ji, and Li 2021). These models can estimate survival or hazard func-

tions in both treatment groups separately; HTEs are then deﬁned as the diﬀerence in derived

properties of the two functions, e.g., as diﬀerences in the median survival time. However, Hu

et al. (2021) found that methods speciﬁcally designed for HTE estimation, like the adapted

BART (Henderson et al. 2018), produce more reliable estimates.

In general, for a continuous or binary outcome Yconditional on treatment wand covariates

x, the conditional average treatment eﬀect τ(x)(CATE) can be estimated from the model

E(Y|W=w, X=x) = µ(x) + τ(x)weven if the model is misspeciﬁed, e.g., when the

prognostic eﬀect µ(x)cannot be fully estimated due to missing covariate information. Beyond

mean regression, stricter assumptions are necessary both for randomized and for observational

studies to estimate HTEs. For example, under a true Cox model with survivor function

exp(−exp(h(t) + µ(x) + τw)) with log-cumulative baseline hazard h(t)at time tand log-

hazard ratio τ, the prognostic eﬀect µ(x)must be speciﬁed correctly, even in a randomized

trial. Estimated marginal log-hazard ratios ˆτ– i.e., when the model is ﬁtted under the

constraint µ(x)≡0– are shrunken towards zero if this constraint is unrealistic (Aalen, Cook,

and Røysland 2015). Naturally, this problem carries over to heterogeneous log-hazard ratios

τ(x).

Consequently, HTE estimation in more complex models requires the simultaneous estimation

of both the prognostic part µ(x)and the predictive HTE τ(x). Model-based forests have been

demonstrated to allow estimation of µ(x)and τ(x)in randomized trials (Seibold, Zeileis, and

Hothorn 2016,2018;Korepanova, Seibold, Steﬀen, and Hothorn 2020;Buri and Hothorn

2020;Fokkema, Smits, Zeileis, Hothorn, and Kelderman 2018;Hothorn and Zeileis 2021b).

In a nutshell, model-based forests combine the parametric modeling framework with random

Dandl, Bender, Hothorn 3

forests to estimate individual treatment eﬀects (Seibold et al. 2018). By using generalized

linear models and transformation models, model-based forests can be adapted for survival

data (Seibold et al. 2016,2018;Korepanova et al. 2020), ordinal data (Buri and Hothorn

2020), or clustered data (Fokkema et al. 2018). A unique feature of model-based forests is the

simultaneous estimation of both treatment and prognostic eﬀects in the same forest model.

In observational studies the treatment group assignment is not under control of the researcher

and confounding eﬀects could bias the estimation of HTEs. In this work, we propose and

evaluate novel variants of model-based forests for HTE estimation in observational studies.

Adaptions of Robinson’s orthogonalization strategy for generalized linear models and trans-

formation models are discussed and implemented. We review key components of model-based

forests for HTE estimation in randomized trials in Section 2. In Section 3, we start introduc-

ing the orthogonalization approach by Robinson (1988), which is instrumental for achieving

robustness to confounding eﬀects in the non-randomized situation. We motivate previous de-

velopments using linear models(Dandl, Hothorn, Seibold, Sverdrup, Wager, and Zeileis 2022)

and leverage adaptations to more complex models discussed by Gao and Hastie (2022) to

deﬁne novel model-based forest variants suitable for HTE in the observational setting. These

variants’ performances are empirically assessed in a simulation study with a range of outcome

distributions in Section 4. Finally, in Section 5presenting a re-analysis of the patient-speciﬁc

eﬀect of Riluzole in patients with Amyotrophic Lateral Sclerosis (ALS), practical aspects of

model estimation and interpretation are discussed.

2. Review of model-based forests for randomized trials

We are interested in estimating HTEs based on i.i.d. observations (y, x, w), where y,xand

ware realizations of the outcome Y, covariates X∈ X , and control vs. treatment indicator

W∈ {0,1}.Y(0) and Y(1) denote the potential outcomes under the two treatment conditions

W∈ {0,1}. Throughout this paper, we assume that Xincludes all relevant variables to

explain heterogeneity both in the treatment eﬀect and the outcome Y, and that the base

model underlying model-based forests is correctly speciﬁed.

We review model-based forests for HTE estimation based on randomized trials as introduced

by Seibold et al. (2018) and Korepanova et al. (2020). Within this section, we only consider

settings where the treatment assignment is randomized and, therefore, follows a binomial

model W|X=x∼B(1, π(x)) with constant propensities π(x)≡π. We omit discussion

of the abstract framework underlying model-based forests and instead discuss the important

linear, generalized linear (Seibold et al. 2018), and transformation models (Korepanova et al.

2020) in detail.

2.1. Linear model

For a continuous outcome Y∈Rwith symmetric error distribution, a model-based forest

might be deﬁned based on the model

(Y|X=x, W =w) = µ(x) + τ(x)w+φZ (1)

where the residuals are given by the error term φZ with E(Z|X, W ) = 0 and standard

deviation φ > 0(Dandl et al. 2022). We are mainly interested in estimating τ(x), the

treatment eﬀect that depends on predictive variables in x. With model-based forests, however,

4Forest-based HTE Estimation

we also obtain an estimated value for the prognostic eﬀect µ(x), which depends on prognostic

variables in x. A variable might be predictive and prognostic at the same time. We refer to

these situations as “overlays”.

Because we assume in this section that π(x)≡πapplies, W⊥⊥ Xholds. Consequently, τ(x)

can be interpreted as a CATE

τ(x) = CATE(x) = E(Y(1) −Y(0) |X=x)(2)

on the absolute scale. To estimate (µ(x), τ (x))>the L2loss

`(µ(x), τ(x)) = 1

/2(Y−µ(x)−τ(x)w)2(3)

is minimized w.r.t. µand τusing an ensemble of trees. Inspired by recursive partitioning

techniques (Hothorn, Hornik, and Zeileis 2006;Zeileis, Hothorn, and Hornik 2008), split

variable and split point selection are separated. The split variable is the variable that has

the lowest p-value for the bivariate permutation tests for the H0-hypothesis that µand τare

constant and independent of any split variable. The cut-point is the point of the chosen split

variable at which the score functions

s(ˆµ, ˆτ) := (Y−ˆµ−ˆτw)(1, w)>

in the two resultant subgroups diﬀer the most; details are available in Appendix 2 of Seibold

et al. (2018).

Once B∈Ntrees were ﬁtted to subsamples of the training data, predictions for the treat-

ment eﬀect for a new observation xare obtained via local maximum likelihood aggregation

(Hothorn, Lausen, Benner, and Radespiel-Tröger 2004;Meinshausen 2006;Lin and Jeon 2006;

Athey et al. 2019;Hothorn and Zeileis 2021b). First, for the i-th training sample, the fre-

quency αiwith which it falls in the same leaf as xover all Btrees is measured. The obtained

weighting vector (α1, ..., αn)is used as an input for minimizing

(ˆµ(x),ˆτ(x))>= arg min

µ,τ

i=1

αi(x)`i(µ, τ)(4)

where `idenotes the loss for the i-th sample. Model-based forests easily allow adaptions

if HTEs for an outcome variable Ythat is not well represented by equation (1) should be

estimated. In this case, model-based forests can build on generalized linear models or trans-

formation models in the recursive partitioning framework (Zeileis et al. 2008). As detailed in

the following sections, the loss function `in equation (3) changes from the squared error to

the negative (partial) log-likelihood of some appropriate model.

2.2. Generalized linear models

When the conditional outcome distribution is better described through a generalized linear

model

(Y|X=x, W =w)∼ExpFam(θ(µ(x) + τ(x)w), φ)

with parameter θdepending on the additive function µ(x) + τ(x)w, the conditional mean

g(E(Y|X=x, W =w)) = µ(x) + τ(x)w=: ηw(x)(5)

Dandl, Bender, Hothorn 5

is linear on the scale of a link function g. Thus, the interpretation of τ(x)as CATE (2)

generally no longer holds. Instead, the predictive eﬀect is understood as the diﬀerence in

natural parameters (DINA (Gao and Hastie 2022))

τ(x) = DINA(x) = η1(x)−η0(x).(6)

In contrast to the linear model case, HTEs τ(x)are now deﬁned on relative scales, such as

odds ratios in binary logistic regression models or multiplicative mean eﬀects in a Poisson or

Gaussian model with a log-link. The negative log-likelihood contribution of some observation

(Y, x, w)is

`(µ, τ, φ) = −log(f(Y|θ(µ(x) + τ(x)w), φ))

with fas the conditional density of an exponential family distribution

f(Y|θ(µ(x) + τ(x)w), φ).

Model-based trees and forests (Zeileis et al. 2008;Seibold et al. 2016,2018) jointly estimate the

prognostic eﬀect µ(x)and the predictive eﬀect τ(x). The procedure simultaneously minimizes

the negative log-likelihood with respect to µ(x)and τ(x). In each node of the model-based

forest, µ,τ, and potentially φare estimated by minimizing

`(µ, τ, φ) = −log(f(Y|θ(µ+τ w), φ)) (7)

and regressing the bivariate gradient

∂`(µ, τ, φ)

∂(µ, τ)ˆµ,ˆτ, ˆ

on x. This means that one is not explicitly looking for changes in the scale parameter φ, but

this could be implemented by looking at the three-variate gradient

∂`(µ, τ, φ)

∂(µ, τ, φ)ˆµ,ˆτ, ˆ

for example, in a heteroscedastic normal linear model

(Y|X=x, W =w) = µ(x) + τ(x)w+φ(x)Z.

After the tree ﬁtting phase, a HTE is estimated with equation (4) with `(µ, τ, φ)of equation (7)

as the corresponding loss function.

Thus, model-based forests can be directly applied to estimate HTEs on relative scales for

binary outcomes (binary logistic or probit regression, for example), counts (Poisson or quasi-

Poisson regression), or continuous outcomes where a multiplicative eﬀect is of interest (normal

model with log-link).

2.3. Transformation models

More complex responses like ordered categorical or time-to-event outcomes are not covered

by generalized linear models but can be analysed using transformation models; corresponding

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HeterogeneousTreatmentEectEstimationforObservationalDatausingModel-basedForestsSusanneDandlLMUMünchen,MCMLAndeasBenderLMUMünchen,MCMLTorstenHothornUniversitätZürichAbstractTheestimationofheterogeneoustreatmenteects(HTEs)hasattractedconsiderableinterestinmanydisciplines,mostprominentlyinmedicineand...

展开>> 收起<<

Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests.pdf

共50页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: