Sensitivity Analysis for Marginal Structural Models Matteo Bonvini Edward H. Kennedy Val erie Ventura and Larry Wasserman Carnegie Mellon University Pittsburgh USA.

2025-05-03 0 0 634.86KB 49 页 10玖币

侵权投诉

Sensitivity Analysis for Marginal Structural Models

Matteo Bonvini, Edward H. Kennedy, Val ´

erie Ventura and Larry Wasserman

Carnegie Mellon University, Pittsburgh, USA.

Summary. We introduce several methods for assessing sensitivity to unmeasured confounding in

marginal structural models; importantly we allow treatments to be discrete or continuous, static or

time-varying. We consider three sensitivity models: a propensity-based model, an outcome-based

model, and a subset confounding model, in which only a fraction of the population is subject to un-

measured confounding. In each case we develop efﬁcient estimators and conﬁdence intervals for

bounds on the causal parameters.

Keywords: Causal inference, sensitivity analysis, marginal structural models

1. Introduction

Marginal structural models (MSMs) [Robins, 1998,Robins et al., 2000,Robins, 2000] are a class

of semiparametric model commonly used for causal inference. As is typical in causal inference, the

parameters of the model are only identiﬁed under an assumption of no unmeasured confounding.

Thus, it is important to quantify how sensitive the inferences are to this assumption. Most existing

sensitivity analysis methods deal with binary point treatments. In contrast, in this paper we

develop tools for assessing sensitivity for MSMs with both continuous (non-binary) and time-

varying treatments.

For simplicity, consider the static treatment setting ﬁrst. Extensions to time-varying treatments

are described in Section 6. Suppose we have niid observations (Z1, . . . , Zn), with Zi= (Xi, Ai, Yi)

from a distribution P, where Y∈Ris the outcome of interest, A∈Ris a treatment (or exposure)

and X∈Rdis a vector of confounding variables. Deﬁne the collection of counterfactual random

variables (also called potential outcomes) {Y(a) : a∈R}, where Y(a) denotes the value that Y

would have if Awere set to a. The usual assumptions in causal inference are:

(A1) No interference: if A=athen Y=Y(a), meaning that a subject’s potential outcomes only

depend on their own treatment.

(A2) Overlap: π(a|x)>0 for all xand a, where π(a|x) is the density of Agiven X=x(the

propensity score). Overlap guarantees that all subjects have some chance of receiving each

treatment level.

(A3) No unmeasured confounding: the counterfactuals {Y(a) : a∈R}are independent of A

given the observed covariates X. This assumption means that the treatment is as good as

arXiv:2210.04681v2 [stat.ME] 11 Oct 2022

randomized within levels of the measured covariates; in other words, there are no unmeasured

variables Uthat aﬀect both Aand Y.

Under these assumptions, the causal mean E{Y(a)}is identiﬁed and equal to

ψ(a)≡Zµ(x, a)dP(x),(1)

where µ(x, a) = E[Y|X=x, A =a] is the outcome regression (causal parameters other than

E{Y(a)}, e.g., cumulative distribution functions, are identiﬁed similarly). Equation (1) is a special

case of the g-formula [Robins, 1986].

A marginal structural model (MSM) is a semiparametric model assuming ψ(a) = g(a;β)

[Robins, 1998,Robins et al., 2000,Robins, 2000]. The MSM provides an interpretable model

for the treatment eﬀect and βcan be estimated using simple estimating equations. The model is

semiparametric in the sense that it leaves the data generating distribution unspeciﬁed except for the

restriction that Rµ(x, a)dP(x) = g(a;β). If gis mis-speciﬁed, one can regard g(a;β) as an approx-

imation to ψ(a), in which case one estimates the value β∗that minimizes R(ψ(a)−g(a;β))2ω(a)da,

where ωis a user provided weight function [Neugebauer and van der Laan, 2007].

In practice, there are often unmeasured confounders Uso that assumption (A3) fails. This

is especially true for observational studies where treatment is not under investigators’ control,

but it can also occur in experiments in the presence of non-compliance. In these cases, E{Y(a)}

is no longer identiﬁed. We can still estimate the functional ψ(a) in (1) but we no longer have

E{Y(a)}=ψ(a). Sensitivity analysis methods aim to assess how much E{Y(a)}and the MSM

parameter βwill change when such unmeasured confounders Uexist. In this paper, we will derive

bounds for E{Y(a)} ≡ g(a;β), as well as for β, under varying amounts of unmeasured confounding.

We consider several sensitivity models for unmeasured confounding: a propensity-based model,

an outcome-based model, and a subset confounding model, in which only a fraction of the popu-

lation is subject to unmeasured confounding.

1.1. Related Work

Sensitivity analysis for causal inference began with Cornﬁeld et al. [1959]. Theory and methods for

sensitivity analysis were greatly expanded by Rosenbaum [1995]. Recently, there has been a ﬂurry

of interest in sensitivity analysis including Chernozhukov et al. [2021], Kallus et al. [2019], Zhao

et al. [2017], Yadlowsky et al. [2018], Scharfstein et al. [2021], among others. We refer to Section

2 of Scharfstein et al. [2021] for a review. Most work deals with binary, static treatments.

The closest work to ours is Brumback et al. [2004], who study sensitivity for MSMs with binary

treatments using parametric models for the sensitivity analysis. We instead consider nonparametric

sensitivity models, for continuous rather than binary treatments. While completing this paper,

Dorn and Guo [2021] appeared on arXiv, who independently derived bounds on treatment eﬀects

for nonparametric causal models that are similar to our bounds in Section 4.1, Lemma 2. Here we

treat MSMs rather than nonparametric causal models, with Lemma 2being an intermediate step

to our results.

1.2. Outline

We ﬁrst treat the static treatment setting. In Section 2we review MSMs. In Section 3we introduce

our three sensitivity analysis models. We ﬁnd bounds for the MSM g(a;β) and for its parameter

βunder propensity sensitivity in Section 4, under outcome sensitivity in Section 5and under

subset sensitivity in Appendix A.2. Then in Section 6, we extend our methods to the time series

setting. We illustrate our methods on simulated data in Appendix A.1 and on observational data

in Section 7. Section 8contains concluding remarks. All proofs can be found in the Appendix.

1.3. Notation

We use the notation P[f(Z)] = Rf(z)dP(z) and U[f(Z1, Z2)] = Rf(z1, z2)dP(z1, z2) to denote

expectations of a ﬁxed function, and Pn[f(Z)] = n−1Pn

i=1 f(Zi) and Un[f(Z1, Z2)] = {n(n−

1)}−1Pn

1≤i6=j≤nf(Zi, Zj) to denote their sample counterparts, where Unis the usual U-statistic

measure. We also let kfk2=Rf2(z)dP(z) denote the L2(P) norm of fand kfk∞= supz|f(z)|

denote the L∞or sup-norm of f. For β∈Rkwe let kβkdenote the Euclidean norm. For f(z1, z2)

we let S2[f] = {f(z1, z2) + f(z2, z1)}/2 be the symmetrizing function. Then Un[f(Z1, Z2)] =

Un[S2[f(Z1, Z2)]].

1.4. Some Inferential Issues

Here we brieﬂy discuss three issues that commonly arise in this paper when constructing conﬁdence

intervals.

The ﬁrst is that we often have to estimate quantities of the form ν=RRf(x, a)π(a)dadP(x)

where π(a) is the marginal density of A. This is not a usual expected value since the integral is

with respect to a product of marginals, π(a)dP(x), rather than the joint measure P(x, a). Then ν

can be written as

U[f(Z1, Z2)] ≡Z Z 1

2[f(x1, a2) + f(x2, a1)] dP(x1, a1)dP(x2, a2) = Z Z g(z1, z2)dP(z1)dP(z2)

where Z1= (X1, A1, Y1) and Z2= (X2, A2, Y2) are two independent draws and g(z1, z2) = S2[f]≡

(f(x1, a2)+f(x2, a1))/2. Under certain conditions, the limiting distribution of √n{Un[b

f(Z1, Z2)]−

U[f(Z1, Z2)]}, where b

fis an estimate of f, is the same as that of √n(Un−U)[f(Z1, Z2)]. More

speciﬁcally, let α∈Rk, where kis the dimension of f. By Theorem 12.3 in Van der Vaart [2000],

√n(Un−U)[αTf(Z1, Z2)] →N(0,4σ2),

where σ2=1

4αTΣαand Σ = var RS2[f(Z1, z2)]dP(z2). Therefore, by the Cramer-Wold device,

√n(Un−U)[f(Z1, Z2)] N(0,Σ). Thus, √n(Un−U)[S2[f(Z1, Z2)]] has variance equal to the

variance of the inﬂuence function of ν=RRf(x, a)π(a)dadP(x) and thus it is eﬃcient.

The second issue is that calculating the variances of these estimators can be cumbersome.

Instead, we construct conﬁdence intervals using the HulC [Kuchibhotla et al., 2021], which avoids

estimating variances. The dataset is randomly split into B= log(2/α)/log 2 subsamples (B=

6 when α= 5%) and the estimators are computed in each subsample. Then, the minimum

(maximum) of the six estimates is returned as the lower (upper) end of the conﬁdence interval.

The third issue is that many of our estimators depend on nuisance functions such as the outcome

model µ(a, x) and the conditional density π(a|x). To avoid imposing restrictions on the complexity

of the nuisance function classes, we analyze estimators based on cross-ﬁtting. That is, unless

otherwise stated, the nuisance functions are assumed to be estimated from a diﬀerent sample

than the sample used to compute the estimator. Such construction can always be achieved by

splitting the sample into kfolds; using all but one fold for training the nuisance functions and

the remaining fold to compute the estimator. Then, the roles of the folds can be swapped, thus

yielding kestimates that are averaged to obtain a single estimate of the parameter. For simplicity,

we will use k= 2, but our analysis can be easily extended to the case where multiple splits are

performed.

2. Marginal Structural Models

In this section we review basic terminology and notation for marginal structural models. We focus

for now on studies with one time point; we deal with time varying cases in Section 6. More detailed

reviews can be found in Robins and Hern´an [2009] and Hern´an and Robins [2010]. Let

E{Y(a)} ≡ ψ(a) = g(a;β), β ∈Rk,(2)

be a model for the expected outcome under treatment regime A=a. An example is the linear

model g(a;β) = bT(a)βfor some speciﬁed vector of basis functions b(a)=[b1(a), . . . , bk(a)]. It can

be shown that βin (2) satisﬁes the k-dimensional system of equations

E[h(A)w(A, X){Y−g(A;β)}] = 0 (3)

for any vector of functions h(a)=[h1(a), . . . , hk(a)], where w(a, x) can be taken to be either

1/π(a|x) or π(a)/π(a|x), and π(a) is the marginal density of the treatment A. The latter weights

are called stabilized weights and can lead to less variable estimators of β. We will use them

throughout. The parameter βcan be estimated by solving the empirical analog of (3), leading to

the estimating equations

Pn[h(A)bw(A, X){Y−g(A;β)}] = 0,(4)

where bw(a, x) = bπ(a)/bπ(a|x), and bπ(a|x) and bπ(a) are estimates of π(a|x) and π(a). Under

regularity conditions, including the correct speciﬁcation of π(a|x), conﬁdence intervals based

on √n(b

β−β) N(0, σ2), where σ2=M−1var[h(A)w(A, X){Y−g(A;β)}]M−1and M=

E{h(A)∇βg(A;β)T}, will be conservative.

Under model (2), every choice of h(a) leads to a √n-consistent, asymptotically Normal estimator

of β, though diﬀerent choices lead to diﬀerent standard errors. If the MSM is linear, i.e. g(a;β) =

b(a)Tβ, a common choice of h(a) is h(a) = b(a). In this case, the solution to the estimating equation

(4) can be obtained by weighted regression, b

β= (BTWB)−1BTWY, where Bis the n×kmatrix

with elements Bij =bj(Ai), Wis diagonal with elements c

Wi≡bw(Ai, Xi) and Y= (Y1, . . . , Yn).

3. Sensitivity Models

We now describe three models for representing unmeasured confounding when treatments are con-

tinuous. Each model deﬁnes a class of distributions for (U, X, A, Y ) where Urepresents unobserved

confounders. Our goal is to ﬁnd bounds on causal quantities, such as βor g(a;β), as the distribution

varies over these classes.

3.1. Propensity Sensitivity Model

In the case of binary treatments A∈ {0,1}, a commonly used sensitivity model [Rosenbaum, 1995]

is the odds ratio model

(γ) = (π(a|x, u) : 1

γ≤π(1|x, u)

π(0|x, u)

π(0|x, eu)

π(1|x, eu)≤γfor all u, eu, x)

for γ≥1. When Ais continuous, it is arguably more natural to work with density ratios, and so

we deﬁne

Π(γ) = (π(a|x, u) : 1

γ≤π(a|x, u)

π(a|x)≤γ, Zπ(a|x, u)da = 1,for all a, x, u).(5)

We can think of Π(γ) as deﬁning a neighborhood around π(a|x). This is related to the class in Tan

[2006] but we consider density ratios rather than odds ratios. There are other constraints possible,

such as Rπ(a|x, u)dP(u|x) = π(a|x); we leave enforcing these additional constraints, which can

yield more precise bounds, for future work.

3.2. Outcome Sensitivity Model

For an outcome-based sensitivity model, we deﬁne a neighborhood around µ(x, a) given by

M(δ) = µ(u, x, a) : |∆(a)| ≤ δ, ∆(a) = Z[µ(u, x, a)−µ(x, a)]dP(x, u),

which is the set of unobserved outcome regressions (on measured covariates, treatment, and un-

measured confounders) such that diﬀerences between unobserved and observed regressions diﬀer

by at most δafter averaging over measured and unmeasured covariates. We immediately have

the simple nonparametric bound E{µ(a, X)} − δ≤E{Y(a)} ≤ E{µ(a, X)}+δ. For a given

∆(a), is a known function, nonparametric bounds can be computed by regressing an estimate of

∆(A) + w(A, X){Y−µ(A, X)}+Rµ(A, x)dP(x) on A(see, e.g. Kennedy et al. [2017], Semenova

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SensitivityAnalysisforMarginalStructuralModelsMatteoBonvini,EdwardH.Kennedy,Val´erieVenturaandLarryWassermanCarnegieMellonUniversity,Pittsburgh,USA.Summary.Weintroduceseveralmethodsforassessingsensitivitytounmeasuredconfoundinginmarginalstructuralmodels;importantlyweallowtreatmentstobediscreteorcont...

展开>> 收起<<

Sensitivity Analysis for Marginal Structural Models Matteo Bonvini Edward H. Kennedy Val erie Ventura and Larry Wasserman Carnegie Mellon University Pittsburgh USA..pdf

共49页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sensitivity Analysis for Marginal Structural Models Matteo Bonvini Edward H. Kennedy Val erie Ventura and Larry Wasserman Carnegie Mellon University Pittsburgh USA.

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: