Sensitivity Analysis for Marginal Structural Models Matteo Bonvini Edward H. Kennedy Val erie Ventura and Larry Wasserman Carnegie Mellon University Pittsburgh USA.

2025-05-03 0 0 634.86KB 49 页 10玖币
侵权投诉
Sensitivity Analysis for Marginal Structural Models
Matteo Bonvini, Edward H. Kennedy, Val ´
erie Ventura and Larry Wasserman
Carnegie Mellon University, Pittsburgh, USA.
Summary. We introduce several methods for assessing sensitivity to unmeasured confounding in
marginal structural models; importantly we allow treatments to be discrete or continuous, static or
time-varying. We consider three sensitivity models: a propensity-based model, an outcome-based
model, and a subset confounding model, in which only a fraction of the population is subject to un-
measured confounding. In each case we develop efficient estimators and confidence intervals for
bounds on the causal parameters.
Keywords: Causal inference, sensitivity analysis, marginal structural models
1. Introduction
Marginal structural models (MSMs) [Robins, 1998,Robins et al., 2000,Robins, 2000] are a class
of semiparametric model commonly used for causal inference. As is typical in causal inference, the
parameters of the model are only identified under an assumption of no unmeasured confounding.
Thus, it is important to quantify how sensitive the inferences are to this assumption. Most existing
sensitivity analysis methods deal with binary point treatments. In contrast, in this paper we
develop tools for assessing sensitivity for MSMs with both continuous (non-binary) and time-
varying treatments.
For simplicity, consider the static treatment setting first. Extensions to time-varying treatments
are described in Section 6. Suppose we have niid observations (Z1, . . . , Zn), with Zi= (Xi, Ai, Yi)
from a distribution P, where YRis the outcome of interest, ARis a treatment (or exposure)
and XRdis a vector of confounding variables. Define the collection of counterfactual random
variables (also called potential outcomes) {Y(a) : aR}, where Y(a) denotes the value that Y
would have if Awere set to a. The usual assumptions in causal inference are:
(A1) No interference: if A=athen Y=Y(a), meaning that a subject’s potential outcomes only
depend on their own treatment.
(A2) Overlap: π(a|x)>0 for all xand a, where π(a|x) is the density of Agiven X=x(the
propensity score). Overlap guarantees that all subjects have some chance of receiving each
treatment level.
(A3) No unmeasured confounding: the counterfactuals {Y(a) : aR}are independent of A
given the observed covariates X. This assumption means that the treatment is as good as
arXiv:2210.04681v2 [stat.ME] 11 Oct 2022
randomized within levels of the measured covariates; in other words, there are no unmeasured
variables Uthat affect both Aand Y.
Under these assumptions, the causal mean E{Y(a)}is identified and equal to
ψ(a)Zµ(x, a)dP(x),(1)
where µ(x, a) = E[Y|X=x, A =a] is the outcome regression (causal parameters other than
E{Y(a)}, e.g., cumulative distribution functions, are identified similarly). Equation (1) is a special
case of the g-formula [Robins, 1986].
A marginal structural model (MSM) is a semiparametric model assuming ψ(a) = g(a;β)
[Robins, 1998,Robins et al., 2000,Robins, 2000]. The MSM provides an interpretable model
for the treatment effect and βcan be estimated using simple estimating equations. The model is
semiparametric in the sense that it leaves the data generating distribution unspecified except for the
restriction that Rµ(x, a)dP(x) = g(a;β). If gis mis-specified, one can regard g(a;β) as an approx-
imation to ψ(a), in which case one estimates the value βthat minimizes R(ψ(a)g(a;β))2ω(a)da,
where ωis a user provided weight function [Neugebauer and van der Laan, 2007].
In practice, there are often unmeasured confounders Uso that assumption (A3) fails. This
is especially true for observational studies where treatment is not under investigators’ control,
but it can also occur in experiments in the presence of non-compliance. In these cases, E{Y(a)}
is no longer identified. We can still estimate the functional ψ(a) in (1) but we no longer have
E{Y(a)}=ψ(a). Sensitivity analysis methods aim to assess how much E{Y(a)}and the MSM
parameter βwill change when such unmeasured confounders Uexist. In this paper, we will derive
bounds for E{Y(a)} ≡ g(a;β), as well as for β, under varying amounts of unmeasured confounding.
We consider several sensitivity models for unmeasured confounding: a propensity-based model,
an outcome-based model, and a subset confounding model, in which only a fraction of the popu-
lation is subject to unmeasured confounding.
1.1. Related Work
Sensitivity analysis for causal inference began with Cornfield et al. [1959]. Theory and methods for
sensitivity analysis were greatly expanded by Rosenbaum [1995]. Recently, there has been a flurry
of interest in sensitivity analysis including Chernozhukov et al. [2021], Kallus et al. [2019], Zhao
et al. [2017], Yadlowsky et al. [2018], Scharfstein et al. [2021], among others. We refer to Section
2 of Scharfstein et al. [2021] for a review. Most work deals with binary, static treatments.
The closest work to ours is Brumback et al. [2004], who study sensitivity for MSMs with binary
treatments using parametric models for the sensitivity analysis. We instead consider nonparametric
sensitivity models, for continuous rather than binary treatments. While completing this paper,
Dorn and Guo [2021] appeared on arXiv, who independently derived bounds on treatment effects
for nonparametric causal models that are similar to our bounds in Section 4.1, Lemma 2. Here we
treat MSMs rather than nonparametric causal models, with Lemma 2being an intermediate step
to our results.
2
1.2. Outline
We first treat the static treatment setting. In Section 2we review MSMs. In Section 3we introduce
our three sensitivity analysis models. We find bounds for the MSM g(a;β) and for its parameter
βunder propensity sensitivity in Section 4, under outcome sensitivity in Section 5and under
subset sensitivity in Appendix A.2. Then in Section 6, we extend our methods to the time series
setting. We illustrate our methods on simulated data in Appendix A.1 and on observational data
in Section 7. Section 8contains concluding remarks. All proofs can be found in the Appendix.
1.3. Notation
We use the notation P[f(Z)] = Rf(z)dP(z) and U[f(Z1, Z2)] = Rf(z1, z2)dP(z1, z2) to denote
expectations of a fixed function, and Pn[f(Z)] = n1Pn
i=1 f(Zi) and Un[f(Z1, Z2)] = {n(n
1)}1Pn
1i6=jnf(Zi, Zj) to denote their sample counterparts, where Unis the usual U-statistic
measure. We also let kfk2=Rf2(z)dP(z) denote the L2(P) norm of fand kfk= supz|f(z)|
denote the Lor sup-norm of f. For βRkwe let kβkdenote the Euclidean norm. For f(z1, z2)
we let S2[f] = {f(z1, z2) + f(z2, z1)}/2 be the symmetrizing function. Then Un[f(Z1, Z2)] =
Un[S2[f(Z1, Z2)]].
1.4. Some Inferential Issues
Here we briefly discuss three issues that commonly arise in this paper when constructing confidence
intervals.
The first is that we often have to estimate quantities of the form ν=RRf(x, a)π(a)dadP(x)
where π(a) is the marginal density of A. This is not a usual expected value since the integral is
with respect to a product of marginals, π(a)dP(x), rather than the joint measure P(x, a). Then ν
can be written as
U[f(Z1, Z2)] Z Z 1
2[f(x1, a2) + f(x2, a1)] dP(x1, a1)dP(x2, a2) = Z Z g(z1, z2)dP(z1)dP(z2)
where Z1= (X1, A1, Y1) and Z2= (X2, A2, Y2) are two independent draws and g(z1, z2) = S2[f]
(f(x1, a2)+f(x2, a1))/2. Under certain conditions, the limiting distribution of n{Un[b
f(Z1, Z2)]
U[f(Z1, Z2)]}, where b
fis an estimate of f, is the same as that of n(UnU)[f(Z1, Z2)]. More
specifically, let αRk, where kis the dimension of f. By Theorem 12.3 in Van der Vaart [2000],
n(UnU)[αTf(Z1, Z2)] N(0,4σ2),
where σ2=1
4αTΣαand Σ = var RS2[f(Z1, z2)]dP(z2). Therefore, by the Cramer-Wold device,
n(UnU)[f(Z1, Z2)] N(0,Σ). Thus, n(UnU)[S2[f(Z1, Z2)]] has variance equal to the
variance of the influence function of ν=RRf(x, a)π(a)dadP(x) and thus it is efficient.
The second issue is that calculating the variances of these estimators can be cumbersome.
Instead, we construct confidence intervals using the HulC [Kuchibhotla et al., 2021], which avoids
3
estimating variances. The dataset is randomly split into B= log(2)/log 2 subsamples (B=
6 when α= 5%) and the estimators are computed in each subsample. Then, the minimum
(maximum) of the six estimates is returned as the lower (upper) end of the confidence interval.
The third issue is that many of our estimators depend on nuisance functions such as the outcome
model µ(a, x) and the conditional density π(a|x). To avoid imposing restrictions on the complexity
of the nuisance function classes, we analyze estimators based on cross-fitting. That is, unless
otherwise stated, the nuisance functions are assumed to be estimated from a different sample
than the sample used to compute the estimator. Such construction can always be achieved by
splitting the sample into kfolds; using all but one fold for training the nuisance functions and
the remaining fold to compute the estimator. Then, the roles of the folds can be swapped, thus
yielding kestimates that are averaged to obtain a single estimate of the parameter. For simplicity,
we will use k= 2, but our analysis can be easily extended to the case where multiple splits are
performed.
2. Marginal Structural Models
In this section we review basic terminology and notation for marginal structural models. We focus
for now on studies with one time point; we deal with time varying cases in Section 6. More detailed
reviews can be found in Robins and Hern´an [2009] and Hern´an and Robins [2010]. Let
E{Y(a)} ≡ ψ(a) = g(a;β), β Rk,(2)
be a model for the expected outcome under treatment regime A=a. An example is the linear
model g(a;β) = bT(a)βfor some specified vector of basis functions b(a)=[b1(a), . . . , bk(a)]. It can
be shown that βin (2) satisfies the k-dimensional system of equations
E[h(A)w(A, X){Yg(A;β)}] = 0 (3)
for any vector of functions h(a)=[h1(a), . . . , hk(a)], where w(a, x) can be taken to be either
1(a|x) or π(a)(a|x), and π(a) is the marginal density of the treatment A. The latter weights
are called stabilized weights and can lead to less variable estimators of β. We will use them
throughout. The parameter βcan be estimated by solving the empirical analog of (3), leading to
the estimating equations
Pn[h(A)bw(A, X){Yg(A;β)}] = 0,(4)
where bw(a, x) = bπ(a)/bπ(a|x), and bπ(a|x) and bπ(a) are estimates of π(a|x) and π(a). Under
regularity conditions, including the correct specification of π(a|x), confidence intervals based
on n(b
ββ) N(0, σ2), where σ2=M1var[h(A)w(A, X){Yg(A;β)}]M1and M=
E{h(A)βg(A;β)T}, will be conservative.
Under model (2), every choice of h(a) leads to a n-consistent, asymptotically Normal estimator
of β, though different choices lead to different standard errors. If the MSM is linear, i.e. g(a;β) =
b(a)Tβ, a common choice of h(a) is h(a) = b(a). In this case, the solution to the estimating equation
4
(4) can be obtained by weighted regression, b
β= (BTWB)1BTWY, where Bis the n×kmatrix
with elements Bij =bj(Ai), Wis diagonal with elements c
Wibw(Ai, Xi) and Y= (Y1, . . . , Yn).
3. Sensitivity Models
We now describe three models for representing unmeasured confounding when treatments are con-
tinuous. Each model defines a class of distributions for (U, X, A, Y ) where Urepresents unobserved
confounders. Our goal is to find bounds on causal quantities, such as βor g(a;β), as the distribution
varies over these classes.
3.1. Propensity Sensitivity Model
In the case of binary treatments A∈ {0,1}, a commonly used sensitivity model [Rosenbaum, 1995]
is the odds ratio model
(γ) = (π(a|x, u) : 1
γπ(1|x, u)
π(0|x, u)
π(0|x, eu)
π(1|x, eu)γfor all u, eu, x)
for γ1. When Ais continuous, it is arguably more natural to work with density ratios, and so
we define
Π(γ) = (π(a|x, u) : 1
γπ(a|x, u)
π(a|x)γ, Zπ(a|x, u)da = 1,for all a, x, u).(5)
We can think of Π(γ) as defining a neighborhood around π(a|x). This is related to the class in Tan
[2006] but we consider density ratios rather than odds ratios. There are other constraints possible,
such as Rπ(a|x, u)dP(u|x) = π(a|x); we leave enforcing these additional constraints, which can
yield more precise bounds, for future work.
3.2. Outcome Sensitivity Model
For an outcome-based sensitivity model, we define a neighborhood around µ(x, a) given by
M(δ) = µ(u, x, a) : |∆(a)| ≤ δ, ∆(a) = Z[µ(u, x, a)µ(x, a)]dP(x, u),
which is the set of unobserved outcome regressions (on measured covariates, treatment, and un-
measured confounders) such that differences between unobserved and observed regressions differ
by at most δafter averaging over measured and unmeasured covariates. We immediately have
the simple nonparametric bound E{µ(a, X)} − δE{Y(a)} ≤ E{µ(a, X)}+δ. For a given
∆(a), is a known function, nonparametric bounds can be computed by regressing an estimate of
∆(A) + w(A, X){Yµ(A, X)}+Rµ(A, x)dP(x) on A(see, e.g. Kennedy et al. [2017], Semenova
5
摘要:

SensitivityAnalysisforMarginalStructuralModelsMatteoBonvini,EdwardH.Kennedy,Val´erieVenturaandLarryWassermanCarnegieMellonUniversity,Pittsburgh,USA.Summary.Weintroduceseveralmethodsforassessingsensitivitytounmeasuredconfoundinginmarginalstructuralmodels;importantlyweallowtreatmentstobediscreteorcont...

展开>> 收起<<
Sensitivity Analysis for Marginal Structural Models Matteo Bonvini Edward H. Kennedy Val erie Ventura and Larry Wasserman Carnegie Mellon University Pittsburgh USA..pdf

共49页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:49 页 大小:634.86KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 49
客服
关注