Data-driven Automated Negative Control Estimation DANCE Search for Validation of and Causal Inference with Negative Controls

2025-04-27 0 0 1.16MB 39 页 10玖币
侵权投诉
Data-driven Automated Negative Control Estimation
(DANCE): Search for, Validation of, and Causal
Inference with Negative Controls
Erich Kummerfeld1, Jaewon Lim2, and Xu Shi3
1Institute for Health Informatics, University of Minnesota
2Department of Biostatistics, University of Washington
3Department of Biostatistics, University of Michigan
Abstract
Negative control variables are increasingly used to adjust for unmeasured confound-
ing bias in causal inference using observational data. They are typically identified by
subject matter knowledge and there is currently a severe lack of data-driven methods
to find negative controls. In this paper, we present a statistical test for discovering
negative controls of a special type—disconnected negative controls—that can serve as
surrogates of the unmeasured confounder, and we incorporate that test into the Data-
driven Automated Negative Control Estimation (DANCE) algorithm. DANCE first
uses the new validation test to identify subsets of a set of candidate negative control
variables that satisfy the assumptions of disconnected negative controls. It then applies
a negative control method to each pair of these validated negative control variables,
and aggregates the output to produce an unbiased point estimate and confidence in-
terval for a causal effect in the presence of unmeasured confounding. We (1) prove the
correctness of this validation test, and thus of DANCE; (2) demonstrate via simula-
tion experiments that DANCE outperforms both naive analysis ignoring unmeasured
confounding and negative control method with randomly selected candidate negative
controls; and (3) demonstrate the effectiveness of DANCE on a challenging real-world
problem.
Keywords: causal discovery; graphical models; negative control; unmeasured confounding;
vanishing tetrad.
1
arXiv:2210.00528v1 [stat.ME] 2 Oct 2022
1 Introduction
There are many causal questions in science and medicine that can not be solved with ran-
domized experiments now or in the foreseeable future. For such questions, our best estimates
must thus rely on observational data instead. The rich field of causal inference has developed
in response to this, providing support for these efforts and developing methods that offer
some level of assurance and confidence for learning causal information from observational
data (Pearl 2009, Rubin 1974). Many causal inference methods assume that there are no
unmeasured common causes of treatment and outcome, but it is generally believed that in
reality unmeasured confounders are widespread. This is a serious limitation to the methods
that make such assumptions. One of the most frequently used approaches to mitigate un-
measured confounding is the instrumental variable (IV) approach (Angrist & Keueger 1991,
Angrist et al. 1996, Hern´an & Robins 2006), which has been previously studied extensively
(Greenland 2000, Baiocchi et al. 2014, Garabedian et al. 2014, Burgess et al. 2017, Swanson
et al. 2018).
A more recently developed strategy is negative control (NC) methods (Lipsitch et al.
2010, Shi et al. 2020a, Tchetgen et al. 2020). Negative controls are variables associated
with the unmeasured confounders but not causally related to either the treatment or out-
come variables of primary interest. One can detect residual confounding bias leveraging such
known null effects: presence of an association between the negative control and the exposure
or outcome constitutes compelling evidence of residual confounding bias, while the absence
of such association implies no empirical evidence of such bias. NCs have traditionally been
used to rule out non-causal explanations of empirical findings (Rosenbaum 1989, Weiss 2002,
Lipsitch et al. 2010, Glass 2014). Recently, a sequence of NC methods have been developed
to identify causal effects and correct for unmeasured confounding bias (Miao, Geng & Tch-
etgen Tchetgen 2018, Deaner 2018, Shi et al. 2020b, Singh 2020, Cui et al. 2020, Ying et al.
2021, Kallus et al. 2021, Dukes et al. 2021, Li et al. 2022).
A key challenge in the use of NC methods is that until now, NC variables have had
to be identified laboriously from background knowledge. It also had to be assumed that
the identified variables were genuine NCs, as no validation test existed unless one is willing
to make additional assumptions. Such situations are common in causal inference, e.g., the
2
assumption of no unmeasured confounding is also untestable. Nevertheless, we will show
that under certain conditions, it is possible to leverage certain subcovariance matrix rank
constraints to validate a particular class of NC variables, referred to as disconnected NCs
which we formally define in Section 2.1, satisfying a specific causal structural model.
In this paper, we utilize some lesser known theory regarding relationships between sub-
covariance matrix rank constraints and the graphical structure of causal models to provide
both theory and algorithms for evaluating NC variables. First, we provide a statistical test
that can be used to determine whether a triplet of candidate NCs are real disconnected NCs
or not. Second, we provide a simple algorithm for searching among a set of candidate NCs,
and identifying subsets of those variables that collectively meet the conditions of being dis-
connected NCs. Third, we combine our proposed method for finding valid NC variables with
a recently developed double-NC method for causal inference (Miao, Shi & Tchetgen Tchet-
gen 2018, Shi et al. 2020b, Cui et al. 2020), creating an algorithm that accurately estimates
and makes inferences about causal effects from observational data. We refer to the proposed
method as the Data-driven Automated Negative Control Estimation (DANCE) algorithm.
We prove that our proposed methods are correct under fairly general assumptions, evaluate
their finite sample performance with a series of numerical experiments, and demonstrate
their usability on a real world data set.
The rest of the paper is organized as follows. In Section 2 we review the three main
topics that the work in this paper builds upon: negative controls, structural models, and
rank constraints. We then present a statistical validation test for disconnected NCs in Section
3, and prove its correctness in Section 3.3. Section 4 presents an algorithm that searches a
set of candidate NC variables to find sets of disconnected NCs which pass the validation test,
and Section 5 presents the DANCE algorithm that combines with the double-NC method to
construct an all-in-one method for producing a valid causal effect estimate from a data set
containing a collection of candidate NC variables, some of which are not necessarily valid
disconnected NCs. Section 6 presents numerical experiments to evaluate our proposed test
and algorithms, and compares them to two methods: a simple regression method ignoring
unmeasured confounding and a random selection of candidate NCs followed by the double-
NC method. An application of DANCE to a real clinical data set is described in Section 7.
Section 8 summarizes the strengths and limitations of the methods presented in this paper,
3
and points towards promising directions for future work.
2 Background
2.1 Unmeasured Confounding and Negative Control Methods
We adopt the potential outcome framework under the Stable Unit Treatment Value As-
sumption (SUTVA) (Rubin 1974, 1980, Cox 1992) and let (O(1), O(0)) denote the pair of
potential outcomes under treatment and control conditions, respectively. We are interested
in estimating the average treatment effect (ATE), defined as ∆ = E[O(1) O(0)]. It suffices
to identify the counterfactual mean E[O(t)] for t∈ {0,1}. Let Odenote the observed out-
come and Tdenote the binary treatment. We suppress measured covariates for simplicity;
adjustment for measured covariates is discussed in Section 5.1.
Instead of making the no unmeasured confounding assumption, we allow the presence of
an unmeasured confounder Uwith a latent ignorability assumption that O(t)T|U. If U
was measured, then E[O(t)] is identified under the ignorability assumption (Robins 1986).
However, when Uis unobserved and unadjusted, ATE estimation will be biased. In this
case, additional information is needed to identify and make inference about the ATE.
An increasingly popular approach to mitigate bias due to unmeasured confounding is to
use its proxies. For example, as shown in Figure 1, if Ucan be measured with error via
proxy variables Zand W, then one can leverage Zand Wto identify the confounding bias
due to Uand remove such bias from the estimated causal effect. Such proxy variables have
been referred to as negative controls (Lipsitch et al. 2010, Shi et al. 2020a). Formally, a
negative control outcome, denoted as W, is a variable known not to be causally affected by
the treatment of interest. Likewise, a negative control exposure, denoted as Z, is a variable
known not to causally affect the outcome of interest. The negative control exposure and
outcome variables should share a confounding mechanism with the exposure and outcome
variables of primary interest. In summary, Zand Wsatisfies
(T, Z)(O(t), W )|U. (1)
There are a number of causal graphs that satisfy the NC assumptions (Shi et al. 2020a).
4
For example, both a valid instrumental variable independent of the unmeasured confounder
and an invalid instrumental variable associated with the unmeasured confounder are valid
negative control exposures. Alternative directed acyclic graphs encoding the NC assumptions
are available in Shi et al. (2020a).
Figure 1 presents a special case where Zand Ware causally related to neither the
treatment nor the outcome of interest, hence Zand Wcan serve as either negative control
exposure or negative control outcome (Shi et al. 2020b, Tchetgen et al. 2020). We refer to
such a special class of NC variables as the disconnected NCs. Formally, the disconnected
NCs satisfy the following assumption
(Z, W )(T, O)|U.
Compared to the fundamental NC assumption (1), the disconnected NCs satisfy additional
assumptions that ZT|Uand WO|U.
T O
U ZW
treatment outcome
unmeasured
confounders
negative
control
negative
control
Figure 1: Causal graph of two disconnected NCs, Z and W, suppressing the measured
covariates X which is implicitly conditioned on in all arguments.
Using a pair of negative control exposure and outcome variables, referred to as the double-
NC, Miao, Geng & Tchetgen Tchetgen (2018) established nonparametric identification of the
average treatment effect (ATE). Intuitively, having additional children of Uthat are condi-
tionally independent with Tand Oallows for identification of the unmeasured confounding
bias due to the influence of Uon Tand O, and subsequently this quantity can be removed
from the association between Tand O, leaving an unbiased estimate of T’s effect on O. Re-
cently, the NC framework has been extended to proximal causal inference, which partitions
measured covariates into proxies satisfying NC conditions, acknowledging that covariate mea-
surements are at best proxies of the underlying confounding mechanisms (Tchetgen et al.
5
摘要:

Data-drivenAutomatedNegativeControlEstimation(DANCE):Searchfor,Validationof,andCausalInferencewithNegativeControlsErichKummerfeld1,JaewonLim2,andXuShi31InstituteforHealthInformatics,UniversityofMinnesota2DepartmentofBiostatistics,UniversityofWashington3DepartmentofBiostatistics,UniversityofMichiganA...

展开>> 收起<<
Data-driven Automated Negative Control Estimation DANCE Search for Validation of and Causal Inference with Negative Controls.pdf

共39页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:39 页 大小:1.16MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 39
客服
关注