Sample-Specific Root Causal Inference with Latent Variables Eric V . Strobl Thomas A. Lasko Abstract

2025-05-03 0 0 562.9KB 21 页 10玖币
侵权投诉
Sample-Specific Root Causal Inference with Latent Variables
Eric V. Strobl, Thomas A. Lasko
Abstract
Root causal analysis seeks to identify the set of initial perturbations that induce an unwanted out-
come. In prior work, we defined sample-specific root causes of disease using exogenous error terms
that predict a diagnosis in a structural equation model. We rigorously quantified predictivity using
Shapley values. However, the associated algorithms for inferring root causes assume no latent con-
founding. We relax this assumption by permitting confounding among the predictors. We then
introduce a corresponding procedure called Extract Errors with Latents (EEL) for recovering the
error terms up to contamination by vertices on certain paths under the linear non-Gaussian acyclic
model. EEL also identifies the smallest sets of dependent errors for fast computation of the Shap-
ley values. The algorithm bypasses the hard problem of estimating the underlying causal graph
in both cases. Experiments highlight the superior accuracy and robustness of EEL relative to its
predecessors.
Keywords: causal inference, root cause, confounding, LiNGAM
1. Introduction
Causal inference refers to the process of inferring causal relations from data. Most scientists identify
causal relations by conducting randomized controlled trials (RCTs). RCTs can nevertheless fail to
distinguish between a cause and a root cause of disease, or the initial perturbation to an otherwise
healthy system that ultimately induces a diagnostic label. Identifying root causes is critical for
(a) understanding disease mechanisms and (b) discovering drug targets that eliminate disease at its
onset in a biological pathway.
Consider for example the directed graph in Figure 1 (a), where vertices in Xrepresent random
variables and directed edges their direct causal relations; we have XiXjwhen Xidirectly causes
Xj. The lightning bolt in the figure denotes an exogenous perturbation of the root cause X2, such
as a virus, mutation or physical injury. This perturbation in turn affects many downstream variables,
such as {X3, X4}, ultimately causing symptoms {X5, X6}and physicians to label a patient with
a diagnosis D= 1 indicating disease. The causes of Dinclude X1, . . . , X6, but we only seek to
identify the root cause X2that may lie arbitrarily far upstream from Din the general case.
Identifying root causes is further complicated by the existence of complex disease, where each
patient may have multiple root causes, and root causes may differ between patients even within the
same diagnostic category. The disease may also only affect certain tissues or cells in the body. We
therefore more specifically seek to identify sample-specific root causes, where a sample may denote
an arbitrary unit of granularity such as a patient, tissue or cell. Identifying sample-specific root
causes has the potential to help experimentalists rapidly identify interventions that target the very
beginnings of disease unique to each patient.
The above intuitive idea of a sample-specific root cause nevertheless lacks a rigorous mathe-
matical definition. This in turn hinders the development of principled algorithms designed for their
automated detection. As a result, we explicitly defined sample-specific root causes of disease as
the error terms in a structural equation model that predict a diagnostic label in prior work (Strobl
arXiv:2210.15340v1 [stat.ML] 27 Oct 2022
X1X2
X3
X4
X5
X6
D
(a)
X1X2
X3
X4
X5
X6
D
E1E2=e2
E3E5
E4E6
(b)
Figure 1: The lightning bolt in (a) denotes an exogenous perturbation of X2that eventually affects
many downstream variables and causes a diagnosis D. In (b), we model the lightning bolt as a
perturbation of E2to the value e2that impacts the values of all of its descendants and ultimately D.
and Lasko,2022a). We quantified predictivity using Shapley values. We also proposed methods to
directly extract these error terms both in the linear and non-linear settings via regression residuals
(Strobl and Lasko,2022a,b). The methods do not require knowledge about the underlying graph
and achieve sample efficiency by bypassing the hard problem of causal graph recovery (Chickering
et al.,2004). These algorithms however rely on the unreasonable assumption that the dataset contain
no unobserved confounders, which we relax in this paper by permitting confounding between the
variables in X.
We specifically make the following contributions in this paper:
We introduce a strategy for identifying sample-specific root causes with confounding by
extracting the error terms up to contamination on certain paths.
We propose an algorithm called Extract Errors with Latents (EEL) that recovers the above
error terms and computes an undirected graph summarizing their statistical dependencies.
We use the graph to efficiently compute Shapley values of the error terms by averaging
over small neighborhoods of dependence.
Experiments in Section 7highlight the superiority of EEL relative to existing methods in the pres-
ence of confounding.
2. Structural Equation Models
We can formalize causal inference under the framework of structural equation models (SEMs). An
SEM over a set of prandom variables Xrefers to a set of deterministic equations in the following
form:
Xi=fi(Pa(Xi), Ei),XiX.
where Edenotes a random vector of pmutually independent error terms, and Pa(Xi)Xthe
parents, or direct causes, of Xi. We can equivalently set the equality sign in the above equation to
algorithmic assignment in order to emphasize that interventions on Pa(Xi)induce changes in the
conditional probability distribution of Xigiven Pa(Xi).
2
EXTRACT ERRORS WITH LATENTS
We can associate an SEM with a directed graph Gcontaining at most one directed edge between
any two variables in X. We have XiXj, when there exists a direct causal relation from Xi
Pa(Xj)to Xj. We always have EiXiin Gbut only draw the vertices in Eand their outgoing
edges when informative. We use the notation PaG(Xj)to emphasize the underlying graph G. A
directed path is sequence of directed edges. Xiis an ancestor of Xj, and Xjadescendant of Xi, if a
directed path exists from Xito Xj. A directed acyclic graph (DAG) corresponds to a directed graph
without cycles, where Xiis an ancestor of Xjand vice versa. A vertex Xjis a collider on a path if
we have XiXjXkon the path. Two vertices Xiand Xjare d-connected given W\{Xi, Xj}
when there exists a path between Xiand Xjsuch that every collider has a descendant in Wand no
non-collider is in W. The two vertice are likewise d-separated when they are not d-connected.
An SEM with an associated DAG Gcan admit a density that factorizes according to the graph:
p(X) =
p
Y
i=1
p(Xi|PaG(Xi)).
The above factorization implies that, if Xiand Xjare d-separated given Win G, then the two ver-
tices are also conditionally independent given W, which we denote by XiXj|Wfor shorthand
(Lauritzen et al.,1990). D-separation faithfulness refers to the converse: if XiXj|W, then Xi
and Xjare d-separated given W.
In this paper, we focus on linear SEMs with an associated DAG:
Xi=
p
X
j=1
Xjβji +Ei,XiX,(1)
comprised of a set of linear equations with coefficient matrix βwhere βji 6= 0 if and only if
XjPaG(Xi). We assume E(X)=0without loss of generality. The equations more specifically
follow a Linear Non-Gaussian Acyclic Model (LiNGAM) when each error term is continuous non-
Gaussian (Shimizu et al.,2006).
Most existing methods also assume that we observe all of the variables in X. We relax this
assumption by dividing Xinto a set of qobserved variables Oand a set of mlatent – or unobserved
– common causes L. We can always write the following:
Oi=
q
X
j=1
Ojβji +
m
X
k=1
Lkγki +Ei,OiO.(2)
Each Lkmust have at least two children lest we accommodate it into Ei. Without loss of generality,
we may also assume that T=LEdenotes a set of mutually independent random variables with
no parents (Hoyer et al.,2008). We refer to Equation (2) as the canonical form.
We can write Equation (2) in matrix notation:
O=Oβ+Lγ+E.
Re-arranging terms yields:
O= (Lγ+E)(Iβ)1=Eλ+Lγλ =Tθ
where λ= (Iβ)1and θ= [λ;γλ]. Notice that Tis now ordered such that Tj=Ejif jq.
The entry θji quantifies the total effect of the latent variable or error term Tjon Oi.
3
3. Sample-Specific Root Causes
We consider LiNGAM over Xand introduce an additional label Drepresenting a diagnosis; we
have D= 1 for patients deemed to have a disease, and D= 0 for healthy controls. We then assume
a DAG over XDsuch that Dis a terminal vertex, or a vertex without descendants, and linked to
Xvia a logistic function:
Assumption 1. Dis a terminal vertex such that P(D|X) = logistic(Xβ·D+α).
This is a reasonable assumption because a scientist who seeks to identify the causes of Dwill
likely use datasets containing measurements of the non-descendants of the diagnosis, such as gene
expression levels, clinical laboratory values or imaging. The logistic link also provides a natural
extension of LiNGAM to handle a noisy binary variable.
We model a sample-specific perturbation first affecting the root cause XiXas a change in
the value of its error term Ei. We may write the following for any healthy control:
Xi=
p
X
j=1
Xjβji +eei,(3)
where we have set the value of Eiin Equation (1) to eei. Suppose however that an exogenous
perturbation – such as a virus, mutation or physical injury – changes the value of Eifrom eeito
ei. This perturbation in turn effects the value of Xiand all of its downstream effects, ultimately
increasing the probability of developing disease D= 1 (Figure 1 (b)).
We can quantify the increase in the probability of developing disease using logistic regression.
We in particular consider:
f(E) = lnhP(D= 1|E)
P(D= 0|E)i=Eθ·D+α,
where the last equality follows by Assumption 1. Let v(W)denote the conditional expectation of
the logistic regression model E(f(E)|W), initially where W=. We then measure the change in
probability when intervening on Eivia the following difference:
γEiW=v(Ei,W)
| {z }
(a)
v(W)
|{z}
(b)
(4)
We have γeiW>0when Ei=eiincreases the probability that D= 1 because (a) is larger than
(b).
Expression (4) unfortunately only quantifies the effect of Eion Din isolation. We however
also want to quantify the joint effect of Eiin conjunction with the other error terms in E\Eiwhen
W6=. We therefore average over all possible combinations of the errors as follows:
Si=1
pX
W(E\Ei)
1
p1
|W|
| {z }
Average over all possible combinations of E\Ei
γEiW.(5)
The quantity corresponds precisely to the well-known Shapley value which, as the reader may recall,
is the only value satisfying the linearity, efficiency, symmetry and null player properties (see e.g.,
(Lundberg and Lee,2017;ˇ
Strumbelj and Kononenko,2014)).
The following result holds:
4
EXTRACT ERRORS WITH LATENTS
Proposition 1. Under LINGAM over Xand Assumption 1, the Shapley value Sicorresponds to the
sample-specific total effect of Eion D:Si=EiθiD.1
The proof follows directly from Corollary 1 of (Lundberg and Lee,2017). This justifies the follow-
ing definition of a sample-specific root cause:
Definition 1. XiAncG(D)is a sample-specific root cause of disease if Si>0.
In other words, a sample-specific root cause of disease is a variable associated with an error term that
increases the probability that D= 1 as quantified by the Shapley value Si>0. We do not consider
Si0because Eidecreases the probability that D= 1 (or likewise increases the probability that
D= 0) when Si<0. Similarly, Eihas no effect on increasing or decreasing the probability that
D= 1 when Si= 0. We have thus arrived at a concise definition of a sample-specific root cause as
a variable associated with a positive Shapley value of its error.
4. Inducing Paths & Terms
The definition of a sample-specific root cause implies that we must develop methods that can accu-
rately extract the error terms in order to compute the Shapley value. We however cannot identify the
error terms Eexactly when confounding exists. Consider for example the graph shown in Figure
2 (a), where we cannot partial out L1from O1and O2because L1is unobserved.
We can however identify the error terms up to connection by directed inducing paths:
Definition 2. A directed inducing path to Oiis a path between Oiand TjT(possibly i=j)
where every collider is an ancestor of Oiand every non-collider is in L.
All colliders are directed to Oi. We only consider directed inducing paths from the error terms or
latent variables to Oi. We provide an example in Figure 2 (b). Any error term incident on or latent
variable lying on a directed inducing path to Oialso has a directed inducing path to Oi. Only Eilies
on a directed inducing path to Oiin the unconfounded setting, but more error terms may lie on the
path when confounding exists. Finally, the above definition corresponds to the directed analogue of
an (undirected) inducing path utilized in constraint-based search with latent variables, where every
collider is an ancestor of either endpoint Oior Tj(or both) (Spirtes et al.,2000).
The following result elucidates the limits of error term recovery when assessing statistical
independence with regression residuals. Consider the ideal scenario where we have access to
Fj=Ej+PLkPaG(Oj)LLkγkj for each OjO, which we collect into the set F.
Lemma 1. Under LiNGAM and d-separation faithfulness, if some entry in WF\Ficorresponds
to an observed vertex lying on a directed inducing path to Oi, then ROiW6⊥Fjfor some FjW.
We delegate proofs to Appendix 9.4. The latent common causes lying on a directed inducing path
to Oithus ensure that we cannot partial out the error terms incident on the path from Oiin general,
even if we identified all entries in F\Fi.
We instead focus on identifying the error terms up to connection by a directed inducing path.
Specifically, let CiTdenote the set of error terms and latent variables lying on any directed
inducing path to Oi. We consider:
E
i=CiθCii,(6)
1. If D=Xβ·D+EDis terminal and continuous, then we arrive at the same result when γEiW=E(D|Ei,W)
E(D|W). We focus on a binary target because this is the most common situation encountered by far.
5
摘要:

Sample-SpecicRootCausalInferencewithLatentVariablesEricV.Strobl,ThomasA.LaskoAbstractRootcausalanalysisseekstoidentifythesetofinitialperturbationsthatinduceanunwantedout-come.Inpriorwork,wedenedsample-specicrootcausesofdiseaseusingexogenouserrortermsthatpredictadiagnosisinastructuralequationmodel...

展开>> 收起<<
Sample-Specific Root Causal Inference with Latent Variables Eric V . Strobl Thomas A. Lasko Abstract.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:562.9KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注