Sample-Speciﬁc Root Causal Inference with Latent Variables Eric V . Strobl Thomas A. Lasko Abstract

2025-05-03 0 0 562.9KB 21 页 10玖币

侵权投诉

Sample-Speciﬁc Root Causal Inference with Latent Variables

Eric V. Strobl, Thomas A. Lasko

Abstract

Root causal analysis seeks to identify the set of initial perturbations that induce an unwanted out-

come. In prior work, we deﬁned sample-speciﬁc root causes of disease using exogenous error terms

that predict a diagnosis in a structural equation model. We rigorously quantiﬁed predictivity using

Shapley values. However, the associated algorithms for inferring root causes assume no latent con-

founding. We relax this assumption by permitting confounding among the predictors. We then

introduce a corresponding procedure called Extract Errors with Latents (EEL) for recovering the

error terms up to contamination by vertices on certain paths under the linear non-Gaussian acyclic

model. EEL also identiﬁes the smallest sets of dependent errors for fast computation of the Shap-

ley values. The algorithm bypasses the hard problem of estimating the underlying causal graph

in both cases. Experiments highlight the superior accuracy and robustness of EEL relative to its

predecessors.

Keywords: causal inference, root cause, confounding, LiNGAM

1. Introduction

Causal inference refers to the process of inferring causal relations from data. Most scientists identify

causal relations by conducting randomized controlled trials (RCTs). RCTs can nevertheless fail to

distinguish between a cause and a root cause of disease, or the initial perturbation to an otherwise

healthy system that ultimately induces a diagnostic label. Identifying root causes is critical for

(a) understanding disease mechanisms and (b) discovering drug targets that eliminate disease at its

onset in a biological pathway.

Consider for example the directed graph in Figure 1 (a), where vertices in Xrepresent random

variables and directed edges their direct causal relations; we have Xi→Xjwhen Xidirectly causes

Xj. The lightning bolt in the ﬁgure denotes an exogenous perturbation of the root cause X2, such

as a virus, mutation or physical injury. This perturbation in turn affects many downstream variables,

such as {X3, X4}, ultimately causing symptoms {X5, X6}and physicians to label a patient with

a diagnosis D= 1 indicating disease. The causes of Dinclude X1, . . . , X6, but we only seek to

identify the root cause X2that may lie arbitrarily far upstream from Din the general case.

Identifying root causes is further complicated by the existence of complex disease, where each

patient may have multiple root causes, and root causes may differ between patients even within the

same diagnostic category. The disease may also only affect certain tissues or cells in the body. We

therefore more speciﬁcally seek to identify sample-speciﬁc root causes, where a sample may denote

an arbitrary unit of granularity such as a patient, tissue or cell. Identifying sample-speciﬁc root

causes has the potential to help experimentalists rapidly identify interventions that target the very

beginnings of disease unique to each patient.

The above intuitive idea of a sample-speciﬁc root cause nevertheless lacks a rigorous mathe-

matical deﬁnition. This in turn hinders the development of principled algorithms designed for their

automated detection. As a result, we explicitly deﬁned sample-speciﬁc root causes of disease as

the error terms in a structural equation model that predict a diagnostic label in prior work (Strobl

arXiv:2210.15340v1 [stat.ML] 27 Oct 2022

X1X2

(a)

X1X2

E1E2=e2

E3E5

E4E6

(b)

Figure 1: The lightning bolt in (a) denotes an exogenous perturbation of X2that eventually affects

many downstream variables and causes a diagnosis D. In (b), we model the lightning bolt as a

perturbation of E2to the value e2that impacts the values of all of its descendants and ultimately D.

and Lasko,2022a). We quantiﬁed predictivity using Shapley values. We also proposed methods to

directly extract these error terms both in the linear and non-linear settings via regression residuals

(Strobl and Lasko,2022a,b). The methods do not require knowledge about the underlying graph

and achieve sample efﬁciency by bypassing the hard problem of causal graph recovery (Chickering

et al.,2004). These algorithms however rely on the unreasonable assumption that the dataset contain

no unobserved confounders, which we relax in this paper by permitting confounding between the

variables in X.

We speciﬁcally make the following contributions in this paper:

• We introduce a strategy for identifying sample-speciﬁc root causes with confounding by

extracting the error terms up to contamination on certain paths.

• We propose an algorithm called Extract Errors with Latents (EEL) that recovers the above

error terms and computes an undirected graph summarizing their statistical dependencies.

• We use the graph to efﬁciently compute Shapley values of the error terms by averaging

over small neighborhoods of dependence.

Experiments in Section 7highlight the superiority of EEL relative to existing methods in the pres-

ence of confounding.

2. Structural Equation Models

We can formalize causal inference under the framework of structural equation models (SEMs). An

SEM over a set of prandom variables Xrefers to a set of deterministic equations in the following

form:

Xi=fi(Pa(Xi), Ei),∀Xi∈X.

where Edenotes a random vector of pmutually independent error terms, and Pa(Xi)⊆Xthe

parents, or direct causes, of Xi. We can equivalently set the equality sign in the above equation to

algorithmic assignment ←in order to emphasize that interventions on Pa(Xi)induce changes in the

conditional probability distribution of Xigiven Pa(Xi).

EXTRACT ERRORS WITH LATENTS

We can associate an SEM with a directed graph Gcontaining at most one directed edge between

any two variables in X. We have Xi→Xj, when there exists a direct causal relation from Xi∈

Pa(Xj)to Xj. We always have Ei→Xiin Gbut only draw the vertices in Eand their outgoing

edges when informative. We use the notation PaG(Xj)to emphasize the underlying graph G. A

directed path is sequence of directed edges. Xiis an ancestor of Xj, and Xjadescendant of Xi, if a

directed path exists from Xito Xj. A directed acyclic graph (DAG) corresponds to a directed graph

without cycles, where Xiis an ancestor of Xjand vice versa. A vertex Xjis a collider on a path if

we have Xi→Xj←Xkon the path. Two vertices Xiand Xjare d-connected given W\{Xi, Xj}

when there exists a path between Xiand Xjsuch that every collider has a descendant in Wand no

non-collider is in W. The two vertice are likewise d-separated when they are not d-connected.

An SEM with an associated DAG Gcan admit a density that factorizes according to the graph:

p(X) =

i=1

p(Xi|PaG(Xi)).

The above factorization implies that, if Xiand Xjare d-separated given Win G, then the two ver-

tices are also conditionally independent given W, which we denote by Xi⊥⊥ Xj|Wfor shorthand

(Lauritzen et al.,1990). D-separation faithfulness refers to the converse: if Xi⊥⊥ Xj|W, then Xi

and Xjare d-separated given W.

In this paper, we focus on linear SEMs with an associated DAG:

Xi=

j=1

Xjβji +Ei,∀Xi∈X,(1)

comprised of a set of linear equations with coefﬁcient matrix βwhere βji 6= 0 if and only if

Xj∈PaG(Xi). We assume E(X)=0without loss of generality. The equations more speciﬁcally

follow a Linear Non-Gaussian Acyclic Model (LiNGAM) when each error term is continuous non-

Gaussian (Shimizu et al.,2006).

Most existing methods also assume that we observe all of the variables in X. We relax this

assumption by dividing Xinto a set of qobserved variables Oand a set of mlatent – or unobserved

– common causes L. We can always write the following:

Oi=

j=1

Ojβji +

k=1

Lkγki +Ei,∀Oi∈O.(2)

Each Lkmust have at least two children lest we accommodate it into Ei. Without loss of generality,

we may also assume that T=L∪Edenotes a set of mutually independent random variables with

no parents (Hoyer et al.,2008). We refer to Equation (2) as the canonical form.

We can write Equation (2) in matrix notation:

O=Oβ+Lγ+E.

Re-arranging terms yields:

O= (Lγ+E)(I−β)−1=Eλ+Lγλ =Tθ

where λ= (I−β)−1and θ= [λ;γλ]. Notice that Tis now ordered such that Tj=Ejif j≤q.

The entry θji quantiﬁes the total effect of the latent variable or error term Tjon Oi.

3. Sample-Speciﬁc Root Causes

We consider LiNGAM over Xand introduce an additional label Drepresenting a diagnosis; we

have D= 1 for patients deemed to have a disease, and D= 0 for healthy controls. We then assume

a DAG over X∪Dsuch that Dis a terminal vertex, or a vertex without descendants, and linked to

Xvia a logistic function:

Assumption 1. Dis a terminal vertex such that P(D|X) = logistic(Xβ·D+α).

This is a reasonable assumption because a scientist who seeks to identify the causes of Dwill

likely use datasets containing measurements of the non-descendants of the diagnosis, such as gene

expression levels, clinical laboratory values or imaging. The logistic link also provides a natural

extension of LiNGAM to handle a noisy binary variable.

We model a sample-speciﬁc perturbation ﬁrst affecting the root cause Xi∈Xas a change in

the value of its error term Ei. We may write the following for any healthy control:

Xi=

j=1

Xjβji +eei,(3)

where we have set the value of Eiin Equation (1) to eei. Suppose however that an exogenous

perturbation – such as a virus, mutation or physical injury – changes the value of Eifrom eeito

ei. This perturbation in turn effects the value of Xiand all of its downstream effects, ultimately

increasing the probability of developing disease D= 1 (Figure 1 (b)).

We can quantify the increase in the probability of developing disease using logistic regression.

We in particular consider:

f(E) = lnhP(D= 1|E)

P(D= 0|E)i=Eθ·D+α,

where the last equality follows by Assumption 1. Let v(W)denote the conditional expectation of

the logistic regression model E(f(E)|W), initially where W=∅. We then measure the change in

probability when intervening on Eivia the following difference:

γEiW=v(Ei,W)

| {z }

(a)

−v(W)

|{z}

(b)

(4)

We have γeiW>0when Ei=eiincreases the probability that D= 1 because (a) is larger than

(b).

Expression (4) unfortunately only quantiﬁes the effect of Eion Din isolation. We however

also want to quantify the joint effect of Eiin conjunction with the other error terms in E\Eiwhen

W6=∅. We therefore average over all possible combinations of the errors as follows:

Si=1

W⊆(E\Ei)

p−1

|W|

| {z }

Average over all possible combinations of E\Ei

γEiW.(5)

The quantity corresponds precisely to the well-known Shapley value which, as the reader may recall,

is the only value satisfying the linearity, efﬁciency, symmetry and null player properties (see e.g.,

(Lundberg and Lee,2017;ˇ

Strumbelj and Kononenko,2014)).

The following result holds:

EXTRACT ERRORS WITH LATENTS

Proposition 1. Under LINGAM over Xand Assumption 1, the Shapley value Sicorresponds to the

sample-speciﬁc total effect of Eion D:Si=EiθiD.1

The proof follows directly from Corollary 1 of (Lundberg and Lee,2017). This justiﬁes the follow-

ing deﬁnition of a sample-speciﬁc root cause:

Deﬁnition 1. Xi∈AncG(D)is a sample-speciﬁc root cause of disease if Si>0.

In other words, a sample-speciﬁc root cause of disease is a variable associated with an error term that

increases the probability that D= 1 as quantiﬁed by the Shapley value Si>0. We do not consider

Si≤0because Eidecreases the probability that D= 1 (or likewise increases the probability that

D= 0) when Si<0. Similarly, Eihas no effect on increasing or decreasing the probability that

D= 1 when Si= 0. We have thus arrived at a concise deﬁnition of a sample-speciﬁc root cause as

a variable associated with a positive Shapley value of its error.

4. Inducing Paths & Terms

The deﬁnition of a sample-speciﬁc root cause implies that we must develop methods that can accu-

rately extract the error terms in order to compute the Shapley value. We however cannot identify the

error terms Eexactly when confounding exists. Consider for example the graph shown in Figure

2 (a), where we cannot partial out L1from O1and O2because L1is unobserved.

We can however identify the error terms up to connection by directed inducing paths:

Deﬁnition 2. A directed inducing path to Oiis a path between Oiand Tj∈T(possibly i=j)

where every collider is an ancestor of Oiand every non-collider is in L.

All colliders are directed to Oi. We only consider directed inducing paths from the error terms or

latent variables to Oi. We provide an example in Figure 2 (b). Any error term incident on or latent

variable lying on a directed inducing path to Oialso has a directed inducing path to Oi. Only Eilies

on a directed inducing path to Oiin the unconfounded setting, but more error terms may lie on the

path when confounding exists. Finally, the above deﬁnition corresponds to the directed analogue of

an (undirected) inducing path utilized in constraint-based search with latent variables, where every

collider is an ancestor of either endpoint Oior Tj(or both) (Spirtes et al.,2000).

The following result elucidates the limits of error term recovery when assessing statistical

independence with regression residuals. Consider the ideal scenario where we have access to

Fj=Ej+PLk∈PaG(Oj)∩LLkγkj for each Oj∈O, which we collect into the set F.

Lemma 1. Under LiNGAM and d-separation faithfulness, if some entry in W⊆F\Ficorresponds

to an observed vertex lying on a directed inducing path to Oi, then ROiW6⊥⊥ Fjfor some Fj∈W.

We delegate proofs to Appendix 9.4. The latent common causes lying on a directed inducing path

to Oithus ensure that we cannot partial out the error terms incident on the path from Oiin general,

even if we identiﬁed all entries in F\Fi.

We instead focus on identifying the error terms up to connection by a directed inducing path.

Speciﬁcally, let Ci⊆Tdenote the set of error terms and latent variables lying on any directed

inducing path to Oi. We consider:

E∗

i=CiθCii,(6)

1. If D=Xβ·D+EDis terminal and continuous, then we arrive at the same result when γEiW=E(D|Ei,W)−

E(D|W). We focus on a binary target because this is the most common situation encountered by far.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Sample-SpecicRootCausalInferencewithLatentVariablesEricV.Strobl,ThomasA.LaskoAbstractRootcausalanalysisseekstoidentifythesetofinitialperturbationsthatinduceanunwantedout-come.Inpriorwork,wedenedsample-specicrootcausesofdiseaseusingexogenouserrortermsthatpredictadiagnosisinastructuralequationmodel...

展开>> 收起<<

Sample-Speciﬁc Root Causal Inference with Latent Variables Eric V . Strobl Thomas A. Lasko Abstract.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sample-Speciﬁc Root Causal Inference with Latent Variables Eric V . Strobl Thomas A. Lasko Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: