Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models

2025-05-06 0 0 1.48MB 35 页 10玖币

侵权投诉

Independence Testing-Based Approach to Causal

Discovery under Measurement Error and

Linear Non-Gaussian Models

Haoyue Dai1,2Peter Spirtes1Kun Zhang1,2

1Department of Philosophy, Carnegie Mellon University

2Machine Learning Department, Mohamed bin Zayed University of Artiﬁcial Intelligence

hyda@cmu.edu ps7z@andrew.cmu.edu kunz1@cmu.edu

Abstract

Causal discovery aims to recover causal structures generating the observational data.

Despite its success in certain problems, in many real-world scenarios the observed

variables are not the target variables of interest, but the imperfect measures of

the target variables. Causal discovery under measurement error aims to recover

the causal graph among unobserved target variables from observations made with

measurement error. We consider a speciﬁc formulation of the problem, where

the unobserved target variables follow a linear non-Gaussian acyclic model, and

the measurement process follows the random measurement error model. Existing

methods on this formulation rely on non-scalable over-complete independent

component analysis (OICA). In this work, we propose the Transformed Independent

Noise (

TIN

) condition, which checks for independence between a speciﬁc linear

transformation of some measured variables and certain other measured variables.

By leveraging the non-Gaussianity and higher-order statistics of data,

TIN

informative about the graph structure among the unobserved target variables. By

utilizing

TIN

, the ordered group decomposition of the causal model is identiﬁable.

In other words, we could achieve what once required OICA to achieve by only

conducting independence tests. Experimental results on both synthetic and real-

world data demonstrate the effectiveness and reliability of our method.1

1 Introduction

PM2.5

Figure 1: Example of

measurement error. Gray

nodes are latent underly-

ing variables and white

nodes are observed ones.

Discovery of causal relations is a fundamental goal of science. To identify

causal relations from observational data, known as causal discovery,

has thus drawn much attention in various scientiﬁc ﬁelds, including

economics, biology, and social science [

]. Methods for causal

discovery can be roughly categorized into constraint-based ones (e.g.,

PC [

]), score-based ones (e.g., Greedy Equivalence Search (GES) [

]),

and ones based on structural equation models (SEM) [

]. Almost

all these methods assume that the recorded values are values of the

variables of interest, which however, is usually not the case in real-world

scenarios. Some variables may be impossible to observe or quantify, so

recorded values are actually a proxy of them (e.g., measure one’s mental

status by survey questionnaire), and some variables, though quantiﬁable,

may subject to error introduced by instruments (e.g., measure brain signals

using functional magnetic resonance (fMRI)). The difference between

quantities of interest and their measured value is termed as measurement error [12].

1An online demo and codes are available at https://cmu.edu/dietrich/causality/tin.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.11021v1 [cs.LG] 20 Oct 2022

Measurement error adversely impairs causal discovery [

]. The measuring process can be

viewed as directed edges from underlying variables of interest (unobservable) to measured values

(observable), and the d-separation patterns on underlying variables typically do no hold on measured

ones. Consider the causal effects from factory emissions to air quality then to residents’ lung

health , as shown in Figure 1, while we only have corresponding measured quantities: chimney

statistics , PM

2.5

, and hospital reported cases . Though and are independent given ,

and are however dependent given PM

2.5

. If measurement error is severe, and even tend to

be marginally independent [

], which makes PM

2.5

look like a collider (common child). One

might thus incorrectly infer that, lung cancer causes air pollution. In fact, such measurement error is

always a serious threat in environmental epidemiologic studies [30, 10].

Denote by

X={˜

Xi}n

i=1

the latent measurement-error-free variables and

X={Xi}n

i=1

observed

ones. While there are different models for measuring process [

], in this paper, we consider

the random measurement error model [

], where the observed variables are generated from the

latent measurement-error-free variables

with additive random measurement errors

E={Ei}n

i=1

Xi=˜

Xi+Ei.(1)

Measurement errors

are assumed to be mutually independent and independent of

. We

assume causal sufﬁciency relative to

(i.e., no confounder of

who does not have a respective

measurement), and focus on the case where

is generated by a linear, non-Gaussian, acyclic

model (LiNGAM [

], see §3.1). Note that here w.l.o.g., the linear weights of

{˜

Xi→Xi}n

i=1

are

assumed to be one (since we do not care about scaling). Generally, if observations are measured by

Xi=ci˜

Xi+Eiwith weights {ci}n

i=1 not necessarily being one, all results in this paper still hold.

The objective of causal discovery under measurement error is to recover causal structure among

latent variables

, denoted as

, a directed acyclic graph (DAG), from contaminated observations

. As illustrated by Figure 1, causal discovery methods that utilize (conditional) independence

produce biased estimation (see Proposition 1 for details). SEM-based methods also typically fail to

ﬁnd correct directions, since the SEM for

usually do not hold on

. Unobserved

are actually

confounders of

, and there exists approaches to deal with confounders, such as Fast Causal Inference

(FCI [

]). However, they focus on structure among observed variables instead of the unobserved

ones, which is what we aim to recover here. With the interest for the latter, another line of research

called causal discovery with latent variables is developed, which this paper is also categorized

to. However, existing methods [

] cannot be adopted either, since they

typically require at least two measurements (indicators) for each latent variable, while we only have

one for each here (and is thus a more difﬁcult task). Speciﬁcally on the measurement error problem,

[

] proposes anchored causal inference in the binary setting. In the linear Gaussian setting, [

]

presents identiﬁability conditions by factor analysis. A main difﬁculty here is the unknown variances

of the measurement errors

, otherwise the covariance matrix of

can be obtained and readily used.

To this end, [

] provides an upper-bound of

and [

] develops a consistent partial correlations

estimator. In linear non-Gaussian settings (i.e., the setting of this paper), [

] shows that the ordered

group decomposition of

, which contains major causal information, is identiﬁable. However, the

corresponding method relies on over-complete independent component analysis (OICA [

]), which

is notorious for suffering from local optimal and high computational complexity [

]. Hence, the

identiﬁability results in [54], despite the theoretical correctness, is far from practical achievability.

The main contributions of this paper are as follows:

We deﬁne the

ransformed

ndependent

oise

(

TIN

) condition, which ﬁnds and checks for independence between a speciﬁc linear transformation

(combination) of some variables and others. The existing Independent Noise (

[

]) and General-

ized Independent Noise (

GIN

[

]) conditions are special cases of

TIN

We provide graphical

criteria of

TIN

, which might further improve identiﬁability of causal discovery with latent variables.

We exploit

TIN

on a speciﬁc task, causal discovery under measurement error and LiNGAM, and

identify the ordered group decomposition. This identiﬁability result once required computationally

and statistically ineffective OICA to achieve, while we achieve it merely by conducting independence

tests. Evaluation on both synthetic and real-world data demonstrate the effectiveness of our method.

2 Motivation: Independence Condition and Structural Information

The example in Figure 1 illustrates how the (conditional) (in)dependence relations differ between

observed Xand latent ˜

X, and thus lead to biased discovery results. To put it generally, we have,

Proposition 1

(rare d-separation)

Suppose variables follow random measurement error model

deﬁned in Equation (1). For disjoint sets of observed variables

Z,Y,S

and their respective latent

ones ˜

Z,˜

Y,˜

S, d-separation Z⊥⊥

dY|Sholds, only when marginally ˜

Z⊥⊥

d˜

Y, and ˜

Z⊥⊥

d˜

Y|˜

Shold.

By ‘rare’ we mean that the d-separation patterns among

usually do not hold among

(except

for rare marginal ones), since the observed variables are not causes of any other (though the latent

variables they intend to measure might be). For example, consider the underlying

to be chain

structure (Figure 2a) and fully connected DAG (Figure 2c). There exists no (conditional) independence

on either, and PC algorithm will output just a fully connected skeleton on both cases. Then, without

(conditional) independence (which is non-parametric) to be directly used, can we create independence,

by leveraging the parametric assumption (LiNGAM) and beneﬁt from non-Gaussianity?

Naturally we recall the Independent Noise (IN) condition proposed in Direct-LiNGAM [40]:

Deﬁnition 1

(

condition)

Let

be a single variable and

be a set of variables. Suppose variables

follow LiNGAM. We say (

Z, Yi

) satisﬁes

condition, denoted by

IN(Z, Yi)

, if and only if the

residual of regressing

is statistically independent to

. Mathematically, let

˜ω

be the vector of

regression coefﬁcients, i.e., ˜ω:= cov(Yi,Z) cov(Z,Z)−1;IN(Z, Yi)holds iff Yi−˜ω|Z⊥⊥ Z.

Here “

cov

” denotes the variance-covariance matrix.

identiﬁes exogenous (root) variables, based

on which the causal ordering of variables can be determined (Lemma 1 in [

]). However,

cannot be applied to measurement error model. With hidden confounders (

) behind observed

independence between regressor and residual typically does not exist on any regression among

. In

fact, Xfollows errors-in-variables models [15, 7], for which the identiﬁability w.r.t. ˜

Gis not clear.

However, we might still beneﬁt from this idea to leverage non-Gaussianity of exogenous noises.

Consider the Figure 1 example and abstract it to

X1→

a˜

X2→

b˜

with

{˜

Xi→

1Xi}3

i=1

. Although

does not hold on any of

, interestingly, there exists a linear transformation of observations

bX2−X3

, which contains only

{˜

E3, E2, E3}

(

and

are cancelled out) and shares no common

non-Gaussian noise term with

. Hence, by the

Darmois–Skitovich theorem

[

bX2−X3

independent of X1. This ﬁnding is echoed in Generalized Independent Noise (GIN [50]) condition:

Deﬁnition 2

(

GIN

condition)

Let

and

be two sets of random variables that follow LiNGAM.

We say (

Z,Y

) satisﬁes the

GIN

condition, denoted by

GIN(Z,Y)

, if and only if the following two

conditions are satisﬁed:

There exists nonzero solution vectors

ω∈R|Y|

to equation

cov(Z,Y)ω=

0, and 2) Any such solution ωmakes the linear transformation ω|Yindependent of Z.

Here

|Y|

denotes the dimensionality of

. The intuition of

GIN

is that, despite no independent resid-

ual by normal regression, it is possible to realize independent “pseudo-residuals” [

] by regressing

with “reference variables”. [

] shows

as a special case of

GIN

(Proposition 2), and further gives

graphical criteria (Theorem 2), based on which a recursive learning algorithm is developed to solve the

latent-variable problem. Each latent variable is required to have at least two observations. Interestingly

we ﬁnd that, in measurement error models, if each latent variable

has two measurements

Xi1, Xi2

then the

GIN

condition can be readily used to fully identify the structure of

, which is already a

breakthrough over existing methods [

]. See Appendix B for the whole procedure. With

regard to our more challenging task where each

has only one measurement

, a natural question

is that, can GIN also help? Given the example in Figure 1 illustrated above, the answer seems to be

afﬁrmative:

GIN({Xi},X\{Xi})

only holds for

i= 1

, so the root can be identiﬁed. More generally:

(a)

(b)

(c)

(d)

Figure 2: Graph structure

exam-

ples. For simplicity, observed

measurement edges are omitted.

Example 1

(

GIN

on chain structure)

Consider cases where the

underlying graph

is a chain structure with

(

n≥3

) vertices

and directed edges

{˜

Xi→˜

Xi+1}n−1

i=1

. Figure 2a is an example

with

n= 5

. We ﬁnd that

GIN(Z=X1,Y=X2,3,4,5)

holds

(where

X2,3,4,5

denotes

{X2, X3, X4, X5}

; same below), with

solution

ω= [−bx −bcy −bcdz, x, y, z]|

x, y, z ∈R

ω|Y

cancels noise components in

that are also shared by

, and

thus

ω|Y⊥⊥ Z

. However,

GIN(Xi,X\{Xi})

is violated for

any other

i6= 1

(see Example 12 for a detailed derivation).

With this asymmetry, the latent root

can be identiﬁed. Fur-

thermore, GIN(Xi, Xi+1,···,n)holds for any i= 1,··· , n −2.

Example 1 might give us an intuition that by recursively testing

GIN

(over the newly found subroot and the remaining vari-

ables), we could identify the causal ordering of ﬁrst

n−2

variables for any DAG. However, this is

over-optimistic thanks to the sparsity of chain structure. Consider a denser structure:

Example 2

(

GIN

on fully connected DAG)

Consider cases where

is a fully connected DAG

with

(

n≥3

) vertices and directed edges

{˜

Xi→˜

Xj}

for every

i<j

. Figure 2c is an example

with

n= 4

. We ﬁrst ﬁnd that

GIN(X1, X2,3,4)

holds. However, in contrast to the chain structure,

GIN(X2, X3,4)

does not hold - there is no way to cancel both

E1,2

from

X3,4

. More generally we

have: GIN(X1, X2,···,n),GIN(X1,2, X3,···,n),···,GIN(X1,···,k, Xk+1,···,n),k=b(n−1)/2c.

Since a fully connected DAG is the densest extreme, Example 2 might give us an intuition that

GIN

could identify at least the causal ordering of the ﬁrst half of variables. Unfortunately, this is still

over-optimistic, since we could not know beforehand the structure type of ˜

Example 3

(

GIN

on chain structure with triangular head)

Figure 2b shows a variation of chain

structure, with edges

X1→˜

X2,˜

X1→˜

X3,˜

X3→˜

, and

{˜

Xi→˜

Xi+1}n−1

i=3

. We name it “chain

structure with triangular head”. Interestingly,

GIN(X2, X3,···,n)

, which is satisﬁed on Figure 2a,

is also satisﬁed here. E.g., in Figure 2b (

n= 5

), the solution

ω= [−dx −dey, x, y]|

x, y ∈R

ω|Y

cancels

E1,3

from

, and thus

ω|Y⊥⊥ Z

. Actually, the chain structure with triangular

head

GOC

is unidentiﬁable with chain structure

w.r.t.

GIN

conditions, i.e., for any two sets of

observed variables

Z,Y⊆X

GIN(Z,Y)

holds on

GOC

if and only if

GIN(Z,Y)

holds on

Consequently, if directly using any recursive algorithm by

GIN

as in Example 1, the output causal

ordering would still be ˜

X1,˜

X2,˜

X3,···, which is incorrect due to ˜

X3→˜

X2in the triangular head.

The above examples show that it is not as simple as it seems to use

GIN

for one-measurement model:

only part of the causal ordering can be identiﬁed, and worse yet, rather complicated error correction

is needed to deal with possible incorrect orderings. However, after a closer look at the unidentiﬁable

examples above, we ﬁnd that actually more information can be uncovered beyond the

GIN

condition:

Example 4

(Asymmetry beyond

GIN

)

. 1)

Consider Example 2 where only the root variable

can

be identiﬁed by

GIN

and

X2,3,4

are unidentiﬁable, i.e., permutation of the labeling of

X2,3,4

will still

preserve the

GIN

condition over any two subsets

Z,Y

. However, there actually exists an asymmetry

between

and

X3,4

: we could construct linear transformation of

X1,3,4

cd−be

dX1+df+e

dX3−

s.t. it is independent of

, while there exists no such linear transformation of

X1,2,4

to be

independent of

, and also for

X1,2,3

Consider Examples 1 and 3 where

and

GOC

are unidentiﬁable w.r.t.

GIN

conditions. Let

Z:=X4

and

Y:=X1,2,3

GIN(Z,Y)

is violated on

both graphs. However, an asymmetry actually exists: on

GOC

, we could construct

aX1−X2+bX3

(which cancels ˜

E1,3) to be independent to X4, while this is impossible on ˜

GC.

To put simply, the motivation of independent “pseudo-residual” behind

GIN

actually limits the

power of non-Gaussianity, with the coefﬁcients vector

only characterized from variance-covariance

matrix (2nd-order). There are actually two cases for

GIN(Z,Y)

to be violated: 1) though not all

solution

makes

ω|Y⊥⊥ Z

, there exists non-zero

s.t.

ω|Y⊥⊥ Z

, and 2) there naturally exists no

non-zero

s.t.

ω|Y⊥⊥ Z

. The original

GIN

cannot distinguish between these two cases. Hence in

this paper, we ﬁrst aim to distinguish between the two, generalizing

GIN

condition to

TIN

condition.

3 With Transformed Independent Noise Condition

In the above discussion, one can see that the presence of measurement error affects the conditional

independence relations among the variables and the independent noise condition. However, we

will show in Theorem 3 (§4) that a speciﬁc type of independence conditions are shared between

the underlying error-free variables and the measured variables with error. In this section, we will

formulate such independence conditions and investigate their graphical implications for error-free

variables (i.e., variables generated by the LiNGAM without measurement error). In §4, we will then

extend the results to the measured variables. Please note that

in contrast to other sections

, the notation

used in this section, including

, and

, denotes error-free variables generated by the LiNGAM.

3.1 Notations

Let

be a directed acyclic graph with the vertex set

V(G)=[n]:={1,2,· · · , n}

and edge set

E(G)

. A directed path

P= (i0, i1,· · · , ik)

is a sequence of vertices of

where there is a

directed edge from

ij+1

for any

0≤j≤k−1

. We use notation

i j

to show that there

exists a directed path from vertex

. Note that a single vertex is also a directed path, i.e.,

i i

holds. Let

Z⊆[n]

be a subset of vertices. Deﬁne ancestors

Anc(Z):={j|∃i∈Z, j i}

. Note

that

Z⊆Anc(Z)

. Further let

be a subset of vertices. We use notation



[S]j

to show that there

exists a directed path from vertex

without passing through

, i.e., there exists a directed path

P= (i, m0,· · · , mk, j)

s.t.

i, j 6∈ S

and

ml6∈ S

for any

0≤l≤k

. Deﬁne ancestors outside

accordingly: for two vertex sets

Y,S

, denote ancestors of

that have directed paths into

without

passing through

Ancout(S)(Y):={j|∃i∈Y, j



[S]i}

. Note that the graphical deﬁnitions here

can also be translated to trek [47] language (see Appendix C for details).

Assume random variables X:={Xi}i∈[n]are generated by LiNGAM w.r.t. graph G, i.e.,

X=AX +E=BE, with B= (I−A)−1.(2)

where

E={Ei}i∈[n]

are corresponding mutually independent exogenous noises.

is the adjacency

matrix where entry

Aj,i

is the linear weight of direct causal effect of variable

Aj,i 6= 0

and only if there exists edge

i→j

can also be written directly as a mixture of exogenous noises

X=BE

. If the entry of mixing matrix

Bj,i 6= 0

, then

i∈Anc({j})

. Note that here and in what

follows, we use boldface letters

A,B

to denote matrices, and use boldface letters

S,W,X,Y,Z

with notation abuse: it can denote vertices set, respective random variables set, or random vector.

When we say “two variables sets Z,Y”, if not otherwise speciﬁed, Z,Yneed not be disjoint.

3.2 Independent Linear Transformation Subspace and its Characterization

We ﬁrst give the deﬁnition and characterization of the independent linear transformation subspace.

Deﬁnition 3

(Independent linear transformation subspace)

Let

and

be two subsets of random

variables. Suppose the variables follow the linear non-Gaussian acyclic causal model. Denote:

ΩZ;Y:={ω∈R|Y||ω|Y⊥⊥ Z}.(3)

By the property of independence,

ΩZ;Y

is closed under scalar multiplication and addition, and thus is

a subspace in R|Y|. In fact, ΩZ;Ycan be characterized as a nullspace as follows:

Theorem 1 (Characterization of ΩZ;Y).For two variables subsets Zand Y,ΩZ;Ysatisﬁes:

ΩZ;Y= null(B|

Y,nzcol(BZ,:)).(4)

where

null(·)

denotes nullspace.

BY,nzcol(BZ,:)

denotes the submatrix of mixing matrix

, with rows

indexed by

and columns indexed by

nzcol(BZ,:)

denotes the column indices where

the submatrix

BZ,:

has non-zero entries.

nzcol(BZ,:)

actually corresponds to the exogenous noises

that constitute Z. Particularly, if assuming “if i jthen Bj,i 6= 0”, then, nzcol(BZ,:) = Anc(Z).

Proof of Theorem 1 is straight-forward by the

Darmois–Skitovich theorem

[

]: linear transformation

ω|Y⊥⊥ Z

if and only if

ω|Y

shares no common non-Gaussian exogenous noise components with

Example 5

(Revisiting examples in §2 from

ΩZ;Y

perspective)

For illustration, now we revisit the

examples in §2 from the perspective of independent linear transformation subspace.

1 0 0 0 0

a1 0 0 0

ab b 1 0 0

abc bc c 1 0

abcd bcd cd d 1











,

1 0 0 0 0

a+bc 1b0 0

c0 1 0 0

cd 0d1 0

cde 0de e 1











,

1 0 0 0

a1 0 0

ad +b d 1 0

a(df +e) + bf +c df +e f 1









(5)

Equation (5) shows the corresponding mixing matrix

for graph

in Figures 2a to 2c, re-

spectively. Suppose we have access to underlying variables

and only focus on

. Colored

blocks denote submatrices of

For the fully connected DAG (Figure 2c, the right matrix),

to identify the root

GIN( ˜

X1,˜

X2,3,4)

is satisﬁed, corresponding to

ΩZ;Y= null( |)

. For

(Z,Y):= ( ˜

X2,˜

X3,4)

(˜

X4,˜

X1,2,3)

, there exists no non-zero

s.t.

ω|Y⊥⊥ Z

, because the lower

part and are full row rank. However, if we set

(Z,Y):= ( ˜

X2,˜

X1,3,4)

, we actually have

X2⊥⊥ cd−be

d˜

X1+df+e

d˜

X3−˜

, because the stacked two parts of has rank

2<3

- though

GIN

is still violated because

cov(Z,Y)

has rank

1<2

For the chain structure

(Figure 2a,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IndependenceTesting-BasedApproachtoCausalDiscoveryunderMeasurementErrorandLinearNon-GaussianModelsHaoyueDai1;2PeterSpirtes1KunZhang1;21DepartmentofPhilosophy,CarnegieMellonUniversity2MachineLearningDepartment,MohamedbinZayedUniversityofArticialIntelligencehyda@cmu.edups7z@andrew.cmu.edukunz1@cmu.ed...

展开>> 收起<<

Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: