Finding and Listing Front-door Adjustment Sets Hyunchai Jeong Purdue University

2025-04-27 0 0 693.35KB 18 页 10玖币

侵权投诉

Finding and Listing Front-door Adjustment Sets

Hyunchai Jeong

Purdue University

jeong3@purdue.edu

Jin Tian

Iowa State University

jtian@iastate.edu

Elias Bareinboim

Columbia University

eb@cs.columbia.edu

Abstract

Identifying the effects of new interventions from data is a signiﬁcant challenge

found across a wide range of the empirical sciences. A well-known strategy for

identifying such effects is Pearl’s front-door (FD) criterion [

]. The deﬁnition

of the FD criterion is declarative, only allowing one to decide whether a speciﬁc

set satisﬁes the criterion. In this paper, we present algorithms for ﬁnding and

enumerating possible sets satisfying the FD criterion in a given causal diagram.

These results are useful in facilitating the practical applications of the FD criterion

for causal effects estimation and helping scientists to select estimands with desired

properties, e.g., based on cost, feasibility of measurement, or statistical power.

1 Introduction

Learning cause and effect relationships is a fundamental challenge across data-driven ﬁelds. For

example, health scientists developing a treatment for curing lung cancer need to understand how a

new drug affects the patient’s body and the tumor’s progression. The distillation of causal relations is

indispensable to understanding the dynamics of the underlying system and how to perform decision-

making in a principled and systematic fashion [27, 37, 2, 30].

One of the most common methods for learning causal relations is through Randomized Controlled

Trials (RCTs, for short) [

]. RCTs are considered as the “gold standard” in many ﬁelds of empirical

research and are used throughout the health and social sciences as well as machine learning and AI.

In practice, however, RCTs are often hard to perform due to ethical, ﬁnancial, and technical issues.

For instance, it may be unethical to submit an individual to a certain condition if such condition may

have some potentially negative effects (e.g., smoking). Whenever RCTs cannot be conducted, one

needs to resort to analytical methods to infer causal relations from observational data, which appears

in the literature as the problem of causal effect identiﬁcation [26, 27].

The causal identiﬁcation problem asks whether the effect of holding a variable

at a constant value

on a variable

, written as

P(Y|do(X=x))

, or

P(Y|do(x))

, can be computed from a combination

of observational data and causal assumptions. One of the most common ways of eliciting these

assumptions is in the form of a causal diagram represented by a directed acyclic graph (DAG), where

its nodes and edges describe the underlying data generating process. For instance, in Fig. 1a, three

nodes

X, Z, Y

represent variables, a directed edge

X→Z

indicates that

causes

, and a dashed-

bidirected edge

X↔Y

represents that

and

are confounded by unmeasured (latent) factors.

Different methods can solve the identiﬁcation problem and a number of generalizations, including

Pearl’s celebrated do-calculus [26] as well as different algorithmic solutions [40, 34, 12, 1, 23, 24].

In practice, researchers often rely on identiﬁcation strategies that generate well-known identiﬁcation

formulas. One of the arguably most popular strategies is identiﬁcation by covariate adjustment.

Whenever a set Zsatisﬁes the back-door (BD) criterion [26] relative to the pair Xand Y, where X

and

represent the treatment and outcome variables, respectively, the causal effect

P(Y|do(x))

can

be evaluated through the BD adjustment formula PzP(y|x, z)P(z).

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.05816v2 [stat.ME] 14 Oct 2022

Despite the popularity of the covariate adjustment technique for estimating causal effects, there are

still settings in which no BD admissible set exists. For example, consider the causal diagram

in Fig. 1a. There clearly exists no set to block the BD path from

, through the bidirected

arrow,

X↔Y

. One may surmise that this effect is not identiﬁable and the only one of evaluating

the interventional distribution is through experimentation. Still, this is not the case. The effect

P(Y|do(x))

is identiﬁable from

and the observed distribution

P(x, y, z)

over

{X, Y, Z}

by another

classic identiﬁcation strategy known as the front-door (FD) criterion [

]. In particular, through the

following FD adjustment formula provides the way of evaluating the interventional distribution:

P(Y|do(x)) = X

P(z|x)X

P(y|x0, z)P(x0).(1)

We refer to Pearl and Mackenzie

[28

, Sec. 3.4

]

for an interesting account of the history of the FD

criterion, which was the ﬁrst graphical generalization of the BD case. The FD criterion is drawing

more attention in recent years. For applications of the FD criterion, see, e.g., Hünermund and

Bareinboim

[13]

and Glynn and Kashin

[10]

. Statistically efﬁcient and doubly robust estimators have

recently been developed for estimating the FD estimand in Eq. (1) from ﬁnite samples [

], which are

still elusive for arbitrary estimands identiﬁable in a diagram despite recent progress [

X YZ

(a) G

X A B Y

(b) G0

Figure 1: (a) A canonical example of the FD crite-

rion where

{Z}

satisﬁes the FD criterion relative

({X},{Y})

. In (b), four FD adjustment sets rel-

ative to

({X},{Y})

are available:

{A}

{A, B}

{A, C}, and {A, B, C}.

Both the BD and FD criteria are only descriptive,

i.e., they specify whether a speciﬁc set

satis-

ﬁes the criteria or not, but do not provide a way

to ﬁnd an admissible set

. In addition, in many

situations, it is possible that multiple adjustment

sets exist. Consider for example the causal dia-

gram in Fig. 1b, and the task of identifying the

effect of Xon Y. The distribution P(Y|do(x))

can indeed be identiﬁed by the FD criterion with

a set

Z={A, B, C}

given by the expression in

Eq. (1) (with

replaced with

{A, B, C}

). Still,

what if the variable

is costly to measure or en-

codes some personal information about patients

which is undesirable to be shared due to ethi-

cal concerns? In this case, the set

Z={A, C}

also satisﬁes the FD criterion and may be used.

Even when both

and

are unmeasured, the

set Z={A}is also FD admissible.

This simple example shows that a target effect can be estimated using different adjustment sets leading

to different probability expressions over different set of variables, which has important practical

implications. Each variable implies different practical challenges in terms of measurement, such

as cost, availability, privacy. Each estimand has different statistical properties in terms of sample

complexity, variance, which may play a key role in the study design [

]. Algorithms

for ﬁnding and listing all possible adjustment sets are hence very useful in practice, which will

allow scientists to select an adjustment set that exhibits desirable properties. Indeed, algorithms have

been developed in recent years for ﬁnding one or listing all BD admissible sets [

However, no such algorithm is currently available for ﬁnding/listing FD admissible sets.

The goal of this paper is to close this gap to facilitate the practical applications of the FD criterion

for causal effects estimation and help scientists to select estimand with certain desired properties

Speciﬁcally, the contributions of this paper are as follows:

We develop an algorithm that ﬁnds an admissible front-door adjustment set

in a given

causal diagram in polynomial time (if one exists). We solve a variant of the problem that

imposes constraints

I⊆Z⊆R

for given sets

and

, which allows a scientist to constrain

the search to include speciﬁc subsets of variables or exclude variables from search perhaps

due to cost, availability, or other technical considerations.

We develop a sound and complete algorithm that enumerates all front-door adjustment sets

with polynomial delay - the algorithm takes polynomial amount of time to return each new

admissible set, if one exists, or return failure whenever it exhausted all admissible sets.

1Code is available at https://github.com/CausalAILab/FrontdoorAdjustmentSets.

2 Preliminaries

Notation.

We write a variable in capital letters (

) and its value as small letters (

). Bold letters,

, represent a set of variables or values. We use kinship terminology to denote various

relationships in a graph

and denote the parents, ancestors, and descendants of

(including

itself) as

Pa(X),An(X)

, and

De(X)

, respectively. Given a graph

over a set of variables

, a

subgraph

consists of a subset of variables

X⊆V

and their incident edges in

. A graph

can be

transformed:

is the graph resulting from removing all incoming edges to

, and

is the graph

with all outgoing edges from

removed. A DAG

may be moralized into an undirected graph

where all directed edges of

are converted into undirected edges, and for every pair of nonadjacent

nodes in Gthat share a common child, an undirected edge that connects such pair is added [22].

A path

from a node

to a node

is a sequence of edges where

and

are the endpoints of

. A node

is said to be a collider if

has converging arrows into

, e.g.,

→W←

↔W←

is said to be blocked by a set

if there exists a node

satisfying one of the

following two conditions: 1)

is a collider, and neither

nor any of its descendants are in

, or

is not a collider, and

is in

[

]. Given three disjoint sets

X,Y

, and

is said

-separate

from

if and only if

blocks every path from a node in

to a node in

according to the d-separation criterion [25], and we say that Zis a separator of Xand Yin G.

Structural Causal Models (SCMs).

We use Structural Causal Models (SCMs, for short) [

] as

our basic semantical framework. An SCM is a 4-tuple

hU,V,F, P (u)i

, where 1)

is a set of

exogenous (latent) variables, 2)

is a set of endogenous (observed) variables, 3)

is a set of

functions

{fV}V∈V

that determine the value of endogenous variables, e.g.,

v←fV(paV,uV)

is a function with

PAV⊆V\ {V}

and

UV⊆U

, and 4)

P(u)

is a joint distribution over the

exogenous variables

. Each SCM induces a causal diagram

[

, Def. 13] where every variable

v∈V

is a vertex and directed edges in

correspond to functional relationships as speciﬁed in

and dashed bidirected edges represent common exogenous variables between two vertices. Within

the structural semantics, performing an intervention and setting

X=x

is represented through the

do-operator,

do(X=x)

, which encodes the operation of replacing the original functions of

(i.e.,

fX(paX,uX)

) by the constant

and induces a submodel

and an interventional distribution

P(v|do(x)).

Classic Causal Effects Identiﬁcation Criteria.

Given a causal diagram

over

, an effect

P(y|do(x))

is said to be identiﬁable in

P(y|do(x))

is uniquely computable from the observed

distribution P(v)in any SCM that induces G[27, p. 77].

A path between

and

with an arrow into

is known as a back-door path from

. The

celebrated back-door (BD) criterion [

] provides a sufﬁcient condition for effect identiﬁcation from

observational data, which states that if a set

of non-descendants of

blocks all BD paths from

to Y, then the causal effect P(y|do(x)) is identiﬁed by the BD adjustment formula:

P(y|do(x)) = X

P(y|x,z)P(z)(2)

Another classic identiﬁcation condition that is key to the discussion in this paper is known as the

front-door criterion, which is deﬁned as follows:

Deﬁnition 1.

(Front-door (FD) Criterion [

]) A set of variables

is said to satisfy the front-door

criterion relative to the pair (X,Y)if

1. Zintercepts all directed paths from Xto Y,

2. There is no unblocked back-door path from Xto Z, and

All back-door paths from

are blocked by

, i.e.,

is a separator of

and

satisﬁes the FD criterion relative to the pair

(X,Y)

, then

P(y|do(x))

is identiﬁed by the

following FD adjustment formula [26]:

P(y|do(x)) = X

P(z|x)X

P(y|x0,z)P(x0).(3)

3 Finding A Front-door Adjustment Set

Algorithm 1 FINDFDSET (G,X,Y,I,R)

1: Input: G

a causal diagram;

X,Y

disjoint sets of

variables; I,Rsets of variables.

2: Output: Z

a set of variables satisfying the front-

door criterion relative to

(X,Y)

with the con-

straint I⊆Z⊆R.

3: Step 1:

4: R0←GETCAND2NDFDC(G,X,I,R)

5: if R0=⊥then: return ⊥

6: Step 2:

7: R00 ←GETCAND3RDFDC(G,X,Y,I,R0)

8: if R00 =⊥then: return ⊥

9: Step 3:

10: G0←GETCAUSALPATHGRAPH(G,X,Y)

11: if TESTSEP(G0,X,Y,R00 ) = True then:

12: return Z=R00

13: else: return ⊥

In this section, we address the following ques-

tion: given a causal diagram

, is there a set

that satisﬁes the FD criterion relative to

the pair

(X,Y)

and, therefore, allows us to

identify

P(y|do(x))

by the FD adjustment?

We solve a more general variant of this ques-

tion that imposes a constraint

I⊆Z⊆R

for

given sets

and

. Here,

are variables that

must be included in

(

could be empty) and

are variables that could be included in

(

could be

V\(X∪Y)

). Note the constraint

that variables in

cannot be included can

be enforced by excluding

from

. Solv-

ing this version of the problem will allow sci-

entists to put constraints on candidate adjust-

ment sets based on practical considerations.

In addition, this version will form a building

block for an algorithm that enumerates all FD

admissible sets in a given

- the algorithm

LISTFDSETS (shown in Alg. 2 in Section 4)

for listing all FD admissible sets will utilize

this result during the recursive call.

We have developed a procedure called FINDFDSET shown in Alg. 1 that outputs a FD adjustment set

relative to

(X,Y)

satisfying

I⊆Z⊆R

, or outputs

⊥

if none exists, given a causal diagram

disjoint sets of variables Xand Y, and two sets of variables Iand R.

Example 1.

Consider the causal graph

, shown in Fig. 1b, with

X={X}

Y={Y}

I=∅

and

R={A, B, C, D}

. Then, FINDFDSET outputs

{A, B, C}

. With

I={C}

and

R={A, C}

FINDFDSET outputs

{A, C}

. With

I={D}

and

R={A, B, C, D}

, FINDFDSET outputs

⊥

as no

FD adjustment set that contains Dis available.

1: function GETCAND2NDFDC(G,X,I,R)

2: Output: R0

with

I⊆R0⊆R

, the set of

candidate variables consisting of all the variables

v∈R

such that there is no BD path from

3: R0←R

4: for all v∈R:

5: if TESTSEP(GX,X, v, ∅) = False then:

6: if v∈Ithen: return ⊥

7: else: R0←R0\ {v}

8: end for

9: return R0

10: end function

Figure 2: A function that outputs the set of candidate

variables satisfying the second condition of the FD

criterion.

FINDFDSET runs in three major steps. Each

step identiﬁes candidate variables that incre-

mentally satisfy each of the conditions of the

FD criterion relative to

(X,Y)

. First, FIND-

FDSET constructs a set of candidate vari-

ables

, with

I⊆R0⊆R

, such that every

subset

with

I⊆Z⊆R0

satisﬁes the sec-

ond condition of the FD criterion (i.e., there

is no BD path from

). Next, FIND-

FDSET generates a set of candidate variables

R00

, with

I⊆R00 ⊆R0

, such that for every

variable

v∈R00

, there exists a set

with

I⊆Z⊆R0

and

v∈Z

that further satisﬁes

the third condition of the FD criterion, that

is, all BD paths from

are blocked by

. Finally, FINDFDSET outputs a set

that

further satisﬁes the ﬁrst condition of the FD

criterion -

intercepts all causal paths from

Xto Y.

Step 1 of FINDFDSET

In Step 1, FINDFDSET calls the function GETCAND2NDFDC (presented in Fig. 2) to construct a set

that consists of all the variables

v∈R

such that there is no BD path from

(

is set to

empty if there is a BD path from

). Then, there is no BD path from

to any set

I⊆Z⊆R0

since, by deﬁnition, there is no BD path from

if and only if there is no BD path from

any v∈Z.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FindingandListingFront-doorAdjustmentSetsHyunchaiJeongPurdueUniversityjeong3@purdue.eduJinTianIowaStateUniversityjtian@iastate.eduEliasBareinboimColumbiaUniversityeb@cs.columbia.eduAbstractIdentifyingtheeffectsofnewinterventionsfromdataisasignicantchallengefoundacrossawiderangeoftheempiricalscience...

展开>> 收起<<

Finding and Listing Front-door Adjustment Sets Hyunchai Jeong Purdue University.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Finding and Listing Front-door Adjustment Sets Hyunchai Jeong Purdue University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: