Finding and Listing Front-door Adjustment Sets Hyunchai Jeong Purdue University

2025-04-27 0 0 693.35KB 18 页 10玖币
侵权投诉
Finding and Listing Front-door Adjustment Sets
Hyunchai Jeong
Purdue University
jeong3@purdue.edu
Jin Tian
Iowa State University
jtian@iastate.edu
Elias Bareinboim
Columbia University
eb@cs.columbia.edu
Abstract
Identifying the effects of new interventions from data is a significant challenge
found across a wide range of the empirical sciences. A well-known strategy for
identifying such effects is Pearl’s front-door (FD) criterion [
26
]. The definition
of the FD criterion is declarative, only allowing one to decide whether a specific
set satisfies the criterion. In this paper, we present algorithms for finding and
enumerating possible sets satisfying the FD criterion in a given causal diagram.
These results are useful in facilitating the practical applications of the FD criterion
for causal effects estimation and helping scientists to select estimands with desired
properties, e.g., based on cost, feasibility of measurement, or statistical power.
1 Introduction
Learning cause and effect relationships is a fundamental challenge across data-driven fields. For
example, health scientists developing a treatment for curing lung cancer need to understand how a
new drug affects the patient’s body and the tumor’s progression. The distillation of causal relations is
indispensable to understanding the dynamics of the underlying system and how to perform decision-
making in a principled and systematic fashion [27, 37, 2, 30].
One of the most common methods for learning causal relations is through Randomized Controlled
Trials (RCTs, for short) [
8
]. RCTs are considered as the “gold standard” in many fields of empirical
research and are used throughout the health and social sciences as well as machine learning and AI.
In practice, however, RCTs are often hard to perform due to ethical, financial, and technical issues.
For instance, it may be unethical to submit an individual to a certain condition if such condition may
have some potentially negative effects (e.g., smoking). Whenever RCTs cannot be conducted, one
needs to resort to analytical methods to infer causal relations from observational data, which appears
in the literature as the problem of causal effect identification [26, 27].
The causal identification problem asks whether the effect of holding a variable
X
at a constant value
x
on a variable
Y
, written as
P(Y|do(X=x))
, or
P(Y|do(x))
, can be computed from a combination
of observational data and causal assumptions. One of the most common ways of eliciting these
assumptions is in the form of a causal diagram represented by a directed acyclic graph (DAG), where
its nodes and edges describe the underlying data generating process. For instance, in Fig. 1a, three
nodes
X, Z, Y
represent variables, a directed edge
XZ
indicates that
X
causes
Z
, and a dashed-
bidirected edge
XY
represents that
X
and
Y
are confounded by unmeasured (latent) factors.
Different methods can solve the identification problem and a number of generalizations, including
Pearl’s celebrated do-calculus [26] as well as different algorithmic solutions [40, 34, 12, 1, 23, 24].
In practice, researchers often rely on identification strategies that generate well-known identification
formulas. One of the arguably most popular strategies is identification by covariate adjustment.
Whenever a set Zsatisfies the back-door (BD) criterion [26] relative to the pair Xand Y, where X
and
Y
represent the treatment and outcome variables, respectively, the causal effect
P(Y|do(x))
can
be evaluated through the BD adjustment formula PzP(y|x, z)P(z).
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.05816v2 [stat.ME] 14 Oct 2022
Despite the popularity of the covariate adjustment technique for estimating causal effects, there are
still settings in which no BD admissible set exists. For example, consider the causal diagram
G
in Fig. 1a. There clearly exists no set to block the BD path from
X
to
Y
, through the bidirected
arrow,
XY
. One may surmise that this effect is not identifiable and the only one of evaluating
the interventional distribution is through experimentation. Still, this is not the case. The effect
P(Y|do(x))
is identifiable from
G
and the observed distribution
P(x, y, z)
over
{X, Y, Z}
by another
classic identification strategy known as the front-door (FD) criterion [
26
]. In particular, through the
following FD adjustment formula provides the way of evaluating the interventional distribution:
P(Y|do(x)) = X
z
P(z|x)X
x0
P(y|x0, z)P(x0).(1)
We refer to Pearl and Mackenzie
[28
, Sec. 3.4
]
for an interesting account of the history of the FD
criterion, which was the first graphical generalization of the BD case. The FD criterion is drawing
more attention in recent years. For applications of the FD criterion, see, e.g., Hünermund and
Bareinboim
[13]
and Glynn and Kashin
[10]
. Statistically efficient and doubly robust estimators have
recently been developed for estimating the FD estimand in Eq. (1) from finite samples [
9
], which are
still elusive for arbitrary estimands identifiable in a diagram despite recent progress [
18
,
19
,
5
,
20
,
43
].
X YZ
(a) G
X A B Y
C
D
(b) G0
Figure 1: (a) A canonical example of the FD crite-
rion where
{Z}
satisfies the FD criterion relative
to
({X},{Y})
. In (b), four FD adjustment sets rel-
ative to
({X},{Y})
are available:
{A}
,
{A, B}
,
{A, C}, and {A, B, C}.
Both the BD and FD criteria are only descriptive,
i.e., they specify whether a specific set
Z
satis-
fies the criteria or not, but do not provide a way
to find an admissible set
Z
. In addition, in many
situations, it is possible that multiple adjustment
sets exist. Consider for example the causal dia-
gram in Fig. 1b, and the task of identifying the
effect of Xon Y. The distribution P(Y|do(x))
can indeed be identified by the FD criterion with
a set
Z={A, B, C}
given by the expression in
Eq. (1) (with
Z
replaced with
{A, B, C}
). Still,
what if the variable
B
is costly to measure or en-
codes some personal information about patients
which is undesirable to be shared due to ethi-
cal concerns? In this case, the set
Z={A, C}
also satisfies the FD criterion and may be used.
Even when both
B
and
C
are unmeasured, the
set Z={A}is also FD admissible.
This simple example shows that a target effect can be estimated using different adjustment sets leading
to different probability expressions over different set of variables, which has important practical
implications. Each variable implies different practical challenges in terms of measurement, such
as cost, availability, privacy. Each estimand has different statistical properties in terms of sample
complexity, variance, which may play a key role in the study design [
31
,
11
,
32
,
36
]. Algorithms
for finding and listing all possible adjustment sets are hence very useful in practice, which will
allow scientists to select an adjustment set that exhibits desirable properties. Indeed, algorithms have
been developed in recent years for finding one or listing all BD admissible sets [
38
,
39
,
41
,
29
,
42
].
However, no such algorithm is currently available for finding/listing FD admissible sets.
The goal of this paper is to close this gap to facilitate the practical applications of the FD criterion
for causal effects estimation and help scientists to select estimand with certain desired properties
1
.
Specifically, the contributions of this paper are as follows:
1.
We develop an algorithm that finds an admissible front-door adjustment set
Z
in a given
causal diagram in polynomial time (if one exists). We solve a variant of the problem that
imposes constraints
IZR
for given sets
I
and
R
, which allows a scientist to constrain
the search to include specific subsets of variables or exclude variables from search perhaps
due to cost, availability, or other technical considerations.
2.
We develop a sound and complete algorithm that enumerates all front-door adjustment sets
with polynomial delay - the algorithm takes polynomial amount of time to return each new
admissible set, if one exists, or return failure whenever it exhausted all admissible sets.
1Code is available at https://github.com/CausalAILab/FrontdoorAdjustmentSets.
2
2 Preliminaries
Notation.
We write a variable in capital letters (
X
) and its value as small letters (
x
). Bold letters,
X
or
x
, represent a set of variables or values. We use kinship terminology to denote various
relationships in a graph
G
and denote the parents, ancestors, and descendants of
X
(including
X
itself) as
Pa(X),An(X)
, and
De(X)
, respectively. Given a graph
G
over a set of variables
V
, a
subgraph
GX
consists of a subset of variables
XV
and their incident edges in
G
. A graph
G
can be
transformed:
GX
is the graph resulting from removing all incoming edges to
X
, and
GX
is the graph
with all outgoing edges from
X
removed. A DAG
G
may be moralized into an undirected graph
where all directed edges of
G
are converted into undirected edges, and for every pair of nonadjacent
nodes in Gthat share a common child, an undirected edge that connects such pair is added [22].
A path
π
from a node
X
to a node
Y
in
G
is a sequence of edges where
X
and
Y
are the endpoints of
π
. A node
W
on
π
is said to be a collider if
W
has converging arrows into
W
in
π
, e.g.,
W
or
W
.
π
is said to be blocked by a set
Z
if there exists a node
W
on
π
satisfying one of the
following two conditions: 1)
W
is a collider, and neither
W
nor any of its descendants are in
Z
, or
2)
W
is not a collider, and
W
is in
Z
[
25
]. Given three disjoint sets
X,Y
, and
Z
in
G
,
Z
is said
to
d
-separate
X
from
Y
in
G
if and only if
Z
blocks every path from a node in
X
to a node in
Y
according to the d-separation criterion [25], and we say that Zis a separator of Xand Yin G.
Structural Causal Models (SCMs).
We use Structural Causal Models (SCMs, for short) [
27
] as
our basic semantical framework. An SCM is a 4-tuple
hU,V,F, P (u)i
, where 1)
U
is a set of
exogenous (latent) variables, 2)
V
is a set of endogenous (observed) variables, 3)
F
is a set of
functions
{fV}VV
that determine the value of endogenous variables, e.g.,
vfV(paV,uV)
is a function with
PAVV\ {V}
and
UVU
, and 4)
P(u)
is a joint distribution over the
exogenous variables
U
. Each SCM induces a causal diagram
G
[
3
, Def. 13] where every variable
vV
is a vertex and directed edges in
G
correspond to functional relationships as specified in
F
and dashed bidirected edges represent common exogenous variables between two vertices. Within
the structural semantics, performing an intervention and setting
X=x
is represented through the
do-operator,
do(X=x)
, which encodes the operation of replacing the original functions of
X
(i.e.,
fX(paX,uX)
) by the constant
x
and induces a submodel
Mx
and an interventional distribution
P(v|do(x)).
Classic Causal Effects Identification Criteria.
Given a causal diagram
G
over
V
, an effect
P(y|do(x))
is said to be identifiable in
G
if
P(y|do(x))
is uniquely computable from the observed
distribution P(v)in any SCM that induces G[27, p. 77].
A path between
X
and
Y
with an arrow into
X
is known as a back-door path from
X
to
Y
. The
celebrated back-door (BD) criterion [
26
] provides a sufficient condition for effect identification from
observational data, which states that if a set
Z
of non-descendants of
X
blocks all BD paths from
X
to Y, then the causal effect P(y|do(x)) is identified by the BD adjustment formula:
P(y|do(x)) = X
z
P(y|x,z)P(z)(2)
Another classic identification condition that is key to the discussion in this paper is known as the
front-door criterion, which is defined as follows:
Definition 1.
(Front-door (FD) Criterion [
26
]) A set of variables
Z
is said to satisfy the front-door
criterion relative to the pair (X,Y)if
1. Zintercepts all directed paths from Xto Y,
2. There is no unblocked back-door path from Xto Z, and
3.
All back-door paths from
Z
to
Y
are blocked by
X
, i.e.,
X
is a separator of
Z
and
Y
in
GZ
.
If
Z
satisfies the FD criterion relative to the pair
(X,Y)
, then
P(y|do(x))
is identified by the
following FD adjustment formula [26]:
P(y|do(x)) = X
z
P(z|x)X
x0
P(y|x0,z)P(x0).(3)
3
3 Finding A Front-door Adjustment Set
Algorithm 1 FINDFDSET (G,X,Y,I,R)
1: Input: G
a causal diagram;
X,Y
disjoint sets of
variables; I,Rsets of variables.
2: Output: Z
a set of variables satisfying the front-
door criterion relative to
(X,Y)
with the con-
straint IZR.
3: Step 1:
4: R0GETCAND2NDFDC(G,X,I,R)
5: if R0=then: return
6: Step 2:
7: R00 GETCAND3RDFDC(G,X,Y,I,R0)
8: if R00 =then: return
9: Step 3:
10: G0GETCAUSALPATHGRAPH(G,X,Y)
11: if TESTSEP(G0,X,Y,R00 ) = True then:
12: return Z=R00
13: else: return
In this section, we address the following ques-
tion: given a causal diagram
G
, is there a set
Z
that satisfies the FD criterion relative to
the pair
(X,Y)
and, therefore, allows us to
identify
P(y|do(x))
by the FD adjustment?
We solve a more general variant of this ques-
tion that imposes a constraint
IZR
for
given sets
I
and
R
. Here,
I
are variables that
must be included in
Z
(
I
could be empty) and
R
are variables that could be included in
Z
(
R
could be
V\(XY)
). Note the constraint
that variables in
W
cannot be included can
be enforced by excluding
W
from
R
. Solv-
ing this version of the problem will allow sci-
entists to put constraints on candidate adjust-
ment sets based on practical considerations.
In addition, this version will form a building
block for an algorithm that enumerates all FD
admissible sets in a given
G
- the algorithm
LISTFDSETS (shown in Alg. 2 in Section 4)
for listing all FD admissible sets will utilize
this result during the recursive call.
We have developed a procedure called FINDFDSET shown in Alg. 1 that outputs a FD adjustment set
Z
relative to
(X,Y)
satisfying
IZR
, or outputs
if none exists, given a causal diagram
G
,
disjoint sets of variables Xand Y, and two sets of variables Iand R.
Example 1.
Consider the causal graph
G0
, shown in Fig. 1b, with
X={X}
,
Y={Y}
,
I=
and
R={A, B, C, D}
. Then, FINDFDSET outputs
{A, B, C}
. With
I={C}
and
R={A, C}
,
FINDFDSET outputs
{A, C}
. With
I={D}
and
R={A, B, C, D}
, FINDFDSET outputs
as no
FD adjustment set that contains Dis available.
1: function GETCAND2NDFDC(G,X,I,R)
2: Output: R0
with
IR0R
, the set of
candidate variables consisting of all the variables
vR
such that there is no BD path from
X
to
v
.
3: R0R
4: for all vR:
5: if TESTSEP(GX,X, v, ) = False then:
6: if vIthen: return
7: else: R0R0\ {v}
8: end for
9: return R0
10: end function
Figure 2: A function that outputs the set of candidate
variables satisfying the second condition of the FD
criterion.
FINDFDSET runs in three major steps. Each
step identifies candidate variables that incre-
mentally satisfy each of the conditions of the
FD criterion relative to
(X,Y)
. First, FIND-
FDSET constructs a set of candidate vari-
ables
R0
, with
IR0R
, such that every
subset
Z
with
IZR0
satisfies the sec-
ond condition of the FD criterion (i.e., there
is no BD path from
X
to
Z
). Next, FIND-
FDSET generates a set of candidate variables
R00
, with
IR00 R0
, such that for every
variable
vR00
, there exists a set
Z
with
IZR0
and
vZ
that further satisfies
the third condition of the FD criterion, that
is, all BD paths from
Z
to
Y
are blocked by
X
. Finally, FINDFDSET outputs a set
Z
that
further satisfies the first condition of the FD
criterion -
Z
intercepts all causal paths from
Xto Y.
Step 1 of FINDFDSET
In Step 1, FINDFDSET calls the function GETCAND2NDFDC (presented in Fig. 2) to construct a set
R0
that consists of all the variables
vR
such that there is no BD path from
X
to
v
(
R0
is set to
empty if there is a BD path from
X
to
I
). Then, there is no BD path from
X
to any set
IZR0
since, by definition, there is no BD path from
X
to
Z
if and only if there is no BD path from
X
to
any vZ.
4
摘要:

FindingandListingFront-doorAdjustmentSetsHyunchaiJeongPurdueUniversityjeong3@purdue.eduJinTianIowaStateUniversityjtian@iastate.eduEliasBareinboimColumbiaUniversityeb@cs.columbia.eduAbstractIdentifyingtheeffectsofnewinterventionsfromdataisasignicantchallengefoundacrossawiderangeoftheempiricalscience...

展开>> 收起<<
Finding and Listing Front-door Adjustment Sets Hyunchai Jeong Purdue University.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:693.35KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注