Disentangling Mixtures of Unknown Causal Interventions Abhinav Kumar Microsoft Research Bangalore

2025-04-27 0 0 1.57MB 23 页 10玖币
侵权投诉
Disentangling Mixtures of Unknown Causal Interventions*
Abhinav Kumar
Microsoft Research, Bangalore
Karnataka, India
abhinavkumar.wk@gmail.com
Gaurav Sinha
Microsoft Research, Bangalore
Karnataka, India
sinhagaur88@gmail.com
Abstract
In many real-world scenarios, such as gene knockout experiments, targeted interventions are often
accompanied by unknown interventions at off-target sites. Moreover, different units can get randomly
exposed to different unknown interventions, thereby creating a mixture of interventions. Identifying dif-
ferent components of this mixture can be very valuable in some applications. Motivated by such situations,
in this work, we study the problem of identifying all components present in a mixture of interventions on
a given causal Bayesian Network. We construct an example to show that, in general, the components are
not identifiable from the mixture distribution. Next, assuming that the given network satisfies a positivity
condition, we show that, if the set of mixture components satisfy a mild exclusion assumption, then they
can be uniquely identified. Our proof gives an efficient algorithm to recover these targets from the expo-
nentially large search space of possible targets. In the more realistic scenario, where distributions are given
via finitely many samples, we conduct a simulation study to analyze the performance of an algorithm
derived from our identifiability proof.
1 Introduction
Motivation: Causal Bayesian Networks (CBN) ( [Pea09,Spi10]), have become the popular choice to model
causal relationships in many real-world systems. These models can simulate the effects of external inter-
ventions that forcibly fix target system variables to desired target values. The simulation is done via the
do() operator ( [Pea09]) wherein the CBN is altered by breaking incoming edges of the target variables and
fixing them to desired target values. Pre-estimating the effect of interventions can help in decision making,
for example, interventions on a CBN describing gene interactions can guide gene editing experiments.
However, real-world interventions are not always precise and mistakenly end up intervening other
unintended targets. For example, gene knockout experiments via the CRISPR-Cas9 gene-editing technol-
ogy perform unintended cleavage at unknown genome sites ( [FFK+13,WWW+15]). Moreover, the unin-
tended intervention targets can themselves be noisy i.e. different individuals targeted by the same inter-
vention might undergo completely different off-target interventions. For example, [AWL18] demonstrated
that same gene editing experiment (using CRISPR-Cas9) on mice embryos exhibited different unintended
cleavage for different mice. In such situations, units (samples) that underwent different unintended in-
terventions are not segregated and therefore the generated distribution becomes a mixture of individual
interventional distributions. We ask the following natural question.
*Published as an oral paper [KS21] at the 37th Conference on Uncertainty in Artificial Intelligence (UAI), 2021.
1
arXiv:2210.03242v1 [stat.ML] 1 Oct 2022
Question 1.1. Given access to a mixture of interventional distributions, under what conditions can one identify all
the intervention targets?1
Our Contributions: First, we model the situation of identifying hidden off-target interventions as the
problem of identifying individual components of a mixture of interventions. We assume an underlying
CBN and model interventions via the do() operator described above. Second, by constructing examples,
we show that, in general for a given CBN and an input mixture distribution, components of the mixture
might not be unique. Using this, we motivate the need for a mild positivity assumption (Assumption 3.2) on
the distribution generated by the CBN and a mild and reasonable exclusion assumption (Assumption 3.1) on
the structure of the intervention components present in the mixture. Third, we prove that, given access to a
CBN satisfying positivity and any input mixture having intervention components satisfying exclusion, such
intervention components generating the mixture can be uniquely identified from the mixture distribution.
Fourth, given oracle access to marginals of the distributions generated by the CBN and the mixture, our
identifiability proof gives an efficient algorithm to recover target components from an exponentially large
space of possible components. Finally, in Section 5, we conduct a simulation study to analyze the perfor-
mance of an algorithm (Algorithm 2) directly inspired from our identifiability proof, but with access to only
finitely many samples. Even though the goal of our paper is to prove identifiability of these intervention
targets, our simulation study indicates that our algorithm is promising in the realistic situation of finitely
many samples.
Related Prior Work: Recently [SWU20] considered the problem of causal discovery using unknown
intervention targets, and, as a crucial intermediate step, prove identifiability of these targets. They also de-
sign two algorithms UT-IGSP and JCI-GSP (based on the Joint Causal Inference framework in [MMC20]) to
recover these targets from data. As discussed in our motivation, in many real situations, such as [AWL18],
the off-target effects are themselves noisy and end up creating mixtures of multiple unknown interven-
tions. Since [SWU20] assumes separate access to each unknown intervention, their algorithm cannot be
used in our situation. Another line of work related to ours is the study of mixtures of Bayesian Net-
works. Perfect interventions i.e. do() operators on the CBNs create new interventional CBNs (Defini-
tion 1.3.1 in [Pea09]) and therefore the input mixture in our setup is actually a mixture of Bayesian Net-
works. This is a more general problem and was tackled first in [TMCH98]. They developed an Expectation-
Maximization (EM) based heuristic to find individual Bayesian Network components. However, they do
not investigate identifiability of the components. In our setting, we care about identifiability since the com-
ponents correspond to the unknown interventions. Along with recovering the individual components of a
mixture, there is also growing interest in developing techniques to understand conditional independence
(CI) relationships among the variables in the mixture data. For example, some recent works try to build
other graphical representations, from which the CI relationships in the mixture can be easily understood
( [Spi94,RSG11,Str19b,Str19a,SPU20]). Even though these new representations can identify some aspects
of the components, none of these works prove or discuss the uniqueness and identifiability of the compo-
nents, which is the main interest of our work. Finally, we would like to mention that the general area of
causal discovery and inference using different kinds of unknown interventions has received a lot of atten-
tion lately ( [EM07,SWU20,JKSB20,MMC20,RHPM15]). Even though many of these do not align with goal
of our paper, the growing interest in this area highlights seriousness of the issue of unintended stochasticity
in targeted interventions and the desire to design algorithms robust to them.
2 Preliminaries
Notation We use capital letters (e.g. X) to represent random variables and the corresponding lower case
letter xto denote the assignment X=x. The set of values taken by random variable Xwill be denoted
1from here on wards we use the terms targets and components interchangeably
2
by CX. Unless otherwise specified, all random variables in this paper are discrete and have finite support
i.e. |CX|<. A tuple or set of random variables is denoted by capital bold face letter (e.g. X) and
the corresponding lower case bold faced letter xwill denote the assignment X=x. Let, CX=XiXCXi
denote the set of all possible values that can be taken by X. Probability of Xtaking the value xis denoted by
P(X=x)or equivalently as P(x)and probability of X=xgiven Y=yis denoted as P(X=x|Y=y)or
equivalently with P(x|y). We will use [n]to denote the set {1, 2, ..., n},[m,n]to denote set {m,m+1, . . . , n},
calligraphic capital letters e.g. Sto denote sets. Size of any set Sis denoted by |S|.R,R+and R0will
denote the set of real numbers, positive real numbers and non-negative real numbers respectively.
Bayesian Network Let G={V,E} be a directed acyclic graph (DAG) with node set V={V1, . . . , Vn}
where each node Virepresents a random variable. Gis called a Bayesian Network if the following factor-
ization of the joint probability of Vholds.
P(V) =
ViV
P(Vi|pa(Vi))
where pa(Vi)are parent nodes of Vi.
Acausal Bayesian Network is a Bayesian Network where all edges denote direct causal relationships. It
allows for modeling effect of external actions called “interventions”, by appropriate modification of the
Bayesian Network. A formal definition of causal Bayesian Networks can be found in Definition 1.3.1,
[Pea09].
Interventions: As mentioned above, these capture external actions on a system under consideration,
for example, dosage of medicines administered to a patient, providing subsidies to poorer sections of the
population, etc. A natural way to model them in causal Bayesian Networks is to perform the act of causal
surgery, wherein, incoming edges into the node(s) to be intervened are removed and the node(s) is forcibly
fixed to the desired value. As described in Definition 1.3.1, [Pea09], the new network thus obtained is
treated as the Bayesian Network modelling effect of the intervention. Formally, following the notation
in [Pea09], if we perform intervention on nodes XVwith a desire to set it to value xCX, then the effect
of this intervention (also known as interventional distribution) is a probability distribution on Vdenoted as
P(v|do(x)) (or Px(v)). In the intervened Bayesian Network, conditional probability distributions (CPD)
P(Xi|pa(Xi)) of all XiXthat are intervened and set to x
i, changes to the Kronecker delta function δxi,x
i
i.e. P(Xi=xi|pa(Xi)) = 1 if xi=x
ielse it is 0. The CPD of the non-intervened nodes i.e. V\Xremains
unchanged. Hence the interventional distribution factorizes as:
Px(V=v) =
Vi/X
P(V=vi|pa(Vi))
ViX
δvi,x
i
Such interventions are called perfect interventions. They capture many real-world situations like gene-
editing-experiments, where a certain target gene is spliced out and replaced with the desired gene. Other
kinds of interventions such as imperfect, uncertain e.t.c. have been defined in literature (Section 2 in [EM07]).
However, in this paper we only deal with perfect interventions.
3 Problem Formulation and Main Theorem
As motivated in Section 1, the intended interventions performed during an experiment often have hidden
off-target effects, which could themselves be stochastic, leading to different hidden treatments on different
individuals. We can model such a situation as an unknown mixture of different interventions. Here is a
formal definition.
3
Definition 3.1 (Mixture of Interventions).Let G={V,E} be a causal Bayesian Network. A probability
distribution Pmix(V)is called a mixture of interventions if for some mN, there exist subsets T1, . . . , Tm
V, corresponding values ti∈ CTi, and positive scalar weights πiR+,i[m], such that
Pmix(V) =
m
i=1
πiPti(V)
where ti6=tjfor all i6=j[m]2. We allow Ti=, in which case, Pti(V)is defined as P(V). Note that for
Pmix to be a valid distribution m
i=1πi=1. We refer to the set T={(ti,πi),i[m]}as a set of intervention
tuples generating the mixture.
Uniqueness and Identifiability : In our mixture model, each of the targets ti, corresponds to an in-
tervention that intentionally or unintentionally transpired in the experiment. Since our ultimate goal is
to recover them from the mixture distribution (see Question 1.1), the problem only makes sense if they
“uniquely” define the mixture. Formally, there should not exist two distinct sets of intervention tuples
T1={(t1
1,π1
1), . . . , (t1
n,π1
n)}and T2={(t2
1,π2
1), . . . , (t2
m,π2
m)}which generate the same mixture distribu-
tion, i.e.,
Pmix(V) =
n
i=1
π1
iPt1
i(V) =
m
j=1
π2
jPt2
j(V)
An immediate next question is that of “identifiability”. Given access to a causal Bayesian Network
and the joint distribution P(V)it captures, does there exist an algorithm, that takes as input the mixture
distribution Pmix(V)and exactly recovers the unknown set of intervention tuples that generated Pmix(V)?
In the general case, the answer to both these questions is no! Using a very simple network, with just one
node, we show that mixture distributions need not be unique, motivating the need for more assumptions.
More complicated examples with multiple nodes can be easily created in the same way, but, for a cleaner
presentation we stick to this example since its purpose is to only motivate an assumption we make next.
Example 3.1. Consider a causal Bayesian Network with a single binary variable V={V1}, i.e. CV1={0, 1}and
denote P(V1=0),P(V1=1)by p0,p1respectively. Define the mixture,
Pmix(V1) = π0P0(V1) + π1P1(V1) + (1π0π1)P(V1)
On setting V1=0and then V1=1in the above equation, and rearranging the terms, we obtain
1p0p0
p01p0π0
π1=Pmix(V1=0)p0
Pmix(V1=1)p1
The above 2×2matrix is singular and has rank 1i.e. the system does not have a unique solution. In fact, when
0<p0<1,
π0=Pmix(V1=0)p0+p0t
1p0
,π1=t
are all valid solutions whenever t 1Pmix(V1=0)and t max{p0Pmix (V1=0)
p0, 0}. Therefore, uniqueness of
intervention tuples does not hold in general.
Even though the example looks very simple, it captures the main reason behind the non-identifiability
of the set of intervention tuples. Exactly like the above example, for any mixture, we can obtain systems
of linear equations by evaluating marginal probabilities of Pmix for different settings of V. Our goal then
2if ti=tj, then (πi+πj)Pti(V)is one component.
4
would be to find settings which help us solve these systems uniquely and recover the set of intervention
tuples. Unfortunately, in this process, similar to the above example, the linear systems will have dependent
equations and therefore infinitely many solutions. To get over this issue, we focus our attention on sets of
intervention tuples, where, for each variable there exists some value that is missing from all of it’s interven-
tion targets. In, our main theorem, we show that any mixture generated by such a set cannot be generated
by any other set of this kind. Next, we formally state the assumption and then discuss why it is extremely
mild and reasonable in most real situations.
Assumption 3.1 (Exclusion).Let Tbe a set of intervention tuples as defined in Definition 3.1. We say that T
satisfies exclusion, if for all ViV, there exists ¯
viCVisuch that ¯
vi/tfor any target tbelonging to any tuple in
T. We say that a mixture of interventions Pmix(V)satisfies exclusion if some set of intervention tuples Tgenerating
it satisfies exclusion.
Remark 3.1. This assumption puts only a mild constraint on the set of mixtures we consider. For example, in a
network with n nodes and each node having k possible values, excluding a fixed value of each node, can still
generate arbitrary mixtures over (kn)allowed targets. Without exclusion, there are O((k+1)n)possible targets
that generate the mixtures. Therefore the reduction is minimal compared to the size of the space of targets we are
searching in. In real-world applications, it’s common for nodes to have a large number of possible values. Therefore,
for each node, the possibility of off-target interventions impacting all values becomes unlikely. We also emphasize that
the values missing from the targets can be different for different input mixtures and are not known to our algorithms.
Our identifiability algorithm only uses existence of such missing values making its interpretation even more general.
Even though the above assumption helps us tackle the singularity problem outlined in Example 3.1, it is
not enough to guarantee uniqueness of intervention tuples in general. We also assume a simple “positivity”
assumption on the causal Bayesian Network, which demands that the joint probability P(v)>0 for any
setting V=v. In fact, using the same example as above (Example 3.1), we show that not assuming p0,p1>
0, can lead to multiple set of intervention tuples satisfying Assumption 3.1 and generating the same mixture.
To see this, we consider the input mixture Pmix(V) = P(V). The set of intervention tuples T1={(, 1)}
for it clearly satisfies Assumption 3.1 as intervention targets (V1=a)and (V1=b)are excluded. Now, if
p1=0, then P0(V1) = P(V1)and for any π0[0, 1], we can trivially write
Pmix(V1) = π0P0(V1) + (1π0)P(V1)
implying that T2={(V1=0, π0),(, 1 π0)}is another set of intervention tuples for Pmix, implying
non-uniqueness. Here is the statement of our assumption.
Assumption 3.2 (Positivity).Let Vbe the set of nodes in our causal Bayesian Network and P(V)be the corre-
sponding joint probability distribution. We assume that P(v)>0for all vCV.
Remark 3.2. As a straight forward consequence of this assumption, for every random variable ViV, we can show
that the conditional probability distributions are positive as well i.e. P(vi|pa(vi)) >0for all viCViand setting
pa(vi)of the parents. This positivity assumption is commonly assumed in many works related to causal graphs.
For example, [HB12] assume positivity throughout their discussion when characterizing the Interventional Markov
Equivalence class.
Having stated these assumptions, we are now ready to state the main theorem of this paper. A detailed
proof is provided in Section 4.
Theorem 1. Let G={V,E} be a causal Bayesian Network and P(V)be the associated joint probability distribution
satisfying Assumption 3.2. Let Pmix (V)(Definition 3.1) be any mixture of interventions that satisfies Assumption
3.1. The following are true.
5
摘要:

DisentanglingMixturesofUnknownCausalInterventions*AbhinavKumarMicrosoftResearch,BangaloreKarnataka,Indiaabhinavkumar.wk@gmail.comGauravSinhaMicrosoftResearch,BangaloreKarnataka,Indiasinhagaur88@gmail.comAbstractInmanyreal-worldscenarios,suchasgeneknockoutexperiments,targetedinterventionsareoftenacco...

展开>> 收起<<
Disentangling Mixtures of Unknown Causal Interventions Abhinav Kumar Microsoft Research Bangalore.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:1.57MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注