Disentangling Mixtures of Unknown Causal Interventions Abhinav Kumar Microsoft Research Bangalore

2025-04-27 0 0 1.57MB 23 页 10玖币

侵权投诉

Disentangling Mixtures of Unknown Causal Interventions*

Abhinav Kumar

Microsoft Research, Bangalore

Karnataka, India

abhinavkumar.wk@gmail.com

Gaurav Sinha

Microsoft Research, Bangalore

Karnataka, India

sinhagaur88@gmail.com

Abstract

In many real-world scenarios, such as gene knockout experiments, targeted interventions are often

accompanied by unknown interventions at off-target sites. Moreover, different units can get randomly

exposed to different unknown interventions, thereby creating a mixture of interventions. Identifying dif-

ferent components of this mixture can be very valuable in some applications. Motivated by such situations,

in this work, we study the problem of identifying all components present in a mixture of interventions on

a given causal Bayesian Network. We construct an example to show that, in general, the components are

not identiﬁable from the mixture distribution. Next, assuming that the given network satisﬁes a positivity

condition, we show that, if the set of mixture components satisfy a mild exclusion assumption, then they

can be uniquely identiﬁed. Our proof gives an efﬁcient algorithm to recover these targets from the expo-

nentially large search space of possible targets. In the more realistic scenario, where distributions are given

via ﬁnitely many samples, we conduct a simulation study to analyze the performance of an algorithm

derived from our identiﬁability proof.

1 Introduction

Motivation: Causal Bayesian Networks (CBN) ( [Pea09,Spi10]), have become the popular choice to model

causal relationships in many real-world systems. These models can simulate the effects of external inter-

ventions that forcibly ﬁx target system variables to desired target values. The simulation is done via the

do() operator ( [Pea09]) wherein the CBN is altered by breaking incoming edges of the target variables and

ﬁxing them to desired target values. Pre-estimating the effect of interventions can help in decision making,

for example, interventions on a CBN describing gene interactions can guide gene editing experiments.

However, real-world interventions are not always precise and mistakenly end up intervening other

unintended targets. For example, gene knockout experiments via the CRISPR-Cas9 gene-editing technol-

ogy perform unintended cleavage at unknown genome sites ( [FFK+13,WWW+15]). Moreover, the unin-

tended intervention targets can themselves be noisy i.e. different individuals targeted by the same inter-

vention might undergo completely different off-target interventions. For example, [AWL18] demonstrated

that same gene editing experiment (using CRISPR-Cas9) on mice embryos exhibited different unintended

cleavage for different mice. In such situations, units (samples) that underwent different unintended in-

terventions are not segregated and therefore the generated distribution becomes a mixture of individual

interventional distributions. We ask the following natural question.

*Published as an oral paper [KS21] at the 37th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2021.

arXiv:2210.03242v1 [stat.ML] 1 Oct 2022

Question 1.1. Given access to a mixture of interventional distributions, under what conditions can one identify all

the intervention targets?1

Our Contributions: First, we model the situation of identifying hidden off-target interventions as the

problem of identifying individual components of a mixture of interventions. We assume an underlying

CBN and model interventions via the do() operator described above. Second, by constructing examples,

we show that, in general for a given CBN and an input mixture distribution, components of the mixture

might not be unique. Using this, we motivate the need for a mild positivity assumption (Assumption 3.2) on

the distribution generated by the CBN and a mild and reasonable exclusion assumption (Assumption 3.1) on

the structure of the intervention components present in the mixture. Third, we prove that, given access to a

CBN satisfying positivity and any input mixture having intervention components satisfying exclusion, such

intervention components generating the mixture can be uniquely identiﬁed from the mixture distribution.

Fourth, given oracle access to marginals of the distributions generated by the CBN and the mixture, our

identiﬁability proof gives an efﬁcient algorithm to recover target components from an exponentially large

space of possible components. Finally, in Section 5, we conduct a simulation study to analyze the perfor-

mance of an algorithm (Algorithm 2) directly inspired from our identiﬁability proof, but with access to only

ﬁnitely many samples. Even though the goal of our paper is to prove identiﬁability of these intervention

targets, our simulation study indicates that our algorithm is promising in the realistic situation of ﬁnitely

many samples.

Related Prior Work: Recently [SWU20] considered the problem of causal discovery using unknown

intervention targets, and, as a crucial intermediate step, prove identiﬁability of these targets. They also de-

sign two algorithms UT-IGSP and JCI-GSP (based on the Joint Causal Inference framework in [MMC20]) to

recover these targets from data. As discussed in our motivation, in many real situations, such as [AWL18],

the off-target effects are themselves noisy and end up creating mixtures of multiple unknown interven-

tions. Since [SWU20] assumes separate access to each unknown intervention, their algorithm cannot be

used in our situation. Another line of work related to ours is the study of mixtures of Bayesian Net-

works. Perfect interventions i.e. do() operators on the CBNs create new interventional CBNs (Deﬁni-

tion 1.3.1 in [Pea09]) and therefore the input mixture in our setup is actually a mixture of Bayesian Net-

works. This is a more general problem and was tackled ﬁrst in [TMCH98]. They developed an Expectation-

Maximization (EM) based heuristic to ﬁnd individual Bayesian Network components. However, they do

not investigate identiﬁability of the components. In our setting, we care about identiﬁability since the com-

ponents correspond to the unknown interventions. Along with recovering the individual components of a

mixture, there is also growing interest in developing techniques to understand conditional independence

(CI) relationships among the variables in the mixture data. For example, some recent works try to build

other graphical representations, from which the CI relationships in the mixture can be easily understood

( [Spi94,RSG11,Str19b,Str19a,SPU20]). Even though these new representations can identify some aspects

of the components, none of these works prove or discuss the uniqueness and identiﬁability of the compo-

nents, which is the main interest of our work. Finally, we would like to mention that the general area of

causal discovery and inference using different kinds of unknown interventions has received a lot of atten-

tion lately ( [EM07,SWU20,JKSB20,MMC20,RHPM15]). Even though many of these do not align with goal

of our paper, the growing interest in this area highlights seriousness of the issue of unintended stochasticity

in targeted interventions and the desire to design algorithms robust to them.

2 Preliminaries

Notation We use capital letters (e.g. X) to represent random variables and the corresponding lower case

letter xto denote the assignment X=x. The set of values taken by random variable Xwill be denoted

1from here on wards we use the terms targets and components interchangeably

by CX. Unless otherwise speciﬁed, all random variables in this paper are discrete and have ﬁnite support

i.e. |CX|<∞. A tuple or set of random variables is denoted by capital bold face letter (e.g. X) and

the corresponding lower case bold faced letter xwill denote the assignment X=x. Let, CX=∏Xi∈XCXi

denote the set of all possible values that can be taken by X. Probability of Xtaking the value xis denoted by

P(X=x)or equivalently as P(x)and probability of X=xgiven Y=yis denoted as P(X=x|Y=y)or

equivalently with P(x|y). We will use [n]to denote the set {1, 2, ..., n},[m,n]to denote set {m,m+1, . . . , n},

calligraphic capital letters e.g. Sto denote sets. Size of any set Sis denoted by |S|.R,R+and R≥0will

denote the set of real numbers, positive real numbers and non-negative real numbers respectively.

Bayesian Network Let G={V,E} be a directed acyclic graph (DAG) with node set V={V1, . . . , Vn}

where each node Virepresents a random variable. Gis called a Bayesian Network if the following factor-

ization of the joint probability of Vholds.

P(V) = ∏

Vi∈V

P(Vi|pa(Vi))

where pa(Vi)are parent nodes of Vi.

Acausal Bayesian Network is a Bayesian Network where all edges denote direct causal relationships. It

allows for modeling effect of external actions called “interventions”, by appropriate modiﬁcation of the

Bayesian Network. A formal deﬁnition of causal Bayesian Networks can be found in Deﬁnition 1.3.1,

[Pea09].

Interventions: As mentioned above, these capture external actions on a system under consideration,

for example, dosage of medicines administered to a patient, providing subsidies to poorer sections of the

population, etc. A natural way to model them in causal Bayesian Networks is to perform the act of causal

surgery, wherein, incoming edges into the node(s) to be intervened are removed and the node(s) is forcibly

ﬁxed to the desired value. As described in Deﬁnition 1.3.1, [Pea09], the new network thus obtained is

treated as the Bayesian Network modelling effect of the intervention. Formally, following the notation

in [Pea09], if we perform intervention on nodes X⊆Vwith a desire to set it to value x∗∈CX, then the effect

of this intervention (also known as interventional distribution) is a probability distribution on Vdenoted as

P(v|do(x∗)) (or Px∗(v)). In the intervened Bayesian Network, conditional probability distributions (CPD)

P(Xi|pa(Xi)) of all Xi∈Xthat are intervened and set to x∗

i, changes to the Kronecker delta function δxi,x∗

i.e. P(Xi=xi|pa(Xi)) = 1 if xi=x∗

ielse it is 0. The CPD of the non-intervened nodes i.e. V\Xremains

unchanged. Hence the interventional distribution factorizes as:

Px∗(V=v) = ∏

Vi/∈X

P(V=vi|pa(Vi)) ∏

Vi∈X

δvi,x∗

Such interventions are called perfect interventions. They capture many real-world situations like gene-

editing-experiments, where a certain target gene is spliced out and replaced with the desired gene. Other

kinds of interventions such as imperfect, uncertain e.t.c. have been deﬁned in literature (Section 2 in [EM07]).

However, in this paper we only deal with perfect interventions.

3 Problem Formulation and Main Theorem

As motivated in Section 1, the intended interventions performed during an experiment often have hidden

off-target effects, which could themselves be stochastic, leading to different hidden treatments on different

individuals. We can model such a situation as an unknown mixture of different interventions. Here is a

formal deﬁnition.

Deﬁnition 3.1 (Mixture of Interventions).Let G={V,E} be a causal Bayesian Network. A probability

distribution Pmix(V)is called a mixture of interventions if for some m∈N, there exist subsets T1, . . . , Tm⊆

V, corresponding values ti∈ CTi, and positive scalar weights πi∈R+,i∈[m], such that

Pmix(V) =

∑

i=1

πiPti(V)

where ti6=tjfor all i6=j∈[m]2. We allow Ti=∅, in which case, Pti(V)is deﬁned as P(V). Note that for

Pmix to be a valid distribution ∑m

i=1πi=1. We refer to the set T={(ti,πi),i∈[m]}as a set of intervention

tuples generating the mixture.

Uniqueness and Identiﬁability : In our mixture model, each of the targets ti, corresponds to an in-

tervention that intentionally or unintentionally transpired in the experiment. Since our ultimate goal is

to recover them from the mixture distribution (see Question 1.1), the problem only makes sense if they

“uniquely” deﬁne the mixture. Formally, there should not exist two distinct sets of intervention tuples

T1={(t1

1,π1

1), . . . , (t1

n,π1

n)}and T2={(t2

1,π2

1), . . . , (t2

m,π2

m)}which generate the same mixture distribu-

tion, i.e.,

Pmix(V) =

∑

i=1

π1

iPt1

i(V) =

∑

j=1

π2

jPt2

j(V)

An immediate next question is that of “identiﬁability”. Given access to a causal Bayesian Network

and the joint distribution P(V)it captures, does there exist an algorithm, that takes as input the mixture

distribution Pmix(V)and exactly recovers the unknown set of intervention tuples that generated Pmix(V)?

In the general case, the answer to both these questions is no! Using a very simple network, with just one

node, we show that mixture distributions need not be unique, motivating the need for more assumptions.

More complicated examples with multiple nodes can be easily created in the same way, but, for a cleaner

presentation we stick to this example since its purpose is to only motivate an assumption we make next.

Example 3.1. Consider a causal Bayesian Network with a single binary variable V={V1}, i.e. CV1={0, 1}and

denote P(V1=0),P(V1=1)by p0,p1respectively. Deﬁne the mixture,

Pmix(V1) = π0P0(V1) + π1P1(V1) + (1−π0−π1)P(V1)

On setting V1=0and then V1=1in the above equation, and rearranging the terms, we obtain

1−p0−p0

p0−1p0π0

π1=Pmix(V1=0)−p0

Pmix(V1=1)−p1

The above 2×2matrix is singular and has rank 1i.e. the system does not have a unique solution. In fact, when

0<p0<1,

π0=Pmix(V1=0)−p0+p0t

1−p0

,π1=t

are all valid solutions whenever t ≤1−Pmix(V1=0)and t ≥max{p0−Pmix (V1=0)

p0, 0}. Therefore, uniqueness of

intervention tuples does not hold in general.

Even though the example looks very simple, it captures the main reason behind the non-identiﬁability

of the set of intervention tuples. Exactly like the above example, for any mixture, we can obtain systems

of linear equations by evaluating marginal probabilities of Pmix for different settings of V. Our goal then

2if ti=tj, then (πi+πj)Pti(V)is one component.

would be to ﬁnd settings which help us solve these systems uniquely and recover the set of intervention

tuples. Unfortunately, in this process, similar to the above example, the linear systems will have dependent

equations and therefore inﬁnitely many solutions. To get over this issue, we focus our attention on sets of

intervention tuples, where, for each variable there exists some value that is missing from all of it’s interven-

tion targets. In, our main theorem, we show that any mixture generated by such a set cannot be generated

by any other set of this kind. Next, we formally state the assumption and then discuss why it is extremely

mild and reasonable in most real situations.

Assumption 3.1 (Exclusion).Let Tbe a set of intervention tuples as deﬁned in Deﬁnition 3.1. We say that T

satisﬁes exclusion, if for all Vi∈V, there exists ¯

vi∈CVisuch that ¯

vi/∈tfor any target tbelonging to any tuple in

T. We say that a mixture of interventions Pmix(V)satisﬁes exclusion if some set of intervention tuples Tgenerating

it satisﬁes exclusion.

Remark 3.1. This assumption puts only a mild constraint on the set of mixtures we consider. For example, in a

network with n nodes and each node having ≤k possible values, excluding a ﬁxed value of each node, can still

generate arbitrary mixtures over Ω(kn)allowed targets. Without exclusion, there are O((k+1)n)possible targets

that generate the mixtures. Therefore the reduction is minimal compared to the size of the space of targets we are

searching in. In real-world applications, it’s common for nodes to have a large number of possible values. Therefore,

for each node, the possibility of off-target interventions impacting all values becomes unlikely. We also emphasize that

the values missing from the targets can be different for different input mixtures and are not known to our algorithms.

Our identiﬁability algorithm only uses existence of such missing values making its interpretation even more general.

Even though the above assumption helps us tackle the singularity problem outlined in Example 3.1, it is

not enough to guarantee uniqueness of intervention tuples in general. We also assume a simple “positivity”

assumption on the causal Bayesian Network, which demands that the joint probability P(v)>0 for any

setting V=v. In fact, using the same example as above (Example 3.1), we show that not assuming p0,p1>

0, can lead to multiple set of intervention tuples satisfying Assumption 3.1 and generating the same mixture.

To see this, we consider the input mixture Pmix(V) = P(V). The set of intervention tuples T1={(∅, 1)}

for it clearly satisﬁes Assumption 3.1 as intervention targets (V1=a)and (V1=b)are excluded. Now, if

p1=0, then P0(V1) = P(V1)and for any π0∈[0, 1], we can trivially write

Pmix(V1) = π0P0(V1) + (1−π0)P(V1)

implying that T2={(V1=0, π0),(∅, 1 −π0)}is another set of intervention tuples for Pmix, implying

non-uniqueness. Here is the statement of our assumption.

Assumption 3.2 (Positivity).Let Vbe the set of nodes in our causal Bayesian Network and P(V)be the corre-

sponding joint probability distribution. We assume that P(v)>0for all v∈CV.

Remark 3.2. As a straight forward consequence of this assumption, for every random variable Vi∈V, we can show

that the conditional probability distributions are positive as well i.e. P(vi|pa(vi)) >0for all vi∈CViand setting

pa(vi)of the parents. This positivity assumption is commonly assumed in many works related to causal graphs.

For example, [HB12] assume positivity throughout their discussion when characterizing the Interventional Markov

Equivalence class.

Having stated these assumptions, we are now ready to state the main theorem of this paper. A detailed

proof is provided in Section 4.

Theorem 1. Let G={V,E} be a causal Bayesian Network and P(V)be the associated joint probability distribution

satisfying Assumption 3.2. Let Pmix (V)(Deﬁnition 3.1) be any mixture of interventions that satisﬁes Assumption

3.1. The following are true.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DisentanglingMixturesofUnknownCausalInterventions*AbhinavKumarMicrosoftResearch,BangaloreKarnataka,Indiaabhinavkumar.wk@gmail.comGauravSinhaMicrosoftResearch,BangaloreKarnataka,Indiasinhagaur88@gmail.comAbstractInmanyreal-worldscenarios,suchasgeneknockoutexperiments,targetedinterventionsareoftenacco...

展开>> 收起<<

Disentangling Mixtures of Unknown Causal Interventions Abhinav Kumar Microsoft Research Bangalore.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Disentangling Mixtures of Unknown Causal Interventions Abhinav Kumar Microsoft Research Bangalore

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: