Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders

2025-04-24 0 0 1.13MB 14 页 10玖币
侵权投诉
Disentangling Causal Effects from Sets of
Interventions in the Presence of Unobserved
Confounders
Olivier Jeunen
Amazon
Edinburgh, UK
Ciarán M. Gilligan-Lee
Spotify & UCL
London, UK
Rishabh Mehrotra
Sharechat
London, UK
Mounia Lalmas
Spotify
London, UK
Abstract
The ability to answer causal questions is crucial in many domains, as causal infer-
ence allows one to understand the impact of interventions. In many applications,
only a single intervention is possible at a given time. However, in some important
areas, multiple interventions are concurrently applied. Disentangling the effects
of single interventions from jointly applied interventions is a challenging task—
especially as simultaneously applied interventions can interact. This problem is
made harder still by unobserved confounders, which influence both treatments
and outcome. We address this challenge by aiming to learn the effect of a single-
intervention from both observational data and sets of interventions. We prove
that this is not generally possible, but provide identification proofs demonstrating
that it can be achieved under non-linear continuous structural causal models with
additive, multivariate Gaussian noise—even when unobserved confounders are
present. Importantly, we show how to incorporate observed covariates and learn
heterogeneous treatment effects. Based on the identifiability proofs, we provide an
algorithm that learns the causal model parameters by pooling data from different
regimes and jointly maximizing the combined likelihood. The effectiveness of our
method is empirically demonstrated on both synthetic and real-world data.
1 Introduction
The ability to answer causal questions is crucial in science, medicine, economics, and beyond,
see [
Gilligan-Lee, 2020
] for a high-level overview. This is because causal inference allows one to
understand the impact of interventions.
2
In many applications, only a single intervention is possible at
a given time, or interventions are applied one after another in a sequential manner. However, in some
important areas, multiple interventions are concurrently applied. For instance, in medicine, patients
that possess many commodities may have to be simultaneously treated with multiple prescriptions; in
computational advertising, people may be targeted by multiple concurrent campaigns; and in dietetics,
the nutritional content of meals can be considered a joint intervention from which we wish to learn
the effects of individual nutritional components.
Disentangling the effects of single interventions from jointly applied interventions is a challenging
task—especially as simultaneously applied interventions can interact, leading to consequences not
seen when considering single interventions separately. This problem is made harder still by the possi-
ble presence of unobserved confounders, which influence both treatments and outcome. This paper
addresses this challenge, by aiming to learn the effect of a single-intervention from both observational
Work done while author was at Spotify.
2
Causal inference also allows one to ask and answer counterfactual questions, see [
Perov et al., 2020
] and
[Vlontzos et al., 2022].
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.05446v1 [stat.ML] 11 Oct 2022
data and sets of interventions. We prove that this is not generally possible, but provide identifi-
cation proofs demonstrating it can be achieved in certain classes of non-linear continuous causal
models with additive multivariate Gaussian noise—even in the presence of unobserved confounders.
This reasonably weak additive noise assumption is prevalent in the causal inference and discovery
literature [
Rolland et al., 2022
,
Saengkyongam and Silva, 2020
,
Kilbertus et al., 2020
]. Importantly,
we show how to incorporate observed covariates, which can be high-dimensional, and hence learn
heterogeneous treatment effects for single-interventions. Our main contributions are:
1. A proof that without restrictions on the causal model, single-intervention effects cannot be
identified from observations and joint-interventions. (§3.1,3.2)
2.
Proofs that single-interventions can be identified from observations and joint-interventions
when the causal model belongs to certain (but not all) classes of non-linear continuous
structural causal models with additive, multivariate Gaussian noise. (§3.2,3.3)
3.
An algorithm that learns the parameters of the proposed causal model and disentangles
single interventions from joint interventions. (§4)
4. An empirical validation of our method on both synthetic and real data.3(§5)
2 Related Work
Disentangling multiple concurrent interventions:
[
Parbhoo et al., 2021
] study the question of dis-
entangling multiple, simultaneously applied interventions from observational data. They propose a
specially designed neural network for the problem and show good empirical performance on some
datasets. However, they do not address the formal identification problem, nor do they address possible
presence of unobserved confounders. By contrast our work derives the conditions under which identi-
fiability holds. We moreover propose an algorithm that can disentangle multiple interventions even
in the presence of unobserved confounders—as long as both observational and interventional data
is available. Related work by [
Parbhoo et al., 2020
] investigated the intervention-disentanglement
problem from a reinforcement learning perspective, where each intervention combination constitutes
a different action that a reinforcement learning agent can take. Unlike this approach, our work
explicitly focuses on modelling the interactions between interventions to learn their individual ef-
fects. Closer to our work is [
Saengkyongam and Silva, 2020
], who investigate identifiability of joint
effects from observations and single-intervention data. They prove this is not generally possible, but
provide an identification proof for non-linear causal models with additive Gaussian noise. Our work
addresses a complementary question; we want to learn the effect of a single-intervention from obser-
vational data and sets of interventions. Additionally, another difference between our work and that of
[
Saengkyongam and Silva, 2020
] is that they do not consider identification of individual-level causal
effects given observed covariates. In a precursor to the work by [
Saengkyongam and Silva, 2020
],
[
Nandy et al., 2017
] developed a method to estimate the effect of joint interventions from obser-
vational data when the causal structure is unknown. This approach assumed linear causal mod-
els with Gaussian noise, and only proved identifiability in this case under a sparsity constraint.
However, like [
Saengkyongam and Silva, 2020
], our result does not need the linearity assump-
tion, and no sparsity constraints are required in our identification proof. Finally, others includ-
ing [
Schwab et al., 2020
,
Egami and Imai, 2018
,
Lopez and Gutman, 2017
,
Ghassami et al., 2021
]
explored how to estimate causal effects of a single categorical-, or continuous-valued treatment,
where different intervention values can produce different outcomes. Unlike our work, they do not
consider multiple concurrent interventions that can interact.
Combining observations and interventions:
[
Bareinboim and Pearl, 2016
] have investigated non-
parametric identifiability of causal effects using both observational and interventional data, in a
paradigm they call “data fusion. More general results were studied by [
Lee et al., 2020
], who
provided necessary and sufficient graphical conditions for identifying causal effects from arbitrary
combinations of observations and interventions. Recent work in [
Correa et al., 2021
] explored
identification of counterfactual—as opposed to interventional—distributions from combinations of
observational and interventional data. Finally, [
Ilse et al., 2021
] investigated the most efficient way
to combine observational and interventional data to estimate causal effects. They demonstrated they
could significantly reduce the number of interventional samples required to achieve a certain fit when
3The code to reproduce our results is available at github.com/olivierjeunen/disentangling-neurips-2022.
2
adding sufficient observational training samples. However, they only prove their method theoretically
in the linear-Gaussian case. In the non-linear case, they parameterise their model using normalising
flows and demonstrate their method empirically. They only consider estimating single-interventions,
and do not deal with multiple, interacting interventions.
Additive noise models:
While certain causal quantities may not be generally identifiable from obser-
vational and interventional data, by imposing restrictions on the structural functions underlying causal
models, one can obtain semi-parametric identifiability results. One of the most common weak restric-
tions used in the causal inference community are additive noise models (ANMs), first studied in the
context of causal discovery by [
Hoyer et al., 2009
]—and still widely used today [
Rolland et al., 2022
].
ANMs limit the form of the structural equations to be additive with respect to latent noise variables—
but allow nonlinear interactions between causes. [
Janzing et al., 2009
] used ANMs to devise a method
for inferring a latent confounder between two observed variables. This is otherwise not possible
without additional assumptions on the underlying causal model. See [
Lee and Spekkens, 2017
],
[
Lee et al., 2019
], and [
Dhir and Lee, 2020
] for an extension of this approach beyond ANMs. ANMs
have also been employed by [
Kilbertus et al., 2020
] to investigate the sensitivity of counterfactual
notions of fairness to the presence of unobserved confounding. Our work proves that in certain
classes of ANMs, the effect of a single-intervention can be identified from observational data and sets
of interventions—even in the presence of unobserved confounders. Moreover, we show how to incor-
porate observed covariates in these ANMs to learn the heterogeneous effects of single-interventions.
3 Model Identifiability
Identifiability is a fundamental concept in parametric statistics that relates to the quantities that can,
or can not, be learned from data [
Rothenberg, 1971
]. An estimand is said to be identifiable from
data if it is theoretically possible to learn this estimand, given infinite samples. If two causal models
coincide on said data then they must coincide on the value of the estimand in question. Hence if one
finds two causal models which agree on said data, but disagree on the estimand, then the estimand
is not identifiable unless further restrictions are imposed. In this section, we provide identification
proofs for single-variable interventional effects from observational data and joint interventions, for
several model classes. Our theoretical analysis provides insights into the fundamental limitations of
causal inference—and the assumptions that are required for identification.
Problem Definition.
We adopt the Structural Causal Model (SCM) framework as introduced by
[
Pearl, 2009
]. An SCM
M
is defined by
h{C,X, Y },U,f,PUi
, where
{C,X, Y }
are endogenous
variables separated into covariates
C
, treatments
X
, and the outcome
Y
,
U
are exogenous variables
(possibly confounders),
f
are structural equations, and
PU
defines a joint probability distribution
over the exogenous variables.
The SCM
M
also induces a causal graph—where vertices represent endogenous variables, and
edges represent structural equations. Vertices with outgoing edges to an endogenous variable
Xi
are
denoted as the parent set of this variable, or
PA(Xi)
. Typically, the observed covariates
C
causally
influence the treatments as well as the outcome, and are a part of this set. Every endogenous variable
Xi
(including
Y
) is then a function of its parents in the graph
PA(Xi)
and a latent noise term
Ui
,
denoting the influence of factors external to the model:
Xi:=fi(PA(Xi), Ui).(1)
In Markovian SCMs, these latent noise terms are all mutually independent. However, in general,
distinct noise terms can be correlated according to some global distribution
PU
. In this case, such
correlation is due to the presence of unobserved confounders.
An intervention on variable
Xi
is denoted by
do(Xi=xi)
, and it corresponds to replacing its
structural equation with a constant, or removing all incoming edges in the causal graph. The core
question we wish to answer in this work, is under which conditions the treatment effect of a single
intervention can be disentangled from joint interventions and observational data. That is, given
samples from the data regimes that induce
E[Y|Xi=xi, Xj=xj, C =c],and E[Y|do(Xi=xi, Xj=xj), C =c],
when can we learn conditional average causal effects:
E[Y|do(Xi=xi), Xj=xj, C =c],or E[Y|Xi=xi,do(Xj=xj), C =c]?
3
摘要:

DisentanglingCausalEffectsfromSetsofInterventionsinthePresenceofUnobservedConfoundersOlivierJeunenAmazonEdinburgh,UKCiaránM.Gilligan-LeeSpotify&UCLLondon,UKRishabhMehrotraSharechatLondon,UKMouniaLalmasSpotifyLondon,UKAbstractTheabilitytoanswercausalquestionsiscrucialinmanydomains,ascausalinfer-enc...

展开>> 收起<<
Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.13MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注