
2.2 CAUSAL DISCOVERY
Structure learning in prior work refers to learning a DAG according to some optimization criterion
with or without the notion of causality (e.g., He et al. (2019)). The task of causal discovery on
the other hand, is more specific in that it refers to learning the structure (also parameters, in some
cases) of SCMs, and subscribes to causality and interventions like that of Pearl (2009). That is, the
methods aim to estimate (G,Θ). These approaches often resort to modular likelihood scores over
causal variables – like the BGe score (Geiger and Heckerman,1994;Kuipers et al.,2022) and BDe
score (Heckerman et al.,1995) – to learn the right structure. However, these methods all assume a
dataset of observed causal variables. These approaches either obtain a maximum likelihood estimate,
G∗= arg max
G
p(Z| G)or (G∗,Θ∗) = arg max
G,Θ
p(Z| G,Θ) ,(3)
or in the case of Bayesian causal discovery (Heckerman et al.,1997), variational inference is typi-
cally used to approximate a joint posterior distribution qφ(G,Θ) to the true posterior p(G,Θ|Z)by
minimizing the KL divergence between the two,
DKL(qφ(G,Θ) || p(G,Θ|Z)) = −E(G,Θ)∼qφlog p(Z| G,Θ) −log qφ(G,Θ)
p(G,Θ) ,(4)
where p(G,Θ) is a prior over the structure and parameters of the SCM – possibly encoding DAGness,
sparse connections, or low-magnitude edge weights. Figure 2shows the Bayesian Network (BN)
over which inference is performed for causal discovery tasks.
2.3 LATENT CAUSAL DISCOVERY
x
z
GΘ
N
Figure 3: BN for the latent causal
discovery task that generalizes stan-
dard causal discovery setups
In more realistic scenarios, the learner does not directly ob-
serve causal variables and they must be learned from low-
level data. The causal variables, structure, and parameters are
part of a latent SCM. The goal of causal representation learn-
ing models is to perform inference of, and generation from,
the true latent SCM. Yang et al. (2021) proposes a Causal VAE
but is in a supervised setup where one has labels on causal
variables and the focus is on disentanglement. Kocaoglu et al.
(2017) present causal generative models trained in an adver-
sarial manner but assumes observations of causal variables.
Given the right causal structure as a prior, the work focuses on
generation from conditional and interventional distributions.
In both the causal representation learning and causal genera-
tive model scenarios mentioned above, the Ground Truth (GT)
causal graph and parameters of the latent SCM are arbitrarily defined on real datasets and the setting
is supervised. Contrary to this, our setting is unsupervised and we are interested in recovering the
GT underlying SCM and causal variables that generate the low-level observed data – we define this
as the problem of latent causal discovery, and the BN over which we want to perform inference
on is given in figure 3. In the upcoming sections, we discuss related work, formulate our prob-
lem setup and propose an algorithm for Bayesian latent causal discovery, evaluate with experiments
on causally created vector data and image data, and perform sampling from unseen interventional
image distributions to showcase generalization of learned latent SCMs.
3 RELATED WORK
Prior work can be classified into Bayesian (Koivisto and Sood,2004;Heckerman et al.,2006;Fried-
man and Koller,2013) or maximum likelihood (Brouillard et al.,2020;Wei et al.,2020;Ng et al.,
2022) methods, that learn the structure and parameters of SCMs using either score-based (Kass and
Raftery,1995;Barron et al.,1998;Heckerman et al.,1995) or constraint-based (Cheng et al.,2002;
Lehmann and Romano,2005) approaches.
Causal discovery and structure learning: Work in this category assume causal variables are ob-
served and do not operate on low-level data (Spirtes et al.,2000;Viinikka et al.,2020;Yu et al.,
3