Causal Structural Hypothesis Testing and Data Generation Models Jeffrey Jiang

2025-04-30 0 0 1.78MB 16 页 10玖币

侵权投诉

Causal Structural Hypothesis Testing and Data

Generation Models

Jeffrey Jiang∗

jimmery@ucla.edu

Omead Pooladzandi∗

opooladz@ucla.edu

Sunay Bhat∗

sunaybhat1@ucla.edu

Gregory Pottie∗

pottie@ee.ucla.edu

Abstract

A vast amount of expert and domain knowledge is captured by causal structural

priors, yet there has been little research on testing such priors for generalization

and data synthesis purposes. We propose a novel model architecture, Causal Struc-

tural Hypothesis Testing, that can use nonparametric, structural causal knowledge

and approximate a causal model’s functional relationships using deep neural net-

works. We use these architectures for comparing structural priors, akin to hy-

pothesis testing, using a deliberate (non-random) split of training and testing data.

Extensive simulations demonstrate the effectiveness of out-of-distribution gener-

alization error as a proxy for causal structural prior hypothesis testing and offers

a statistical baseline for interpreting results. We show that the variational version

of the architecture, Causal Structural Variational Hypothesis Testing can improve

performance in low SNR regimes. Due to the simplicity and low parameter count

of the models, practitioners can test and compare structural prior hypotheses on

small dataset and use the priors with the best generalization capacity to synthesize

much larger, causally-informed datasets. Finally, we validate our methods on a

synthetic pendulum dataset, and show a use-case on a real-world trauma surgery

ground-level falls dataset. Our code is available on GitHub.2

1 Introduction

In most scientiﬁc ﬁelds, causal information is considered an invaluable prior with strong generaliza-

tion properties and is the product of experimental intervention or domain expertise. These priors can

be in a structural causal model (SCM) form that instantiates unidirectional relationships between

variables using a Directed Acyclic Graph (DAG) [1]. The conﬁdence in causal models needs to be

higher than in a statistical model, as its relationships are invariant and preserved outside the data

domain. In ﬁelds such as medicine or economics, where ground truth is often unavailable, domain

experts are relied on to hypothesize and test causal models using experiments or observational data.

Generative models have been crucial to solving many problems in modern machine learning [2]

and generating useful synthetic datasets. Causal generative models learn or use causal informa-

tion for generating data, producing more interpretable results, and tackling biased datasets [3–5].

Recently, [6] introduces a Causal Layer, which allows for direct interventions to generate images

outside the distribution of the training dataset in its CausalVAE framework. Another method, Causal

Counterfactual Generative Modeling (CCGM), in which exogeneity priors are included, extends the

counterfactual modeling capabilities to test alternative structures and “de-bias” datasets [7].

∗Equal Contribution, Department of Electrical and Computer Engineering, University of California, Los

Angeles

2https://github.com/SunayBhat1/Causal-Structural-Hypothesis-Testing

NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research.

arXiv:2210.11275v2 [cs.LG] 4 Nov 2022

CausalVAE and CCGM focus on causal discovery concurrently with simulation (i.e. reconstruction

error-based training). But in many real-world applications, a causal model is available or readily hy-

pothesized. It is often of interest to test various causal model hypotheses not only for in-distribution

(ID) test data performance, but for generalization to out-of-distribution (OOD) test data. Thus we

propose CSHTEST and CSVHTEST, which are causally constrained architectures that forgo struc-

tural causal discovery (but not the functional approximation) for causal hypothesis testing. Com-

bined with comprehensive non-random dataset splits to test generalization to non-overlapping dis-

tributions, we allow for a systematic way to test structural causal hypotheses and use those models

to generate synthetic data outside training distributions.

2 Background

2.1 Causality and Model Hypothesis Testing

Causality literature has detailed the beneﬁts of interventions and counterfactual modeling once a

causal model is known. Given a structural prior, a causal model can tell us what parameters are

identiﬁable from observational data alone, subject to a no-confounders and conditioning criterion

determined by d-separation rules [1]. Because the structural priors are not known to be ground truth,

we assume a more deterministic functional form and we can make no assumptions about identiﬁa-

bility [8]. Instead, we rely on deep neural networks to approximate the functional relationships and

use empirical results to demonstrate the reliability of this method to compare structural hypotheses

in low-data environments.

Structural causal priors are primarily about the ordering and absence of connections between vari-

ables. It is the absence of a certain edge that prevents information ﬂow, reducing the likelihood that

spurious connections are learned within the training dataset distribution. Thus, when comparing our

architecture to traditional deep learning prediction and generative models, we show how hypothe-

sized causal models might perform worse when testing within the same distribution as the training

data, but drastically improve generalization performance when splitting the test and train distribu-

tions to have less overlap. This effect is seen the most in small datasets where traditional deep

learning methods, absent causal priors, can “memorize” spurious patterns in the data and vastly

overﬁt the training distribution [9].

Our architectures explore the use of the causal layer, provided with priors, as a hypothesis-testing

space. Both CSHTEST and CSVHTEST accept non-parametric (structural only, no functional-form

or parameters) causal priors as a binary Structural Causal Model (SCM) and use deep learning to ap-

proximate the functional relationships that minimize a means-squared reconstruction error (MSE).

Our empirical results show the beneﬁts of testing structural priors using these architectures to estab-

lish a baseline for comparison where stronger causal assumptions cannot be satisﬁed.

3 Causal Hypothesis Gen and Variational Model

3.1 Causal Hypothesis Testing with CSHTEST

Our model CSHTEST, uses a similar causal layer as in both CCGM and CausalVAE [6, 7]. The

causal layer consists of a structural prior matrix Sfollowed by non-linear functions deﬁned by

MLPs. We deﬁne the structural prior S∈ {0,1}d×dso that Sis the sum of a DAG term and a

diagonal term:

S=A

|{z}

DAG

|{z}

diag.

(1)

Arepresents a DAG adjacency matrix, usually referred to as the causal structural model in literature,

and Dhas 1 on the diagonal for exogenous variables and 0 if endogenous. Then, given tabular inputs

x∈Rd,Sij is an indicator determining whether variable iis a parent of variable j.

From the structural prior S, each of the input variables is “selected” to be parents of output variables

through a Hadamard product with the features x

x. For each output variable, its parents are passed

through a non-linear ηfully connected neural-network. The ηnetworks are trained as general func-

tion approximators, learning to approximate the relationships between parent & child nodes:

xi=ηi(Si◦x

x)(2)

where Sirepresents the i-th column vector of A, and ˆ

xiis the i-th reconstructed output [10]. In the

case of exogenous variable x

xi, a corresponding 1 at Dii, ‘leaks’ the variable through, encouraging η

to learn the identity function while a 0 value forces the network to learn some functional relationship

of its parents. The end-to-end structure, as seen in Figure 1, is trained on a reconstruction loss,

deﬁned by `(x

x, ˆ

x). We use the L2 loss (Mean Squared Error):

`CSHTEST =||x

x−ηi(Si◦x

x)||2

2(3)

CSHTEST can be used, then, to operate as a structural hypothesis test mechanism for two structural

causal models Sand T. The basic idea is that if `S< `T, across the majority of non-random

OOD dataset splits for training and testing, then Sis a more suitable hypothesis for the true causal

structure of the data than T. In section 4.3 we demonstrate the ID, OOD train/test splits to test this

generalization capacity, and our experimental results provide baselines for this approach.

Structural Prior

101 1

0111

0000

Structural Prior Matrix (S)

0.57 0.66 0.43 1.78

... ... ... ...

Input Data Batch (x)

◦

0.57

0.66

Trainable ηNetworks

(ˆ

x4=η4(x

x1, x

x2))

Figure 1: Causal Hypothesis Generative Architecture (CSHTEST) with an example of how the

Structural Prior Matrix selects for the parents of each variable or identity if it is exogenous. The η

networks approximate the functional relationships in training.

3.2 Causal Variational Hypothesis Testing with CSVHTEST

We extend CSHTEST to a variational model CSVHTEST, that includes sampling functionality like

a VAE [2]. We do this primarily for a more robust model in low Signal-to-Noise (SNR) regimes

and to generate new data points that are not deterministic on the inputs, allowing for more dynamic

synthetic data generation. CSVHTEST consists of an encoder, a CSHTEST causal layer and a

decoder. Further details are provided in the appendix A.3.1.

4 Problem Setting

4.1 Structural Hamming Distance

In causal and graph discovery literature, the Structural Hamming Distance is a common metric

to differentiate causal models by the number of edge modiﬁcations (ﬂips in a binary matrix) to

transform one graph to another [11, 12], often described as the norm of the difference between

adjacency matrices:

H=|Ai−Aj|1(4)

However, Structural Hamming Distance does not account for the “causal asymmetry.” The absence

of edges is a more profound statement than inclusion, as any edge could have a weight of zero.

Hence we deﬁne two types of hypotheses that are incorrect relative to ground truth, which could

have the same Structural Hamming Distances:

•Leaky hypotheses are causal hypotheses with extra links. In general, having a leaky hy-

pothesis will produce models that are more prone to overﬁtting, but with proper weighting,

the solution space of a leaky causal hypothesis includes the ground truth causal structure.

•Lossy hypotheses are causal hypotheses where we are missing at least one link. Lossy

hypotheses are much easier to detect because a lossy hypothesis results in lost information.

As such, a lossy hypothesis should never do better than the true hypothesis, within ﬁnite

sampling and noise errors.

From these deﬁnitions, we deﬁne the Positive Structural Hamming Distance and the Negative Struc-

tural Hamming Distance. We deﬁne these as, for null hypothesis A0and alternative A1,

H+(A1,A0) = |A1>A0|1H−(A1,A0) = |A1<A0|1(5)

where H+counts how leaky the alternative hypothesis is and H−counts how lossy it is. One remark

is that H=H++H−, but the “net” Hamming Distance ∆H=H+−H−can also be a naïve

indicator of how much information is passed through the causal layer.

4.2 Baseline Models

4.2.1 Simulated DAG Baselines

We empirically test our theory that an incorrect hypothesis will result in worse OOD test error using

extensive simulations. We use the same methodology as [13], simulating across multiple DAG nodes

sizes, edge counts, OOD variable splits (described further in 4.3), and Structural Hamming Distance

with iterations at the ground truth and modiﬁed DAG levels for robustness. In our experimental

results, we calculate the probability a Hof 1 closer to ground truth would have a lower OOD test

error as the ratio across our simulations:

Pr(`CSHTEST (Sj)< `CSHTES T (Si)) 1 = |Ai−AGT |1− |Aj−AGT |1(6)

where GT is ground truth, and so on for differences 2 and 3. In practice, we actually consider the

probability conditional on a tuple of the positive and negative Hamming distances (H+,H−) thus

allowing us to distinguish hypotheses that are leakier,lossier, or the speciﬁc mix of the two. Doing

so allows us to better consider the fundamental asymmetry in causality. Full hyperparameters and

test cases can be found in Appendix A.5.

4.2.2 Sun Pendulum Image Dataset

A synthetic pendulum image dataset is introduced in [6] and we use it here to produce a physics-

based tabular dataset where we know the ground truth DAG and can test the abilities of CSHTEST

and CSVHTEST. More about the dataset is described in Appendix A.2.1.

4.2.3 Medical Trauma Dataset

We also analyze our model on a real-world dataset of brain-trauma ground-level fall patients that

includes multiple health factors, with a focus on predicting a decision to proceed with surgery or

not. We used an initial SHAP analysis to select three variables of high prediction impact: Glasgow

Coma Scale/Score for head trauma severity (GCS), Diastolic Blood Pressure (DBP), the presence of

any Co-Morbidities (Co-Morb), one demographic variable Age, along with the Surgery outcome of

interest. Without the ground truth, we test two structural models shown in 2 based on knowledge of

the selected variables and how they may interact to inform the surgery decision.

GCS Age

DBP

Co-

Morb

Surgery

GCS Age

DBP

Co-

Morb

Surgery

Figure 2: Two hypothesized structural causal priors for a medical dataset on trauma patients and the

decision to perform surgery, H1 and H2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CausalStructuralHypothesisTestingandDataGenerationModelsJeffreyJiangjimmery@ucla.eduOmeadPooladzandiopooladz@ucla.eduSunayBhatsunaybhat1@ucla.eduGregoryPottiepottie@ee.ucla.eduAbstractAvastamountofexpertanddomainknowledgeiscapturedbycausalstructuralpriors,yettherehasbeenlittleresearchontestingsu...

展开>> 收起<<

Causal Structural Hypothesis Testing and Data Generation Models Jeffrey Jiang.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Causal Structural Hypothesis Testing and Data Generation Models Jeffrey Jiang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: