Causal Structural Hypothesis Testing and Data Generation Models Jeffrey Jiang

2025-04-30 0 0 1.78MB 16 页 10玖币
侵权投诉
Causal Structural Hypothesis Testing and Data
Generation Models
Jeffrey Jiang
jimmery@ucla.edu
Omead Pooladzandi
opooladz@ucla.edu
Sunay Bhat
sunaybhat1@ucla.edu
Gregory Pottie
pottie@ee.ucla.edu
Abstract
A vast amount of expert and domain knowledge is captured by causal structural
priors, yet there has been little research on testing such priors for generalization
and data synthesis purposes. We propose a novel model architecture, Causal Struc-
tural Hypothesis Testing, that can use nonparametric, structural causal knowledge
and approximate a causal model’s functional relationships using deep neural net-
works. We use these architectures for comparing structural priors, akin to hy-
pothesis testing, using a deliberate (non-random) split of training and testing data.
Extensive simulations demonstrate the effectiveness of out-of-distribution gener-
alization error as a proxy for causal structural prior hypothesis testing and offers
a statistical baseline for interpreting results. We show that the variational version
of the architecture, Causal Structural Variational Hypothesis Testing can improve
performance in low SNR regimes. Due to the simplicity and low parameter count
of the models, practitioners can test and compare structural prior hypotheses on
small dataset and use the priors with the best generalization capacity to synthesize
much larger, causally-informed datasets. Finally, we validate our methods on a
synthetic pendulum dataset, and show a use-case on a real-world trauma surgery
ground-level falls dataset. Our code is available on GitHub.2
1 Introduction
In most scientific fields, causal information is considered an invaluable prior with strong generaliza-
tion properties and is the product of experimental intervention or domain expertise. These priors can
be in a structural causal model (SCM) form that instantiates unidirectional relationships between
variables using a Directed Acyclic Graph (DAG) [1]. The confidence in causal models needs to be
higher than in a statistical model, as its relationships are invariant and preserved outside the data
domain. In fields such as medicine or economics, where ground truth is often unavailable, domain
experts are relied on to hypothesize and test causal models using experiments or observational data.
Generative models have been crucial to solving many problems in modern machine learning [2]
and generating useful synthetic datasets. Causal generative models learn or use causal informa-
tion for generating data, producing more interpretable results, and tackling biased datasets [3–5].
Recently, [6] introduces a Causal Layer, which allows for direct interventions to generate images
outside the distribution of the training dataset in its CausalVAE framework. Another method, Causal
Counterfactual Generative Modeling (CCGM), in which exogeneity priors are included, extends the
counterfactual modeling capabilities to test alternative structures and “de-bias” datasets [7].
Equal Contribution, Department of Electrical and Computer Engineering, University of California, Los
Angeles
2https://github.com/SunayBhat1/Causal-Structural-Hypothesis-Testing
NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research.
arXiv:2210.11275v2 [cs.LG] 4 Nov 2022
CausalVAE and CCGM focus on causal discovery concurrently with simulation (i.e. reconstruction
error-based training). But in many real-world applications, a causal model is available or readily hy-
pothesized. It is often of interest to test various causal model hypotheses not only for in-distribution
(ID) test data performance, but for generalization to out-of-distribution (OOD) test data. Thus we
propose CSHTEST and CSVHTEST, which are causally constrained architectures that forgo struc-
tural causal discovery (but not the functional approximation) for causal hypothesis testing. Com-
bined with comprehensive non-random dataset splits to test generalization to non-overlapping dis-
tributions, we allow for a systematic way to test structural causal hypotheses and use those models
to generate synthetic data outside training distributions.
2 Background
2.1 Causality and Model Hypothesis Testing
Causality literature has detailed the benefits of interventions and counterfactual modeling once a
causal model is known. Given a structural prior, a causal model can tell us what parameters are
identifiable from observational data alone, subject to a no-confounders and conditioning criterion
determined by d-separation rules [1]. Because the structural priors are not known to be ground truth,
we assume a more deterministic functional form and we can make no assumptions about identifia-
bility [8]. Instead, we rely on deep neural networks to approximate the functional relationships and
use empirical results to demonstrate the reliability of this method to compare structural hypotheses
in low-data environments.
Structural causal priors are primarily about the ordering and absence of connections between vari-
ables. It is the absence of a certain edge that prevents information flow, reducing the likelihood that
spurious connections are learned within the training dataset distribution. Thus, when comparing our
architecture to traditional deep learning prediction and generative models, we show how hypothe-
sized causal models might perform worse when testing within the same distribution as the training
data, but drastically improve generalization performance when splitting the test and train distribu-
tions to have less overlap. This effect is seen the most in small datasets where traditional deep
learning methods, absent causal priors, can “memorize” spurious patterns in the data and vastly
overfit the training distribution [9].
Our architectures explore the use of the causal layer, provided with priors, as a hypothesis-testing
space. Both CSHTEST and CSVHTEST accept non-parametric (structural only, no functional-form
or parameters) causal priors as a binary Structural Causal Model (SCM) and use deep learning to ap-
proximate the functional relationships that minimize a means-squared reconstruction error (MSE).
Our empirical results show the benefits of testing structural priors using these architectures to estab-
lish a baseline for comparison where stronger causal assumptions cannot be satisfied.
3 Causal Hypothesis Gen and Variational Model
3.1 Causal Hypothesis Testing with CSHTEST
Our model CSHTEST, uses a similar causal layer as in both CCGM and CausalVAE [6, 7]. The
causal layer consists of a structural prior matrix Sfollowed by non-linear functions defined by
MLPs. We define the structural prior S∈ {0,1}d×dso that Sis the sum of a DAG term and a
diagonal term:
S=A
|{z}
DAG
+D
|{z}
diag.
(1)
Arepresents a DAG adjacency matrix, usually referred to as the causal structural model in literature,
and Dhas 1 on the diagonal for exogenous variables and 0 if endogenous. Then, given tabular inputs
x
x
xRd,Sij is an indicator determining whether variable iis a parent of variable j.
From the structural prior S, each of the input variables is “selected” to be parents of output variables
through a Hadamard product with the features x
x
x. For each output variable, its parents are passed
through a non-linear ηfully connected neural-network. The ηnetworks are trained as general func-
tion approximators, learning to approximate the relationships between parent & child nodes:
ˆ
x
x
xi=ηi(Six
x
x)(2)
2
where Sirepresents the i-th column vector of A, and ˆ
x
x
xiis the i-th reconstructed output [10]. In the
case of exogenous variable x
x
xi, a corresponding 1 at Dii, ‘leaks’ the variable through, encouraging η
to learn the identity function while a 0 value forces the network to learn some functional relationship
of its parents. The end-to-end structure, as seen in Figure 1, is trained on a reconstruction loss,
defined by `(x
x
x, ˆ
x
x
x). We use the L2 loss (Mean Squared Error):
`CSHTEST =||x
x
xηi(Six
x
x)||2
2(3)
CSHTEST can be used, then, to operate as a structural hypothesis test mechanism for two structural
causal models Sand T. The basic idea is that if `S< `T, across the majority of non-random
OOD dataset splits for training and testing, then Sis a more suitable hypothesis for the true causal
structure of the data than T. In section 4.3 we demonstrate the ID, OOD train/test splits to test this
generalization capacity, and our experimental results provide baselines for this approach.
x4
x1
x2
Structural Prior
101 1
0111
0000
0000
Structural Prior Matrix (S)
0.57 0.66 0.43 1.78
... ... ... ...
... ... ... ...
... ... ... ...
Input Data Batch (x)
0.57
0.66
0
0
Trainable ηNetworks
(ˆ
x
x
x4=η4(x
x
x1, x
x
x2))
Figure 1: Causal Hypothesis Generative Architecture (CSHTEST) with an example of how the
Structural Prior Matrix selects for the parents of each variable or identity if it is exogenous. The η
networks approximate the functional relationships in training.
3.2 Causal Variational Hypothesis Testing with CSVHTEST
We extend CSHTEST to a variational model CSVHTEST, that includes sampling functionality like
a VAE [2]. We do this primarily for a more robust model in low Signal-to-Noise (SNR) regimes
and to generate new data points that are not deterministic on the inputs, allowing for more dynamic
synthetic data generation. CSVHTEST consists of an encoder, a CSHTEST causal layer and a
decoder. Further details are provided in the appendix A.3.1.
4 Problem Setting
4.1 Structural Hamming Distance
In causal and graph discovery literature, the Structural Hamming Distance is a common metric
to differentiate causal models by the number of edge modifications (flips in a binary matrix) to
transform one graph to another [11, 12], often described as the norm of the difference between
adjacency matrices:
H=|AiAj|1(4)
However, Structural Hamming Distance does not account for the “causal asymmetry. The absence
of edges is a more profound statement than inclusion, as any edge could have a weight of zero.
Hence we define two types of hypotheses that are incorrect relative to ground truth, which could
have the same Structural Hamming Distances:
Leaky hypotheses are causal hypotheses with extra links. In general, having a leaky hy-
pothesis will produce models that are more prone to overfitting, but with proper weighting,
the solution space of a leaky causal hypothesis includes the ground truth causal structure.
3
Lossy hypotheses are causal hypotheses where we are missing at least one link. Lossy
hypotheses are much easier to detect because a lossy hypothesis results in lost information.
As such, a lossy hypothesis should never do better than the true hypothesis, within finite
sampling and noise errors.
From these definitions, we define the Positive Structural Hamming Distance and the Negative Struc-
tural Hamming Distance. We define these as, for null hypothesis A0and alternative A1,
H+(A1,A0) = |A1>A0|1H(A1,A0) = |A1<A0|1(5)
where H+counts how leaky the alternative hypothesis is and Hcounts how lossy it is. One remark
is that H=H++H, but the “net” Hamming Distance H=H+Hcan also be a naïve
indicator of how much information is passed through the causal layer.
4.2 Baseline Models
4.2.1 Simulated DAG Baselines
We empirically test our theory that an incorrect hypothesis will result in worse OOD test error using
extensive simulations. We use the same methodology as [13], simulating across multiple DAG nodes
sizes, edge counts, OOD variable splits (described further in 4.3), and Structural Hamming Distance
with iterations at the ground truth and modified DAG levels for robustness. In our experimental
results, we calculate the probability a Hof 1 closer to ground truth would have a lower OOD test
error as the ratio across our simulations:
Pr(`CSHTEST (Sj)< `CSHTES T (Si)) 1 = |AiAGT |1− |AjAGT |1(6)
where GT is ground truth, and so on for differences 2 and 3. In practice, we actually consider the
probability conditional on a tuple of the positive and negative Hamming distances (H+,H) thus
allowing us to distinguish hypotheses that are leakier,lossier, or the specific mix of the two. Doing
so allows us to better consider the fundamental asymmetry in causality. Full hyperparameters and
test cases can be found in Appendix A.5.
4.2.2 Sun Pendulum Image Dataset
A synthetic pendulum image dataset is introduced in [6] and we use it here to produce a physics-
based tabular dataset where we know the ground truth DAG and can test the abilities of CSHTEST
and CSVHTEST. More about the dataset is described in Appendix A.2.1.
4.2.3 Medical Trauma Dataset
We also analyze our model on a real-world dataset of brain-trauma ground-level fall patients that
includes multiple health factors, with a focus on predicting a decision to proceed with surgery or
not. We used an initial SHAP analysis to select three variables of high prediction impact: Glasgow
Coma Scale/Score for head trauma severity (GCS), Diastolic Blood Pressure (DBP), the presence of
any Co-Morbidities (Co-Morb), one demographic variable Age, along with the Surgery outcome of
interest. Without the ground truth, we test two structural models shown in 2 based on knowledge of
the selected variables and how they may interact to inform the surgery decision.
H1
GCS Age
DBP
Co-
Morb
Surgery
H2
GCS Age
DBP
Co-
Morb
Surgery
Figure 2: Two hypothesized structural causal priors for a medical dataset on trauma patients and the
decision to perform surgery, H1 and H2.
4
摘要:

CausalStructuralHypothesisTestingandDataGenerationModelsJeffreyJiangjimmery@ucla.eduOmeadPooladzandiopooladz@ucla.eduSunayBhatsunaybhat1@ucla.eduGregoryPottiepottie@ee.ucla.eduAbstractAvastamountofexpertanddomainknowledgeiscapturedbycausalstructuralpriors,yettherehasbeenlittleresearchontestingsu...

展开>> 收起<<
Causal Structural Hypothesis Testing and Data Generation Models Jeffrey Jiang.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.78MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注