Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data

2025-05-06 0 0 2.83MB 21 页 10玖币
侵权投诉
Functional Bayesian Networks for
Discovering Causality from Multivariate
Functional Data
Fangting Zhou1,2, Kejun He2,, Kunbo Wang3, Yanxun Xu3, and Yang Ni1,
1Department of Statistics, Texas A&M University, College Station, Texas,
U.S.A.
2Institute of Statistics and Big Data, Renmin University of China, Beijing,
China
3Department of Applied Mathematics and Statistics, Johns Hopkins
University, Baltimore, Maryland, U.S.A.
Email: kejunhe@ruc.edu.cn, yni@stat.tamu.edu
Abstract
Multivariate functional data arise in a wide range of applications. One funda-
mental task is to understand the causal relationships among these functional objects
of interest, which has not yet been fully explored. In this article, we develop a
novel Bayesian network model for multivariate functional data where the conditional
independence and causal structure are both encoded by a directed acyclic graph.
Specifically, we allow the functional objects to deviate from Gaussian process, which
is adopted by most existing functional data analysis models. The more reasonable
non-Gaussian assumption is the key for unique causal structure identification even
when the functions are measured with noises. A fully Bayesian framework is designed
to infer the functional Bayesian network model with natural uncertainty quantifica-
tion through posterior summaries. Simulation studies and real data examples are
used to demonstrate the practical utility of the proposed model.
Keywords: Causal discovery, Directed acyclic graphs, Multivariate longitudinal/func-
tional data, Non-Gaussianity, Structure learning.
1 Introduction
This article develops a novel functional Bayesian network for modeling directed conditional
independence and causal relationships of multivariate functional data, which arise in a
wide range of applications. For example, learning brain effective connectivity networks
from electroencephalogram (EEG) records is crucial for understanding brain activities and
neuron responses. Another example is longitudinal medical studies where multiple clinical
1
arXiv:2210.12832v1 [stat.ME] 23 Oct 2022
variables are recorded at possibly distinct time points across variables and/or patients.
Knowing causal dependence of these clinical variables may help physicians decide the right
interventions. Functional data can also go beyond those defined on time domain e.g., spatial
domain (environmental data, spatially-resolved genomics, etc).
Joint analysis of multiple functional objects has attracted great attention in recent years
with focuses mainly on reducing dimensionality and capturing functional dependence. For
instance, Kowal et al. (2017) and Kowal (2019) proposed to model time-ordered func-
tional data through a time-varying parameterization for functional time series. Using basis
transformation strategies, Zhang et al. (2016) built an autoregressive model for spatially
correlated functional data, while Lee et al. (2018) modeled functional data in serial cor-
relation semiparametrically. Chiou and M¨uller (2014) developed a linear manifold model
characterizing the functional dependence between multiple random processes.
Functional Graphical Models In a similar but conceptually different manner, func-
tional graphical models have been recently proposed to model conditional independence of
multivariate functional data. Graphical models gives rise to compact probabilistic repre-
sentation of high-dimensional data through the graph-encoded conditional independence
constraints. One key challenge is that the graph is typically unknown and must be inferred
from data. While graphical models have been extensively studied for vector- and matrix-
variate data (Yuan and Lin, 2007; Wang and West, 2009; Leng and Tang, 2012; Ni et al.,
2017), only recently have there been several developments for the functional data. Zhu
et al. (2016) extended Markov and hyper Markov laws of decomposable undirected graphs
for random vectors to those for random functions. Qiao et al. (2019) adopted the group
lasso penalty on the precision matrix of coefficients extracted from the basis expansion of
functions. Zapata et al. (2022) introduced the idea of partial separability to reduce the
computational cost of Qiao et al. (2019). Qiao et al. (2020) further extended Qiao et al.
(2019) and proposed to characterize the time-varying conditional independence of random
functions through smoothing techniques. To relax the Gaussian process assumption of the
aforementioned methods, Li and Solea (2018), Solea and Li (2022), and Lee et al. (2022)
proposed models based on additive conditional independence and copula Gaussian models.
Despite these exciting developments of functional undirected graphical models, the work
on functional directed graphical models is sparse. Generally, undirected graphs admit a
different set of conditional independence constraints from directed graphs. For example,
the directed graph in Figure 1a implies X2X3but X26⊥ X3|X1, yet there exists no
undirected counterpart that admits the same set of conditional (in)dependence assertions.
More importantly, causal discovery (i.e., generation of plausible causal hypotheses) is only
possible with directed graphs given additional causal assumptions (Pearl, 2000). To the best
of our knowledge, the functional structural equation model recently proposed by Lee and
Li (2022) is the only work that infers directional relationships from multivariate functional
data. However, as will become evident in Section 3 and 4, our model differs from theirs in
several significant aspects.
Causal Discovery As hinted earlier, one of the two important problems we intend to
address in this work is discovering causality from functional observations. Causal discovery
is one of the first steps to investigate the physical mechanism that governs the operation
and dynamics of an unknown system. Given the learned causal knowledge, subsequent
2
causal inference (e.g., deriving the interventional and counterfactual distributions) can be
conducted under the celebrated do-calculus framework (Pearl, 2000). Therefore, inferring
causal relationships potentially has more significant scientific impacts than learning associa-
tions since it may help answer fundamental questions about the nature. Bayesian networks
paired with causal assumptions are among the most popular approaches in identifying
unknown causal structure represented by a directed acyclic graph (DAG). One pressing
obstacle of using Bayesian networks to discover causality from purely observational data
is that in general, only Markov equivalence classes (MEC) can be learned based on con-
ditional independence constraints alone. Causal interpretations of members in the same
MEC can be drastically different, and, generally, only bounds on causal effects can be cal-
culated (Maathuis et al., 2009). For example, the three DAGs in Figure 1b constitute an
MEC with the only conditional independence X2X3|X1, but the causal directions are
completely reversed in the last graph compared to the first one.
X3
X1
X2
(a)
X1
X2
X3
X1
X2
X3
X1
X2
X3
(b)
Figure 1: Two Markov equivalence classes. (a) X2X3. (b) X2X3|X1.
Since 2006, numerous researchers, however, have found that causal discovery (unique
causal structure identification) is indeed possible with additional distributional assumptions
on the data generating process, at least for finite-dimensional data. Examples include but
are not limited to linear non-Gaussian models (LiNGAM, Shimizu et al. 2006), non-linear
additive noise models (Hoyer et al., 2008), and linear Gaussian models with equal error
variances (Peters and B¨uhlmann, 2014). See more related methods in a recent book of
Peters et al. (2017). Although remarkable progresses have been made in the causal discovery
area for traditional finite-dimensional data, what remains lacking is method capable of
discovering causality from general, purely observational, multivariate functional data. We
remark that given a known causal graph, there are existing approaches that can be used to
infer causal effects. For example, Lindquist (2012) developed a causal mediation analysis
framework where the treatment and outcome are scalars and the mediator is a univariate
random function. Our scope is substantially different from this line of works in that we do
not assume the causal graph to be known; in fact, learning the causal graph structure is
precisely the focus of this paper.
Proposed Functional Bayesian Networks We propose a novel functional Bayesian
network model for multivariate functional data for which the conditional independence
and causal relationships are represented by a DAG. As one would expect, the proposed
functional Bayesian network factorizes over the DAG and respects all directed Markov
properties (i.e., conditional independence constraints) encoded in the DAG via the notion
of d-separation. Then for ease of exposition, we reformulate the proposed Bayesian network
constructed in the functional space to an equivalent Bayesian network defined on the space
3
of basis coefficients via basis expansion. Because in practice, functional data are almost
always observed with noises, two essential ingredients are built in the proposed Bayesian
networks to capture the functional dependence and to learn the causal structure. First, we
capture the within-function dependence through a set of orthonormal basis functions chosen
in a data-driven way. The resulting basis functions are interpretable and computationally
efficient. Second, we encode the unknown causal structure by a structural equation model
on the basis coefficients. Due to the equivalence of probability measures on the functional
space and the space of basis coefficients, the conditional independence and causal rela-
tionships naturally transform back to the original random functions. To allow for unique
DAG identification, we move away from the Gaussian process assumption often adopted
by the existing functional graphical models and instead assume our random functions are
generated from a discrete scale mixture of Gaussian distributions. We theoretically prove
and empirically verify that the unique DAG identification is indeed possible even when the
functions are observed with noises.
To conduct inference and uncertainty quantification from a finite amount of data, the
proposed model is based on a Bayesian hierarchical formulation with carefully chosen
prior distributions. Posterior inference is carried out through Markov chain Monte Carlo
(MCMC). We perform simulation studies to demonstrate the capability of the proposed
model in recovering causal structure and key parameters of interest. A real data analy-
sis with brain EEG records illustrates the applicability of the proposed framework in real
world. We also apply the proposed model to a COVID-19 multivariate longitudinal dataset
(shown in Section D of the Supplementary Material).
The rest of the paper is structured as follows. We provide an overview of Bayesian
networks in Section 2. The proposed functional Bayesian network is introduced in Section
3, which includes elaborations of the functional linear non-Gaussian model (Section 3.2) and
the causal identifiability theory (Section 3.3). Section 4 is devoted to Bayesian inference of
the proposed model. We provide simulation studies and applications in Sections 5 and 6,
respectively. The main contributions of this paper are summarized in Section 7 with some
concluding remarks.
2 Overview of Bayesian Networks
Throughout the paper, vectors and matrices are boldfaced whereas scalars and sets are not.
DAGs and Bayesian Networks Let X= (X1, . . . , Xp)T∈ X1× · · · × Xpdenote a
p-dimensional random vector. Denote [m] := {1, . . . , m}for any integer m1. Let
XS= (Xj)jSbe a subvector of Xwith S[p]. A DAG G= (V, E) consists of a set
of nodes V= [p] and a set of directed edges represented by a binary adjacency matrix
E= (Ej`) where Ej` = 1 if and only if `jfor `6=jV. DAGs do not allow directed
cycles j0j1 · · · jk=j0. Each node jVrepresents a random variable Xj∈ Xj;
we may use jand Xjinterchangeably when no ambiguity arises. Each directed edge `j
and the lack thereof represent conditional dependence and independence of X`and Xj,
respectively. Note that although Xjis often a scalar but it does not need to be. In fact,
Xjis a random function or an infinite dimensional random vector in this article. Denote
paG(j) = {`V:`j}the set of parents of jin graph G. A Bayesian network (BN)
4
B= (G, P ) on Xis a probability model where the joint probability distribution Pof X
factorizes with respect to Gin the following manner,
P(X) =
p
Y
j=1
Pj(Xj|XpaG(j)),(1)
where Pjis the conditional distribution of Xjgiven XpaG(j)under P. Let deG(j) = {`
V:j→ · · · → `}denote the descendants of jin Gand let ndG(j) = V\deG(j)\{j}denote
the non-descendants of j. The BN factorization (1) directly implies the local directed
Markov property – any variable is conditionally independent of its non-descendants given
its parents, XjXndG(j)/paG(j)|XpaG(j),j[p]. In fact, the reverse is also true: if a
distribution Prespects the local Markov property according to a DAG G, then Pmust
factorize over Gas in (1). In summary, BN factorization and local Markov property are
equivalent. We may omit the subscript Gof paG(j) and ndG(j) and simply write pa(j) and
nd(j) instead when Gis clear from the context.
Causal DAGs and Causal Bayesian Networks A causal DAG Gis a DAG except that
the directed edges are now interpreted causally, i.e., we say X`is a direct cause (with respect
to V) of Xjand Xjis a direct effect of X`if `j. For simplicity, we will overload nd(j) and
pa(j) to denote the noneffects and directed causes of jin a causal DAG. To define a causal
BN, we begin by asserting the local causal Markov assumption (Spirtes et al., 2000; Pearl,
2000) – given a causal DAG G, a variable is conditionally independent of its noneffects given
its direct causes. By noting the correspondence between noneffects and non-descendants,
and between direct causes and parents in DAGs and causal DAGs, the local causal Markov
assumption simply states that the distribution Pof Xrespects the local Markov property
of the causal DAG G, which in turn implies that Pmust also factorize over G(recall the
equivalence between BN factorization and local Markov property). Therefore, a causal BN
B= (G, P ) is a probability model where Pfactorizes with respect to a causal DAG Gin
the same way as in (1).
Structural Equation Representation of Bayesian Networks A BN is often repre-
sented by a structural equation model (SEM),
Xj=fj(X, j),j[p],
where the transformation fjdepends on Xonly through its parents/direct causes Xpa(j),
and the exogenous variables = (1, . . . , p)TPare assumed to be mutually independent.
Denote the set of transformation functions as F={f1, . . . , fp}. Since Fand Pinduce the
joint distribution Pof Xand it is not difficult to show that the induced distribution P
factorizes over G, with a slight abuse of notation, we can rewrite the BN as B= (G, F, P).
5
摘要:

FunctionalBayesianNetworksforDiscoveringCausalityfromMultivariateFunctionalDataFangtingZhou1;2,KejunHe2;,KunboWang3,YanxunXu3,andYangNi1;1DepartmentofStatistics,TexasA&MUniversity,CollegeStation,Texas,U.S.A.2InstituteofStatisticsandBigData,RenminUniversityofChina,Beijing,China3DepartmentofAppliedM...

展开>> 收起<<
Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:2.83MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注