Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data

2025-05-06 0 0 2.83MB 21 页 10玖币

侵权投诉

Functional Bayesian Networks for

Discovering Causality from Multivariate

Functional Data

Fangting Zhou1,2, Kejun He2,∗, Kunbo Wang3, Yanxun Xu3, and Yang Ni1,∗

1Department of Statistics, Texas A&M University, College Station, Texas,

U.S.A.

2Institute of Statistics and Big Data, Renmin University of China, Beijing,

China

3Department of Applied Mathematics and Statistics, Johns Hopkins

University, Baltimore, Maryland, U.S.A.

Email: kejunhe@ruc.edu.cn, yni@stat.tamu.edu

Abstract

Multivariate functional data arise in a wide range of applications. One funda-

mental task is to understand the causal relationships among these functional objects

of interest, which has not yet been fully explored. In this article, we develop a

novel Bayesian network model for multivariate functional data where the conditional

independence and causal structure are both encoded by a directed acyclic graph.

Speciﬁcally, we allow the functional objects to deviate from Gaussian process, which

is adopted by most existing functional data analysis models. The more reasonable

non-Gaussian assumption is the key for unique causal structure identiﬁcation even

when the functions are measured with noises. A fully Bayesian framework is designed

to infer the functional Bayesian network model with natural uncertainty quantiﬁca-

tion through posterior summaries. Simulation studies and real data examples are

used to demonstrate the practical utility of the proposed model.

Keywords: Causal discovery, Directed acyclic graphs, Multivariate longitudinal/func-

tional data, Non-Gaussianity, Structure learning.

1 Introduction

This article develops a novel functional Bayesian network for modeling directed conditional

independence and causal relationships of multivariate functional data, which arise in a

wide range of applications. For example, learning brain eﬀective connectivity networks

from electroencephalogram (EEG) records is crucial for understanding brain activities and

neuron responses. Another example is longitudinal medical studies where multiple clinical

arXiv:2210.12832v1 [stat.ME] 23 Oct 2022

variables are recorded at possibly distinct time points across variables and/or patients.

Knowing causal dependence of these clinical variables may help physicians decide the right

interventions. Functional data can also go beyond those deﬁned on time domain e.g., spatial

domain (environmental data, spatially-resolved genomics, etc).

Joint analysis of multiple functional objects has attracted great attention in recent years

with focuses mainly on reducing dimensionality and capturing functional dependence. For

instance, Kowal et al. (2017) and Kowal (2019) proposed to model time-ordered func-

tional data through a time-varying parameterization for functional time series. Using basis

transformation strategies, Zhang et al. (2016) built an autoregressive model for spatially

correlated functional data, while Lee et al. (2018) modeled functional data in serial cor-

relation semiparametrically. Chiou and M¨uller (2014) developed a linear manifold model

characterizing the functional dependence between multiple random processes.

Functional Graphical Models In a similar but conceptually diﬀerent manner, func-

tional graphical models have been recently proposed to model conditional independence of

multivariate functional data. Graphical models gives rise to compact probabilistic repre-

sentation of high-dimensional data through the graph-encoded conditional independence

constraints. One key challenge is that the graph is typically unknown and must be inferred

from data. While graphical models have been extensively studied for vector- and matrix-

variate data (Yuan and Lin, 2007; Wang and West, 2009; Leng and Tang, 2012; Ni et al.,

2017), only recently have there been several developments for the functional data. Zhu

et al. (2016) extended Markov and hyper Markov laws of decomposable undirected graphs

for random vectors to those for random functions. Qiao et al. (2019) adopted the group

lasso penalty on the precision matrix of coeﬃcients extracted from the basis expansion of

functions. Zapata et al. (2022) introduced the idea of partial separability to reduce the

computational cost of Qiao et al. (2019). Qiao et al. (2020) further extended Qiao et al.

(2019) and proposed to characterize the time-varying conditional independence of random

functions through smoothing techniques. To relax the Gaussian process assumption of the

aforementioned methods, Li and Solea (2018), Solea and Li (2022), and Lee et al. (2022)

proposed models based on additive conditional independence and copula Gaussian models.

Despite these exciting developments of functional undirected graphical models, the work

on functional directed graphical models is sparse. Generally, undirected graphs admit a

diﬀerent set of conditional independence constraints from directed graphs. For example,

the directed graph in Figure 1a implies X2⊥X3but X26⊥ X3|X1, yet there exists no

undirected counterpart that admits the same set of conditional (in)dependence assertions.

More importantly, causal discovery (i.e., generation of plausible causal hypotheses) is only

possible with directed graphs given additional causal assumptions (Pearl, 2000). To the best

of our knowledge, the functional structural equation model recently proposed by Lee and

Li (2022) is the only work that infers directional relationships from multivariate functional

data. However, as will become evident in Section 3 and 4, our model diﬀers from theirs in

several signiﬁcant aspects.

Causal Discovery As hinted earlier, one of the two important problems we intend to

address in this work is discovering causality from functional observations. Causal discovery

is one of the ﬁrst steps to investigate the physical mechanism that governs the operation

and dynamics of an unknown system. Given the learned causal knowledge, subsequent

causal inference (e.g., deriving the interventional and counterfactual distributions) can be

conducted under the celebrated do-calculus framework (Pearl, 2000). Therefore, inferring

causal relationships potentially has more signiﬁcant scientiﬁc impacts than learning associa-

tions since it may help answer fundamental questions about the nature. Bayesian networks

paired with causal assumptions are among the most popular approaches in identifying

unknown causal structure represented by a directed acyclic graph (DAG). One pressing

obstacle of using Bayesian networks to discover causality from purely observational data

is that in general, only Markov equivalence classes (MEC) can be learned based on con-

ditional independence constraints alone. Causal interpretations of members in the same

MEC can be drastically diﬀerent, and, generally, only bounds on causal eﬀects can be cal-

culated (Maathuis et al., 2009). For example, the three DAGs in Figure 1b constitute an

MEC with the only conditional independence X2⊥X3|X1, but the causal directions are

completely reversed in the last graph compared to the ﬁrst one.

(a)

(b)

Figure 1: Two Markov equivalence classes. (a) X2⊥X3. (b) X2⊥X3|X1.

Since 2006, numerous researchers, however, have found that causal discovery (unique

causal structure identiﬁcation) is indeed possible with additional distributional assumptions

on the data generating process, at least for ﬁnite-dimensional data. Examples include but

are not limited to linear non-Gaussian models (LiNGAM, Shimizu et al. 2006), non-linear

additive noise models (Hoyer et al., 2008), and linear Gaussian models with equal error

variances (Peters and B¨uhlmann, 2014). See more related methods in a recent book of

Peters et al. (2017). Although remarkable progresses have been made in the causal discovery

area for traditional ﬁnite-dimensional data, what remains lacking is method capable of

discovering causality from general, purely observational, multivariate functional data. We

remark that given a known causal graph, there are existing approaches that can be used to

infer causal eﬀects. For example, Lindquist (2012) developed a causal mediation analysis

framework where the treatment and outcome are scalars and the mediator is a univariate

random function. Our scope is substantially diﬀerent from this line of works in that we do

not assume the causal graph to be known; in fact, learning the causal graph structure is

precisely the focus of this paper.

Proposed Functional Bayesian Networks We propose a novel functional Bayesian

network model for multivariate functional data for which the conditional independence

and causal relationships are represented by a DAG. As one would expect, the proposed

functional Bayesian network factorizes over the DAG and respects all directed Markov

properties (i.e., conditional independence constraints) encoded in the DAG via the notion

of d-separation. Then for ease of exposition, we reformulate the proposed Bayesian network

constructed in the functional space to an equivalent Bayesian network deﬁned on the space

of basis coeﬃcients via basis expansion. Because in practice, functional data are almost

always observed with noises, two essential ingredients are built in the proposed Bayesian

networks to capture the functional dependence and to learn the causal structure. First, we

capture the within-function dependence through a set of orthonormal basis functions chosen

in a data-driven way. The resulting basis functions are interpretable and computationally

eﬃcient. Second, we encode the unknown causal structure by a structural equation model

on the basis coeﬃcients. Due to the equivalence of probability measures on the functional

space and the space of basis coeﬃcients, the conditional independence and causal rela-

tionships naturally transform back to the original random functions. To allow for unique

DAG identiﬁcation, we move away from the Gaussian process assumption often adopted

by the existing functional graphical models and instead assume our random functions are

generated from a discrete scale mixture of Gaussian distributions. We theoretically prove

and empirically verify that the unique DAG identiﬁcation is indeed possible even when the

functions are observed with noises.

To conduct inference and uncertainty quantiﬁcation from a ﬁnite amount of data, the

proposed model is based on a Bayesian hierarchical formulation with carefully chosen

prior distributions. Posterior inference is carried out through Markov chain Monte Carlo

(MCMC). We perform simulation studies to demonstrate the capability of the proposed

model in recovering causal structure and key parameters of interest. A real data analy-

sis with brain EEG records illustrates the applicability of the proposed framework in real

world. We also apply the proposed model to a COVID-19 multivariate longitudinal dataset

(shown in Section D of the Supplementary Material).

The rest of the paper is structured as follows. We provide an overview of Bayesian

networks in Section 2. The proposed functional Bayesian network is introduced in Section

3, which includes elaborations of the functional linear non-Gaussian model (Section 3.2) and

the causal identiﬁability theory (Section 3.3). Section 4 is devoted to Bayesian inference of

the proposed model. We provide simulation studies and applications in Sections 5 and 6,

respectively. The main contributions of this paper are summarized in Section 7 with some

concluding remarks.

2 Overview of Bayesian Networks

Throughout the paper, vectors and matrices are boldfaced whereas scalars and sets are not.

DAGs and Bayesian Networks Let X= (X1, . . . , Xp)T∈ X1× · · · × Xpdenote a

p-dimensional random vector. Denote [m] := {1, . . . , m}for any integer m≥1. Let

XS= (Xj)j∈Sbe a subvector of Xwith S⊆[p]. A DAG G= (V, E) consists of a set

of nodes V= [p] and a set of directed edges represented by a binary adjacency matrix

E= (Ej`) where Ej` = 1 if and only if `→jfor `6=j∈V. DAGs do not allow directed

cycles j0→j1→ · · · → jk=j0. Each node j∈Vrepresents a random variable Xj∈ Xj;

we may use jand Xjinterchangeably when no ambiguity arises. Each directed edge `→j

and the lack thereof represent conditional dependence and independence of X`and Xj,

respectively. Note that although Xjis often a scalar but it does not need to be. In fact,

Xjis a random function or an inﬁnite dimensional random vector in this article. Denote

paG(j) = {`∈V:`→j}the set of parents of jin graph G. A Bayesian network (BN)

B= (G, P ) on Xis a probability model where the joint probability distribution Pof X

factorizes with respect to Gin the following manner,

P(X) =

j=1

Pj(Xj|XpaG(j)),(1)

where Pjis the conditional distribution of Xjgiven XpaG(j)under P. Let deG(j) = {`∈

V:j→ · · · → `}denote the descendants of jin Gand let ndG(j) = V\deG(j)\{j}denote

the non-descendants of j. The BN factorization (1) directly implies the local directed

Markov property – any variable is conditionally independent of its non-descendants given

its parents, Xj⊥XndG(j)/paG(j)|XpaG(j),∀j∈[p]. In fact, the reverse is also true: if a

distribution Prespects the local Markov property according to a DAG G, then Pmust

factorize over Gas in (1). In summary, BN factorization and local Markov property are

equivalent. We may omit the subscript Gof paG(j) and ndG(j) and simply write pa(j) and

nd(j) instead when Gis clear from the context.

Causal DAGs and Causal Bayesian Networks A causal DAG Gis a DAG except that

the directed edges are now interpreted causally, i.e., we say X`is a direct cause (with respect

to V) of Xjand Xjis a direct eﬀect of X`if `→j. For simplicity, we will overload nd(j) and

pa(j) to denote the noneﬀects and directed causes of jin a causal DAG. To deﬁne a causal

BN, we begin by asserting the local causal Markov assumption (Spirtes et al., 2000; Pearl,

2000) – given a causal DAG G, a variable is conditionally independent of its noneﬀects given

its direct causes. By noting the correspondence between noneﬀects and non-descendants,

and between direct causes and parents in DAGs and causal DAGs, the local causal Markov

assumption simply states that the distribution Pof Xrespects the local Markov property

of the causal DAG G, which in turn implies that Pmust also factorize over G(recall the

equivalence between BN factorization and local Markov property). Therefore, a causal BN

B= (G, P ) is a probability model where Pfactorizes with respect to a causal DAG Gin

the same way as in (1).

Structural Equation Representation of Bayesian Networks A BN is often repre-

sented by a structural equation model (SEM),

Xj=fj(X, j),∀j∈[p],

where the transformation fjdepends on Xonly through its parents/direct causes Xpa(j),

and the exogenous variables = (1, . . . , p)T∼Pare assumed to be mutually independent.

Denote the set of transformation functions as F={f1, . . . , fp}. Since Fand Pinduce the

joint distribution Pof Xand it is not diﬃcult to show that the induced distribution P

factorizes over G, with a slight abuse of notation, we can rewrite the BN as B= (G, F, P).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FunctionalBayesianNetworksforDiscoveringCausalityfromMultivariateFunctionalDataFangtingZhou1;2,KejunHe2;,KunboWang3,YanxunXu3,andYangNi1;1DepartmentofStatistics,TexasA&MUniversity,CollegeStation,Texas,U.S.A.2InstituteofStatisticsandBigData,RenminUniversityofChina,Beijing,China3DepartmentofAppliedM...

展开>> 收起<<

Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: