3
being a measurable space to set interventions, E= Πj∈J Ej, or just EJfor short, is the prod-
uct space of the exogenous variables with Ejbeing a measurable space, PEis a probability
measure on Efor the exogenous unstructural disturbance noise, and f:X × E → X is a mea-
surable function that specifies the causal mechanism encoded in structural equations [12].
This modeling allows us to represent interventions in an unambiguous way by changing the
causal mechanisms that target specific endogenous variables as well as encode the structural
properties of the functional relations in a graph with index sets. A pair of random variables
(XI, EJ)is a solution of the SCM M=hI,J,X,E, f, PEiif the perturbation is equal to
PEJ(i.e., PEJ=PEJ) and the structural equations are satisfied (i.e., XI=f(XI, EJ)a.s.).
The endogenous XIis observable, while the exogenous random variables EJare latent. For
a solution XI, we call the distribution PXIthe observational distribution of Massociated to
XI. SCMs arise in genetics [66], econometrics [28], electrical engineering [39,40] and the
social sciences [18,26]. SCMs are widely used for causal modeling [11,49,51,57] and the
corresponding statistical methods are developed for causal inference [13,34,43,44,52].
The types of research questions in these areas continues to increase in their complexity
with the advance of technology. As the higher order co-occurrences, e.g. co-occurrence in
triples, quadruples, etc, are observed, the intrinsic data sparseness problem of co-occurrence
data becomes more urgent than that of co-occurrence in pairs [31]. The complexity of encod-
ing the model and describing the raw data conditioned on that model is encoded in a class
of index sets. The class of index sets includes parents of variables and sufficient set or ad-
missible set for adjustment in the causal inference context [49]. The class of index sets plays
a critical role in the identifying assumptions underlying all causal inferences, the languages
used in formulating the assumptions, the conditional nature of all causal and counterfactual
claims, and the methods developed for the assessment of such claims [50]. Index sets are also
critical in capturing the significant variations important to the process being modeled and
understanding what is measured and perceived [24]. A physical, biological or social process
cannot be modeled and identified successfully to answer causal questions unless data are
available at appropriate indices and their structure. Many researchers have shown that dif-
ferent scales of index set—hence aggregations— often lead to contradictory interpretations,
such paradoxes being referred to as “ecological fallacies" [55,32] or known as the modi-
fiable areal unit problem (MAUP) in spatial analysis and geographical information science
[48,3,59]. The measure-theoretic treatment of the sparseness problem provides insights into
the problem of co-occurrence data in a unified foundation. In the culture of data science, we
pursue the fundamental interpretation and understanding of scientific problems arising from
co-occurrence events.
Let (Ω,F, P )be a given probability space. The space might in practice be geographic
space, or socio-economic space, or more generally network space as abstraction of reality.
We usually call an element A∈Fan event and P(A)is the probability of occurrence of
event Aor abbreviated as the probability of event A. In many classical books on probability
or measure (see, e.g., [8], [19], [10], [67], [15], [4]), the definitions of conditional probability
and conditional expectation are well given. For example, given A, B ∈Fwith P(B)>0,
the conditional probability of event Agiven event Bis P(A∩B)/P (B). Note that, since
A∩B∈F,A∩Bis also an event that implies events Aand Boccur simultaneously.
P(A∩B)is thus naturally called the probability of the event that events Aand Boccur
simultaneously. We will extend P(A∩B)as the probability of co-occurrence of Aand Bin
Section 2to accommodate the complex problems of co-occurrence emerging in modern data
science and proceed to the fundamental measure-theoretic treatment.
We deal with lots of σ-fields in modern applications, not just the one σ-field which is the
concern of measure theory. Consider a random object X(ω)on a given probability space
(Ω,F, P ). There exists an objective measurable space (Ω1,F1)such that X: Ω →Ω1is