II. DATA AND CHALLENGES IN COMPLEX
MANUFACTURING DOMAINS
In this section we describe the data sources and propose a
preprocessing of the data. Then, we explain the broad prior
knowledge in manufacturing domains. Finally, we mention
common challenges with production data.
A. Data Sources along the Production Line
The assembly of products consists of production lines,
which again contain several stations, which are passed in a
fixed order and where process steps are carried out. During
those process steps the piece is transformed or it is combined
with other parts in order to achieve a predefined outcome.
All involved parts are assigned to unique identifiers. Data of
different types is collected along the production process:
•Process data: the stations take measurements of the
involved parts (e.g. thickness of the piece) and the pa-
rameters of the machine (e.g. weight of applied glue).
•End-of-Line (EoL) tests take additional quality measure-
ments of the intermediate or final products.
•Station information: at some production steps the pieces
are spread out to identical stations, such that parts can be
processed in parallel and every piece is assigned to one
of the stations.
•Bill of Material (BoM): the BoM contains the information
which pieces were merged together and on which position
they have been worked in.
•Supplier data: suppliers transmit data on provided goods.
The preprocessing of the data, which is depicted in Figure 1,
consists of the following steps:
1) Collect the data for every intermediate product.
2) Iteratively merge the data of all subcomponents of a final
product.
Measurements of identical subcomponents, which are placed
in the same position, can be found in the same column.
Eventually, the final tabular data set contains all measurements
that can be associated with a final product.
B. Prior Knowledge
As the stations are passed in a fixed order, we know that
CERs across different stations can only act forward in time.
Additonally, in many manufacturing organizations, tools as the
Failure Mode and Effect Analysis (FMEA) [11] are imple-
mented to extract expert knowledge on CERs in the production
process and to provide the information in a structured form.
C. Challenges of Data Analysis in Manufacturing
Often, similar information is recorded multiple times along
the production line, leading to multicollinearity [4]. Also,
sensors might deliver non-informative data by recording im-
plausible values. Industrial data is also reported to be drifting
over time. However, even in shorter time intervals, data of
a series production contains thousands of observations. This
distinguishes the manufacturing domain from other applica-
tions of causal discovery as medicine, genetics or the social
sciences.
III. STRUCTURE LEARNING OF GRAPHICAL MODELS
A. Some Preliminaries on Graphical Models
Let G= (V,E)be a directed acyclic graph (DAG) [12,
Chapter 6] with nodes V= (V1, . . . , Vp)and edges E. The
node Viis called a parent of Vjif the edge Vi→Vjis in
E. We denote the set of all parents of Vjas pa(Vj). A tuple
of nodes (Vj1, . . . , Vj`), such that Vjkis a parent of Vjk+1 for
all k= 1,...,(`−1), is called a directed path. Nodes that
can be reached from Xjthrough a directed path are called the
descendants of Xj.
In the following we denote random vectors with bold letters
as Zand random variables as Z. Let X= (X1, . . . , Xp)be a
random vector representing the data generating process. For a
graph Gwith nodes X1, . . . , Xp, we call (X, G)a Bayesian
network if the local Markov property holds, i.e.
Xi⊥Xj|pa(Xi)
for any Xjthat is not a descendant of Xiin G. Here, X⊥Y|Z
denotes the conditional independence of Xand Ygiven Z. In
that case, we can deduce additional conditional independencies
for Xfrom the graph Gusing the concept of d-separation [12].
For a Bayesian Network (X, G), it then holds that Xi⊥Xj|S
if Xiand Xjare d-separated by Sin G. On the other hand,
if there is a graph G, such that Xi⊥Xj|Simplies that Xi
and Xjare d-separated given Sin G, then Xis called faithful
with respect to G. As multiple graphs can contain the same
d-separations, this graph Gis in general not unique.
To promote the intuition, assume that Xhas a joint density f.
Then Xi⊥Xj|Scan be characterized by
f(xi|Xj=xj,S=s) = f(xi|S=s),
where f(xi|Z=z)denotes the conditional density function
of Xigiven Z=z. Thus, if we already know S, then Xjdoes
not provide additional information on Xi. Assume that we are
interested which variable in {Xj,XS}causes the variable Xi
to be out of the specification limits. Then we know, that the
root causes can be found within S.
B. Graph Learning with Structural Equation Models
While the PC algorithm is the classic approach for deriving
a Causal Bayesian Network, recent research focused on identi-
fying it using acyclic SEMs [10], [13]–[15]. They assume that
there exists a permutation Π0(1, . . . , p) = π0(1), . . . , π0(p)
and functions {f`, ` = 1, . . . , p}, such that
X`=f`(X`1, . . . , X`v, ε`), ` = 1, . . . , p, (1)
where π0(`k)< π0(`)for all k= 1, . . . , v and ε1, . . . , εpare
i.i.d. noise terms. As the estimation of f`in Equation (1) is
difficult in high dimensions, one typically restricts the function
class and the distribution of the noise terms. In this work, we
assume that the functions follow the additive form
f`(X`1, . . . , X`v, ε`) = c`+X
k:π0(k)<π0(`)
fk,`(Xk) + ε`,(2)
where ε`∼ N (0, σ`)and c`∈R. To ensure the uniqueness
of the fk,` and without loss of generality, we set E(X`)=0