portation infrastructure, in the second, the spreading of information is constrained by existing social
connections, while in the third case, available hyperlinks constrain the possible clickstreams of users.
Such constraints can limit the number of parameters of the model that we seek to infer. They can
thus help us to address the reliable inference of transition matrices and improve the results of down-
stream analyses. Therefore, in this paper, we address the following research questions:
Q1 How can we use information on topological constraints to improve the inference of transition
matrices from incomplete data on repeated interactions captured in weighted graphs?
Q2 How is the inference of transition matrices influenced by a partial knowledge of the underlying
topological constraints, which is often the case in real-world settings?
Q3 To what extent can our proposed approach to include topological constraints in the inference
of transition matrices improve the performance of downstream network analysis tasks like node
ranking and community detection?
The remainder of this article is structured as follows: In Section 2, we formally define the inference
problem that we address in our work and we introduce two methods that are commonly used to
infer transition matrices without leveraging topological constraints, namely maximum likelihood
estimation and a “naive” Bayesian approach. Addressing Q1, in Section 3 we introduce BaCon,
a Bayesian approach that uses a topological prior to improve the inference of transition matrices
in incomplete data on repeated interactions in a graph. In Section 4, we introduce the datasets
and the experimental setup that we use to evaluate our method. We next compare BaCon to
the methods introduced in Section 2 (i.e. frequentist and naive Bayes approach) that do not use
information on topological constraints. In Section 5, we evaluate the extent to which the inclusion of
the topological constraints improves the network inference, and address Q2, observing the effects of
partial knowledge of the constraint. In Section 6, we address Q3 and explore whether the effects of
the network inference carry over to the network analysis results. In Section 7, we show how inference
of diverse examples of real-world networks can be further improved with an appropriate choice of
prior parameters. In Section 8 we discuss how our work complements existing network analytic
methods in the field. We finally summarize our conclusions and outline future work in Section 9.
The results of our study show that (i) the inclusion of topological constraints considerably im-
proves the inference of transition matrices in network data, and (ii) that this improved inference
translates to an increased accuracy for downstream network analysis tasks. Our work highlights the
importance of treating the construction of network models from (partially observed) interactions as
an inference problem that can be addressed using Bayesian statistics. Our results further suggest
that constraints for the topology of interactions in complex systems should be given more attention
as their inclusion can significantly improve modelling accuracy, especially in the limit of small data.
Given the importance of network inference in partially observed or noisy data we expect our work to
be of interest for a broad interdisciplinary community. An implementation of our method is available
as an Open Source project [Zenodo].
2 Background
This section introduces the mathematical formalism for the inference of transition matrices from
observations on repeated interactions and outlines two existing strategies for addressing it. We
assume that we observe a multiset of pairwise interactions (i, j) between nodes i, j ∈V, we denote
it with Eand indicate the number of times the interaction (i, j) occurred with nij . In addition, we
are given a directed graph G= (V, E) that represents the topological constraints, i.e., which edges
can be observed and which are impossibile. Given these sources of data, our goal is to infer the
transition matrix Tthat best reflects the actual transition probabilities in the system.
3