Bayesian Inference of Transition Matrices from Incomplete Graph Data with a Topological Prior Vincenzo Perri1 Luka V Petrovi c1 Ingo Scholtes21

2025-05-06 0 0 1.99MB 22 页 10玖币

侵权投诉

Bayesian Inference of Transition Matrices from Incomplete

Graph Data with a Topological Prior

Vincenzo Perri 1?, Luka V Petrovi´c 1?, Ingo Scholtes2,1

1 Data Analytics Group, Department of Informatics, University of Zurich, Switzerland

2 Chair of Machine Learning for Complex Networks, Center for Artiﬁcial Intelligence and Data Science,

Julius-Maximilians-Universit¨at W¨urzburg, Germany

?equal contribution

October 28, 2022

Abstract

Many network analysis and graph learning techniques are based on discrete- or continuous-

time models of random walks. To apply these methods, it is necessary to infer transition

matrices that formalize the underlying stochastic process in an observed graph. For weighted

graphs, where weighted edges capture observations of repeated interactions between nodes, it

is common to estimate the entries of such transition matrices based on the (relative) weights of

edges. However in real-world settings we are often confronted with incomplete data, which turns

the construction of the transition matrix based on a weighted graph into an inference problem.

Moreover, we often have access to additional information, which capture topological constraints

of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are

not. Examples include transportation networks, where we may have access to a small sample of

passenger trajectories as well as the physical topology of connections, or a limited set of observed

social interactions with additional information on the underlying social structure. Combining

these two diﬀerent sources of information to reliably infer transition matrices from incomplete

data on repeated interactions is an important open challenge, with severe implications for the

reliability of downstream network analysis tasks.

Addressing this issue, we show that including knowledge on such topological constraints can

considerably improve the inference of transition matrices, especially in situations where we only

have a small number of observed interactions. To this end, we derive an analytically tractable

Bayesian method that uses repeated interactions and a topological prior to perform data-eﬃcient

inference of transition matrices. We compare our approach against commonly used frequentist

and Bayesian approaches both in synthetic data and in ﬁve real-world datasets, and we ﬁnd

that our method recovers the transition probabilities with higher accuracy. Furthermore, we

demonstrate that the method is robust even in cases when the knowledge of the topological con-

straint is partial. Lastly, we show that this higher accuracy improves the results for downstream

network analysis tasks like cluster detection and node ranking, which highlights the practical

relevance of our method for interdisciplinary data-driven analyses of networked systems.

1 Introduction

Graph models of relational data have become a cornerstone in the analysis of complex systems [Boc-

caletti et al., 2006] and an important foundation for the application of machine learning to graph-

structured data from social, technical, and biological systems [Bronstein et al., 2017]. Many network

arXiv:2210.15410v1 [stat.ME] 27 Oct 2022

Available data:

Important nodes,

Relevant clusters,

...

Transition

matrix

Downstream

Analysis

Inference

Constraints

Observed

interactions

Figure 1: Steps in the analysis of network data. The ﬁrst step is the inference of the transition matrix

from the network data. In the second step, the transition matrix is used to detect the important

nodes, relevant clusters and other interesting features of the system. The goals of our work are

to extend the network inference to use the topological constraints, to investigate the robustness of

the new method to partial knowledge of the constraints, and to investigate whether this improves

downstream analysis tasks.

analysis and graph learning techniques are based on discrete- or continuous-time models of ran-

dom walks [Masuda et al., 2017] such as, e.g., community detection algorithms like InfoMap [Ros-

vall and Bergstrom, 2008] or WalkTrap [Pons and Latapy, 2006], node ranking techniques like

PageRank [Page et al., 1999], neural graph embeddings like DeepWalk [Perozzi et al., 2014] or

node2vec [Grover and Leskovec, 2016], walk-based similarity scores that are the basis for link pre-

diction [Liben-Nowell and Kleinberg, 2007], or heat kernels for graphs used for community detection

[Kloster and Gleich, 2014] and node ranking [Chung, 2007]. To apply these methods, it is necessary

to obtain a transition matrix that formalizes the underlying stochastic process in the observed graph.

This is trivial when we have full information on repeated interactions in the graph, which enables us

to estimate transition probabilities between nodes based on relative frequencies of observed interac-

tions. However in real-world settings we are often confronted with incomplete data, which turns the

construction of the transition matrix into an inference problem that we need to address to obtain

reliable results.

In this work, we consider situations where we have access to a possibly incomplete set of observed

repeated interactions between a set of nodes. Such data can be represented as weighted graph, where

the weight of an edge corresponds to the number of observed interactions between a given node pair.

For such weighted graphs, it is common to deﬁne transition probabilities proportional to the edge

weights. From a statistical inference point of view, this method to infer the transition matrix based

on observed interactions corresponds to a frequentist approach that uses a maximum likelihood

estimation. When few observations are available, this simple approach suﬀers from overﬁtting:

On the one hand, unobserved interactions translate to zero transition probabilities even though

transitions may actually be possible in the underlying graph. On the other hand, those interactions

that were observed are likely to translate to overestimated transition probabilities. This generates a

large variance in inferred transition probabilities that can severely distort the results of downstream

network analysis and graph learning tasks.

However, repeated interactions are often not the only information that we have about the net-

worked systems that we study. We often have additional knowledge about topological constraints

that determine which of the interactions are theoretically possible and which others are not. For ex-

ample, consider a transportation system, where observed interactions represent passengers travelling

between connected stations, or a social network, where interactions represent messages transferred

between users, or a Web graph, where interactions represent users clicking on a hyperlink between

two Web pages. In the ﬁrst case, the movement of passengers is constrained by the available trans-

portation infrastructure, in the second, the spreading of information is constrained by existing social

connections, while in the third case, available hyperlinks constrain the possible clickstreams of users.

Such constraints can limit the number of parameters of the model that we seek to infer. They can

thus help us to address the reliable inference of transition matrices and improve the results of down-

stream analyses. Therefore, in this paper, we address the following research questions:

Q1 How can we use information on topological constraints to improve the inference of transition

matrices from incomplete data on repeated interactions captured in weighted graphs?

Q2 How is the inference of transition matrices inﬂuenced by a partial knowledge of the underlying

topological constraints, which is often the case in real-world settings?

Q3 To what extent can our proposed approach to include topological constraints in the inference

of transition matrices improve the performance of downstream network analysis tasks like node

ranking and community detection?

The remainder of this article is structured as follows: In Section 2, we formally deﬁne the inference

problem that we address in our work and we introduce two methods that are commonly used to

infer transition matrices without leveraging topological constraints, namely maximum likelihood

estimation and a “naive” Bayesian approach. Addressing Q1, in Section 3 we introduce BaCon,

a Bayesian approach that uses a topological prior to improve the inference of transition matrices

in incomplete data on repeated interactions in a graph. In Section 4, we introduce the datasets

and the experimental setup that we use to evaluate our method. We next compare BaCon to

the methods introduced in Section 2 (i.e. frequentist and naive Bayes approach) that do not use

information on topological constraints. In Section 5, we evaluate the extent to which the inclusion of

the topological constraints improves the network inference, and address Q2, observing the eﬀects of

partial knowledge of the constraint. In Section 6, we address Q3 and explore whether the eﬀects of

the network inference carry over to the network analysis results. In Section 7, we show how inference

of diverse examples of real-world networks can be further improved with an appropriate choice of

prior parameters. In Section 8 we discuss how our work complements existing network analytic

methods in the ﬁeld. We ﬁnally summarize our conclusions and outline future work in Section 9.

The results of our study show that (i) the inclusion of topological constraints considerably im-

proves the inference of transition matrices in network data, and (ii) that this improved inference

translates to an increased accuracy for downstream network analysis tasks. Our work highlights the

importance of treating the construction of network models from (partially observed) interactions as

an inference problem that can be addressed using Bayesian statistics. Our results further suggest

that constraints for the topology of interactions in complex systems should be given more attention

as their inclusion can signiﬁcantly improve modelling accuracy, especially in the limit of small data.

Given the importance of network inference in partially observed or noisy data we expect our work to

be of interest for a broad interdisciplinary community. An implementation of our method is available

as an Open Source project [Zenodo].

2 Background

This section introduces the mathematical formalism for the inference of transition matrices from

observations on repeated interactions and outlines two existing strategies for addressing it. We

assume that we observe a multiset of pairwise interactions (i, j) between nodes i, j ∈V, we denote

it with Eand indicate the number of times the interaction (i, j) occurred with nij . In addition, we

are given a directed graph G= (V, E) that represents the topological constraints, i.e., which edges

can be observed and which are impossibile. Given these sources of data, our goal is to infer the

transition matrix Tthat best reﬂects the actual transition probabilities in the system.

Figure 2: Illustration of methods to infer transition matrices from repeated interactions on a

downstream task of clustering. The ﬁrst panel shows the ground truth clusters encoded in the

transition probabilities of a geometric random network with Euclidean metric in which we observe

small set of interactions. We use three diﬀerent inference methods to construct a transition matrix

and detect communities using InfoMap. The other panels show the clusters detected using the

transition matrix inferred with a frequentist, naive Bayesian, and our approach - BaCon. BaCon

ﬁnds clusters that are closer to the ones detected from the ground truth matrix.

The simplest option is to assign the transition probabilities in a frequentist way, i.e. proportion-

ally to the observation counts nij and thus πij =nij /Pk∈Vnik. This corresponds to a maximum

likelihood estimation (MLE) of a multinomial distribution p(j|i). The advantages of this approach

are its simplicity and good performance when we have a data set that is suﬃciently large considering

the space of possible interactions. In fact, this method is so common that it has become a standard

“preprocessing” step, which is rarely even mentioned as a method to “infer” a graph model from

observational data. However, this simple method can require a large number of observations to

assign non-zero probabilities to all edges that are possible based on the underlying topology of the

system. In a nutshell, when few observations are available, it is diﬃcult to determine whether an

edge with probability zero is not possible or whether it has not been observed yet. From a machine

learning point of view, the erroneous interpretation of edges with zero probability as evidence of the

absence of the associated edge corresponds to an overﬁtting of the weighted graph model.

An alternative to a frequentist approach is to use a Bayesian one. The naive Bayesian approach

addresses the issue of overﬁtting by recording the distribution of parameters πij for a given data

set. For every node i, we organize the parameters πij in vectors ~πi= (πij )j∈V. A priori, we

would assign, e.g., a uniform prior over the space of transition probabilities: p(( ~πi)i∈V) = const. =

Qi∈VDir(πi|~αi=~

1N),where Dir denotes the Dirichlet distribution, ~αi’s are its concentration param-

eters, and ~

1Nis the N-dimensional vector with all components equal to 1. This choice of parameters

corresponds to the uniform distribution. The naive Bayesian approach further uses Bayes’ rule to up-

date the prior distribution of transition probabilities: p((~πi)i∈V|E) = p(E|(~πi)i∈V)p((~πi)i∈V)/p(E).

An advantage of this method is that unobserved edges are still modeled with non-zero probabili-

ties. However, this also introduces problems: First, since all transitions are modelled with non-zero

probabilities, we cannot directly use sparse matrices, which complicates applications to large net-

works. Second, and more importantly, in the typical case of networks with sparse topologies, a

large amount of data is required to overcome the uniform prior on a fully connected graph. From a

machine learning point of view, this corresponds to an underﬁtting of the weighted graph model.

In the ﬁrst three panels of Fig. 2, we use a toy example to illustrate the issues of frequentist

and naive Bayesian network inferences. In the toy example, we consider a ground-truth transition

matrix constrained to a random geometric graph topology (ﬁrst panel). The network consists of

four clusters expressed in higher transition probabilities between nodes within the same cluster. We

draw a small random sample of repeated interactions from the distribution of edge probabilities in

the ground truth network. We apply the frequentist (second panel) and the naive Bayesian method

(third panel) to infer a transition matrix. We further visualize the result of the popular community

detection technique InfoMap on the resulting transition matrix. The frequentist approach detects

many spurious communities because of its overﬁtting issue. The Bayesian approach detects a single

community because the observed data is insuﬃcient for overcoming the prior.

Neither the frequentist approach, nor the naive Bayesian, use the information on the constraints

G= (V, E). Addressing this issue, in the next section, we formally introduce a Bayesian method that

leverages topological constraints (BaCon) that are often known in real-world networked systems. In

the fourth panel of Fig. 2, we show a representative example that illustrates how inclusion of such

constraints in the network inference improves the detection of clusters.

3 BaCon: Bayesian Constrained Network Inference

In this section, we show how the knowledge of topological constraints can be used as a prior to infer

transition probabilities from observations of interactions (Q1). We formally introduce a Bayesian

method (BaCon) that leverages topological constraints that are often available for real-world net-

worked systems.

We assume that we are given a set of nodes Vof size N, and a set of possible edges E⊂V×V

that represent a topological constraint for our inference task. I.e. we assume that an interaction

between node iand node jcannot occur if (i, j)/∈E. For each node i, we denote its possible

successors as S(i). We also assume that we are given a data set Eof interactions that were observed

in the system. Our goal is to infer the best directed weighted graph model from this data set. The

transition probabilities p(j|i) = πij from ito its successors j∈S(v) satisfy:

j∈S(i)

πij = 1,∀i, j : 0 < πij <1 (1)

We are interested in the probability density of parameters πij . We again organize them in vectors ~πi,

similarly to the naive Bayesian approach introduced in the previous section, but this time only over

the possible successors: ~πi= (πij )j∈S(i). We capture the information about impossible transitions

using a Dirac delta function as prior for the πij . Since those transitions can not occur in the data,

their probability distribution will not change. For the possible transitions from a node i, we assume

a uniform prior over the parameter space:

p(~πi|G) = Dir ~πi



~αi=α~

1|S(i)|(2)

where, for the uniform distribution, we make a choice of α= 1. The choice of α < 1 means that the

prior distribution of the probabilities is more skewed, and the choice of α > 1 means that the prior

distribution of the probabilities is more peaked around the value 1/|S(i)|. In the following we denote

all transition probabilities as π:= (~πi)i∈V. The likelihood of observing data E, which contains nij

observations of an edge (i, j), is given by the multinomial distribution:

p(E|π, α, G) = ZY

j∈S(i)

πnij

ij (3)

where Zdenotes the number of permutations of observations. We use Bayes’ rule to update our a

priori distribution after observing data E:

p(π|E, α, G) = p(E|π, α, G)×p(π|α, G)

p(E|α, G)(4)

As the multinomial and the Dirichlet distributions are conjugate distributions, the posterior is also

a Dirichlet distribution with parameters ~αi=~α0

i+~ni,where ~α0

idenotes the a priori concentration

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BayesianInferenceofTransitionMatricesfromIncompleteGraphDatawithaTopologicalPriorVincenzoPerri1?,LukaVPetrovic1?,IngoScholtes2;11DataAnalyticsGroup,DepartmentofInformatics,UniversityofZurich,Switzerland2ChairofMachineLearningforComplexNetworks,CenterforArticialIntelligenceandDataScience,Julius-Max...

展开>> 收起<<

Bayesian Inference of Transition Matrices from Incomplete Graph Data with a Topological Prior Vincenzo Perri1 Luka V Petrovi c1 Ingo Scholtes21.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bayesian Inference of Transition Matrices from Incomplete Graph Data with a Topological Prior Vincenzo Perri1 Luka V Petrovi c1 Ingo Scholtes21

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: