
scoring suspect
ˆs= argmax
s
φ(s;t, β, G) (2)
being the most probable origin of the information. If it is true that ˆs=s∗then the localization is successful.
See Section 2.2.4 for further details on evaluating localization algorithms.
2.2.1 Maximum likelihood algorithm
The maximum likelihood algorithm has been proposed by Pinto et al in [8]. We will refer to it as LPTVA
(Limited Pinto-Thiran-Vitelli Algorithm), with limited signifying the fact that only infection times tiare used.
It is analytically derived for tree graphs and extended to general graphs in the way outlined below.
The unknown inception time t∗is accounted for by choosing a reference observer o1and constructing a
vector of infection delays t0with respect to t1
[t0]i=ti+1 −t1.(3)
If we assume that the graph in question is a tree Tand let P(vi, vj) denote the path between vertices vi,vj,
the expected value for the t0vector (given sis the source) can be written as
[µs]i=µ(|P(s, oi+1)|−|P(s, o1)|),(4)
where µis the mean time it takes for information to transmit through one edge. In the case of the SI model it
is equal to the mean of a geometric distribution µ=1
β. The covariance matrix Λ of t0is given by
[Λ]i,j =σ2|P(oi+1, o1)∩P(oj+1, o1)|,(5)
where σ2is the variance of the time it takes for information to transmit through one edge. In the case of SI
model it is equal to variance of a geometric distribution σ2=1−β
β2.
Using the central limit theorem, we can postulate that t0, as a sum of i.i.d. single-edge delays, follows a
multivariate normal distribution with mean µsand covariance matrix Λ. Taking a logarithm of this distribution
and stripping away s-independent normalization factors we arrive at the score function
φLPTVA(s) = −(t0−µs)TΛ−1(t0−µs)−ln |Λ|,(6)
which can be calculated for all nodes in the network.
If the graph in question is not a tree, the algorithm can be adjusted by calculating the score given by (6)
with respect to the breadth-first search (BFS) tree T(s)
BF S rooted at s. Such modification is equivalent to the
assumption that the information propagates along the shortest paths connecting the source with observers. This
holds exactly when β= 1 and contributes to significantly reduced algorithm performance for non-tree graphs
with βsignificantly lower than 1.
2.2.2 Gradient maximum likelihood algorithm
The LPTV algorithm, while well established and highly analytical, suffers from significant computation speed
issues. Matrix inversion in equation (6) must be performed for each suspect, rendering the method effectively
unusable for networks with a large number of nodes.
The gradient maximum likelihood algorithm (GMLA) [9] attempts to rectify this problem by prioritizing
the earliest observers and utilizing a gradient-like selection of suspects. As a consequence, unlike in the case of
LPTVA and Pearson correlation algorithm, the set of suspects is generally only a small subset of all nodes in
the network. All other nodes ev∈(V\S) are assigned the same score, effectively equal −∞. While the score
function used to rank suspects in GMLA is the same as (6), the choice of suspects is much more sophisticated.
Firstly, only Kfearliest-infected observers are used in the calculations. The performance of the algorithm
varies depending on the exact value of Kf, with the optimum usually estimated for each network model on
a case-by-case basis. In our calculations we have used Kf=√Nfor all considered networks, which is an
acceptable rule of thumb [9].
Secondly, the score function is not calculated for all Nvertices. Instead, it is first calculated for the neighbors
S(1) of the earliest observer o1. Next, the score is calculated for the previously unchecked neighbors of the highest
scoring ˆs(1) from S(1), creating another set of suspects S(2), from which the highest scoring ˆs(2) is chosen. This
process is repeated until there is no node in S(n)(neighbors of ˆs(n−1) not checked in previous steps) with score
higher than ˆs(n−1). In such case, the algorithm stops and returns scores calculated for all members of
S=
n
[
i=1
S(i).(7)
3