3
average fraction of neighbors outside the community of a
node. Low values of µindicate well separated and pro-
nounced communities; the larger µthe less strong the
community structure is.
B. Independent cascade model
In this work, we focus our attention on the Independent
Cascade Model (ICM) which is one of the most stud-
ied spreading models in the context of influence maxi-
mization (IM) [8]. The ICM is a discrete-time conta-
gion model, similar in spirit to the Susceptible-Infected-
Recovered model [24]. In the initial configuration, all
nodes are in the susceptible state, except for the nodes
in the set of spreaders that are in the infected state. At a
given time step, each infected node first attempts to in-
fect its susceptible neighbors with probability p, and then
recovers. Recovered nodes do no longer participate in the
dynamics. The dynamics proceeds by repeating the pre-
viously described iteration over the newly infected nodes.
The spreading process stops once there are no infected
nodes remaining in the network. The influence of the
set of spreaders is quantified as the size of the outbreak,
i.e., the number of nodes that are found in the recovered
state at the end of the dynamics. Clearly, this number
may differ from realization to realization of the model due
to the stochastic nature of the spreading events. The IM
problem consists in finding the set of spreaders leading
to the largest average value of the outbreak size [8]. The
optimization is constrained by the number of nodes that
can compose the set of spreaders. The typical setting in
practical applications consists in finding a small set of
spreaders in a very large network.
As a function of the spreading probability p, the ICM
displays a transition from a non-endemic regime, where
the size of the outbreak is small compared to the network
size, to an endemic regime, where the outbreak involves a
large portion of the nodes in the network. The IM prob-
lem is particularly challenging and interesting around the
point where such a change of regime occurs. We define
it as the pseudo-critical value p∗of the ordinary bond-
percolation model on the network. Specifically, p∗rep-
resents the threshold between the non-endemic and en-
demic regimes for the ICM started from one randomly
chosen seed; this fact follows from the exact mapping of
critical SIR-like spreading to bond percolation on net-
works [25]. We stress that each network is characterized
by a different p∗value; the numerical estimation of a net-
work’s p∗is performed using the Newman-Ziff algorithm
[26, 27].
C. The divide-and-conquer algorithm
The input of our algorithm is an unweighted and undi-
rected network G= (V, E), with set of nodes Vand set of
edges E. We denote the size of the network as N=|V|.
The algorithm requires also to choose the number kof
desired influential spreaders, and the number Sof sec-
tors used to divide the network. The divide-and-conquer
(DC) algorithm consists of two main components (see
Figure I). First, we divide the network into Ssectors, or
vertex subsets, V1, V2, ..., VS. We have V=SS
i=1 Viand
Vi∩Vj=∅for all i6=j. Second, we form the set of
kinfluential spreaders by adding one node at a time to
the set. Starting from an empty set, at each of the k
iterations, we first select a random sector, and then pick
the most influential node in the sector that is not already
included in the set of spreaders. One can use any suit-
able methodology to divide the network and any suitable
centrality metric to select influential spreaders from the
sectors. Clearly, for S= 1 no actual division of the net-
work into sectors is performed. In this case, the selection
of influential spreaders is made relying on the centrality
metric scores only, thus according to the standard pro-
cedure used in the literature [6]. For S=N, seed nodes
are randomly selected.
We note that the above procedure is conceptually iden-
tical to the one introduced by Chen et al. [20]. However,
there are a few important practical differences. First,
Chen et al. consider high-dimensional node2vec embed-
dings only [21]. node2vec requires a non-trivial calibra-
tion of several hyperparameters that is known to be es-
sential for task performance, but adds significant com-
putational burden to the procedure [28]. Also, the high-
dimensionality of the node2vec embedding space makes
the identification procedure of the sectors non trivial. Fi-
nally, Chen et al. impose S=k, with one seed selected
per sector. This fact implies that increasing the seed set
from kto k+ 1 requires redefining the sectors afresh, an
operation that requires a time that grows at least lin-
early with the network size N. Since in IM problems
one typically uses a number of spreaders proportional to
the size of the system [6], the resulting complexity of the
algorithm is at least quadratic.
1. Dividing the network
We consider three possible methods of dividing a net-
work into sectors: (i) graph partitioning, (ii) graph hy-
perbolic embedding, and (iii) community structure. Be-
low, we briefly summarize each of these methods.
Graph partitioning consists in splitting a graph
into an arbitrarily chosen number of sectors of roughly
equal size, such that the total number of edges lying be-
tween the corresponding subgraphs is minimized [29, 30].
To perform graph partitioning, we take advantage of
METIS [30], i.e., the algorithm that implements the
multilevel partitioning technique introduced in Refs. [31]
and [32]. The computational time of METIS grows as
S N [30].
Graph hyperbolic embedding is another represen-
tation that allows to divide a network into sectors. Here,
sectors are given by groups of close-by nodes in vector