2
quantify how well a network approximates the ideal model of
a core-periphery structured graph are defined, where the ideal
model consists of a core-core block that is fully connected,
a periphery-periphery block with no connections, and a core-
periphery part that is either fully connected or not connected
at all. These measures are then used to develop algorithms
to estimate the core scores. In [8], the core score vector is
learnt by maximizing the correlation measure from [7] with
a constraint that the core score vector is a shuffled version
of an N-dimensional vector whose entries are fixed according
to a number of desired factors that effect the spread of the
core scores, such as the number of nodes in the core region
and the change in the core score from the core to peripheral
nodes. In [9], the edge weight of a node pair is modeled as
the product of its cores scores, and the core score vector is
obtained by minimizing the sum of the squared differences
between the known edge weights and the product of the core
scores of the related node pair, summed over all node pairs.
In [10], a recursive algorithm, called the k-core decomposition,
is proposed to iteratively partition a graph into nodes having
sparse and dense connections. In [11], an iterative algorithm
derived from a standard random walk model is proposed,
where the so-called persistence probability that denotes the
probability that a random walker starting from a certain node
in a sub-network of a network remains in that sub-network
in the next step is defined. Then starting from a sub-network
containing a single node with the weakest connectivity, nodes
are iteratively added to the sub-network such that the increase
in persistence probability is minimal. The above mentioned
existing works, which we refer as, Rombach [8], MINRES [9],
k-cores [10], and Random-Walk [11] serve as the baseline
for the proposed methods.
When only node attributes are available and the underlying
graph structure is unknown, we first need to learn the under-
lying graph. We may then infer the core scores using existing
methods. One of the standard approaches to infer graphs
from node attributes is graphical lasso, which solves an `1-
regularized Gaussian maximum log-likelihood problem [13]
to learn a sparse connectivity pattern underlying a Gaussian
graphical model. However, graphical lasso imposes a uniform
`1penalty on each edge and does not account for the core-
periphery structure in networks.
To summarize, when we have access to only node attributes
and the underlying graph is unavailable, the existing methods
cannot be used to estimate core scores. Furthermore, the exist-
ing works do not incorporate node attribute information while
inferring core scores. In this work, we propose generative
models and learning algorithms to address these limitations.
B. Main Results and Contributions
The main results and contributions of the paper are summa-
rized as follows.
•Generative Models: We propose probabilistic generative
models that relate core-periphery structured graphs to
their core scores and attributes of nodes to their re-
spective core scores through an affine model. We re-
fer to these models as Graph-Attributes-Affine
(GA-Affine). In particular, we propose two models,
namely, GA-Affine-Bool and GA-Affine-Real,
to account for binary and real-valued node attributes,
respectively. Next, we propose nonlinear generative
models that, though not as simple as the affine
models, incorporate information about spatial dis-
tances and capture any dependencies between node at-
tributes. Within the nonlinear models, we propose two
models, namely, Graph-Attributes-Nonlinear
(GA-Nonlinear) and Attributes-Only (AO)
to address two different cases, namely, both node at-
tributes and graph structure being available and only
the node attributes being available, respectively. Similar
to the GA-Affine model, the GA-Nonlinear model
assumes the graph is known. However, contrary to the
GA-Affine model, it establishes a nonlinear relation-
ship between the core scores and the node attributes.
On the other hand, the nonlinear AO model assumes
that the underlying graph is unknown and relates node
attributes to core scores through a latent graph structure.
In [14], which is a conference precursor of this work,
we proposed the AO model, which we extend to gener-
ative models that relate core scores to graphs and node
attributes in the paper.
•Algorithms: We infer core scores using both the graph
structure and the complementary information contained in
the node attributes whenever both are available by fitting
one of the proposed models of Graph-Attributes
to the observed data. The problems formulated using
models GA-Affine-Bool and GA-Affine-Real
are non-convex in the core scores and model parame-
ters, which we solve by alternating minimization. With
GA-Nonlinear, the inference problem is a convex
problem in the core scores for which we propose a pro-
jected gradient ascent based solver. Next, when only node
attributes are available, we fit data to the AO model to
simultaneously learn the unknown latent graph structure
and the core scores. The inference problem takes the form
of the graphical lasso problem. However, in contrast to
the standard graphical lasso, the `1regularization is not
uniformly applied on all the edges but is weighed accord-
ing to the latent core scores. We provide an alternating
minimization based solver to jointly learn the underlying
graph and the core scores.
We test the proposed algorithms on several synthetic and
real-world datasets. For synthetic datasets, we observe that
the core scores estimated by the proposed algorithms are the
closest to the groundtruth core scores, in terms of cosine sim-
ilarity, compared to those estimated by the existing methods
that consider only the graph structure. We also perform various
numerical experiments to show the ability of the proposed
algorithms to correctly identify the core parts of various real-
world datasets such as citation, organizational, social, and
transportation networks. Further, when the graph structure is
unavailable, we show that our method learns the core scores
along with the core-periphery structured graphs outperforming
graphical lasso. We also use as inputs the core scores and