
2 THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY
In this model, there are two clusters of approximate sizes
nρ
and
n
(1
−ρ
), and the edges inside
each of the clusters appear independently with probabilities
p1, p2
respectively, while edges between
the two clusters appear with probability
q
. Moreover, each edge status is revealed with probability
tlog n/n
for some fixed
t >
0. Thus the statuses of most edges are unknown. The censored model
was introduced to model the fact that in many social networks, not all of the connections between
individual nodes are known.
Given an instance of a censored graph with no vertex labels, the problem is to recover the partitions
exactly with high probability. This is often referred to as the exact recovery problem. We note that
some applications of spectral algorithms to the exact recovery problem use an additional combinatorial
clean-up stage (see e.g. [
7
,
17
,
18
]), but we follow [
1
,
8
,
9
] in studying spectral algorithms that do
not look at the graph after the top eigenvectors have been found. This is partially motivated by the
fact that most real applications of spectral algorithms do not include a combinatorial clean-up stage.
The classical case in the literature considers exact recovery in the Stochastic Block Model where
there is no censoring and
p1, p2, q
= Θ(
log n/n
). In order to achieve exact recovery up to the
information theoretic boundary, prior works used some trimming and post-processing steps together
with the spectral algorithm [
7
,
17
,
18
]. However, the question of whether a direct spectral algorithm
based on the top two eigenvectors of the of the adjacency matrix would be optimal remained open
until the recent resolution by Abbe, Fan, Wang, and Zhong [
1
] for
p1
=
p2
. In the censored SBM,
there are three possible observations (present, absent, or censored), so spectral recovery using a
binary-valued adjacency matrix is suboptimal. Instead, one can use a ternary-valued encoding
matrix. It was recently shown in [
8
,
9
] that, for some special cases of the planted partition model
such as the planted dense subgraph problem (
p2
=
q
)and the symmetric stochastic block model
(
p1
=
p2, ρ
= 1
/
2), a spectral algorithm based on the top two eigenvectors of a signed adjacency
matrix is optimal. This raises the question:
Are spectral algorithms based on the top eigenvectors of a signed adjacency matrix
optimal for all censored stochastic block models?
The main contributions of this article are as follows:
(1)
In contrast with the success stories in [
8
,
9
], whenever
p1, p2, q
are distinct, a spectral
algorithm based on the top two eigenvectors of a signed adjacency matrix is always suboptimal
(Theorem 1.7 Part (2)).
(2)
We propose spectral algorithms with two encoding matrices, where we take an appropriate
linear combination of the corresponding top eigenvectors. We show that these algorithms are
always optimal (Theorem 1.10). The optimality of spectral algorithms with two matrices is
also shown in the more general setting with k≥2communities (Theorem 1.12).
Thus, these results exhibit a strict separation between spectral algorithm classes with one versus
multiple encoding matrices, and this separation can be realized for even elementary planted partition
models. To our knowledge, this general phenomenon was not observed in the substantial prior
literature for recovery problems in the planted partition problems.
1.1. Model and Objective. We start by defining the Censored Stochastic Block Model.
Definition 1.1 (Censored Stochastic Block Model (CSBM)).Let
ρ∈
(0
,
1)
k
be such that
Pk
i=1 ρi
= 1
and let
P∈
(0
,
1)
k×k
be a symmetric matrix. Suppose we have
n
vertices and each vertex
v∈
[
n
]
is assigned a community assignment
σ0
(
v
)
∈
[
k
]according to the distribution
ρ
independently, i.e.,
P(σ0(v) = i) = ρifor i∈[k].
▷
For
u, v ∈
[
n
]and
u̸
=
v
, the edge
{u, v}
exists independently with probability
Pσ0(u)σ0(v)
.
Self-loops do not occur.
▷
For every pair of vertices
{u, v}
, its connectivity status is revealed independently with
probability tlog n
n, and is censored otherwise for some fixed t > 0.