THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY SOUVIK DHARA JULIA GAUDIO ELCHANAN MOSSEL COLIN SANDON

2025-04-26 0 0 896.93KB 39 页 10玖币

侵权投诉

THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR

COMMUNITY RECOVERY

SOUVIK DHARA⋆, JULIA GAUDIO†, ELCHANAN MOSSEL‡, COLIN SANDON|

Abstract.

Spectral algorithms are some of the main tools in optimization and inference problems

on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the

matrix are then used to solve the given graph problem. Spectral algorithms have been successfully

used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study

the power of spectral algorithms using two matrices in a graph partitioning problem. We use two

diﬀerent matrices resulting from two diﬀerent encodings of the same graph and then combine the

spectral information coming from these two matrices.

We analyze a two-matrix spectral algorithm for the problem of identifying latent community

structure in large random graphs. In particular, we consider the problem of recovering community

assignments exactly in the censored stochastic block model, where each edge status is revealed

independently with some probability. We show that spectral algorithms based on two matrices are

optimal and succeed in recovering communities up to the information theoretic threshold. Further,

we show that for most choices of the parameters, any spectral algorithm based on one matrix is

suboptimal. The latter observation is in contrast to our prior works (2022a, 2022b) which showed

that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral

algorithm based on one matrix achieves the information theoretic threshold. We additionally provide

more general geometric conditions for the (sub)-optimality of spectral algorithms.

1. Introduction

Spectral algorithms are some of the main tools in graph algorithms and combinatorial optimization.

Some famous and classical examples include spectral algorithms for the hidden clique problem [

graph bisection [

], and graph coloring [

]. These algorithms encode the graph into a matrix by

recording the status of each present/absent edge of the graph as an entry of the matrix. The most

natural encoding is the adjacency matrix representation, where edges are encoded by the value 1and

non-edges are encoded by the value 0. Given the encoding matrix, a small number of eigenvectors

for this matrix are used to solve the given graph problem.

Our interest in this work lies in graph problems for which using multiple matrix

representations gives an advantage over using a single matrix.

In particular, we are interested in the power of spectral algorithms in such a scenario in the context

of ﬁnding clusters in a planted partition model called the Censored Stochastic Block Model (CSBM).

(⋆)School of Industrial Engineering, Purdue University

(‡)Department of Industrial Engineering and Management Sciences, Northwestern University

(†)Department of Mathematics, Massachusetts Institute of Technology

(|)Department of Mathematics, École Polytechnique Fédérale de Lausanne

E-mail address:sdhara@purdue.edu, julia.gaudio@northwestern.edu, elmos@mit.edu, colin.sandon@epfl.ch

Acknowledgement: The authors sincerely thank the anonymous reviewers for their helpful feedback. S.D., E.M and

C.S. were partially supported by Vannevar Bush Faculty Fellowship ONR-N00014-20-1-2826. S.D. was supported by

Simons-Berkeley Research Fellowship and Vannevar Bush Faculty Fellowship ONR-N0014-21-1-2887. E.M. and C.S.

were partially supported by NSF award DMS-1737944. E.M. was partially supported by Simons Investigator award

(622132) and by ARO MURI W911NF1910217. J.G. was partially supported by NSF award CCF-2154100. Part of

this work was completed while S.D. and C.S. were at the MIT Mathematics Department, and also while S.D. was at

The Simons Institute for the Theory of Computing. J.G. thanks Nirmit Joshi for helpful discussions, and Charlie

Guan for catching some errors in the proof of Lemma 3.8.

arXiv:2210.05893v3 [math.ST] 24 Oct 2024

2 THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY

In this model, there are two clusters of approximate sizes

nρ

and

−ρ

), and the edges inside

each of the clusters appear independently with probabilities

p1, p2

respectively, while edges between

the two clusters appear with probability

. Moreover, each edge status is revealed with probability

tlog n/n

for some ﬁxed

t >

0. Thus the statuses of most edges are unknown. The censored model

was introduced to model the fact that in many social networks, not all of the connections between

individual nodes are known.

Given an instance of a censored graph with no vertex labels, the problem is to recover the partitions

exactly with high probability. This is often referred to as the exact recovery problem. We note that

some applications of spectral algorithms to the exact recovery problem use an additional combinatorial

clean-up stage (see e.g. [

]), but we follow [

] in studying spectral algorithms that do

not look at the graph after the top eigenvectors have been found. This is partially motivated by the

fact that most real applications of spectral algorithms do not include a combinatorial clean-up stage.

The classical case in the literature considers exact recovery in the Stochastic Block Model where

there is no censoring and

p1, p2, q

= Θ(

log n/n

). In order to achieve exact recovery up to the

information theoretic boundary, prior works used some trimming and post-processing steps together

with the spectral algorithm [

]. However, the question of whether a direct spectral algorithm

based on the top two eigenvectors of the of the adjacency matrix would be optimal remained open

until the recent resolution by Abbe, Fan, Wang, and Zhong [

] for

. In the censored SBM,

there are three possible observations (present, absent, or censored), so spectral recovery using a

binary-valued adjacency matrix is suboptimal. Instead, one can use a ternary-valued encoding

matrix. It was recently shown in [

] that, for some special cases of the planted partition model

such as the planted dense subgraph problem (

)and the symmetric stochastic block model

(

p2, ρ

= 1

2), a spectral algorithm based on the top two eigenvectors of a signed adjacency

matrix is optimal. This raises the question:

Are spectral algorithms based on the top eigenvectors of a signed adjacency matrix

optimal for all censored stochastic block models?

The main contributions of this article are as follows:

(1)

In contrast with the success stories in [

], whenever

p1, p2, q

are distinct, a spectral

algorithm based on the top two eigenvectors of a signed adjacency matrix is always suboptimal

(Theorem 1.7 Part (2)).

(2)

We propose spectral algorithms with two encoding matrices, where we take an appropriate

linear combination of the corresponding top eigenvectors. We show that these algorithms are

always optimal (Theorem 1.10). The optimality of spectral algorithms with two matrices is

also shown in the more general setting with k≥2communities (Theorem 1.12).

Thus, these results exhibit a strict separation between spectral algorithm classes with one versus

multiple encoding matrices, and this separation can be realized for even elementary planted partition

models. To our knowledge, this general phenomenon was not observed in the substantial prior

literature for recovery problems in the planted partition problems.

1.1. Model and Objective. We start by deﬁning the Censored Stochastic Block Model.

Deﬁnition 1.1 (Censored Stochastic Block Model (CSBM)).Let

ρ∈

be such that

i=1 ρi

= 1

and let

P∈

k×k

be a symmetric matrix. Suppose we have

vertices and each vertex

v∈

[

]

is assigned a community assignment

σ0

(

)

∈

[

]according to the distribution

independently, i.e.,

P(σ0(v) = i) = ρifor i∈[k].

▷

For

u, v ∈

[

]and

u̸

, the edge

{u, v}

exists independently with probability

Pσ0(u)σ0(v)

Self-loops do not occur.

▷

For every pair of vertices

{u, v}

, its connectivity status is revealed independently with

probability tlog n

n, and is censored otherwise for some ﬁxed t > 0.

THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY 3

The output is a random graph with edge statuses given by

{present,absent,censored}

. The

distribution of this random graph is called the Censored Stochastic Block Model. We write

G∼

CSBMk

(

ρ, P, t

)to denote a graph generated from the above model, with vertex labels removed (i.e.,

σ0is unknown).

Deﬁnition 1.2 (Exact recovery).Consider the

n×k

membership matrix

, where (

)

{σ0

(

) =

, i.e., the

-th row indicates the community membership of

. Given an estimator

ˆσ

construct

similarly as

Sui

{ˆσ

(

) =

. We say that an estimator achieves exact recovery if

there exists a k×kpermutation matrix Jsuch that ˆ

SJ =S0.

1.2. Information Theoretic Boundary. We start by discussing the information theoretic threshold.

The result will be stated in terms of a Chernoﬀ–Hellinger divergence, introduced by Abbe and

Sandon [2].

Deﬁnition 1.3 (Chernoﬀ–Hellinger divergence).Given two vectors µ, ν ∈(R+\ {0})l, deﬁne

CHξ(µ, ν) = X

i∈[l]ξµi+ (1 −ξ)νi−µξ

iν1−ξ

i

for ξ∈[0,1]. The Chernoﬀ–Hellinger divergence of µand νis deﬁned as

∆+(µ, ν) = max

ξ∈[0,1] CHξ(µ, ν).(1.1)

Deﬁne

tc:= min

i̸=j∆+(θi, θj)−1

where θi= (ρrPri, ρr(1 −Pri))r∈[k]∈R2k.(1.2)

Theorem 1.4 (Information theoretic threshold).Let

G∼CSBMk

(

ρ, P, t

). If

t<tc

, then for any

estimator ˆσ,

lim

n→∞

P(ˆσachieves exact recovery) = 0.

1.3. Spectral Algorithms. For comparing the performance of spectral algorithms with one ma-

trix versus spectral algorithms with more than one matrix, we ﬁrst specialize to the case of two

communities.

To deﬁne spectral algorithms formally, we ﬁrst deﬁne the threshold procedures we allow to apply

on vectors. These are the procedures that will be applied to the leading eigenvectors of the encoding

matrices.

Algorithm 1 Classify

Input: Censored graph Gon nvertices, vectors (ui)m

i=1 ⊂Rn, and scalars, a1, . . . , am, T ∈R.

Output: Community classiﬁcation.

1: Compute possible score vectors

U=(m

i=1

siaiuifor all s1, ..., sm∈ {±1}).

Compute possible assignments

(

) =

{ˆσ

sign

(

u−T

) :

u∈U}

and output a community

assignment 1 + (1 + ˆσ)/2that maximizes the posterior probability P(ˆσ|G)over ˆσ∈ˆ

S(U).

Since eigenvectors are determined up to a sign ﬂip, Step 2 above is required in order to resolve

this sign ambiguity. This will be explained in more detail in Remark 1.13.

4 THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY

Deﬁnition 1.5 (Signed adjacency matrix).Given

y >

0and a graph

with edge statuses

{present

absent,censored}, deﬁne the signed adjacency matrix A(G, y)as the n×nmatrix with

Aij =









1if {i, j}is present

−yif {i, j}is absent

0if {i, j}is censored.

Let us deﬁne the class of algorithms Spectral-One that use a single encoding matrix.

Deﬁnition 1.6 (Spectral-One).An algorithm

(

G, y, a1, a2, T

)in the Spectral-One class

takes a censored graph

as input, an encoding parameter

y∈R+

, and scalars

a1, a2, T ∈R

. The

algorithm then computes the top two eigenvectors

u1, u2

(

G, y

), and gives the output of

Classify((ui)2

i=1,(ai)2

i=1, T ). We denote the output of algorithm Ain this class as ˆσA.

For the two community case, we will always consider the parameters:

P=p1q

q p2,¯ρ= (ρ, 1−ρ),and ρ, p1, p2, q ∈(0,1).(1.3)

Theorem 1.7 (Failure of Spectral-One in most cases).Let

G∼CSBM2

(

¯ρ, P, t

)with

¯ρ, P

given

by (1.3).

(1)

Suppose that

p1, p2, q

are not distinct. If

, then assume

q̸

= 1.

There

exist explicitly computable constants

y∈R+

and

γ1, γ2∈R

such that the algorithm

A(G, y, γ1, γ2,0) from the class Spectral-One satisﬁes

lim

n→∞

P(ˆσAachieves exact recovery) = 1,

for any t>tc. In particular, Algorithm 3produces such an estimator.

(2)

Suppose that

p1, p2, q

are distinct. There exists

δ0>

0such that, if

t<tc

δ0

, then for any

A ∈ Spectral-One,

lim

n→∞

P(ˆσAachieves exact recovery) = 0.

For the case

, Theorem 1.7 Part (1) generalizes the result of [

, Theorem 2.2] to the

case

ρ̸

= 1

2. Part (2) of the result is in sharp contrast with the results in [

]; together, these

results essentially say that the censored planted dense subgraph problem (

) and the symmetric

censored stochastic block models (

) are remarkably the only cases where an algorithm from

Spectral-One is successful

. The possible limitation of Spectral-One was shown in [

, Theorem

2.6] for the special case of q= 1/2,p1= 1 −p2and ρ= 1/2.

Remark 1.8.It is worthwhile to note that the choice of encoding parameters

{

,−y,

}

is completely

general and one does not get a more powerful class of algorithms by allowing an arbitrary ternary

encoding. In fact, as our proof shows, if

p1, p2, q

are distinct, then even if one allows arbitrary

encodings, the Spectral-One algorithms still fail suﬃciently near the threshold (see Remark 5.2).

Next, we will show that spectral algorithms with two matrices are always optimal for the recovery

of two communities. Let us deﬁne the class of algorithms Spectral-Two that uses two encoding

matrices instead of one.

Deﬁnition 1.9 (Spectral-Two).An algorithm

(

G, y1, y2,

(

)

i=1, T

)in the Spectral-Two class

takes as input a censored graph

, two encoding parameter

y1, y2∈R+

with

y1̸

and

(

)

i=1 ⊂R

T∈R

. The algorithm considers two signed adjacency matrices

(

G, y1

)

The case

is covered in [

] without the assumption

q̸

= 1. In this case, spectral algorithms

succeed for t > tc.

For the edge-case

and

= 1, the rank of

[

]is 1 for the value of

that we would want to use.

This is why it is ruled out in Theorem 1.7 Part (1).

THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY 5

and

(

G, y2

), and computes their top two eigenvectors

1, ur

, for

= 1

2. Then the algorithm

outputs

Classify

((

)

i,r=1,2,

(

)

i=1, T

). As before, we denote the output of algorithm

from this

class as ˆσA.

Theorem 1.10 (Spectral-Two always succeeds in recovering two communities).Let

G∼

CSBM2

(

¯ρ, P, t

)with

¯ρ, P

given by

(1.3)

. There exists a set

Y ⊂ R+

with

|Y| ≤

3such that for any

y1̸

and

y1, y2/∈ Y

, there exist explicit (

)

i=1 ⊂R4

such that the algorithm

(

G, y1, y2,

(

)

i=1,

from the class Spectral-Two satisﬁes

lim

n→∞

P(ˆσAachieves exact recovery) = 1,

for any t>tc. In particular, Algorithm 5produces such an estimator.

Theorem 1.10 not only shows that Spectral-Two algorithms are always successful, but also

shows that the choice of the encoding parameters

y1, y2

does not matter too much as long as

y1̸

and they both lie outside a ﬁnite exception set. For example, we can choose

y1, y2∼Uniform

independently. Avoiding the ﬁnite exception set helps us ensure that

and

both have two

eigenvectors with large, distinct eigenvalues. In contrast, the choice of the encoding is quite important

for Spectral-One algorithms in Theorem 1.7 (1). In fact, for

and

, the

only choice of

that yields an optimal algorithm is

log

(

1−q

1−p

)

/log p

. Thus, Spectral-Two algorithms

leads to a much broader and ﬂexible class of algorithms as compared to Spectral-One.

Finally, we show that Spectral-Two succeeds for the recovery of

k≥

3communities, as long as

the parameters

P, ρ

satisfy certain conditions. To this end, let us deﬁne Spectral-Two for general

Algorithm 2 Classify-Multiple

Input: Censored graph Gon nvertices, vectors (ui)m

i=1 ⊂Rn, and weight vectors (ai)k

i=1 ⊂Rm.

Output: Community classiﬁcation.

1: Let Ube the n×mmatrix whose i-th column is ui.

For

s∈ {±

, let

D(s)

diag

(

). Compute the set of possible assignments

consisting of

ˆσ(·;s)with s∈ {±1}msuch that

ˆσ(v;s) = argmax

i∈[k]nUD(s)aivofor each v∈[n].

3: Output ˆσ(·;s)that maximizes the posterior probability P(ˆσ|G)over ˆσ∈ˆ

We will use this algorithm with

= 2

, and the top

eigenvectors from each of two signed adjacency

matrices.

Deﬁnition 1.11 (Spectral-Two for

k≥

3communities).An algorithm

(

G, y1, y2,

(

)

i=1, T

)

in this class takes as input a censored graph

, two encoding parameters

y1, y2∈R+

with

y1̸

and (

)

i=1 ⊂R2k

. The algorithm considers two signed adjacency matrices

(

G, y1

)and

(

G, y2

), and computes their top

eigenvectors (

)

i∈[k],

(

)

i∈[k]

. Then the algorithm outputs

Classify-Multiple

((

)

i∈[k],r=1,2,

(

)

i=1

). As before, we denote the output of algorithm

from

this class as ˆσA.

Theorem 1.12 (Success of Spectral-Two for

k≥

3communities).Let

G∼CSBMk

(

ρ, P, t

)

where

ρ∈

is such that

Piρi

= 1, and

P∈

k×k

is a symmetric matrix. Further, suppose

that

P·diag

(

)has exactly

distinct non-zero eigenvalues. Then there exists a ﬁnite set

Y ⊂ R+

such

that for any

y1̸

and

y1, y2/∈ Y

, the following holds: there exist explicit vectors (

)

i=1 ⊂R2k

such that the algorithm A(G, y1, y2,(ai)k

i=1)from the class Spectral-Two satisﬁes

lim

n→∞

P(ˆσAachieves exact recovery) = 1,

for any t>tc. In particular, Algorithm 7produces such an estimator.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

THEPOWEROFTWOMATRICESINSPECTRALALGORITHMSFORCOMMUNITYRECOVERYSOUVIKDHARA⋆,JULIAGAUDIO†,ELCHANANMOSSEL‡,COLINSANDON|Abstract.Spectralalgorithmsaresomeofthemaintoolsinoptimizationandinferenceproblemsongraphs.Typically,thegraphisencodedasamatrixandeigenvectorsandeigenvaluesofthematrixarethenusedtosolve...

展开>> 收起<<

THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY SOUVIK DHARA JULIA GAUDIO ELCHANAN MOSSEL COLIN SANDON.pdf

共39页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

THE POWER OF TWO MATRICES IN SPECTRAL ALGORITHMS FOR COMMUNITY RECOVERY SOUVIK DHARA JULIA GAUDIO ELCHANAN MOSSEL COLIN SANDON

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: