Implications of sparsity and high triangle density for graph representation learning Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy

2025-05-08 0 0 6.45MB 25 页 10玖币

侵权投诉

Implications of sparsity and high triangle density for graph representation

learning

Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy

University of Bristol University of Bristol University of Bristol University of Bristol

Abstract

Recent work has shown that sparse graphs con-

taining many triangles cannot be reproduced using

a ﬁnite-dimensional representation of the nodes,

in which link probabilities are inner products.

Here, we show that such graphs can be repro-

duced using an inﬁnite-dimensional inner product

model, where the node representations lie on a

low-dimensional manifold. Recovering a global

representation of the manifold is impossible in

a sparse regime. However, we can zoom in on

local neighbourhoods, where a lower-dimensional

representation is possible. As our constructions

allow the points to be uniformly distributed on

the manifold, we ﬁnd evidence against the com-

mon perception that triangles imply community

structure.

1 INTRODUCTION

Network analysis is a common activity in many areas of

science and technology, and has diverse goals as a result.

Sometimes, it is of interest to estimate just a few, salient, pa-

rameters which generally drive connectivity, for example to

inform policy (as the ‘R’ number did, in Britain, during the

COVID-19 pandemic (UK-government, 2020)). A consid-

erable amount of research has gone into developing simple

models, such as the Erdös-Renyi graph, preferential attach-

ment (Barabási and Albert, 1999), or the Chung-Lu model

(Chung and Lu, 2002), to explain global properties of real-

world networks. However, in machine-learning, the goal is

often to uncover hidden structure or make predictions about

the nodes or edges, and these simple models are often felt to

say too little about the data. Instead, the paradigm of graph

embedding has emerged, in which each node is represented

as a vector. There are several theoretical reasons to do this,

Proceedings of the 26

International Conference on Artiﬁcial Intel-

ligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR:

but the most straightforward is convenience, including com-

patibility with existing machine-learning techniques, which

often take feature vectors as input.

In the design of graph embedding algorithms, providing

a realistic generative model for the data has not always

been high priority. In fact, in some of the most popular

approaches, such as DeepWalk (Perozzi et al., 2014) or

Node2vec (Grover and Leskovec, 2016), there is no obvious

generative model. But this is a problem for many applica-

tions, particularly in unsupervised settings, as there is no

straightforward measure of goodness-of-ﬁt. Moreover, data

retention and privacy issues have increased the need for syn-

thetic graph generation techniques. It would be useful to be

able to harness the power of graph embeddings to reproduce

rich structure present in real graphs, as well as their global

properties.

There are approaches to graph embedding which do pur-

port to model the data. Of particular interest here is the

highly studied random dot product graph (RDPG) (Young

and Scheinerman, 2007; Athreya et al., 2017), in which

the nodes each have a latent position

Xi∈Rd

, which we

aim to estimate via graph embedding, and the inner product

hXi, Xji

provides the corresponding connection probability.

The model has been particularly useful for understanding

graph spectral embedding, where the node representations

are obtained from the eigenvectors of some matrix represen-

tation of graph, such as the normalised Laplacian (Tang and

Priebe, 2018). However, a recent result has put its adequacy

as a generative model in doubt. The random dot product

graph is unable to produce graphs in which the nodes make

few connections but still form large numbers of triangles

(Seshadhri et al., 2020). These properties, known as spar-

sity and transitivity, are regarded as ubiquitous in network

science (Newman, 2018) and are of course consistent with

our own experience of social networks: most of us have

few connections, relative to the size of the network and yet,

often, ‘a friend of a friend is a friend’.

A follow-on paper (Chanpuriya et al., 2020), which

proved an inspiration to us, was able to reproduce spar-

sity and high triangle density (formal deﬁnition to come),

in theory, by modifying the connection probability to

max{0,min(hXi, Yji,1)}

and, in practice, using the ker-

arXiv:2210.15277v2 [stat.ML] 21 Apr 2023

Implications of sparsity and high triangle density for graph representation learning

nel

logistic(hXi, Yji)

(the latter already in common use),

assigning two latent positions,

and

, to each node

. More recently, Stolman et al. (2022) have attained the

properties of interest using the softmax kernel over a low-

dimensional embedding.

It is important to appreciate that, in our approach, the in-

ner product is not an arbitrary kernel. It provides a default

choice, because under mild assumptions it can be used to

represent any other positive deﬁnite kernel, although typi-

cally in inﬁnite dimension, by Mercer’s theorem, known in

machine-learning as the ‘kernel trick’ (Steinwart and Christ-

mann, 2008). We ﬁnd a transparent relationship between

the eigenvalues of the kernel and the triangle density (see

the supplementary material — Lemma 7), which shows

negative eigenvalues can only reduce the expected number

of triangles in the graph. As a result, if a carefully crafted

kernel seems to reproduce high triangle density (Chanpuriya

et al., 2020), and negative eigenvalues aren’t the reason, the

likely problem with the RDPG is with its ﬁnite-dimension

assumption.

There would be little hope for practical inference in inﬁnite

dimension without structure. Instead, this paper shows that

sparse graphs with high triangle density can be modelled

using an inﬁnite-dimensional inner product model, with

representations on a low dimensional manifold.

An implication of this manifold structure, that we later ex-

plore in depth, is the plausibility of global versus local

embedding. Under levels of sparsity which are generally

deemed reasonable, the manifold does not get populated

everywhere, and perfect reconstruction is impossible. How-

ever, since a

-dimensional manifold (informally) locally

resembles

-dimensional Euclidean space, we can ‘zoom in’

to the subgraph containing nodes in a small neighbourhood

of the latent space, and form an embedding which represents

a ﬂat approximation of the manifold over that region. This

encompasses, but generalises, the notion of ﬁnding local

community structure (Rohe and Qin, 2013; Spielman and

Teng, 2013; Chen et al., 2020). A key technical contribu-

tion of our work is to construct manifolds for which we can

simultaneously control: sparsity, triangle density, and the

on-manifold distribution. One point of the latter is that we

can place a uniform distribution on the manifold, and still

achieve sparse graphs with high triangle density. Because

there is no reasonable notion of community structure in this

case, we ﬁnd evidence against the common perception that

triangles imply community structure.

Related work

Several earlier works have proposed inﬁ-

nite dimensional versions of the RDPG and/or the hypothe-

sis that latent positions live on a low-dimensional manifold

(Tang et al., 2013; Athreya et al., 2021; Rubin-Delanchy,

2020; Lei, 2021; Whiteley et al., 2021). The main goal of

this paper, to generatively achieve sparsity and high triangle

density, was not considered in these papers and indeed they

contain several assumptions which make the goal impossi-

ble, including a ﬁnite rank or trace-class kernel. The paper

Whiteley et al. (2021) provides a crucial element of our

theory, allowing a manifold construction for which we can

control the on-manifold distribution. We principally focus

on triangle density, rather than (the formal deﬁnition of)

transitivity, to align our work with Seshadhri et al. (2020);

however, our graphs also achieve high transitivity.

2 ACHIEVING HIGH TRIANGLE

DENSITY IN SPARSE GRAPHS

Setup

We consider a sequence of simple undirected

graphs,

A(n)

, on nodes indexed

1, . . . , n

. We make no dis-

tinction between the graph and its adjacency matrix,

A(n)

which is binary,

n×n

, and symmetric, with the standard

convention that

A(n)

ij = 1

if nodes

and

have an edge, and

zero otherwise.

We will assign to each node,

, an inﬁnite-dimensional latent

position,

X(n)

i∈RN

, from a probability distribution

whose support

(the precise deﬁnition of which comes

later) only allows inner products that are valid probabilities,

that is,

hx, yi:=

∞

k=1

xkyk∈[0,1] for all x, y ∈ Mn. (1)

For convenience, the superscript

(n)

will be suppressed

from the latent positions, but there is no correspondence

between the latent positions of graphs of different sizes, i.e.

associated with

A(n)

is not the same as

associated

with A(m), for n6=m.

Deﬁnition 1

(Inﬁnite-dimensional random dot product

graph)

Let

X1, . . . , Xn

i.i.d.

∼Fn

. The matrix

A(n)

rep-

resents an inﬁnite-dimensional random dot product graph

(IRDPG) if

A(n)ind

∼Bernoulli(hXi, Xji),

conditional on X1, . . . , Xn, for 1≤i≤j≤n.

This model has appeared, at different levels of generality,

in many previous works (Young and Scheinerman, 2007;

Tang et al., 2013; Athreya et al., 2017; Lei, 2021), and

encodes several assumptions which we review at the end

of this section. Of particular interest here is the trade-off

between two key properties: the graph sparsity factor,

ρn

and triangle density,

∆n

, where (He, 2015; Seshadhri et al.,

2020)

nρn=nZZ hx, yidFn(x)dFn(y);

n∆n=n

3ZZZhx, yihy, zihz, xidFn(x)dFn(y)dFn(z),

Hannah Sansford, Alexander Modell, Nick Whiteley, Patrick Rubin-Delanchy

are respectively the graph’s expected degree,

E(PjA(n)

ij )

, and expected number of triangles,

E(Pi6=j6=kA(n)

ij A(n)

jk A(n)

ki ).

Deﬁnition 2

(Spectral embedding into

)

Given

A(n)

has eigendecomposition

A(n)=Piλiuiu>

, its adjacency

spectral embedding into

dimensions is given by the rows

of ˆ

X:= (|λ1|1/2u1,...,|λd|1/2ud).

The point cloud

can be interpreted as a geometric rep-

resentation of the best rank-

approximation of

A(n)

, in

the sense that, provided

λ1, . . . , λd>0

(as is generally

assumed throughout this paper),

X= argmin

X:rank(X)≤dkXX>−A(n)kF.

The matrix input may also be rectangular (e.g. the ‘core-

periphery slice’ (Section 3 - local embedding)), in which

case, singular values and left singular vectors replace eigen-

values and eigenvectors respectively.

For the reader’s convenience, we recall the standard rate-of-

growth notation

o(n) : ‘strictly slower than n’;

O(n) : ‘slower than n’;

ω(n) : ‘strictly faster than n’;

Ω(n) : ‘faster than n’;

Θ(n) : ‘as fast as n’, i.e., O(n)and Ω(n).

Question

Motivated by recent impossibility results con-

cerning triangle densities under sparse ﬁnite-dimensional

dot product models (Seshadhri et al., 2020), the question

which is the starting point for our contribution is: can a

sparse IRDPG, i.e.

ρn=o(1)

, have high triangle density,

i.e., ∆n= Ω(1)?

Main result

The answer is yes. Moreover, in contrast to

the results of Seshadhri et al. (2020) which show it is im-

possible to have sparsity and high triangle density when

is supported on a ﬁnite-dimensional vector space, we show

that sparsity and high triangle density is possible when

is supported on a low-dimensional manifold

embed-

ded in an inﬁnite-dimensional space. We can even obtain

the extreme regime

ρn= Θ(1/n),∆n= Θ(1)

, which in-

cludes asymptotically constant expected degree and triangle

density.

Approach

Our proof is by example: we construct

through a non-distortive homeomorphism

φn

applied to a

compact topological manifold

Zn⊂Rd

. In our examples,

is either a growing interval or a growing circle. By non-

distortive we mean: the Euclidean length of a curve

η(t)

is equal to the (inﬁnite dimensional) Euclidean length of

the image of this curve

φn{η(t)}

. If

Zn= [0,1]2

for example, we can think of a sheet of paper which is

being bent or folded, but not cut or stretched. This property

allows us to easily control the on-manifold distribution, an

implication of which is given in Section 3 (triangles and

community structure). Moreover, the map

φn

is chosen in

such a way that we can calculate the sparsity and triangle

density of the resulting graph. The technical details of the

construction are delegated to the supplementary material.

Example 1

Let

be a Gaussian distribution with mean

zero and variance

σ2

, truncated to the interval

Zn=

[−tn, tn]

, for

tn>0

. We let

tn=σ2

so that mass of

N(0, σ2

on the complement of

tends to zero as

n→ ∞

For this example:

Lemma 3.

There exists a non-distortive homeomorphism

φRBF

n:Zn→RN

such that for

Mn=φRBF

n(Zn)

and

with

the pushforward by

φRBF

of the truncated Gaussian

distribution Gn,

ρn= Θ(σ−1

n),∆n= Θ(n2σ−2

n),

so that ρn= Θ(1/n)and ∆n= Θ(1) if σn= Θ(n).

Example 2

Let

be the uniform distribution on the

circle Zn={x∈R2:kxk=rn}. For this example:

Lemma 4.

There exists a non-distortive homeomorphism

φS

n:Zn→RN

such that for

Mn=φS

n(Zn)

and

the

pushforward of Gnby φS

ρn=O(r−1

n),∆n= Ω(n2r−2

n),

so that ρn=O(1/n)and ∆n= Ω(1) if rn= Θ(n).

We note that in the context of Lemma 4,

is the uniform

probability measure on

, since

is uniform and

φS

non-distortive. This is also an example where the expected

degrees are identical and therefore bounded for each node,

as in Seshadhri et al. (2020).

The two examples are illustrated in Figure 1, with

n= 2000

σn=n/2000

and

rn=n/2000

. Seshadhri et al. (2020)

note that for various real-world graphs, the subgraphs con-

taining nodes of degree

c∈[10,50]

or less have triangle

density greater than one (and in many cases greater than

ten). Our examples, and several others, are able to pro-

duce graphs with (constant) expected degrees and triangle

densities in the same range, which we demonstrate in the

supplementary materials. We also illustrate that the logistic

model studied in Chanpuriya et al. (2020), and employed in

Spotify’s recommendation system (Johnson, 2014), forms a

manifold in (indeﬁnite) inner product space.

Model assumptions

Allowing self-loops is not crucial,

and is simply done to make the analysis more reader-friendly.

Forbidding self-loops will not change the expected number

Implications of sparsity and high triangle density for graph representation learning

Figure 1: The upper plots show the manifolds of our concrete examples (red curve) with the spectral embeddings (blue

points) of a graph simulated from the model, and aligned using Procrustes analysis. The lower plots are histograms showing

the intrinsic distribution of Xi, with n= 2000,σn=n/2000 and rn=n/2000.

of triangles, and can only reduce the expected degree by at

most one.

That the latent positions are i.i.d., edges conditionally in-

dependent, with probability given by an inner product, are

testable assumptions. Given an embedding,

X1,..., ˆ

the ﬁrst can be investigated using point process analysis

techniques to detect dependence, such as repulsion (Badde-

ley et al., 2015). The second could be addressed using held

out information, such as time, to uncover edge correlations

not captured by

X1,..., ˆ

. A recent paper has observed

that edge independence can be problematic for reproducing

triangle density while controlling a notion of overlap, for

a ﬁxed edge probability matrix (Chanpuriya et al., 2021).

However, in our model, the edge probability matrix is ran-

dom. Regarding the assumption of an inner product, the

model cannot explain any large-magnitude negative eigen-

value in A(n).

These assumptions have been relaxed in many works (Caron

and Fox, 2017; Xie and Xu, 2021; Padilla et al., 2019; Rubin-

Delanchy et al., 2022; Lei, 2021). However, it is important

to appreciate that they are appropriate for our goal: to show

that sparsity and high triangle density are possible with

node representations on a low-dimensional manifold, un-

der a setup as close as possible to Seshadhri et al. (2020),

contrasting their result. Moreover, if these properties are

possible within a set of strong assumptions, they are possible

more generally.

Relation to graphon models

Suppose the node repre-

sentations are i.i.d. uniform on the unit interval, with

connection probabilities given by an unknown function

g: [0,1] ×[0,1] →[0,1]

. If the graphon

is ﬁxed, sparse

graphs are impossible since expected degree grows as

Θ(n)

If we scale

by a sparsity factor, as is common in the litera-

ture (Lunde and Sarkar, 2019; Lei, 2021; Wolfe and Olhede,

2013), then high triangle density becomes impossible. In

particular, if

nρn=o(n1/3)

, then

∆n=o(1)

. The proof

is a straightforward application of the deﬁnition of

∆n

and

can be found in the supplementary material. Caron et al.

(2017) prove that an extension of the graphon model, called

graphex processes, can simultaneously achieve sparsity and

constant clustering coefﬁcients.

3 IMPLICATIONS

We now turn our attention away from the generative proper-

ties of the IRDPG, i.e. producing sparsity and high triangle

density, to the implications of our model for performing

inference on network data.

Global manifold learning

Most real world graphs are

perceived to be very sparse, in that their average degree is

much smaller than

, and it is common in network theory to

equate this with the expected degree being

o(log n)

(Krivele-

vich and Sudakov, 2003; Amini et al., 2013; Le et al., 2017).

The threshold

log(n)

has special signiﬁcance because it is

Hannah Sansford, Alexander Modell, Nick Whiteley, Patrick Rubin-Delanchy

the rate under which an Erdös-Rényi graph is disconnected

with high probability (Bollobás, 1998), and exact recovery

of network communities is impossible (Abbe, 2017). The

same threshold appears here, but for a different reason.

We can only expect to recover the manifold from

X1, . . . , Xn

if they don’t leave large parts of

uncovered.

We measure the largest gap using the Hausdorff distance

dH({Xi},Mn) := sup

x∈Mnmin

i∈{1,...,n}d◦(Xi, x),

where

d◦(x, y)

denotes geodesic distance on

, the length

of the shortest curve between xand yon Mn.

Lemma 5.

is homeomorphic to a closed interval,

is uniform on

, and there exists

c > 0

such that

kxk ≥ c, for all x∈ Mn, then

dH({Xi},Mn)p

→0⇒nρn= Ω{log(n)}.

Given just

X1, . . . , Xn

, ﬁnding a

-dimensional represen-

tation of the nodes in which geodesic distances are faith-

fully reproduced is therefore impossible in general, unless

nρn= Ω{log(n)}

. In practice, we only have access to

estimates

X1,..., ˆ

, which adds further complications to

manifold learning.

In fact, in this semi-sparse regime, there are problems relat-

ing to estimating

. For example, existing works on spec-

tral embedding in the inﬁnite-dimensional case make heavy

use of the trace-class assumption (Tang et al., 2013; Rubin-

Delanchy, 2020; Lei, 2021),

ρ−1

nP∞

i=1 λi<∞

, whereas

nρn=O{log(n)}

and

∆n= Ω(1)

require this sum to be

unbounded in n.

Local embedding

When

is a manifold, as it is in

the examples above, we now explore the idea that we can

‘zoom in’ to a particular location in latent space and faith-

fully represent the rich local structure with low-dimensional

embeddings.

Given a ball with centre

x∈ Mn\{0}

and radius

, denoted

Br(x)

, we consider two local views of the graph: the core,

, is the subgraph on the nodes with latent position in

Br(x)

; the core-periphery,

AC-P

, is a graph on the full

nodeset with adjacency matrix

AC-P

ij =(Aij if Xi∈Br(x)or Xj∈Br(x),

0otherwise.

In what follows, we will refer to a ‘slice’ of this matrix,

AC-P slice =AC-P

i:Xi∈Br(x),j∈{1,...,n}.

We propose to ﬁnd a low-rank approximate factorisation of

these ‘local adjacency matrices’,

and

AC-P slice

, and ar-

gue that, under certain conditions, this low-rank assumption

is reasonable. We provide some theoretical reasons to prefer

the core-periphery; however, our experiments with real data

ﬁnd them comparable.

To describe these graphs as

grows, we have to assume

the manifolds are nested, that is,

Mn⊆ Mn+1

, for all

We impose two other regularity conditions. First,

n/vn→

∞

, where

vn=RMn1dx

is the intrinsic volume of the

manifold, which would preclude a constant expected degree

in the examples of the previous section, but allows any faster

rate. Second, there exists

R∈(0,kxk/√6)

and a constant

C > 0such that

P{Xi∈Br(x)} ≥ C

rd,for all r≤R.

Under these assumptions, the core is dense, with high trian-

gle density.

Theorem 6.

Fix

x∈ Mn\ {0}

and let

rn=

ω{(vn/n)1/d}, rn≤R < kxk/√6

, which could be ﬁxed

or shrinking. The graph core corresponding to the ball

Brn(x)

has

Ω(nrd/vn)

nodes in expectation, expected de-

gree Ω(nrd/vn)and triangle density Ω(nrd/vn)2.

If the manifold is smooth, a local neighbourhood

Mn∩

Br(x)

can be approximated up to arbitrary precision by a

dimensional plane, the approximation improving as

r→0

However, we can go too far, and collect very few points

Xi∈Br(x)

. Theorem 6 tells us how

should shrink with

nto avoid this.

Under a ﬂat approximation, the core and core-periphery

slice have low rank. To be precise, the matrices

PC:=

Pi,j∈C

and

PC-P slice := Pi∈C,j∈{1,...,n}

, where

hXi, Xjii,j∈{1,...,n}

and

C={i:Xi∈Br(x)}

, have

rank at most

d+ 1

. As a result, we can hope to recover local,

ﬁnite-dimensional estimates of

Xi∈Br(x)

using approxi-

mate matrix factorisation of

AC-P slice

. Moreover, the

approximate ranks of those matrices provide a local estimate

of the manifold dimension.

We ﬁnd however that, given ‘noise-free’ factorisations

PC=XC(XC)>

and

PC-P slice =XC-P(YC-P)>

, where

XC,XC-P ∈R|C|×(d+1),YC-P ∈Rn×(d+1)

, the original

Xi∈Br(x)

are linear images of the rows of

XC-P

, but are

not necessarily recoverable from XC.

We have not discussed how a neighbourhood might be found

in practice. There are many algorithms suited to this pur-

pose (Haveliwala, 2003; Spielman and Teng, 2004; Ander-

sen et al., 2006), and even a simple similarity measure such

as “number of shared neighbours” could be expected to be

consistent (Chen and Xu, 2016). The supplementary mate-

rial contains an experiment demonstrating that this measure

forms a reasonable surrogate for the theoretical neighbour-

hood. There are two other ways a proxy for neighbourhood

could be available. First, the graph may have associated

covariates, such as time or location, which allow us to zoom

into particular subgraphs (see classiﬁcation experiment).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImplicationsofsparsityandhightriangledensityforgraphrepresentationlearningHannahSansfordAlexanderModellNickWhiteleyPatrickRubin-DelanchyUniversityofBristolUniversityofBristolUniversityofBristolUniversityofBristolAbstractRecentworkhasshownthatsparsegraphscon-tainingmanytrianglescannotbereproducedusin...

展开>> 收起<<

Implications of sparsity and high triangle density for graph representation learning Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Implications of sparsity and high triangle density for graph representation learning Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: