Implications of sparsity and high triangle density for graph representation learning Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy

2025-05-08 0 0 6.45MB 25 页 10玖币
侵权投诉
Implications of sparsity and high triangle density for graph representation
learning
Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy
University of Bristol University of Bristol University of Bristol University of Bristol
Abstract
Recent work has shown that sparse graphs con-
taining many triangles cannot be reproduced using
a finite-dimensional representation of the nodes,
in which link probabilities are inner products.
Here, we show that such graphs can be repro-
duced using an infinite-dimensional inner product
model, where the node representations lie on a
low-dimensional manifold. Recovering a global
representation of the manifold is impossible in
a sparse regime. However, we can zoom in on
local neighbourhoods, where a lower-dimensional
representation is possible. As our constructions
allow the points to be uniformly distributed on
the manifold, we find evidence against the com-
mon perception that triangles imply community
structure.
1 INTRODUCTION
Network analysis is a common activity in many areas of
science and technology, and has diverse goals as a result.
Sometimes, it is of interest to estimate just a few, salient, pa-
rameters which generally drive connectivity, for example to
inform policy (as the ‘R’ number did, in Britain, during the
COVID-19 pandemic (UK-government, 2020)). A consid-
erable amount of research has gone into developing simple
models, such as the Erdös-Renyi graph, preferential attach-
ment (Barabási and Albert, 1999), or the Chung-Lu model
(Chung and Lu, 2002), to explain global properties of real-
world networks. However, in machine-learning, the goal is
often to uncover hidden structure or make predictions about
the nodes or edges, and these simple models are often felt to
say too little about the data. Instead, the paradigm of graph
embedding has emerged, in which each node is represented
as a vector. There are several theoretical reasons to do this,
Proceedings of the 26
th
International Conference on Artificial Intel-
ligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR:
Volume 206. Copyright 2023 by the author(s).
but the most straightforward is convenience, including com-
patibility with existing machine-learning techniques, which
often take feature vectors as input.
In the design of graph embedding algorithms, providing
a realistic generative model for the data has not always
been high priority. In fact, in some of the most popular
approaches, such as DeepWalk (Perozzi et al., 2014) or
Node2vec (Grover and Leskovec, 2016), there is no obvious
generative model. But this is a problem for many applica-
tions, particularly in unsupervised settings, as there is no
straightforward measure of goodness-of-fit. Moreover, data
retention and privacy issues have increased the need for syn-
thetic graph generation techniques. It would be useful to be
able to harness the power of graph embeddings to reproduce
rich structure present in real graphs, as well as their global
properties.
There are approaches to graph embedding which do pur-
port to model the data. Of particular interest here is the
highly studied random dot product graph (RDPG) (Young
and Scheinerman, 2007; Athreya et al., 2017), in which
the nodes each have a latent position
XiRd
, which we
aim to estimate via graph embedding, and the inner product
hXi, Xji
provides the corresponding connection probability.
The model has been particularly useful for understanding
graph spectral embedding, where the node representations
are obtained from the eigenvectors of some matrix represen-
tation of graph, such as the normalised Laplacian (Tang and
Priebe, 2018). However, a recent result has put its adequacy
as a generative model in doubt. The random dot product
graph is unable to produce graphs in which the nodes make
few connections but still form large numbers of triangles
(Seshadhri et al., 2020). These properties, known as spar-
sity and transitivity, are regarded as ubiquitous in network
science (Newman, 2018) and are of course consistent with
our own experience of social networks: most of us have
few connections, relative to the size of the network and yet,
often, ‘a friend of a friend is a friend’.
A follow-on paper (Chanpuriya et al., 2020), which
proved an inspiration to us, was able to reproduce spar-
sity and high triangle density (formal definition to come),
in theory, by modifying the connection probability to
max{0,min(hXi, Yji,1)}
and, in practice, using the ker-
arXiv:2210.15277v2 [stat.ML] 21 Apr 2023
Implications of sparsity and high triangle density for graph representation learning
nel
logistic(hXi, Yji)
(the latter already in common use),
assigning two latent positions,
Xi
and
Yi
, to each node
i
. More recently, Stolman et al. (2022) have attained the
properties of interest using the softmax kernel over a low-
dimensional embedding.
It is important to appreciate that, in our approach, the in-
ner product is not an arbitrary kernel. It provides a default
choice, because under mild assumptions it can be used to
represent any other positive definite kernel, although typi-
cally in infinite dimension, by Mercer’s theorem, known in
machine-learning as the ‘kernel trick’ (Steinwart and Christ-
mann, 2008). We find a transparent relationship between
the eigenvalues of the kernel and the triangle density (see
the supplementary material — Lemma 7), which shows
negative eigenvalues can only reduce the expected number
of triangles in the graph. As a result, if a carefully crafted
kernel seems to reproduce high triangle density (Chanpuriya
et al., 2020), and negative eigenvalues aren’t the reason, the
likely problem with the RDPG is with its finite-dimension
assumption.
There would be little hope for practical inference in infinite
dimension without structure. Instead, this paper shows that
sparse graphs with high triangle density can be modelled
using an infinite-dimensional inner product model, with
representations on a low dimensional manifold.
An implication of this manifold structure, that we later ex-
plore in depth, is the plausibility of global versus local
embedding. Under levels of sparsity which are generally
deemed reasonable, the manifold does not get populated
everywhere, and perfect reconstruction is impossible. How-
ever, since a
d
-dimensional manifold (informally) locally
resembles
d
-dimensional Euclidean space, we can ‘zoom in’
to the subgraph containing nodes in a small neighbourhood
of the latent space, and form an embedding which represents
a flat approximation of the manifold over that region. This
encompasses, but generalises, the notion of finding local
community structure (Rohe and Qin, 2013; Spielman and
Teng, 2013; Chen et al., 2020). A key technical contribu-
tion of our work is to construct manifolds for which we can
simultaneously control: sparsity, triangle density, and the
on-manifold distribution. One point of the latter is that we
can place a uniform distribution on the manifold, and still
achieve sparse graphs with high triangle density. Because
there is no reasonable notion of community structure in this
case, we find evidence against the common perception that
triangles imply community structure.
Related work
Several earlier works have proposed infi-
nite dimensional versions of the RDPG and/or the hypothe-
sis that latent positions live on a low-dimensional manifold
(Tang et al., 2013; Athreya et al., 2021; Rubin-Delanchy,
2020; Lei, 2021; Whiteley et al., 2021). The main goal of
this paper, to generatively achieve sparsity and high triangle
density, was not considered in these papers and indeed they
contain several assumptions which make the goal impossi-
ble, including a finite rank or trace-class kernel. The paper
Whiteley et al. (2021) provides a crucial element of our
theory, allowing a manifold construction for which we can
control the on-manifold distribution. We principally focus
on triangle density, rather than (the formal definition of)
transitivity, to align our work with Seshadhri et al. (2020);
however, our graphs also achieve high transitivity.
2 ACHIEVING HIGH TRIANGLE
DENSITY IN SPARSE GRAPHS
Setup
We consider a sequence of simple undirected
graphs,
A(n)
, on nodes indexed
1, . . . , n
. We make no dis-
tinction between the graph and its adjacency matrix,
A(n)
,
which is binary,
n×n
, and symmetric, with the standard
convention that
A(n)
ij = 1
if nodes
i
and
j
have an edge, and
zero otherwise.
We will assign to each node,
i
, an infinite-dimensional latent
position,
X(n)
iRN
, from a probability distribution
Fn
whose support
Mn
(the precise definition of which comes
later) only allows inner products that are valid probabilities,
that is,
hx, yi:=
X
k=1
xkyk[0,1] for all x, y ∈ Mn. (1)
For convenience, the superscript
(n)
will be suppressed
from the latent positions, but there is no correspondence
between the latent positions of graphs of different sizes, i.e.
Xi
associated with
A(n)
is not the same as
Xi
associated
with A(m), for n6=m.
Definition 1
(Infinite-dimensional random dot product
graph)
.
Let
X1, . . . , Xn
i.i.d.
Fn
. The matrix
A(n)
rep-
resents an infinite-dimensional random dot product graph
(IRDPG) if
A(n)ind
Bernoulli(hXi, Xji),
conditional on X1, . . . , Xn, for 1ijn.
This model has appeared, at different levels of generality,
in many previous works (Young and Scheinerman, 2007;
Tang et al., 2013; Athreya et al., 2017; Lei, 2021), and
encodes several assumptions which we review at the end
of this section. Of particular interest here is the trade-off
between two key properties: the graph sparsity factor,
ρn
,
and triangle density,
n
, where (He, 2015; Seshadhri et al.,
2020)
n=nZZ hx, yidFn(x)dFn(y);
nn=n
3ZZZhx, yihy, zihz, xidFn(x)dFn(y)dFn(z),
Hannah Sansford, Alexander Modell, Nick Whiteley, Patrick Rubin-Delanchy
are respectively the graph’s expected degree,
E(PjA(n)
ij )
, and expected number of triangles,
E(Pi6=j6=kA(n)
ij A(n)
jk A(n)
ki ).
Definition 2
(Spectral embedding into
Rd
)
.
Given
A(n)
has eigendecomposition
A(n)=Piλiuiu>
i
, its adjacency
spectral embedding into
d
dimensions is given by the rows
of ˆ
X:= (|λ1|1/2u1,...,|λd|1/2ud).
The point cloud
ˆ
X
can be interpreted as a geometric rep-
resentation of the best rank-
d
approximation of
A(n)
, in
the sense that, provided
λ1, . . . , λd>0
(as is generally
assumed throughout this paper),
ˆ
X= argmin
X:rank(X)dkXX>A(n)kF.
The matrix input may also be rectangular (e.g. the ‘core-
periphery slice’ (Section 3 - local embedding)), in which
case, singular values and left singular vectors replace eigen-
values and eigenvectors respectively.
For the reader’s convenience, we recall the standard rate-of-
growth notation
o(n) : ‘strictly slower than n’;
O(n) : ‘slower than n’;
ω(n) : ‘strictly faster than n’;
Ω(n) : ‘faster than n’;
Θ(n) : ‘as fast as n’, i.e., O(n)and Ω(n).
Question
Motivated by recent impossibility results con-
cerning triangle densities under sparse finite-dimensional
dot product models (Seshadhri et al., 2020), the question
which is the starting point for our contribution is: can a
sparse IRDPG, i.e.
ρn=o(1)
, have high triangle density,
i.e., n= Ω(1)?
Main result
The answer is yes. Moreover, in contrast to
the results of Seshadhri et al. (2020) which show it is im-
possible to have sparsity and high triangle density when
Fn
is supported on a finite-dimensional vector space, we show
that sparsity and high triangle density is possible when
Fn
is supported on a low-dimensional manifold
Mn
embed-
ded in an infinite-dimensional space. We can even obtain
the extreme regime
ρn= Θ(1/n),n= Θ(1)
, which in-
cludes asymptotically constant expected degree and triangle
density.
Approach
Our proof is by example: we construct
Mn
through a non-distortive homeomorphism
φn
applied to a
compact topological manifold
ZnRd
. In our examples,
Zn
is either a growing interval or a growing circle. By non-
distortive we mean: the Euclidean length of a curve
η(t)
on
Zn
is equal to the (infinite dimensional) Euclidean length of
the image of this curve
φn{η(t)}
on
Mn
. If
Zn= [0,1]2
,
for example, we can think of a sheet of paper which is
being bent or folded, but not cut or stretched. This property
allows us to easily control the on-manifold distribution, an
implication of which is given in Section 3 (triangles and
community structure). Moreover, the map
φn
is chosen in
such a way that we can calculate the sparsity and triangle
density of the resulting graph. The technical details of the
construction are delegated to the supplementary material.
Example 1
Let
Gn
be a Gaussian distribution with mean
zero and variance
σ2
n
, truncated to the interval
Zn=
[tn, tn]
, for
tn>0
. We let
tn=σ2
n
so that mass of
N(0, σ2
n)
on the complement of
Zn
tends to zero as
n→ ∞
.
For this example:
Lemma 3.
There exists a non-distortive homeomorphism
φRBF
n:ZnRN
such that for
Mn=φRBF
n(Zn)
and
with
Fn
the pushforward by
φRBF
n
of the truncated Gaussian
distribution Gn,
ρn= Θ(σ1
n),n= Θ(n2σ2
n),
so that ρn= Θ(1/n)and n= Θ(1) if σn= Θ(n).
Example 2
Let
Gn
be the uniform distribution on the
circle Zn={xR2:kxk=rn}. For this example:
Lemma 4.
There exists a non-distortive homeomorphism
φS
n:ZnRN
such that for
Mn=φS
n(Zn)
and
Fn
the
pushforward of Gnby φS
n,
ρn=O(r1
n),n= Ω(n2r2
n),
so that ρn=O(1/n)and n= Ω(1) if rn= Θ(n).
We note that in the context of Lemma 4,
Fn
is the uniform
probability measure on
Mn
, since
Gn
is uniform and
φS
n
is
non-distortive. This is also an example where the expected
degrees are identical and therefore bounded for each node,
as in Seshadhri et al. (2020).
The two examples are illustrated in Figure 1, with
n= 2000
,
σn=n/2000
and
rn=n/2000
. Seshadhri et al. (2020)
note that for various real-world graphs, the subgraphs con-
taining nodes of degree
c[10,50]
or less have triangle
density greater than one (and in many cases greater than
ten). Our examples, and several others, are able to pro-
duce graphs with (constant) expected degrees and triangle
densities in the same range, which we demonstrate in the
supplementary materials. We also illustrate that the logistic
model studied in Chanpuriya et al. (2020), and employed in
Spotify’s recommendation system (Johnson, 2014), forms a
manifold in (indefinite) inner product space.
Model assumptions
Allowing self-loops is not crucial,
and is simply done to make the analysis more reader-friendly.
Forbidding self-loops will not change the expected number
Implications of sparsity and high triangle density for graph representation learning
Figure 1: The upper plots show the manifolds of our concrete examples (red curve) with the spectral embeddings (blue
points) of a graph simulated from the model, and aligned using Procrustes analysis. The lower plots are histograms showing
the intrinsic distribution of Xi, with n= 2000,σn=n/2000 and rn=n/2000.
of triangles, and can only reduce the expected degree by at
most one.
That the latent positions are i.i.d., edges conditionally in-
dependent, with probability given by an inner product, are
testable assumptions. Given an embedding,
ˆ
X1,..., ˆ
Xn
,
the first can be investigated using point process analysis
techniques to detect dependence, such as repulsion (Badde-
ley et al., 2015). The second could be addressed using held
out information, such as time, to uncover edge correlations
not captured by
ˆ
X1,..., ˆ
Xn
. A recent paper has observed
that edge independence can be problematic for reproducing
triangle density while controlling a notion of overlap, for
a fixed edge probability matrix (Chanpuriya et al., 2021).
However, in our model, the edge probability matrix is ran-
dom. Regarding the assumption of an inner product, the
model cannot explain any large-magnitude negative eigen-
value in A(n).
These assumptions have been relaxed in many works (Caron
and Fox, 2017; Xie and Xu, 2021; Padilla et al., 2019; Rubin-
Delanchy et al., 2022; Lei, 2021). However, it is important
to appreciate that they are appropriate for our goal: to show
that sparsity and high triangle density are possible with
node representations on a low-dimensional manifold, un-
der a setup as close as possible to Seshadhri et al. (2020),
contrasting their result. Moreover, if these properties are
possible within a set of strong assumptions, they are possible
more generally.
Relation to graphon models
Suppose the node repre-
sentations are i.i.d. uniform on the unit interval, with
connection probabilities given by an unknown function
g: [0,1] ×[0,1] [0,1]
. If the graphon
g
is fixed, sparse
graphs are impossible since expected degree grows as
Θ(n)
.
If we scale
g
by a sparsity factor, as is common in the litera-
ture (Lunde and Sarkar, 2019; Lei, 2021; Wolfe and Olhede,
2013), then high triangle density becomes impossible. In
particular, if
n=o(n1/3)
, then
n=o(1)
. The proof
is a straightforward application of the definition of
n
and
can be found in the supplementary material. Caron et al.
(2017) prove that an extension of the graphon model, called
graphex processes, can simultaneously achieve sparsity and
constant clustering coefficients.
3 IMPLICATIONS
We now turn our attention away from the generative proper-
ties of the IRDPG, i.e. producing sparsity and high triangle
density, to the implications of our model for performing
inference on network data.
Global manifold learning
Most real world graphs are
perceived to be very sparse, in that their average degree is
much smaller than
n
, and it is common in network theory to
equate this with the expected degree being
o(log n)
(Krivele-
vich and Sudakov, 2003; Amini et al., 2013; Le et al., 2017).
The threshold
log(n)
has special significance because it is
Hannah Sansford, Alexander Modell, Nick Whiteley, Patrick Rubin-Delanchy
the rate under which an Erdös-Rényi graph is disconnected
with high probability (Bollobás, 1998), and exact recovery
of network communities is impossible (Abbe, 2017). The
same threshold appears here, but for a different reason.
We can only expect to recover the manifold from
X1, . . . , Xn
if they don’t leave large parts of
Mn
uncovered.
We measure the largest gap using the Hausdorff distance
dH({Xi},Mn) := sup
x∈Mnmin
i∈{1,...,n}d(Xi, x),
where
d(x, y)
denotes geodesic distance on
Mn
, the length
of the shortest curve between xand yon Mn.
Lemma 5.
If
Mn
is homeomorphic to a closed interval,
Fn
is uniform on
Mn
, and there exists
c > 0
such that
kxk ≥ c, for all x∈ Mn, then
dH({Xi},Mn)p
0n= Ω{log(n)}.
Given just
X1, . . . , Xn
, finding a
d
-dimensional represen-
tation of the nodes in which geodesic distances are faith-
fully reproduced is therefore impossible in general, unless
n= Ω{log(n)}
. In practice, we only have access to
estimates
ˆ
X1,..., ˆ
Xn
, which adds further complications to
manifold learning.
In fact, in this semi-sparse regime, there are problems relat-
ing to estimating
Xi
. For example, existing works on spec-
tral embedding in the infinite-dimensional case make heavy
use of the trace-class assumption (Tang et al., 2013; Rubin-
Delanchy, 2020; Lei, 2021),
ρ1
nP
i=1 λi<
, whereas
n=O{log(n)}
and
n= Ω(1)
require this sum to be
unbounded in n.
Local embedding
When
Mn
is a manifold, as it is in
the examples above, we now explore the idea that we can
‘zoom in’ to a particular location in latent space and faith-
fully represent the rich local structure with low-dimensional
embeddings.
Given a ball with centre
x∈ Mn\{0}
and radius
r
, denoted
Br(x)
, we consider two local views of the graph: the core,
AC
, is the subgraph on the nodes with latent position in
Br(x)
; the core-periphery,
AC-P
, is a graph on the full
nodeset with adjacency matrix
AC-P
ij =(Aij if XiBr(x)or XjBr(x),
0otherwise.
In what follows, we will refer to a ‘slice’ of this matrix,
AC-P slice =AC-P
i:XiBr(x),j∈{1,...,n}.
We propose to find a low-rank approximate factorisation of
these ‘local adjacency matrices’,
AC
and
AC-P slice
, and ar-
gue that, under certain conditions, this low-rank assumption
is reasonable. We provide some theoretical reasons to prefer
the core-periphery; however, our experiments with real data
find them comparable.
To describe these graphs as
n
grows, we have to assume
the manifolds are nested, that is,
Mn⊆ Mn+1
, for all
n
.
We impose two other regularity conditions. First,
n/vn
, where
vn=RMn1dx
is the intrinsic volume of the
manifold, which would preclude a constant expected degree
in the examples of the previous section, but allows any faster
rate. Second, there exists
R(0,kxk/6)
and a constant
C > 0such that
P{XiBr(x)} ≥ C
vn
rd,for all rR.
Under these assumptions, the core is dense, with high trian-
gle density.
Theorem 6.
Fix
x∈ Mn\ {0}
and let
rn=
ω{(vn/n)1/d}, rnR < kxk/6
, which could be fixed
or shrinking. The graph core corresponding to the ball
Brn(x)
has
Ω(nrd/vn)
nodes in expectation, expected de-
gree Ω(nrd/vn)and triangle density Ω(nrd/vn)2.
If the manifold is smooth, a local neighbourhood
Mn
Br(x)
can be approximated up to arbitrary precision by a
d
-
dimensional plane, the approximation improving as
r0
.
However, we can go too far, and collect very few points
XiBr(x)
. Theorem 6 tells us how
r
should shrink with
nto avoid this.
Under a flat approximation, the core and core-periphery
slice have low rank. To be precise, the matrices
PC:=
Pi,j∈C
and
PC-P slice := Pi∈C,j∈{1,...,n}
, where
P=
hXi, Xjii,j∈{1,...,n}
and
C={i:XiBr(x)}
, have
rank at most
d+ 1
. As a result, we can hope to recover local,
finite-dimensional estimates of
XiBr(x)
using approxi-
mate matrix factorisation of
AC
or
AC-P slice
. Moreover, the
approximate ranks of those matrices provide a local estimate
of the manifold dimension.
We find however that, given ‘noise-free’ factorisations
PC=XC(XC)>
and
PC-P slice =XC-P(YC-P)>
, where
XC,XC-P R|C(d+1),YC-P Rn×(d+1)
, the original
XiBr(x)
are linear images of the rows of
XC-P
, but are
not necessarily recoverable from XC.
We have not discussed how a neighbourhood might be found
in practice. There are many algorithms suited to this pur-
pose (Haveliwala, 2003; Spielman and Teng, 2004; Ander-
sen et al., 2006), and even a simple similarity measure such
as “number of shared neighbours” could be expected to be
consistent (Chen and Xu, 2016). The supplementary mate-
rial contains an experiment demonstrating that this measure
forms a reasonable surrogate for the theoretical neighbour-
hood. There are two other ways a proxy for neighbourhood
could be available. First, the graph may have associated
covariates, such as time or location, which allow us to zoom
into particular subgraphs (see classification experiment).
摘要:

ImplicationsofsparsityandhightriangledensityforgraphrepresentationlearningHannahSansfordAlexanderModellNickWhiteleyPatrickRubin-DelanchyUniversityofBristolUniversityofBristolUniversityofBristolUniversityofBristolAbstractRecentworkhasshownthatsparsegraphscon-tainingmanytrianglescannotbereproducedusin...

展开>> 收起<<
Implications of sparsity and high triangle density for graph representation learning Hannah Sansford Alexander Modell Nick Whiteley Patrick Rubin-Delanchy.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:6.45MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注