
strategies, but also explain why GCL works by proposing the contrastive invariance theorem. Our
work provides deeper understanding on the nature of GCL. Secondly, we answer the question “how to
utilize the augmentation rule for GCL”. We show that the augmentation rule provides a novel insight
to estimate the current augmentation strategies. We propose a novel concept optimal contrastive pair
and theoretically derive a general framework SpCo, which is able to improve the performance of
existing GCL methods. Last, we choose three typical GCL methods as target methods, and plug SpCo
into them. We validate the effectiveness of SpCo on five datasets. We consistently gain improvements
compared with those target methods.
2 Preliminaries
Let
G= (V, ξ)
represent an undirected attributed graph, where
V
is the set of
N
nodes and
ξ⊆ V×V
is the set of edges. All edges formulate an adjacency matrix
A∈ {0,1}N×N
, where
Aij ∈ {0,1}
denotes the relation between nodes
i
and
j
in
V
. The node degree matrix
D=diag(d1, . . . .dn)
,
where
di=Pj∈V Aij
is the degree of node
i∈ V
. Graph
G
is often associated with a node feature
matrix
X= [x1, x2, . . . , xN]∈RN×d
, where
xi
is a
d
dimensional feature vector of node
i∈ V
. Let
L=D−A
be the unnormalized graph Laplacian of
G
. If we set symmetric normalized adjacency
matrix as
ˆ
A=D−1
2AD−1
2
, then
ˆ
L=In−ˆ
A=D−1
2(D−A)D−1
2
is the symmetric normalized
graph Laplacian.
Since
ˆ
L
is symmetric normalized, its eigen-decomposition is
UΛU>
, where
Λ=diag(λ1, . . . , λN)
and
U= [u>
1,...,u>
N]∈RN×N
are the eigenvalues and eigenvectors of
ˆ
L
, respectively. Without
loss of generality, assume
0≤λ1≤ ··· ≤ λN<2
(where we approximate
λN≈2
[
11
]).
Denote by
FL={λ1, . . . , λbN/2c}
the amplitudes of low-frequency components and by
FH=
{λbN/2c+1, . . . , λN}
the amplitudes of high-frequency components. The
graph spectrum
is defined
as these amplitudes of different frequency components, denoted as
φ(λ)
, indicating which parts of
frequency are enhanced or weakened [
20
]. Additionally, we rewrite
ˆ
L=λ1·u1u>
1+··· +λN·
uNu>
N, where we define term uiu>
i∈RN×Nas the eigenspace related to λi, denoted as Si.
Graph Contrastive Learning
(GCL) [
28
,
9
,
33
] aims to learn discriminative embeddings without
supervision, whose pipeline is shown in Fig. 1(a). We summarize the representative GCL in Ap-
pendix E. Specifically, two augmentations are randomly extracted from
A
in a predefined way and are
encoded by GCN [
11
] to obtain the node embeddings under these two augmentations. Then, for one
target node, its embedding in one augmentation is learned to be close to the embeddings of its positive
samples in the other augmentation and be far away from those of its negative samples. Models built
in this way are capable of discriminating similar nodes from dissimilar ones. For example, some
graph contrastive methods [
36
,
32
,
35
] use classical InfoNCE loss [
19
] as the optimization objective:
L(hV1
i,hV2
i) = log exp(θ(hV1
i,hV2
i)/τ)
exp(θ(hV1
i,hV2
i)/τ) + P
k6=i
exp(θ(hV1
i,hV2
k)/τ),(1)
where
hV1
i
and
hV2
i
are the embeddings of node
i
under augmentations
V1
and
V2
, respectively,
θ
is
the similarity metric, such as cosine similarity, and
τ
is a temperature parameter. The total loss is
LInf oNCE =P
i
1
2L(hV1
i,hV2
i) + L(hV2
i,hV1
i).
3 Impact of Graph Augmentation: An Experimental Investigation
Figure 2: The case study model.
In this section, we aim to explore what information should
be considered in two contrasted augmentations from the
perspective of graph spectrum. Specifically, we design
a simple GCL framework shown in Fig. 2. Two input
augmentations are adjacency matrix
A
and generated
V
.
Then, we utilize a shared GCN with one layer as the en-
coder to encode
A
and
V
and get their nodes embeddings
as
HA
and
HV
.We train the GCN by utilizing InfoNCE
loss as in Eq.
(1)
. More experimental settings can be found
in appendix B.1.
3