Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum Nian Liu1 Xiao Wang12 Deyu Bo1 Chuan Shi12 Jian Pei3

2025-04-29 0 0 6.14MB 25 页 10玖币
侵权投诉
Revisiting Graph Contrastive Learning from the
Perspective of Graph Spectrum
Nian Liu1, Xiao Wang1,2, Deyu Bo1, Chuan Shi1,2
, Jian Pei3
1Beijing University of Posts and Telecommunications
2Peng Cheng Laboratory
3Simon Fraser University
{nianliu, xiaowang, bodeyu, shichuan}@bupt.edu.cn, jpei@cs.sfu.ca
Abstract
Graph Contrastive Learning (GCL), learning the node representations by aug-
menting graphs, has attracted considerable attentions. Despite the proliferation of
various graph augmentation strategies, some fundamental questions still remain
unclear: what information is essentially encoded into the learned representations
by GCL? Are there some general graph augmentation rules behind different aug-
mentations? If so, what are they and what insights can they bring? In this paper,
we answer these questions by establishing the connection between GCL and graph
spectrum. By an experimental investigation in spectral domain, we firstly find
the General grAph augMEntation (GAME) rule for GCL, i.e., the difference of
the high-frequency parts between two augmented graphs should be larger than
that of low-frequency parts. This rule reveals the fundamental principle to revisit
the current graph augmentations and design new effective graph augmentations.
Then we theoretically prove that GCL is able to learn the invariance information
by contrastive invariance theorem, together with our GAME rule, for the first
time, we uncover that the learned representations by GCL essentially encode the
low-frequency information, which explains why GCL works. Guided by this
rule, we propose a spectral graph contrastive learning module (SpCo
1
), which is
a general and GCL-friendly plug-in. We combine it with different existing GCL
models, and extensive experiments well demonstrate that it can further improve the
performances of a wide variety of different GCL methods.
1 Introduction
Graph Neural Networks (GNNs) learn the node representations in a graph mainly by message
passing. GNNs have attracted significant interest and found many applications [
11
,
23
,
14
]. Training
the high quality GNNs heavily relies on task-specific labels, while it is well known that manually
annotating nodes in graphs is costly and time-consuming [
10
]. Therefore, Graph Contrastive Learning
(GCL) is developed as a typical technique for self-supervised learning without the explicit usage of
labels [28, 9, 36].
The traditional GCL framework (Fig. 1 (a)) mainly includes three components: graph augmentation,
graph representation learning by an encoder, and contrastive loss. In essence, GCL aims to maximize
agreement between augmentations to learn invariant representations [
36
]. Typical GCL methods have
Corresponding authors.
1Code available at https://github.com/liun-online/SpCo
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.02330v1 [cs.LG] 5 Oct 2022
Figure 1: (a) The general framework of GCL. (b) The illustration of our findings in the empirical
study (Section 3). If two contrasted graphs have a larger margin between high-frequency part than
low-frequency part, they will boost the GCL. We call such two graphs as optimal contrastive pair.
sought to elaborately design different graph augmentation strategies. For example, the heuristic based
methods including node or edge dropping [
32
], feature masking [
33
], and diffusion [
9
]; and the learn-
ing based methods including InfoMin [
26
,
30
], disentanglement [
13
], and adversarial training [
31
].
Although various graph augmentation strategies are proposed, the fundamental augmentation mech-
anism is not well understood. What information should we preserve or discard in an augmented
graph? Are there some general rules across different graph augmentation strategies? How to use
those general rules to validate and improve the current GCL methods? This paper explores those
questions.
Essentially, an augmented graph is obtained by changing some components in the original graph and
thus strength of frequencies [
20
] in graph spectrum. This natural and intuitive connection between
graph augmentation and graph spectrum inspires us to explore the effectiveness of augmentations
from the spectral domain. We start with an empirical study (Section 3) to understand the importance
of low-frequency and high-frequency information in GCL. Our findings indicate that both the lowest-
frequency information and the high-frequency information are important in GCL. Retaining more
high-frequency information is particularly helpful to improve the performance of GCL. However, as
shown in Fig. 1 (b), the way of handling high-frequency information in two contrasted graphs
V1
and
V2
should be different, which can be finally summarized as a general graph augmentation (GAME)
rule: the difference of amplitudes of high frequencies in two contrasted graphs should be larger than
that of low frequencies.
To explain the GAME rule, we need to understand what information is encoded into the learned
representations by GCL. We propose the contrastive invariance (Theorem 1), which, for the first
time, theoretically proves that GCL can learn the invariance information from two contrasted graphs.
Meanwhile, as can be seen in Fig. 1 (b), because the difference of amplitudes of lowest-frequency in-
formation is much smaller than that of high-frequency information, the lowest-frequency information
will be the approximately invariant pattern between the two graphs
V1
and
V2
. Therefore, with such
two augmentations
V1
and
V2
, we can conclude that the information learned by GCL is mainly the
low-frequency information, whose usefulness has been well demonstrated [
6
]. This not only explains
why GCL works, but also provides a clear and concise demonstration of which augmentation strategy
is better, as verified by the experiments in Section 4.
Based on our findings and theoretical analysis, we define two augmentations satisfying the GAME
rule are called an optimal contrastive pair. Then, we propose a novel
sp
ectral graph
co
ntrastive
learning (SpCo), a general GCL framework, which can boost existing GCL methods with optimal
contrastive pairs. Specifically, to ensure that the learned augmented graph is an optimal contrastive
pair with the original adjacency matrix, we need to make the amplitude of its high frequency ascend
while keeping the low frequency the same as the original structure. We model this process as an
optimization objective based on matrix perturbation theory, which can be solved by Sinkhorn’s
Iteration [24] and finally obtain the augmented structure used for the following target GCL model.
Our contributions are summarized as follows. Firstly, we answer the question “what information is
learned by GCL and whether there exists a general augmentation rule”. To the best of our knowledge,
this is the first attempt to fundamentally explore the augmentation strategies for GCL from spectral
domain. We not only reveal the general graph augmentation rule behind different augmentation
2
strategies, but also explain why GCL works by proposing the contrastive invariance theorem. Our
work provides deeper understanding on the nature of GCL. Secondly, we answer the question “how to
utilize the augmentation rule for GCL”. We show that the augmentation rule provides a novel insight
to estimate the current augmentation strategies. We propose a novel concept optimal contrastive pair
and theoretically derive a general framework SpCo, which is able to improve the performance of
existing GCL methods. Last, we choose three typical GCL methods as target methods, and plug SpCo
into them. We validate the effectiveness of SpCo on five datasets. We consistently gain improvements
compared with those target methods.
2 Preliminaries
Let
G= (V, ξ)
represent an undirected attributed graph, where
V
is the set of
N
nodes and
ξ⊆ V×V
is the set of edges. All edges formulate an adjacency matrix
A∈ {0,1}N×N
, where
Aij ∈ {0,1}
denotes the relation between nodes
i
and
j
in
V
. The node degree matrix
D=diag(d1, . . . .dn)
,
where
di=Pj∈V Aij
is the degree of node
i∈ V
. Graph
G
is often associated with a node feature
matrix
X= [x1, x2, . . . , xN]RN×d
, where
xi
is a
d
dimensional feature vector of node
i∈ V
. Let
L=DA
be the unnormalized graph Laplacian of
G
. If we set symmetric normalized adjacency
matrix as
ˆ
A=D1
2AD1
2
, then
ˆ
L=Inˆ
A=D1
2(DA)D1
2
is the symmetric normalized
graph Laplacian.
Since
ˆ
L
is symmetric normalized, its eigen-decomposition is
UΛU>
, where
Λ=diag(λ1, . . . , λN)
and
U= [u>
1,...,u>
N]RN×N
are the eigenvalues and eigenvectors of
ˆ
L
, respectively. Without
loss of generality, assume
0λ1≤ ··· ≤ λN<2
(where we approximate
λN2
[
11
]).
Denote by
FL={λ1, . . . , λbN/2c}
the amplitudes of low-frequency components and by
FH=
{λbN/2c+1, . . . , λN}
the amplitudes of high-frequency components. The
graph spectrum
is defined
as these amplitudes of different frequency components, denoted as
φ(λ)
, indicating which parts of
frequency are enhanced or weakened [
20
]. Additionally, we rewrite
ˆ
L=λ1·u1u>
1+··· +λN·
uNu>
N, where we define term uiu>
iRN×Nas the eigenspace related to λi, denoted as Si.
Graph Contrastive Learning
(GCL) [
28
,
9
,
33
] aims to learn discriminative embeddings without
supervision, whose pipeline is shown in Fig. 1(a). We summarize the representative GCL in Ap-
pendix E. Specifically, two augmentations are randomly extracted from
A
in a predefined way and are
encoded by GCN [
11
] to obtain the node embeddings under these two augmentations. Then, for one
target node, its embedding in one augmentation is learned to be close to the embeddings of its positive
samples in the other augmentation and be far away from those of its negative samples. Models built
in this way are capable of discriminating similar nodes from dissimilar ones. For example, some
graph contrastive methods [
36
,
32
,
35
] use classical InfoNCE loss [
19
] as the optimization objective:
L(hV1
i,hV2
i) = log exp(θ(hV1
i,hV2
i))
exp(θ(hV1
i,hV2
i)) + P
k6=i
exp(θ(hV1
i,hV2
k)),(1)
where
hV1
i
and
hV2
i
are the embeddings of node
i
under augmentations
V1
and
V2
, respectively,
θ
is
the similarity metric, such as cosine similarity, and
τ
is a temperature parameter. The total loss is
LInf oNCE =P
i
1
2L(hV1
i,hV2
i) + L(hV2
i,hV1
i).
3 Impact of Graph Augmentation: An Experimental Investigation
Figure 2: The case study model.
In this section, we aim to explore what information should
be considered in two contrasted augmentations from the
perspective of graph spectrum. Specifically, we design
a simple GCL framework shown in Fig. 2. Two input
augmentations are adjacency matrix
A
and generated
V
.
Then, we utilize a shared GCN with one layer as the en-
coder to encode
A
and
V
and get their nodes embeddings
as
HA
and
HV
.We train the GCN by utilizing InfoNCE
loss as in Eq.
(1)
. More experimental settings can be found
in appendix B.1.
3
Figure 3: The generation of V.
Generating augmentation V
We construct the aug-
mented graph by extracting information with different
frequencies from the original graph, so that we can ana-
lyze the effect of different information. This process is
shown in Fig. 3. Specifically, we divide the eigenvalues
of
L
into
FL
and
FH
parts, and conduct augmentations
in these two parts, respectively. Taking the augmentation
in
FL
for example, we keep the high-frequency part as
uN/2u>
N/2+··· +uNu>
N
. Then, we gradually add the
eigenspaces in
FL
back with rates
[20%,40%,60%,80%]
, starting from the lowest frequency. There-
fore,
V
augmenting 20% in
FL
is
u1u>
1+···+u0.2N/2u>
0.2N/2+uN/2u>
N/2+···+uNu>
N
.
Similarly,
V
augmenting 20% in
FH
is
u1u>
1+··· +uN/2u>
N/2+u(N+1)/2u>
(N+1)/2+··· +
u0.7Nu>
0.7N
. Please note that we set graph spectrum of
V
,
φV(λ) = 1,λ[0,2]
above, in that
we just want to test the effect of different uiu>
iand avoid the influence from eigenvalues λ[18].
(a) Cora (b) Citeseer (c) BlogCatalog (d) Flickr
Figure 4: The results of case study on four datasets. The x-axis means different addition rate
of different frequency interval, and y-axis means the performance on ACC. The performance of
augmentations in FLare plotted on the left y-axis, and in FHare plotted on the right y-axis.
Figure 5: The spectrum
of Aand V.
Results and analyses
We conduct the node classification on four datasets:
Cora, Citeseer [
11
], BlogCatalog, and Flickr [
16
]. The accuracy (ACC)
is shown in Fig. 4. In appendix B.2, we also report the results when both
high and low frequency components are added back in the high-to-low
frequency order.
Results.
For each dataset, in generated
V
, (1) when the
lowest part of frequencies are kept, the best performance is achieved; (2)
when more frequencies in
FH
are involved, the performance generally
rises.
Analyses.
From the graph spectra of
A
and
V
shown in Fig. 5,
we can see that in generated
V
, (1) when the lowest part of frequencies
are kept, the difference of amplitude, i.e., the graph spectrum, in
FL
between
A
and
V
becomes smaller; (2) when more frequencies in
FH
are involved, the margin of graph spectrum in
FH
between
A
and
V
becomes larger. Combining results and observations, we propose the following general
G
raph
AugMEntation rule, called GAME rule1:
The General Graph Augmentation Rule
Given two random augmentations
V1
and
V2
, their graph spectra are
φV1(λ)
and
φV2(λ)
.
Then,
λm
[1,2] and
λn
[0,1],
V1
and
V2
are an effective pair of graph augmentations if
the following condition is satisfied:
|φV1(λm)φV2(λm)|>|φV1(λn)φV2(λn)|.
We define such pair of augmentations as optimal contrastive pair.
1
Although this rule is derived from contrasting
A
and
V
, the selection of certain views does not curb the
generality of GAME rule. Considering that most of augmentations are obtained from the raw adjacency matrix
A, it is a natural setting that one view is fixed as Aand the other is an augmented one.
4
4 Analysis of The General Graph Augmentation Rule
In this section, we aim to verify the correctness of GAME rule that whether two contrasted aug-
mentations satisfying GAME rule can perform better in downstream tasks from experimental and
theoretical analysis.
Experimental analysis
We substitute existing augmentations proposed by MVGRL [
9
], GCA [
36
]
and GraphCL [
32
] for augmentation
V
in the case. Specifically, MVGRL proposes PPR matrix,
heat diffusion matrix and pair-wise distance matrix. GCA mainly randomly drops edges based on
Degree, Eigenvector and PageRank. GraphCL adopts random node dropping, edge perturbation and
subgraph sampling. The nine augmentations almost cover the mainstream augmentations in GCL.
To accurately depict the change of the amplitude after these augmentations for some
λi
, we turn to
matrix perturbation theory 1[25]:
λi=λ0
iλi=u>
iAuiλiu>
iDui+O(||A||),(2)
where
λ0
i
is the eigenvalue after change,
A=A0A
represent the modification of edges after
augmentation, and
D
is the respective change in degree matrix. With Eq.
(2)
, we calculate the
eigenvalues on Cora after each augmentation, and plot their graph spectra in Fig. 6. Simultaneously,
we use the GCL framework in Section 3 to separately contrast adjacency matrix
A
and these
augmentations, and results are shown in Table 1. As shown in Fig. 6, PPR matrix, Heat diffusion
matrix and Distance matrix better accord with GAME rule, where they have small difference with
A
in
FL
, and have a large difference in
FH
. Therefore, they outperform other augmentations in Table 1.
Figure 6: The graph spectra of laplacian, adjacency matrix and nine existing augmentations.
Table 1: Performance of different existing augmentations to verify the GAME rule.
Methods GraphCL GCA MVGRL
Type Subgraph Node dropping Edge perturbation Degree PageRank Eigenvector PPR Heat Distance
Results 34.9±3.529.8±2.337.7±4.440.2±4.138.5±5.042.1±4.958.0±1.649.9±4.246.1±7.5
We also test the GAME rule in another circumstance, where we contrast among three cases:
A
and
A2(two-hop of A), Aand A, and A2and A2. The results are given in Appendix C.
Theoretical analysis We have the following theorem to depict the learning process of the GCL.
Theorem 1. (Contrastive Invariance)
Given adjacency matrix
A
and the generated augmentation
V
, the amplitudes of
i
-th frequency of
A
and
V
are
λi
and
γi
, respectively. With the optimization of
InfoNCE loss LInf oN CE , the following upper bound is established:
LInf oNCE 1+N
2P
i
θih2(λiγi)2i,
where θiis an adaptive weight of the ith term.
1
Here, we do not use eigenvalue decomposition to obtain
λ0
of
A0
, because the obtained
λ0
are unordered
compared with previous
λ
of
A
. That is to say, for certain
λi
of
A
, we cannot figure out which eigenvalue of
A0matches to it after decomposition, so we cannot calculate the change λifor λiin this case.
5
摘要:

RevisitingGraphContrastiveLearningfromthePerspectiveofGraphSpectrumNianLiu1,XiaoWang1;2,DeyuBo1,ChuanShi1;2,JianPei31BeijingUniversityofPostsandTelecommunications2PengChengLaboratory3SimonFraserUniversity{nianliu,xiaowang,bodeyu,shichuan}@bupt.edu.cn,jpei@cs.sfu.caAbstractGraphContrastiveLearning(...

展开>> 收起<<
Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum Nian Liu1 Xiao Wang12 Deyu Bo1 Chuan Shi12 Jian Pei3.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:6.14MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注