Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum Nian Liu1 Xiao Wang12 Deyu Bo1 Chuan Shi12 Jian Pei3

2025-04-29 0 0 6.14MB 25 页 10玖币

侵权投诉

Revisiting Graph Contrastive Learning from the

Perspective of Graph Spectrum

Nian Liu1, Xiao Wang1,2∗, Deyu Bo1, Chuan Shi1,2∗

, Jian Pei3

1Beijing University of Posts and Telecommunications

2Peng Cheng Laboratory

3Simon Fraser University

{nianliu, xiaowang, bodeyu, shichuan}@bupt.edu.cn, jpei@cs.sfu.ca

Abstract

Graph Contrastive Learning (GCL), learning the node representations by aug-

menting graphs, has attracted considerable attentions. Despite the proliferation of

various graph augmentation strategies, some fundamental questions still remain

unclear: what information is essentially encoded into the learned representations

by GCL? Are there some general graph augmentation rules behind different aug-

mentations? If so, what are they and what insights can they bring? In this paper,

we answer these questions by establishing the connection between GCL and graph

spectrum. By an experimental investigation in spectral domain, we ﬁrstly ﬁnd

the General grAph augMEntation (GAME) rule for GCL, i.e., the difference of

the high-frequency parts between two augmented graphs should be larger than

that of low-frequency parts. This rule reveals the fundamental principle to revisit

the current graph augmentations and design new effective graph augmentations.

Then we theoretically prove that GCL is able to learn the invariance information

by contrastive invariance theorem, together with our GAME rule, for the ﬁrst

time, we uncover that the learned representations by GCL essentially encode the

low-frequency information, which explains why GCL works. Guided by this

rule, we propose a spectral graph contrastive learning module (SpCo

), which is

a general and GCL-friendly plug-in. We combine it with different existing GCL

models, and extensive experiments well demonstrate that it can further improve the

performances of a wide variety of different GCL methods.

1 Introduction

Graph Neural Networks (GNNs) learn the node representations in a graph mainly by message

passing. GNNs have attracted signiﬁcant interest and found many applications [

]. Training

the high quality GNNs heavily relies on task-speciﬁc labels, while it is well known that manually

annotating nodes in graphs is costly and time-consuming [

]. Therefore, Graph Contrastive Learning

(GCL) is developed as a typical technique for self-supervised learning without the explicit usage of

labels [28, 9, 36].

The traditional GCL framework (Fig. 1 (a)) mainly includes three components: graph augmentation,

graph representation learning by an encoder, and contrastive loss. In essence, GCL aims to maximize

agreement between augmentations to learn invariant representations [

]. Typical GCL methods have

∗Corresponding authors.

1Code available at https://github.com/liun-online/SpCo

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.02330v1 [cs.LG] 5 Oct 2022

Figure 1: (a) The general framework of GCL. (b) The illustration of our ﬁndings in the empirical

study (Section 3). If two contrasted graphs have a larger margin between high-frequency part than

low-frequency part, they will boost the GCL. We call such two graphs as optimal contrastive pair.

sought to elaborately design different graph augmentation strategies. For example, the heuristic based

methods including node or edge dropping [

], feature masking [

], and diffusion [

]; and the learn-

ing based methods including InfoMin [

], disentanglement [

], and adversarial training [

Although various graph augmentation strategies are proposed, the fundamental augmentation mech-

anism is not well understood. What information should we preserve or discard in an augmented

graph? Are there some general rules across different graph augmentation strategies? How to use

those general rules to validate and improve the current GCL methods? This paper explores those

questions.

Essentially, an augmented graph is obtained by changing some components in the original graph and

thus strength of frequencies [

] in graph spectrum. This natural and intuitive connection between

graph augmentation and graph spectrum inspires us to explore the effectiveness of augmentations

from the spectral domain. We start with an empirical study (Section 3) to understand the importance

of low-frequency and high-frequency information in GCL. Our ﬁndings indicate that both the lowest-

frequency information and the high-frequency information are important in GCL. Retaining more

high-frequency information is particularly helpful to improve the performance of GCL. However, as

shown in Fig. 1 (b), the way of handling high-frequency information in two contrasted graphs

and

should be different, which can be ﬁnally summarized as a general graph augmentation (GAME)

rule: the difference of amplitudes of high frequencies in two contrasted graphs should be larger than

that of low frequencies.

To explain the GAME rule, we need to understand what information is encoded into the learned

representations by GCL. We propose the contrastive invariance (Theorem 1), which, for the ﬁrst

time, theoretically proves that GCL can learn the invariance information from two contrasted graphs.

Meanwhile, as can be seen in Fig. 1 (b), because the difference of amplitudes of lowest-frequency in-

formation is much smaller than that of high-frequency information, the lowest-frequency information

will be the approximately invariant pattern between the two graphs

and

. Therefore, with such

two augmentations

and

, we can conclude that the information learned by GCL is mainly the

low-frequency information, whose usefulness has been well demonstrated [

]. This not only explains

why GCL works, but also provides a clear and concise demonstration of which augmentation strategy

is better, as veriﬁed by the experiments in Section 4.

Based on our ﬁndings and theoretical analysis, we deﬁne two augmentations satisfying the GAME

rule are called an optimal contrastive pair. Then, we propose a novel

ectral graph

ntrastive

learning (SpCo), a general GCL framework, which can boost existing GCL methods with optimal

contrastive pairs. Speciﬁcally, to ensure that the learned augmented graph is an optimal contrastive

pair with the original adjacency matrix, we need to make the amplitude of its high frequency ascend

while keeping the low frequency the same as the original structure. We model this process as an

optimization objective based on matrix perturbation theory, which can be solved by Sinkhorn’s

Iteration [24] and ﬁnally obtain the augmented structure used for the following target GCL model.

Our contributions are summarized as follows. Firstly, we answer the question “what information is

learned by GCL and whether there exists a general augmentation rule”. To the best of our knowledge,

this is the ﬁrst attempt to fundamentally explore the augmentation strategies for GCL from spectral

domain. We not only reveal the general graph augmentation rule behind different augmentation

strategies, but also explain why GCL works by proposing the contrastive invariance theorem. Our

work provides deeper understanding on the nature of GCL. Secondly, we answer the question “how to

utilize the augmentation rule for GCL”. We show that the augmentation rule provides a novel insight

to estimate the current augmentation strategies. We propose a novel concept optimal contrastive pair

and theoretically derive a general framework SpCo, which is able to improve the performance of

existing GCL methods. Last, we choose three typical GCL methods as target methods, and plug SpCo

into them. We validate the effectiveness of SpCo on ﬁve datasets. We consistently gain improvements

compared with those target methods.

2 Preliminaries

Let

G= (V, ξ)

represent an undirected attributed graph, where

is the set of

nodes and

ξ⊆ V×V

is the set of edges. All edges formulate an adjacency matrix

A∈ {0,1}N×N

, where

Aij ∈ {0,1}

denotes the relation between nodes

and

. The node degree matrix

D=diag(d1, . . . .dn)

where

di=Pj∈V Aij

is the degree of node

i∈ V

. Graph

is often associated with a node feature

matrix

X= [x1, x2, . . . , xN]∈RN×d

, where

is a

dimensional feature vector of node

i∈ V

. Let

L=D−A

be the unnormalized graph Laplacian of

. If we set symmetric normalized adjacency

matrix as

A=D−1

2AD−1

, then

L=In−ˆ

A=D−1

2(D−A)D−1

is the symmetric normalized

graph Laplacian.

Since

is symmetric normalized, its eigen-decomposition is

UΛU>

, where

Λ=diag(λ1, . . . , λN)

and

U= [u>

1,...,u>

N]∈RN×N

are the eigenvalues and eigenvectors of

, respectively. Without

loss of generality, assume

0≤λ1≤ ··· ≤ λN<2

(where we approximate

λN≈2

[

]).

Denote by

FL={λ1, . . . , λbN/2c}

the amplitudes of low-frequency components and by

FH=

{λbN/2c+1, . . . , λN}

the amplitudes of high-frequency components. The

graph spectrum

is deﬁned

as these amplitudes of different frequency components, denoted as

φ(λ)

, indicating which parts of

frequency are enhanced or weakened [

]. Additionally, we rewrite

L=λ1·u1u>

1+··· +λN·

uNu>

N, where we deﬁne term uiu>

i∈RN×Nas the eigenspace related to λi, denoted as Si.

Graph Contrastive Learning

(GCL) [

] aims to learn discriminative embeddings without

supervision, whose pipeline is shown in Fig. 1(a). We summarize the representative GCL in Ap-

pendix E. Speciﬁcally, two augmentations are randomly extracted from

in a predeﬁned way and are

encoded by GCN [

] to obtain the node embeddings under these two augmentations. Then, for one

target node, its embedding in one augmentation is learned to be close to the embeddings of its positive

samples in the other augmentation and be far away from those of its negative samples. Models built

in this way are capable of discriminating similar nodes from dissimilar ones. For example, some

graph contrastive methods [

] use classical InfoNCE loss [

] as the optimization objective:

L(hV1

i,hV2

i) = log exp(θ(hV1

i,hV2

i)/τ)

exp(θ(hV1

i,hV2

i)/τ) + P

k6=i

exp(θ(hV1

i,hV2

k)/τ),(1)

where

hV1

and

hV2

are the embeddings of node

under augmentations

and

, respectively,

the similarity metric, such as cosine similarity, and

is a temperature parameter. The total loss is

LInf oNCE =P

2L(hV1

i,hV2

i) + L(hV2

i,hV1

i).

3 Impact of Graph Augmentation: An Experimental Investigation

Figure 2: The case study model.

In this section, we aim to explore what information should

be considered in two contrasted augmentations from the

perspective of graph spectrum. Speciﬁcally, we design

a simple GCL framework shown in Fig. 2. Two input

augmentations are adjacency matrix

and generated

Then, we utilize a shared GCN with one layer as the en-

coder to encode

and

and get their nodes embeddings

and

.We train the GCN by utilizing InfoNCE

loss as in Eq.

(1)

. More experimental settings can be found

in appendix B.1.

Figure 3: The generation of V.

Generating augmentation V

We construct the aug-

mented graph by extracting information with different

frequencies from the original graph, so that we can ana-

lyze the effect of different information. This process is

shown in Fig. 3. Speciﬁcally, we divide the eigenvalues

into

and

parts, and conduct augmentations

in these two parts, respectively. Taking the augmentation

for example, we keep the high-frequency part as

uN/2u>

N/2+··· +uNu>

. Then, we gradually add the

eigenspaces in

back with rates

[20%,40%,60%,80%]

, starting from the lowest frequency. There-

fore,

augmenting 20% in

u1u>

1+···+u0.2∗N/2u>

0.2∗N/2+uN/2u>

N/2+···+uNu>

Similarly,

augmenting 20% in

u1u>

1+··· +uN/2u>

N/2+u(N+1)/2u>

(N+1)/2+··· +

u0.7Nu>

0.7N

. Please note that we set graph spectrum of

φV(λ) = 1,∀λ∈[0,2]

above, in that

we just want to test the effect of different uiu>

iand avoid the inﬂuence from eigenvalues λ[18].

(a) Cora (b) Citeseer (c) BlogCatalog (d) Flickr

Figure 4: The results of case study on four datasets. The x-axis means different addition rate

of different frequency interval, and y-axis means the performance on ACC. The performance of

augmentations in FLare plotted on the left y-axis, and in FHare plotted on the right y-axis.

Figure 5: The spectrum

of Aand V.

Results and analyses

We conduct the node classiﬁcation on four datasets:

Cora, Citeseer [

], BlogCatalog, and Flickr [

]. The accuracy (ACC)

is shown in Fig. 4. In appendix B.2, we also report the results when both

high and low frequency components are added back in the high-to-low

frequency order.

Results.

For each dataset, in generated

, (1) when the

lowest part of frequencies are kept, the best performance is achieved; (2)

when more frequencies in

are involved, the performance generally

rises.

Analyses.

From the graph spectra of

and

shown in Fig. 5,

we can see that in generated

, (1) when the lowest part of frequencies

are kept, the difference of amplitude, i.e., the graph spectrum, in

between

and

becomes smaller; (2) when more frequencies in

are involved, the margin of graph spectrum in

between

and

becomes larger. Combining results and observations, we propose the following general

raph

AugMEntation rule, called GAME rule1:

The General Graph Augmentation Rule

Given two random augmentations

and

, their graph spectra are

φV1(λ)

and

φV2(λ)

Then,

∀λm∈

[1,2] and

λn∈

[0,1],

and

are an effective pair of graph augmentations if

the following condition is satisﬁed:

|φV1(λm)−φV2(λm)|>|φV1(λn)−φV2(λn)|.

We deﬁne such pair of augmentations as optimal contrastive pair.

Although this rule is derived from contrasting

and

, the selection of certain views does not curb the

generality of GAME rule. Considering that most of augmentations are obtained from the raw adjacency matrix

A, it is a natural setting that one view is ﬁxed as Aand the other is an augmented one.

4 Analysis of The General Graph Augmentation Rule

In this section, we aim to verify the correctness of GAME rule that whether two contrasted aug-

mentations satisfying GAME rule can perform better in downstream tasks from experimental and

theoretical analysis.

Experimental analysis

We substitute existing augmentations proposed by MVGRL [

], GCA [

]

and GraphCL [

] for augmentation

in the case. Speciﬁcally, MVGRL proposes PPR matrix,

heat diffusion matrix and pair-wise distance matrix. GCA mainly randomly drops edges based on

Degree, Eigenvector and PageRank. GraphCL adopts random node dropping, edge perturbation and

subgraph sampling. The nine augmentations almost cover the mainstream augmentations in GCL.

To accurately depict the change of the amplitude after these augmentations for some

λi

, we turn to

matrix perturbation theory 1[25]:

∆λi=λ0

i−λi=u>

i∆Aui−λiu>

i∆Dui+O(||∆A||),(2)

where

λ0

is the eigenvalue after change,

∆A=A0−A

represent the modiﬁcation of edges after

augmentation, and

∆D

is the respective change in degree matrix. With Eq.

(2)

, we calculate the

eigenvalues on Cora after each augmentation, and plot their graph spectra in Fig. 6. Simultaneously,

we use the GCL framework in Section 3 to separately contrast adjacency matrix

and these

augmentations, and results are shown in Table 1. As shown in Fig. 6, PPR matrix, Heat diffusion

matrix and Distance matrix better accord with GAME rule, where they have small difference with

, and have a large difference in

. Therefore, they outperform other augmentations in Table 1.

Figure 6: The graph spectra of laplacian, adjacency matrix and nine existing augmentations.

Table 1: Performance of different existing augmentations to verify the GAME rule.

Methods GraphCL GCA MVGRL

Type Subgraph Node dropping Edge perturbation Degree PageRank Eigenvector PPR Heat Distance

Results 34.9±3.529.8±2.337.7±4.440.2±4.138.5±5.042.1±4.958.0±1.649.9±4.246.1±7.5

We also test the GAME rule in another circumstance, where we contrast among three cases:

and

A2(two-hop of A), Aand A, and A2and A2. The results are given in Appendix C.

Theoretical analysis We have the following theorem to depict the learning process of the GCL.

Theorem 1. (Contrastive Invariance)

Given adjacency matrix

and the generated augmentation

, the amplitudes of

-th frequency of

and

are

λi

and

γi

, respectively. With the optimization of

InfoNCE loss LInf oN CE , the following upper bound is established:

LInf oNCE ≤1+N

θih2−(λi−γi)2i,

where θiis an adaptive weight of the ith term.

Here, we do not use eigenvalue decomposition to obtain

λ0

, because the obtained

λ0

are unordered

compared with previous

. That is to say, for certain

λi

, we cannot ﬁgure out which eigenvalue of

A0matches to it after decomposition, so we cannot calculate the change ∆λifor λiin this case.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RevisitingGraphContrastiveLearningfromthePerspectiveofGraphSpectrumNianLiu1,XiaoWang1;2,DeyuBo1,ChuanShi1;2,JianPei31BeijingUniversityofPostsandTelecommunications2PengChengLaboratory3SimonFraserUniversity{nianliu,xiaowang,bodeyu,shichuan}@bupt.edu.cn,jpei@cs.sfu.caAbstractGraphContrastiveLearning(...

展开>> 收起<<

Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum Nian Liu1 Xiao Wang12 Deyu Bo1 Chuan Shi12 Jian Pei3.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum Nian Liu1 Xiao Wang12 Deyu Bo1 Chuan Shi12 Jian Pei3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: