GLCC A General Framework for Graph-Level Clustering Wei Ju1 Yiyang Gu1 Binqi Chen2 Gongbo Sun3 Yifang Qin2 Xingyuming Liu2 Xiao Luo4 Ming Zhang1y

2025-05-06 0 0 723KB 9 页 10玖币

侵权投诉

GLCC: A General Framework for Graph-Level Clustering

Wei Ju1,*, Yiyang Gu1,∗, Binqi Chen2, Gongbo Sun3, Yifang Qin2,

Xingyuming Liu2, Xiao Luo4,†, Ming Zhang1,†

1School of Computer Science, Peking University, China

2School of EECS, Peking University, China

3Beijing National Day School, China

4Department of Computer Science, University of California Los Angeles, USA

{juwei, yiyanggu, qinyifang, liuxym, mzhang cs}@pku.edu.cn,

berkinchen@gmail.com, tommy2809@163.com, xiaoluo@cs.ucla.edu

Abstract

This paper studies the problem of graph-level clustering,

which is a novel yet challenging task. This problem is critical

in a variety of real-world applications such as protein cluster-

ing and genome analysis in bioinformatics. Recent years have

witnessed the success of deep clustering coupled with graph

neural networks (GNNs). However, existing methods focus

on clustering among nodes given a single graph, while explor-

ing clustering on multiple graphs is still under-explored. In

this paper, we propose a general graph-level clustering frame-

work named Graph-Level Contrastive Clustering (GLCC)

given multiple graphs. Speciﬁcally, GLCC ﬁrst constructs an

adaptive afﬁnity graph to explore instance- and cluster-level

contrastive learning (CL). Instance-level CL leverages graph

Laplacian based contrastive loss to learn clustering-friendly

representations while cluster-level CL captures discrimina-

tive cluster representations incorporating neighbor informa-

tion of each sample. Moreover, we utilize neighbor-aware

pseudo-labels to reward the optimization of representation

learning. The two steps can be alternatively trained to col-

laborate and beneﬁt each other. Experiments on a range of

well-known datasets demonstrate the superiority of our pro-

posed GLCC over competitive baselines.

Introduction

Clustering is a fundamental problem in graph machine learn-

ing, which has been widely studied for decades. It aims at

partitioning similar samples into the same group and dis-

similar samples into different groups. The clusters of sam-

ples provide a global insight of the whole dataset, which has

various downstream applications, including anomaly detec-

tion (Sheng et al. 2019), domain adaptation (Tang, Chen, and

Jia 2020), community detection (Liu et al. 2020) and repre-

sentation learning (Xu et al. 2021; Luo et al. 2022b).

Over the past decades, traditional methods such as spec-

tral clustering (Ng, Jordan, and Weiss 2001) and subspace

clustering (Vidal 2011) have played a dominant role. How-

ever, the separation of representation learning and cluster-

ing unavoidably leads to sub-optimal solutions. Due to the

*Equal contribution with order determined by ﬂipping a coin.

†Corresponding authors.

strong representation learning capability of deep learning,

deep clustering approaches (Caron et al. 2018; Mukherjee

et al. 2019; Li et al. 2021; Zhong et al. 2021; de Mello,

Assunc¸˜

ao, and Murai 2022) have recently achieved state-of-

the-art performance. The crucial characteristic of deep clus-

tering is to learn clustering-friendly representations with-

out manual feature extraction via deep neural networks in

an end-to-end fashion. Representative method DeepClus-

ter (Caron et al. 2018) iteratively groups the features with

k-means and uses the cluster assignments as supervision to

update the deep neural networks. With deep clustering, rep-

resentation learning and clustering can be optimized in a

joint way to learn clustering-friendly representations.

With the advancement of graph neural networks (GNNs)

in achieving unprecedented success for graph-related tasks,

one promising direction to leverage GNNs is graph cluster-

ing (Bo et al. 2020; Cheng et al. 2021; Peng et al. 2021;

Zhao et al. 2021; Pan and Kang 2021; Liu et al. 2022). The

basic idea of graph clustering methods is to train GNNs for

learning effective cluster assignments to divide nodes into

different groups without human annotations. Speciﬁcally,

SDCN (Bo et al. 2020) ﬁrstly integrates the structural in-

formation into deep clustering combined with autoencoder

and GCN. To avoid representation collapse caused by over-

smoothing in GCN, DCRN (Liu et al. 2022) proposes a self-

supervised deep graph clustering method by reducing infor-

mation correlation in a dual manner.

Although existing graph clustering approaches have

achieved encouraging performance, they all focus on study-

ing clustering among nodes given a single graph. In other

words, they are tailored to node-level clustering. Neverthe-

less, to the best of our knowledge, clustering on multiple

graphs (also called graph-level clustering) remains largely

unexplored, and has a variety of real-world applications. For

example, protein clustering is a signiﬁcant topic in bioin-

formatics, which is used to construct meaningful and stable

groups of similar proteins to be used for analysis and func-

tional annotation (Zaslavsky et al. 2016). Moreover, graph-

level clustering is a crucial yet challenging task, unlike node-

level clustering where we can derive extra supervision for

each node from their neighbors via propagation, graphs are

individual instances isolated from each other and thus we

arXiv:2210.11879v4 [cs.LG] 8 Mar 2023

cannot directly aggregate information from other graphs. As

such, we are looking for an approach tailored to graph-level

clustering that can well learn clustering-friendly represen-

tations, and meanwhile capture discriminative clustering in-

formation from neighboring graphs.

Towards this end, this work proposes a general frame-

work called Graph-Level Contrastive Clustering (GLCC)

on multiple graphs. The key idea of GLCC is to exploit

the multi-granularity information to provide a global char-

acterization of graph instances. To achieve this goal effec-

tively, GLCC ﬁrst constructs an adaptive afﬁnity graph to in-

corporate neighbor knowledge, we then introduce two-level

contrastive learning (CL) based on the afﬁnity graph, i.e.,

an instance-level CL and a cluster-level CL, respectively.

On the one hand, instance-level CL leverages graph Lapla-

cian based contrastive loss to learn clustering-friendly rep-

resentations for effective cluster assignments. On the other

hand, cluster-level CL incorporates neighbor information

of each sample to capture compact cluster representations

to achieve cluster-level consistency. Furthermore, neighbor-

aware pseudo-labels are generated to feed back the training

of representation learning, so that the clustering and repre-

sentation learning can be alternatively optimized to coopera-

tively supervise and mutually enhance each other. By incor-

porating this multi-granularity information, our experiments

show that it can largely improve the existing state-of-the-art

approaches on multiple real-life datasets. To summarize, the

main contributions of this work are as follows:

•General Aspects: To the best of our knowledge, this

could be the ﬁrst work to investigate deep graph-level

clustering, which explores graph-level clustering on mul-

tiple graphs, different from existing works in studying

clustering among nodes given a single graph.

•Novel Methodologies: We propose a general framework

to explore instance- and cluster-level contrastive learning

based on the afﬁnity graph. Instance-level CL aims to

learn clustering-friendly representations, while cluster-

level CL captures discriminative cluster representations.

•Multifaceted Experiments: We conduct comprehensive

experiments on various well-known datasets to demon-

strate the effectiveness of the proposed approach.

Related Work

Graph Neural Networks. GNNs are originally introduced

by (Gori, Monfardini, and Scarselli 2005) and are a typi-

cal class of deep neural networks that combine the topolog-

ical structure and associated features of a graph, thus pos-

sessing the powerful capability to process graph-structured

data. The basic idea is to learn the low-dimensional graph

representations through a recursive neighborhood aggrega-

tion scheme (Gilmer et al. 2017; Ju et al. 2022b; Luo et al.

2022a). The derived graph representations can be used to

serve various downstream tasks, such as node classiﬁca-

tion (Kipf and Welling 2017), graph classiﬁcation (Ju et al.

2022a), and graph clustering (Bo et al. 2020).

Deep Clustering. Our work is related to deep cluster-

ing, which has achieved impressive performance, beneﬁt-

ing from the breakthroughs in deep learning. There has

been a surge of interest in employing deep neural net-

works to enhance clustering, which can be divided into two

main categories: (i) reconstruction based methods, and (ii)

self-augmentation based methods. For the ﬁrst category, it

aims to leverage the auto-encoder (Rumelhart, Hinton, and

Williams 1985) paradigm to impose desired constraints on

feature learning in the latent space. For example, Deep Em-

bedded Clustering (DEC) (Xie, Girshick, and Farhadi 2016)

simultaneously learns feature representations and cluster as-

signments by minimizing the Kullback-Leibler divergence.

IDEC (Guo et al. 2017) improves the clustering by preserv-

ing the local structure of data generating distribution. For the

second category, the underlying concept is training the net-

works to achieve the consistency between original samples

and their augmented samples. For instance, IIC (Ji, Hen-

riques, and Vedaldi 2019) maximizes the mutual informa-

tion of paired dataset samples to keep a consistent assign-

ment probability. However, these methods are not tailored

to graph-level clustering, and show an inability to process

complex data structures, such as graph domains.

Graph Clustering. Another category of related work is

graph clustering. Beneﬁting from the strong capability of

GNNs in incorporating both node attributes and graph struc-

tures, GNNs have emerged as a powerful approach for graph

clustering, which aims to reveal the underlying graph struc-

ture and divides the nodes into several disjoint groups.

Similarly, most existing graph clustering approaches (Wang

et al. 2017, 2019; Pan et al. 2019; Fan et al. 2020; Bo

et al. 2020) also follow the framework of auto-encoder, in

which the graph auto-encoder (GAE) and the variational

graph auto-encoder (VGAE) are used to learn the graph-

structured data. For example, DAEGC (Wang et al. 2019)

utilizes an attention network to capture the importance of

the neighboring nodes, and further encodes the topologi-

cal structures and node contents to a compact representa-

tion based on GAE. The adversarially regularized graph au-

toencoder (ARGA) (Pan et al. 2019) enhances the cluster-

ing via introducing an adversarial learning scheme to learn

the graph embedding. Compared with existing methods for

node-level clustering, our work goes further and studies an

under-explored yet important graph-level clustering.

Problem Deﬁnition & Preliminary

Deﬁnition: Graph-level Clustering. A graph is denoted as

a topological graph G= (V,E,X), where Vis the set of

nodes, Eis the set of edges. We use X∈R|V|×dto de-

note the node feature matrix, where dis the dimension of

features. Let G={G1,··· , GN}denote a set of unlabeled

graphs from Kdifferent categories. The goal of graph-level

clustering is to separate these graphs into Kdifferent clus-

ters such that the graphs with the same semantic labels can

be grouped into the same cluster.

Graph Neural Networks

We build upon graph neural networks (GNNs) to learn ef-

fective graph-level representations. GNNs are powerful ar-

chitectures by iteratively aggregating over local node neigh-

borhoods via message-passing mechanisms (Gilmer et al.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GLCC:AGeneralFrameworkforGraph-LevelClusteringWeiJu1;*,YiyangGu1;,BinqiChen2,GongboSun3,YifangQin2,XingyumingLiu2,XiaoLuo4;,MingZhang1;y1SchoolofComputerScience,PekingUniversity,China2SchoolofEECS,PekingUniversity,China3BeijingNationalDaySchool,China4DepartmentofComputerScience,UniversityofCalifor...

展开>> 收起<<

GLCC A General Framework for Graph-Level Clustering Wei Ju1 Yiyang Gu1 Binqi Chen2 Gongbo Sun3 Yifang Qin2 Xingyuming Liu2 Xiao Luo4 Ming Zhang1y.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GLCC A General Framework for Graph-Level Clustering Wei Ju1 Yiyang Gu1 Binqi Chen2 Gongbo Sun3 Yifang Qin2 Xingyuming Liu2 Xiao Luo4 Ming Zhang1y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: