GLCC A General Framework for Graph-Level Clustering Wei Ju1 Yiyang Gu1 Binqi Chen2 Gongbo Sun3 Yifang Qin2 Xingyuming Liu2 Xiao Luo4 Ming Zhang1y

2025-05-06 0 0 723KB 9 页 10玖币
侵权投诉
GLCC: A General Framework for Graph-Level Clustering
Wei Ju1,*, Yiyang Gu1,, Binqi Chen2, Gongbo Sun3, Yifang Qin2,
Xingyuming Liu2, Xiao Luo4,, Ming Zhang1,
1School of Computer Science, Peking University, China
2School of EECS, Peking University, China
3Beijing National Day School, China
4Department of Computer Science, University of California Los Angeles, USA
{juwei, yiyanggu, qinyifang, liuxym, mzhang cs}@pku.edu.cn,
berkinchen@gmail.com, tommy2809@163.com, xiaoluo@cs.ucla.edu
Abstract
This paper studies the problem of graph-level clustering,
which is a novel yet challenging task. This problem is critical
in a variety of real-world applications such as protein cluster-
ing and genome analysis in bioinformatics. Recent years have
witnessed the success of deep clustering coupled with graph
neural networks (GNNs). However, existing methods focus
on clustering among nodes given a single graph, while explor-
ing clustering on multiple graphs is still under-explored. In
this paper, we propose a general graph-level clustering frame-
work named Graph-Level Contrastive Clustering (GLCC)
given multiple graphs. Specifically, GLCC first constructs an
adaptive affinity graph to explore instance- and cluster-level
contrastive learning (CL). Instance-level CL leverages graph
Laplacian based contrastive loss to learn clustering-friendly
representations while cluster-level CL captures discrimina-
tive cluster representations incorporating neighbor informa-
tion of each sample. Moreover, we utilize neighbor-aware
pseudo-labels to reward the optimization of representation
learning. The two steps can be alternatively trained to col-
laborate and benefit each other. Experiments on a range of
well-known datasets demonstrate the superiority of our pro-
posed GLCC over competitive baselines.
Introduction
Clustering is a fundamental problem in graph machine learn-
ing, which has been widely studied for decades. It aims at
partitioning similar samples into the same group and dis-
similar samples into different groups. The clusters of sam-
ples provide a global insight of the whole dataset, which has
various downstream applications, including anomaly detec-
tion (Sheng et al. 2019), domain adaptation (Tang, Chen, and
Jia 2020), community detection (Liu et al. 2020) and repre-
sentation learning (Xu et al. 2021; Luo et al. 2022b).
Over the past decades, traditional methods such as spec-
tral clustering (Ng, Jordan, and Weiss 2001) and subspace
clustering (Vidal 2011) have played a dominant role. How-
ever, the separation of representation learning and cluster-
ing unavoidably leads to sub-optimal solutions. Due to the
*Equal contribution with order determined by flipping a coin.
Corresponding authors.
Copyright © 2023, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
strong representation learning capability of deep learning,
deep clustering approaches (Caron et al. 2018; Mukherjee
et al. 2019; Li et al. 2021; Zhong et al. 2021; de Mello,
Assunc¸˜
ao, and Murai 2022) have recently achieved state-of-
the-art performance. The crucial characteristic of deep clus-
tering is to learn clustering-friendly representations with-
out manual feature extraction via deep neural networks in
an end-to-end fashion. Representative method DeepClus-
ter (Caron et al. 2018) iteratively groups the features with
k-means and uses the cluster assignments as supervision to
update the deep neural networks. With deep clustering, rep-
resentation learning and clustering can be optimized in a
joint way to learn clustering-friendly representations.
With the advancement of graph neural networks (GNNs)
in achieving unprecedented success for graph-related tasks,
one promising direction to leverage GNNs is graph cluster-
ing (Bo et al. 2020; Cheng et al. 2021; Peng et al. 2021;
Zhao et al. 2021; Pan and Kang 2021; Liu et al. 2022). The
basic idea of graph clustering methods is to train GNNs for
learning effective cluster assignments to divide nodes into
different groups without human annotations. Specifically,
SDCN (Bo et al. 2020) firstly integrates the structural in-
formation into deep clustering combined with autoencoder
and GCN. To avoid representation collapse caused by over-
smoothing in GCN, DCRN (Liu et al. 2022) proposes a self-
supervised deep graph clustering method by reducing infor-
mation correlation in a dual manner.
Although existing graph clustering approaches have
achieved encouraging performance, they all focus on study-
ing clustering among nodes given a single graph. In other
words, they are tailored to node-level clustering. Neverthe-
less, to the best of our knowledge, clustering on multiple
graphs (also called graph-level clustering) remains largely
unexplored, and has a variety of real-world applications. For
example, protein clustering is a significant topic in bioin-
formatics, which is used to construct meaningful and stable
groups of similar proteins to be used for analysis and func-
tional annotation (Zaslavsky et al. 2016). Moreover, graph-
level clustering is a crucial yet challenging task, unlike node-
level clustering where we can derive extra supervision for
each node from their neighbors via propagation, graphs are
individual instances isolated from each other and thus we
arXiv:2210.11879v4 [cs.LG] 8 Mar 2023
cannot directly aggregate information from other graphs. As
such, we are looking for an approach tailored to graph-level
clustering that can well learn clustering-friendly represen-
tations, and meanwhile capture discriminative clustering in-
formation from neighboring graphs.
Towards this end, this work proposes a general frame-
work called Graph-Level Contrastive Clustering (GLCC)
on multiple graphs. The key idea of GLCC is to exploit
the multi-granularity information to provide a global char-
acterization of graph instances. To achieve this goal effec-
tively, GLCC first constructs an adaptive affinity graph to in-
corporate neighbor knowledge, we then introduce two-level
contrastive learning (CL) based on the affinity graph, i.e.,
an instance-level CL and a cluster-level CL, respectively.
On the one hand, instance-level CL leverages graph Lapla-
cian based contrastive loss to learn clustering-friendly rep-
resentations for effective cluster assignments. On the other
hand, cluster-level CL incorporates neighbor information
of each sample to capture compact cluster representations
to achieve cluster-level consistency. Furthermore, neighbor-
aware pseudo-labels are generated to feed back the training
of representation learning, so that the clustering and repre-
sentation learning can be alternatively optimized to coopera-
tively supervise and mutually enhance each other. By incor-
porating this multi-granularity information, our experiments
show that it can largely improve the existing state-of-the-art
approaches on multiple real-life datasets. To summarize, the
main contributions of this work are as follows:
General Aspects: To the best of our knowledge, this
could be the first work to investigate deep graph-level
clustering, which explores graph-level clustering on mul-
tiple graphs, different from existing works in studying
clustering among nodes given a single graph.
Novel Methodologies: We propose a general framework
to explore instance- and cluster-level contrastive learning
based on the affinity graph. Instance-level CL aims to
learn clustering-friendly representations, while cluster-
level CL captures discriminative cluster representations.
Multifaceted Experiments: We conduct comprehensive
experiments on various well-known datasets to demon-
strate the effectiveness of the proposed approach.
Related Work
Graph Neural Networks. GNNs are originally introduced
by (Gori, Monfardini, and Scarselli 2005) and are a typi-
cal class of deep neural networks that combine the topolog-
ical structure and associated features of a graph, thus pos-
sessing the powerful capability to process graph-structured
data. The basic idea is to learn the low-dimensional graph
representations through a recursive neighborhood aggrega-
tion scheme (Gilmer et al. 2017; Ju et al. 2022b; Luo et al.
2022a). The derived graph representations can be used to
serve various downstream tasks, such as node classifica-
tion (Kipf and Welling 2017), graph classification (Ju et al.
2022a), and graph clustering (Bo et al. 2020).
Deep Clustering. Our work is related to deep cluster-
ing, which has achieved impressive performance, benefit-
ing from the breakthroughs in deep learning. There has
been a surge of interest in employing deep neural net-
works to enhance clustering, which can be divided into two
main categories: (i) reconstruction based methods, and (ii)
self-augmentation based methods. For the first category, it
aims to leverage the auto-encoder (Rumelhart, Hinton, and
Williams 1985) paradigm to impose desired constraints on
feature learning in the latent space. For example, Deep Em-
bedded Clustering (DEC) (Xie, Girshick, and Farhadi 2016)
simultaneously learns feature representations and cluster as-
signments by minimizing the Kullback-Leibler divergence.
IDEC (Guo et al. 2017) improves the clustering by preserv-
ing the local structure of data generating distribution. For the
second category, the underlying concept is training the net-
works to achieve the consistency between original samples
and their augmented samples. For instance, IIC (Ji, Hen-
riques, and Vedaldi 2019) maximizes the mutual informa-
tion of paired dataset samples to keep a consistent assign-
ment probability. However, these methods are not tailored
to graph-level clustering, and show an inability to process
complex data structures, such as graph domains.
Graph Clustering. Another category of related work is
graph clustering. Benefiting from the strong capability of
GNNs in incorporating both node attributes and graph struc-
tures, GNNs have emerged as a powerful approach for graph
clustering, which aims to reveal the underlying graph struc-
ture and divides the nodes into several disjoint groups.
Similarly, most existing graph clustering approaches (Wang
et al. 2017, 2019; Pan et al. 2019; Fan et al. 2020; Bo
et al. 2020) also follow the framework of auto-encoder, in
which the graph auto-encoder (GAE) and the variational
graph auto-encoder (VGAE) are used to learn the graph-
structured data. For example, DAEGC (Wang et al. 2019)
utilizes an attention network to capture the importance of
the neighboring nodes, and further encodes the topologi-
cal structures and node contents to a compact representa-
tion based on GAE. The adversarially regularized graph au-
toencoder (ARGA) (Pan et al. 2019) enhances the cluster-
ing via introducing an adversarial learning scheme to learn
the graph embedding. Compared with existing methods for
node-level clustering, our work goes further and studies an
under-explored yet important graph-level clustering.
Problem Definition & Preliminary
Definition: Graph-level Clustering. A graph is denoted as
a topological graph G= (V,E,X), where Vis the set of
nodes, Eis the set of edges. We use XR|Vdto de-
note the node feature matrix, where dis the dimension of
features. Let G={G1,··· , GN}denote a set of unlabeled
graphs from Kdifferent categories. The goal of graph-level
clustering is to separate these graphs into Kdifferent clus-
ters such that the graphs with the same semantic labels can
be grouped into the same cluster.
Graph Neural Networks
We build upon graph neural networks (GNNs) to learn ef-
fective graph-level representations. GNNs are powerful ar-
chitectures by iteratively aggregating over local node neigh-
borhoods via message-passing mechanisms (Gilmer et al.
摘要:

GLCC:AGeneralFrameworkforGraph-LevelClusteringWeiJu1;*,YiyangGu1;,BinqiChen2,GongboSun3,YifangQin2,XingyumingLiu2,XiaoLuo4;†,MingZhang1;y1SchoolofComputerScience,PekingUniversity,China2SchoolofEECS,PekingUniversity,China3BeijingNationalDaySchool,China4DepartmentofComputerScience,UniversityofCalifor...

展开>> 收起<<
GLCC A General Framework for Graph-Level Clustering Wei Ju1 Yiyang Gu1 Binqi Chen2 Gongbo Sun3 Yifang Qin2 Xingyuming Liu2 Xiao Luo4 Ming Zhang1y.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:723KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注