1 Deep Clustering A Comprehensive Survey Y azhou Ren Member IEEE Jingyu Pu Zhimeng Y ang Jie Xu Guofeng Li Xiaorong Pu

2025-04-28 0 0 1.04MB 20 页 10玖币

Deep Clustering: A Comprehensive Survey

Yazhou Ren, Member, IEEE, Jingyu Pu, Zhimeng Yang, Jie Xu, Guofeng Li, Xiaorong Pu,

Philip S. Yu, Fellow, IEEE, Lifang He, Member, IEEE

Abstract—Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is

crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural

networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the

single-view ﬁelds and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this

paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial

conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture.

Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering,

semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and

potential future opportunities in different ﬁelds of deep clustering.

Index Terms—Deep clustering; semi-supervised clustering; multi-view clustering; transfer learning

1 INTRODUCTION

WITH the development of online media, abundant data with

high complexity can be gathered easily. Through pinpoint

analysis of these data, we can dig the value out and use these

conclusions in many ﬁelds, such as face recognition [1], [2],

sentiment analysis [3], [4], intelligent manufacturing [5], [6], etc.

A model which can be used to classify the data with different

labels is the base of many applications. For labeled data, it is

taken granted to use the labels as the most important information

as a guide. For unlabeled data, ﬁnding a quantiﬁable objective as

the guide of the model-building process is the key question of

clustering. Over the past decades, a large number of clustering

methods with shallow models have been proposed, including

centroid-based clustering [7], [8], density-based clustering [9],

[10], [11], [12], [13], distribution-based clustering [14], hierar-

chical clustering [15], ensemble clustering [16], [17], multi-view

clustering [18], [19], [20], [21], [22], [23], etc. These shallow

models are effective only when the features are representative,

while their performance on the complex data is usually limited

due to the poor power of feature learning.

In order to map the original complex data to a feature space

that is easy to cluster, many clustering methods focus on feature

extraction or feature transformation, such as PCA [24], kernel

method [25], spectral method [26], deep neural network [27], etc.

Among these methods, the deep neural network is a promising ap-

proach because of its excellent nonlinear mapping capability and

its ﬂexibility in different scenarios. A well-designed deep learning

based clustering approach (referred to deep clustering) aims at

effectively extracting more clustering-friendly features from data

and performing clustering with learned features simultaneously.

Much research has been done in the ﬁeld of deep clustering

and there are also some surveys about deep clustering methods

•Yazhou Ren, Jingyu Pu, Zhimeng Yang, Jie Xu, Guofeng Li and Xiaorong

Pu are with University of Electronic Science and Technology of China,

Chengdu 611731, China. Yazhou Ren is the corresponding author. E-mail:

yazhou.ren@uestc.edu.cn.

•Philip S. Yu is with University of Illinois at Chicago, IL 60607, USA.

•Lifang He is with Lehigh University, PA 18015, USA.

Manuscript received Oct. 2022.

[28], [29], [30], [31]. Speciﬁcally, existing systematic reviews for

deep clustering mainly focus on the single-view clustering tasks

and the architectures of neural networks. For example, Aljalbout

et al. [28] focus only on deep single-view clustering methods

which are based on deep autoencoder (AE or DAE). Min et

al. [29] classify deep clustering methods from the perspective

of different deep networks. Nutakki et al. [30] divide deep

single-view clustering methods into three categories according

to their training strategies: multi-step sequential deep clustering,

joint deep clustering, and closed-loop multi-step deep clustering.

Zhou et al. [31] categorize deep single-view clustering methods

by the interaction way between feature learning and clustering

modules. But in the real world, the datasets for clustering are

always associated, e.g., the taste for reading is correlated with

the taste for a movie, and the side face and full-face from the

same person should be labeled the same. For these data, deep

clustering methods based on semi-supervised learning, multi-view

learning, and transfer learning have also made signiﬁcant progress.

Unfortunately, existing reviews do not discuss them too much.

Therefore, it is important to classify deep clustering from

the perspective of data sources and initial conditions. In this

survey, we summarize the deep clustering from the perspective of

initial settings of data combined with deep learning methodology.

We introduce the newest progress of deep clustering from the

perspective of network and data structure as shown in Fig. 1.

Speciﬁcally, we organize the deep clustering methods into the

following four categories:

•Deep single-view clustering

For conventional clustering tasks, it is often assumed that

the data are of the same form and structure, as known as single-

view or single-modal data. The extraction of representations for

these data by deep neural networks (DNNs) is a signiﬁcant

characteristic of deep clustering. However, what is more note-

worthy is the different applied deep learning techniques, which

are highly correlated with the structure of DNNs. To compare the

technical route of speciﬁc DNNs, we divide those algorithms into

ﬁve categories: deep autoencoder (DAE) based deep clustering,

arXiv:2210.04142v1 [cs.LG] 9 Oct 2022

6LQJOHYLHZ

'HHS

FOXVWHULQJ

6HPL

VXSHUYLVHG

0XOWLYLHZ

7UDQVIHU

OHDUQLQJ

'(&EDVHG

6XEVSDFH

FOXVWHULQJEDVHG

*11EDVHG

'11EDVHG *$1EDVHG

'DWD VWUXFWXUH

9$(EDVHG

'11EDVHG

'$(EDVHG

*$1EDVHG

*11EDVHG

1HWZRUN

Fig. 1: The directory tree of this survey.

deep neural network (DNN) based deep clustering, variational

autoencoder (VAE) based deep clustering, generative adversarial

network (GAN) based deep clustering and graph nerual network

(GNN) based deep clustering.

•Deep clustering based on semi-supervised learning

When the data to be processed contain a small part of

prior constraints, traditional clustering methods cannot effectively

utilize this prior information and semi-supervised clustering is an

effective way to solve this question. In presence, the research

of deep semi-supervised clustering has not been well explored.

However, semi-supervised clustering is inevitable because it is

feasible to let a clustering method become a semi-supervised one

by adding the additional information as a constraint loss to the

model.

•Deep clustering based on multi-view learning

In the real world, data are often obtained from different

feature collectors or have different structures. We call those data

”multi-view data” or ”multi-modal data”, where each sample has

multiple representations. The purpose of deep clustering based on

multi-view learning is to utilize the consistent and complementary

information contained in multi-view data to improve clustering

performance. In addition, the idea of multi-view learning may have

guiding signiﬁcance for deep single-view clustering. In this survey,

we summarize deep multi-view clustering into three categories:

deep embedded clustering based, subspace clustering based, and

graph neural network based.

•Deep clustering based on transfer learning

For a task that has a limited amount of instances and high

dimensions, sometimes we can ﬁnd an assistant to offer additional

information. For example, if task A is similar to another task B and

B has more information for clustering than A (B is labeled or B is

easier to clustering than A), it is useful to transfer the information

from B to A. Transfer learning for unsupervised domain adaption

(UDA) is boosted in recent years, which contains two domains:

Source domain with labels and target domain which is unlabeled.

The goal of transfer learning is to apply the knowledge or patterns

learned from the source task to a different but related target

task. Deep clustering methods based on transfer learning aim to

improve the performance of current clustering tasks by utilizing

information from relevant tasks.

TABLE 1: Notations and their descriptions in this paper.

Notations Descriptions

ia counter variable

ja counter variable

|.|the length of a set

k.kthe 2-norm of a vector

Xthe data for clustering

Xsthe data in source domain (UDA methods)

Ysthe labels of source domain instances (UDA methods)

Xtthe data in target domain (UDA methods)

Dsthe source domain of UDA methods

Dtthe target domain of UDA methods

xithe vector of an oringinal data sample

Xithe i-th view of Xin multi-view learning

Ythe predicted labels of X

Sthe soft data assignments of X

Rthe adjusted assignments of S

Athe pairwise constraint matrix

aij the constraint of sample iand sample j

zithe vector of the embedded representation of xi

εthe noise used in generative model

Ethe expectation

Lnthe network loss

Lcthe clustering loss

Lext the extra task loss

Lrec the reconstruction loss of autoencoder network

Lgan the loss of GAN

LELBO the loss of evidence lower bound

kthe number of clusters

nthe number of data samples

µthe mean of the Gaussian distribution

θthe variance of the Gaussian distribution

KL(.k.)the Kullback-Leibler divergence

p(.)the probability distribution

p(.|.)the conditional probability distribution

p(., .)the joint probability distribution

q(.)the approximate probability distribution of p(.)

q(.|.)the approximate probability distribution of p(.|.)

q(., .)the approximate probability distribution of p(., .)

f(.)the feature extractor

φe(.)the encoder network of AE or VAE

φr(.)the decoder network of AE or VAE

φg(.)the generative network of GAN

φd(.)the discriminative network of GAN

Qthe graph adjacency matrix

Dthe degree matrix of Q

Cthe feature matrix of a graph

Hthe node hidden feature matrix

Wthe learnable model parameters

It is necessary to pay attention to the different characteristics

and conditions of the clustering data before studying the corre-

sponding clustering methods. In this survey, existing deep cluster-

ing methods are systematically classiﬁed from data sources and

initial conditions. The advantages, disadvantages, and applicable

conditions of different clustering methods are analyzed. Finally,

we present some interesting research directions in the ﬁeld of deep

clustering.

2 DEFINITIONS AND PRELIMINARIES

We introduce the notations in this section. Throughout this

paper, we use uppercase letters to denote matrices and lowercase

letters to denote vectors. Unless otherwise stated, the notations

used in this paper are summarized in Table 1.

This survey will introduce four kinds of deep clustering

problems based on different background conditions. Here, we

deﬁne these problems formally. Given a set of data samples X,

we aim at ﬁnding a map function Fwhich can map Xinto k

clusters. The map result is represented with ˆ

Y. So the tasks we

cope with are:

(1) Deep single-view clustering:

F(X)→ˆ

Y . (1)

(2) Semi-supervised deep clustering:

F(X, A)→ˆ

Y , (2)

where Ais a constrained matrix.

(3) Deep multi-view clustering:

FX1, ..., Xn→ˆ

Y , (3)

where Xiis the i-th view of X.

(4) Deep clustering with domain adaptation:

FXs, Y s, Xt→ˆ

Y , (4)

where (Xs, Y s)is the labeled source domain and Xtis the

unlabeled target domain.

3 DEEP SINGLE-VIEW CLUSTERING

The theory of representation learning [32] shows the impor-

tance of feature learning (or representation learning) in machine

learning tasks. However, deep representation learning is mostly

supervised learning that requires many labeled data. As we men-

tioned before, the obstacle of the deep clustering problem is what

can be used to guide the training process like labels in supervised

problem. The most “supervised” information in deep clustering is

the data itself. So how can we train an effective feature extractor to

get good representation? According to the way the feature extrac-

tor is trained, we divide deep single-view clustering algorithms

into ﬁve categories: DAE-based,DNN-based,VAE-based,GAN-

based, and GNN-based. The difference of these methods is mainly

about the loss components, where the loss terms are deﬁned in

Table 1 and explained below:

•DAE-based/GNN-based:L=Lrec +Lc,

•DNN-based:L=Lext +Lc,

•VAE-based:L=LELBO +Lc,

•GAN-based:L=Lgan +Lc.

In unsupervised learning, the issue we cope with is to train

a reliable feature extractor without labels. There are mainly two

ways in existing works: 1) A loss function that optimizes the

pseudo labels according to the principle: narrowing the inner-

cluster distance and widening the inter-cluster distance. 2) An ex-

tra task that can help train the feature extractor. For the clustering

methods with specialized feature extractors, such as autoencoder,

the reconstruction loss Lrec can be interpreted as the extra task.

In this paper, the clustering-oriented loss Lcindicates the loss

of the clustering objective. DAE-based/GNN-based methods use

an autoencoder/graph autoencoder as the feature extractor, so the

loss functions are always composed of a reconstruction loss Lrec

and another clustering-oriented loss Lc. By contrast, DNN-based

methods optimize the feature extractor with extra tasks or other

strategies Lext.VAE-based methods optimize the loss of evidence

lower bound LELBO .GAN-based methods are based on the

generative adversarial loss Lgan. Based on these ﬁve dimensions,

existing deep single-view clustering methods are summarized in

Table 2 and Table 3.

3.1 DAE-based

The autoencoder network [90] is originally designed for

unsupervised representation learning of data and can learn a highly

non-linear mapping function. Using deep autoencoder (DAE) [91]

is a common way to develop deep clustering methods. DAE aims

to learn a low-dimensional embedding feature space by minimiz-

ing the reconstruction loss of the network, which is deﬁned as:

Lrec = min 1

i=1

kxi−φr(φe(xi))k2(5)

where φe(.)and φr(.)represent the encoder network and decoder

network of autoencoder respectively. Using the encoder as a

feature extractor, various clustering objective functions have been

proposed. We summarize these deep autoencoder based cluster-

ing methods as DAE-based deep clustering. In DAE-based deep

clustering methods, there are two main ways to get the labels.

The ﬁrst way embeds the data into low-dimensional features and

then clusters the embedded features with traditional clustering

methods such as the k-means algorithm [92]. The second way

jointly optimizes the feature extractor and the clustering results.

We refer to these two approaches as “separate analysis” and “joint

analysis” respectively, and elaborate on them below.

“Separate analysis” means that learning features and clus-

tering data are performed separately. In order to solve the prob-

lem that representations learned by “separately analysis” are not

cluster-oriented due to its innate characteristics, Huang et al.

propose a deep embedding network for clustering (DEN) [34],

which imposes two constraints based on DAE objective: locality-

preserving constraint and group sparsity constraint. Locality-

preserving constraint urges the embedded features in the same

cluster to be similar. Group sparsity constraint aims to diagonalize

the afﬁnity of representations. These two constraints improve the

clustering performance while reduce the inner-cluster distance and

expand inter-cluster distance. The objective of most clustering

methods based on DAE are working on these two kinds of

distance. So, in Table 2, we summarize these methods from the

perspective of “characteristics”, which shows the way to optimize

the inner-cluster distance and inter-cluster distance.

Peng et al. [35] propose a novel deep learning based frame-

work in the ﬁeld of Subspace clustering, namely, deep subspace

clustering with sparsity prior (PARTY). PARTY enhances autoen-

coder by considering the relationship between different samples

(i.e., structure prior) and solves the limitation of traditional sub-

space clustering methods. As far as we know, PARTY is the ﬁrst

deep learning based subspace clustering method, and it is the ﬁrst

work to introduce the global structure prior to the neural network

for unsupervised learning. Different from PARTY, Ji et al. [38]

propose another deep subspace clustering networks (DSC-Nets)

architecture to learn non-linear mapping and introduce a self-

expressive layer to directly learn the afﬁnity matrix.

Density-based clustering [9], [93] is another kind of popular

clustering methods. Ren et al. [50] propose deep density-based im-

age clustering (DDIC) that uses DAE to learn the low-dimensional

feature representations and then performs density-based clustering

on the learned features. In particular, DDIC does not need to know

the number of clusters in advance.

“Joint analysis” aims at learning a representation that is more

suitable for clustering which is different from separate analysis

approaches that deep learning and clustering are carried out

separately, the neural network does not have a clustering-oriented

TABLE 2: The summaries of DAE-based and DNN-based methods in deep single-view clustering. We summarize the DAE-based methods

based on “Jointly or Separately” and “Characteristics”.

Net Methods Jointly or Sepa-

rately Characteristics

DAE

AEC (2013) [33] Separately Optimize the distance between ziand its closest cluster centroid.

DEN (2014) [34] Separately Locality-preserving constraint, group sparsity constraint.

PARTY (2016) [35] Separately Subspace clustering.

DEC (2016) [36] Jointly Optimize the distribution of assignments.

IDEC (2017) [37] Jointly Improve DEC [36] with local structure preservation.

DSC-Nets (2017) [38] Separately Subspace clustering.

DEPICT (2017) [39] Jointly Convolutional autoencoder and relative entropy minimization.

DCN (2017) [40] Jointly Take the objective of k-means as the clustering loss.

DMC (2017) [41] Jointly Multi-manifold clustering.

DEC-DA (2018) [42] Jointly Improve DEC [36] with data augmentation.

DBC (2018) [43] Jointly Self-paced learning.

DCC (2018) [44] Separately Extend robust continuous clustering [45] with autoencoder. Not given k.

DDLSC (2018) [46] Jointly Pairwise loss function.

DDC (2019) [47] Separately Global and local constraints of relationships.

DSCDAE (2019) [48] Jointly Subspace Clustering.

NCSC (2019) [49] Jointly Dual autoencoder network.

DDIC (2020) [50] Separately Density-based clustering. Not given k.

SC-EDAE (2020) [51] Jointly Spectral clustering.

ASPC-DA (2020) [52] Jointly Self-paced learning and data augmentation.

ALRDC (2020) [53] Jointly Adversarial learning.

N2D (2021) [54] Separately Manifold learning.

AGMDC (2021) [55] Jointly Gaussian Mixture Model. Improve the inter-cluster distance.

Net Methods Clustering-

oriented loss Characteristics

DNN

JULE (2016) [56] Yes Agglomerative clustering.

DDBC (2017) [57] Yes Information theoretic measures.

DAC (2017) [58] No Self-adaptation learning. Binary pairwise-classiﬁcation.

DeepCluster (2018) [59] No Use traditional clustering methods to assign labels.

CCNN (2018) [60] No Mini-batch k-means. Feature drift compensation for large-scale image data

ADC (2018) [61] Yes Centroid embeddings.

ST-DAC (2019) [62] No Spatial transformer layers. Binary pairwise-classiﬁcation.

RTM (2019) [63] No Random triplet mining.

IIC (2019) [64] No Mutual information. Generated image pairs.

DCCM (2019) [65] No Triplet mutual information. Generated image pairs.

MMDC (2019) [66] No Multi-modal. Generated image pairs.

SCAN (2020) [67] Yes Decouple feature learning and clustering. Nearest neighbors mining.

DRC (2020) [68] Yes Contrastive learning.

PICA (2020) [69] Yes Maximize the “global” partition conﬁdence.

TABLE 3: The summaries of VAE-,GAN-, and GNN-based methods in deep single-view clustering.

Net Methods Characteristics

VAE

VaDE (2016) [70] Gaussian mixture variational autoencoder.

GMVAE (2016) [71] Gaussian mixture variational autoencoder. Unbalanced clustering.

MFVDC (2017) [72] Continuous Gumbel-Softmax distribution.

LTVAE (2018) [73] Latent tree model.

VLAC (2019) [74] Variational ladder autoencoders.

VAEIC (2020) [75] No pre-training process.

S3VDC (2020) [76] Improvement on four generic algorithmic.

DSVAE (2021) [77] Spherical latent embeddings.

DVAE (2022) [78] Additional classiﬁer to distinguish clusters.

Net Methods With DAE Characteristics

GAN

CatGAN (2015) [79] No Can be applied to both unsupervised and semi-supervised tasks.

DAGC (2017) [80] Yes Build an encoder to make the data representations easier to cluster.

DASC (2018) [81] Yes Subspace clustering.

ClusterGAN-SPL (2019) [82] No No discrete latent variables and applies self-paced learning based on [83].

ClusterGAN (2019) [83] No Train a GAN with a clustering-speciﬁc loss.

ADEC (2020) [84] Yes Reconstruction loss and adversarial loss are optimized in turn.

IMDGC (2022) [85] No Integrates a hierarchical generative adversarial network and mutual information maximization.

Net Methods Characteristics

GNN

DAEGC (2019) [71] Perform graph clustering and learn graph embedding in a uniﬁed framework.

AGC (2019) [86] Attributed graph clustering.

AGAE (2019) [87] Ensemble clustering.

AGCHK (2020) [88] Utilize heat kernel in attributed graphs.

SDCN (2020) [89] Integrate the structural information into deep clustering.

objective when learning the features of data. Most subsequent deep

clustering researches combine clustering objectives with feature

learning, which enables the neural network to learn features

conducive to clustering from the potential distribution of data. In

this survey, those methods are summarized as “joint analysis”.

Inspired by the idea of non-parametric algorithm t-SNE [94],

Xie et al. [36] propose a joint framework to optimize feature

learning and clustering objective, which is named deep embedded

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1DeepClustering:AComprehensiveSurveyYazhouRen,Member,IEEE,JingyuPu,ZhimengYang,JieXu,GuofengLi,XiaorongPu,PhilipS.Yu,Fellow,IEEE,LifangHe,Member,IEEEAbstractClusteranalysisplaysanindispensableroleinmachinelearninganddatamining.Learningagooddatarepresentationiscrucialforclusteringalgorithms.Recently...

展开>> 收起<<

1 Deep Clustering A Comprehensive Survey Y azhou Ren Member IEEE Jingyu Pu Zhimeng Y ang Jie Xu Guofeng Li Xiaorong Pu.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Deep Clustering A Comprehensive Survey Y azhou Ren Member IEEE Jingyu Pu Zhimeng Y ang Jie Xu Guofeng Li Xiaorong Pu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: