1 Deep Clustering A Comprehensive Survey Y azhou Ren Member IEEE Jingyu Pu Zhimeng Y ang Jie Xu Guofeng Li Xiaorong Pu

2025-04-28 0 0 1.04MB 20 页 10玖币
侵权投诉
1
Deep Clustering: A Comprehensive Survey
Yazhou Ren, Member, IEEE, Jingyu Pu, Zhimeng Yang, Jie Xu, Guofeng Li, Xiaorong Pu,
Philip S. Yu, Fellow, IEEE, Lifang He, Member, IEEE
Abstract—Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is
crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural
networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the
single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this
paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial
conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture.
Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering,
semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and
potential future opportunities in different fields of deep clustering.
Index Terms—Deep clustering; semi-supervised clustering; multi-view clustering; transfer learning
F
1 INTRODUCTION
WITH the development of online media, abundant data with
high complexity can be gathered easily. Through pinpoint
analysis of these data, we can dig the value out and use these
conclusions in many fields, such as face recognition [1], [2],
sentiment analysis [3], [4], intelligent manufacturing [5], [6], etc.
A model which can be used to classify the data with different
labels is the base of many applications. For labeled data, it is
taken granted to use the labels as the most important information
as a guide. For unlabeled data, finding a quantifiable objective as
the guide of the model-building process is the key question of
clustering. Over the past decades, a large number of clustering
methods with shallow models have been proposed, including
centroid-based clustering [7], [8], density-based clustering [9],
[10], [11], [12], [13], distribution-based clustering [14], hierar-
chical clustering [15], ensemble clustering [16], [17], multi-view
clustering [18], [19], [20], [21], [22], [23], etc. These shallow
models are effective only when the features are representative,
while their performance on the complex data is usually limited
due to the poor power of feature learning.
In order to map the original complex data to a feature space
that is easy to cluster, many clustering methods focus on feature
extraction or feature transformation, such as PCA [24], kernel
method [25], spectral method [26], deep neural network [27], etc.
Among these methods, the deep neural network is a promising ap-
proach because of its excellent nonlinear mapping capability and
its flexibility in different scenarios. A well-designed deep learning
based clustering approach (referred to deep clustering) aims at
effectively extracting more clustering-friendly features from data
and performing clustering with learned features simultaneously.
Much research has been done in the field of deep clustering
and there are also some surveys about deep clustering methods
Yazhou Ren, Jingyu Pu, Zhimeng Yang, Jie Xu, Guofeng Li and Xiaorong
Pu are with University of Electronic Science and Technology of China,
Chengdu 611731, China. Yazhou Ren is the corresponding author. E-mail:
yazhou.ren@uestc.edu.cn.
Philip S. Yu is with University of Illinois at Chicago, IL 60607, USA.
Lifang He is with Lehigh University, PA 18015, USA.
Manuscript received Oct. 2022.
[28], [29], [30], [31]. Specifically, existing systematic reviews for
deep clustering mainly focus on the single-view clustering tasks
and the architectures of neural networks. For example, Aljalbout
et al. [28] focus only on deep single-view clustering methods
which are based on deep autoencoder (AE or DAE). Min et
al. [29] classify deep clustering methods from the perspective
of different deep networks. Nutakki et al. [30] divide deep
single-view clustering methods into three categories according
to their training strategies: multi-step sequential deep clustering,
joint deep clustering, and closed-loop multi-step deep clustering.
Zhou et al. [31] categorize deep single-view clustering methods
by the interaction way between feature learning and clustering
modules. But in the real world, the datasets for clustering are
always associated, e.g., the taste for reading is correlated with
the taste for a movie, and the side face and full-face from the
same person should be labeled the same. For these data, deep
clustering methods based on semi-supervised learning, multi-view
learning, and transfer learning have also made significant progress.
Unfortunately, existing reviews do not discuss them too much.
Therefore, it is important to classify deep clustering from
the perspective of data sources and initial conditions. In this
survey, we summarize the deep clustering from the perspective of
initial settings of data combined with deep learning methodology.
We introduce the newest progress of deep clustering from the
perspective of network and data structure as shown in Fig. 1.
Specifically, we organize the deep clustering methods into the
following four categories:
Deep single-view clustering
For conventional clustering tasks, it is often assumed that
the data are of the same form and structure, as known as single-
view or single-modal data. The extraction of representations for
these data by deep neural networks (DNNs) is a significant
characteristic of deep clustering. However, what is more note-
worthy is the different applied deep learning techniques, which
are highly correlated with the structure of DNNs. To compare the
technical route of specific DNNs, we divide those algorithms into
five categories: deep autoencoder (DAE) based deep clustering,
arXiv:2210.04142v1 [cs.LG] 9 Oct 2022
2
6LQJOHYLHZ
'HHS
FOXVWHULQJ
6HPL
VXSHUYLVHG
0XOWLYLHZ
7UDQVIHU
OHDUQLQJ
'(&EDVHG
6XEVSDFH
FOXVWHULQJEDVHG
*11EDVHG
'11EDVHG *$1EDVHG
'DWD VWUXFWXUH
9$(EDVHG
'11EDVHG
'$(EDVHG
*$1EDVHG
*11EDVHG
1HWZRUN
Fig. 1: The directory tree of this survey.
deep neural network (DNN) based deep clustering, variational
autoencoder (VAE) based deep clustering, generative adversarial
network (GAN) based deep clustering and graph nerual network
(GNN) based deep clustering.
Deep clustering based on semi-supervised learning
When the data to be processed contain a small part of
prior constraints, traditional clustering methods cannot effectively
utilize this prior information and semi-supervised clustering is an
effective way to solve this question. In presence, the research
of deep semi-supervised clustering has not been well explored.
However, semi-supervised clustering is inevitable because it is
feasible to let a clustering method become a semi-supervised one
by adding the additional information as a constraint loss to the
model.
Deep clustering based on multi-view learning
In the real world, data are often obtained from different
feature collectors or have different structures. We call those data
”multi-view data” or ”multi-modal data”, where each sample has
multiple representations. The purpose of deep clustering based on
multi-view learning is to utilize the consistent and complementary
information contained in multi-view data to improve clustering
performance. In addition, the idea of multi-view learning may have
guiding significance for deep single-view clustering. In this survey,
we summarize deep multi-view clustering into three categories:
deep embedded clustering based, subspace clustering based, and
graph neural network based.
Deep clustering based on transfer learning
For a task that has a limited amount of instances and high
dimensions, sometimes we can find an assistant to offer additional
information. For example, if task A is similar to another task B and
B has more information for clustering than A (B is labeled or B is
easier to clustering than A), it is useful to transfer the information
from B to A. Transfer learning for unsupervised domain adaption
(UDA) is boosted in recent years, which contains two domains:
Source domain with labels and target domain which is unlabeled.
The goal of transfer learning is to apply the knowledge or patterns
learned from the source task to a different but related target
task. Deep clustering methods based on transfer learning aim to
improve the performance of current clustering tasks by utilizing
information from relevant tasks.
TABLE 1: Notations and their descriptions in this paper.
Notations Descriptions
ia counter variable
ja counter variable
|.|the length of a set
k.kthe 2-norm of a vector
Xthe data for clustering
Xsthe data in source domain (UDA methods)
Ysthe labels of source domain instances (UDA methods)
Xtthe data in target domain (UDA methods)
Dsthe source domain of UDA methods
Dtthe target domain of UDA methods
xithe vector of an oringinal data sample
Xithe i-th view of Xin multi-view learning
ˆ
Ythe predicted labels of X
Sthe soft data assignments of X
Rthe adjusted assignments of S
Athe pairwise constraint matrix
aij the constraint of sample iand sample j
zithe vector of the embedded representation of xi
εthe noise used in generative model
Ethe expectation
Lnthe network loss
Lcthe clustering loss
Lext the extra task loss
Lrec the reconstruction loss of autoencoder network
Lgan the loss of GAN
LELBO the loss of evidence lower bound
kthe number of clusters
nthe number of data samples
µthe mean of the Gaussian distribution
θthe variance of the Gaussian distribution
KL(.k.)the Kullback-Leibler divergence
p(.)the probability distribution
p(.|.)the conditional probability distribution
p(., .)the joint probability distribution
q(.)the approximate probability distribution of p(.)
q(.|.)the approximate probability distribution of p(.|.)
q(., .)the approximate probability distribution of p(., .)
f(.)the feature extractor
φe(.)the encoder network of AE or VAE
φr(.)the decoder network of AE or VAE
φg(.)the generative network of GAN
φd(.)the discriminative network of GAN
Qthe graph adjacency matrix
Dthe degree matrix of Q
Cthe feature matrix of a graph
Hthe node hidden feature matrix
Wthe learnable model parameters
It is necessary to pay attention to the different characteristics
and conditions of the clustering data before studying the corre-
sponding clustering methods. In this survey, existing deep cluster-
ing methods are systematically classified from data sources and
initial conditions. The advantages, disadvantages, and applicable
conditions of different clustering methods are analyzed. Finally,
we present some interesting research directions in the field of deep
clustering.
2 DEFINITIONS AND PRELIMINARIES
We introduce the notations in this section. Throughout this
paper, we use uppercase letters to denote matrices and lowercase
letters to denote vectors. Unless otherwise stated, the notations
used in this paper are summarized in Table 1.
This survey will introduce four kinds of deep clustering
problems based on different background conditions. Here, we
define these problems formally. Given a set of data samples X,
we aim at finding a map function Fwhich can map Xinto k
3
clusters. The map result is represented with ˆ
Y. So the tasks we
cope with are:
(1) Deep single-view clustering:
F(X)ˆ
Y . (1)
(2) Semi-supervised deep clustering:
F(X, A)ˆ
Y , (2)
where Ais a constrained matrix.
(3) Deep multi-view clustering:
FX1, ..., Xnˆ
Y , (3)
where Xiis the i-th view of X.
(4) Deep clustering with domain adaptation:
FXs, Y s, Xtˆ
Y , (4)
where (Xs, Y s)is the labeled source domain and Xtis the
unlabeled target domain.
3 DEEP SINGLE-VIEW CLUSTERING
The theory of representation learning [32] shows the impor-
tance of feature learning (or representation learning) in machine
learning tasks. However, deep representation learning is mostly
supervised learning that requires many labeled data. As we men-
tioned before, the obstacle of the deep clustering problem is what
can be used to guide the training process like labels in supervised
problem. The most “supervised” information in deep clustering is
the data itself. So how can we train an effective feature extractor to
get good representation? According to the way the feature extrac-
tor is trained, we divide deep single-view clustering algorithms
into five categories: DAE-based,DNN-based,VAE-based,GAN-
based, and GNN-based. The difference of these methods is mainly
about the loss components, where the loss terms are defined in
Table 1 and explained below:
DAE-based/GNN-based:L=Lrec +Lc,
DNN-based:L=Lext +Lc,
VAE-based:L=LELBO +Lc,
GAN-based:L=Lgan +Lc.
In unsupervised learning, the issue we cope with is to train
a reliable feature extractor without labels. There are mainly two
ways in existing works: 1) A loss function that optimizes the
pseudo labels according to the principle: narrowing the inner-
cluster distance and widening the inter-cluster distance. 2) An ex-
tra task that can help train the feature extractor. For the clustering
methods with specialized feature extractors, such as autoencoder,
the reconstruction loss Lrec can be interpreted as the extra task.
In this paper, the clustering-oriented loss Lcindicates the loss
of the clustering objective. DAE-based/GNN-based methods use
an autoencoder/graph autoencoder as the feature extractor, so the
loss functions are always composed of a reconstruction loss Lrec
and another clustering-oriented loss Lc. By contrast, DNN-based
methods optimize the feature extractor with extra tasks or other
strategies Lext.VAE-based methods optimize the loss of evidence
lower bound LELBO .GAN-based methods are based on the
generative adversarial loss Lgan. Based on these five dimensions,
existing deep single-view clustering methods are summarized in
Table 2 and Table 3.
3.1 DAE-based
The autoencoder network [90] is originally designed for
unsupervised representation learning of data and can learn a highly
non-linear mapping function. Using deep autoencoder (DAE) [91]
is a common way to develop deep clustering methods. DAE aims
to learn a low-dimensional embedding feature space by minimiz-
ing the reconstruction loss of the network, which is defined as:
Lrec = min 1
n
n
X
i=1
kxiφr(φe(xi))k2(5)
where φe(.)and φr(.)represent the encoder network and decoder
network of autoencoder respectively. Using the encoder as a
feature extractor, various clustering objective functions have been
proposed. We summarize these deep autoencoder based cluster-
ing methods as DAE-based deep clustering. In DAE-based deep
clustering methods, there are two main ways to get the labels.
The first way embeds the data into low-dimensional features and
then clusters the embedded features with traditional clustering
methods such as the k-means algorithm [92]. The second way
jointly optimizes the feature extractor and the clustering results.
We refer to these two approaches as “separate analysis” and “joint
analysis” respectively, and elaborate on them below.
“Separate analysis” means that learning features and clus-
tering data are performed separately. In order to solve the prob-
lem that representations learned by “separately analysis” are not
cluster-oriented due to its innate characteristics, Huang et al.
propose a deep embedding network for clustering (DEN) [34],
which imposes two constraints based on DAE objective: locality-
preserving constraint and group sparsity constraint. Locality-
preserving constraint urges the embedded features in the same
cluster to be similar. Group sparsity constraint aims to diagonalize
the affinity of representations. These two constraints improve the
clustering performance while reduce the inner-cluster distance and
expand inter-cluster distance. The objective of most clustering
methods based on DAE are working on these two kinds of
distance. So, in Table 2, we summarize these methods from the
perspective of “characteristics”, which shows the way to optimize
the inner-cluster distance and inter-cluster distance.
Peng et al. [35] propose a novel deep learning based frame-
work in the field of Subspace clustering, namely, deep subspace
clustering with sparsity prior (PARTY). PARTY enhances autoen-
coder by considering the relationship between different samples
(i.e., structure prior) and solves the limitation of traditional sub-
space clustering methods. As far as we know, PARTY is the first
deep learning based subspace clustering method, and it is the first
work to introduce the global structure prior to the neural network
for unsupervised learning. Different from PARTY, Ji et al. [38]
propose another deep subspace clustering networks (DSC-Nets)
architecture to learn non-linear mapping and introduce a self-
expressive layer to directly learn the affinity matrix.
Density-based clustering [9], [93] is another kind of popular
clustering methods. Ren et al. [50] propose deep density-based im-
age clustering (DDIC) that uses DAE to learn the low-dimensional
feature representations and then performs density-based clustering
on the learned features. In particular, DDIC does not need to know
the number of clusters in advance.
“Joint analysis” aims at learning a representation that is more
suitable for clustering which is different from separate analysis
approaches that deep learning and clustering are carried out
separately, the neural network does not have a clustering-oriented
4
TABLE 2: The summaries of DAE-based and DNN-based methods in deep single-view clustering. We summarize the DAE-based methods
based on “Jointly or Separately” and “Characteristics”.
Net Methods Jointly or Sepa-
rately Characteristics
DAE
AEC (2013) [33] Separately Optimize the distance between ziand its closest cluster centroid.
DEN (2014) [34] Separately Locality-preserving constraint, group sparsity constraint.
PARTY (2016) [35] Separately Subspace clustering.
DEC (2016) [36] Jointly Optimize the distribution of assignments.
IDEC (2017) [37] Jointly Improve DEC [36] with local structure preservation.
DSC-Nets (2017) [38] Separately Subspace clustering.
DEPICT (2017) [39] Jointly Convolutional autoencoder and relative entropy minimization.
DCN (2017) [40] Jointly Take the objective of k-means as the clustering loss.
DMC (2017) [41] Jointly Multi-manifold clustering.
DEC-DA (2018) [42] Jointly Improve DEC [36] with data augmentation.
DBC (2018) [43] Jointly Self-paced learning.
DCC (2018) [44] Separately Extend robust continuous clustering [45] with autoencoder. Not given k.
DDLSC (2018) [46] Jointly Pairwise loss function.
DDC (2019) [47] Separately Global and local constraints of relationships.
DSCDAE (2019) [48] Jointly Subspace Clustering.
NCSC (2019) [49] Jointly Dual autoencoder network.
DDIC (2020) [50] Separately Density-based clustering. Not given k.
SC-EDAE (2020) [51] Jointly Spectral clustering.
ASPC-DA (2020) [52] Jointly Self-paced learning and data augmentation.
ALRDC (2020) [53] Jointly Adversarial learning.
N2D (2021) [54] Separately Manifold learning.
AGMDC (2021) [55] Jointly Gaussian Mixture Model. Improve the inter-cluster distance.
Net Methods Clustering-
oriented loss Characteristics
DNN
JULE (2016) [56] Yes Agglomerative clustering.
DDBC (2017) [57] Yes Information theoretic measures.
DAC (2017) [58] No Self-adaptation learning. Binary pairwise-classification.
DeepCluster (2018) [59] No Use traditional clustering methods to assign labels.
CCNN (2018) [60] No Mini-batch k-means. Feature drift compensation for large-scale image data
ADC (2018) [61] Yes Centroid embeddings.
ST-DAC (2019) [62] No Spatial transformer layers. Binary pairwise-classification.
RTM (2019) [63] No Random triplet mining.
IIC (2019) [64] No Mutual information. Generated image pairs.
DCCM (2019) [65] No Triplet mutual information. Generated image pairs.
MMDC (2019) [66] No Multi-modal. Generated image pairs.
SCAN (2020) [67] Yes Decouple feature learning and clustering. Nearest neighbors mining.
DRC (2020) [68] Yes Contrastive learning.
PICA (2020) [69] Yes Maximize the “global” partition confidence.
TABLE 3: The summaries of VAE-,GAN-, and GNN-based methods in deep single-view clustering.
Net Methods Characteristics
VAE
VaDE (2016) [70] Gaussian mixture variational autoencoder.
GMVAE (2016) [71] Gaussian mixture variational autoencoder. Unbalanced clustering.
MFVDC (2017) [72] Continuous Gumbel-Softmax distribution.
LTVAE (2018) [73] Latent tree model.
VLAC (2019) [74] Variational ladder autoencoders.
VAEIC (2020) [75] No pre-training process.
S3VDC (2020) [76] Improvement on four generic algorithmic.
DSVAE (2021) [77] Spherical latent embeddings.
DVAE (2022) [78] Additional classifier to distinguish clusters.
Net Methods With DAE Characteristics
GAN
CatGAN (2015) [79] No Can be applied to both unsupervised and semi-supervised tasks.
DAGC (2017) [80] Yes Build an encoder to make the data representations easier to cluster.
DASC (2018) [81] Yes Subspace clustering.
ClusterGAN-SPL (2019) [82] No No discrete latent variables and applies self-paced learning based on [83].
ClusterGAN (2019) [83] No Train a GAN with a clustering-specific loss.
ADEC (2020) [84] Yes Reconstruction loss and adversarial loss are optimized in turn.
IMDGC (2022) [85] No Integrates a hierarchical generative adversarial network and mutual information maximization.
Net Methods Characteristics
GNN
DAEGC (2019) [71] Perform graph clustering and learn graph embedding in a unified framework.
AGC (2019) [86] Attributed graph clustering.
AGAE (2019) [87] Ensemble clustering.
AGCHK (2020) [88] Utilize heat kernel in attributed graphs.
SDCN (2020) [89] Integrate the structural information into deep clustering.
objective when learning the features of data. Most subsequent deep
clustering researches combine clustering objectives with feature
learning, which enables the neural network to learn features
conducive to clustering from the potential distribution of data. In
this survey, those methods are summarized as “joint analysis”.
Inspired by the idea of non-parametric algorithm t-SNE [94],
Xie et al. [36] propose a joint framework to optimize feature
learning and clustering objective, which is named deep embedded
摘要:

1DeepClustering:AComprehensiveSurveyYazhouRen,Member,IEEE,JingyuPu,ZhimengYang,JieXu,GuofengLi,XiaorongPu,PhilipS.Yu,Fellow,IEEE,LifangHe,Member,IEEEAbstract—Clusteranalysisplaysanindispensableroleinmachinelearninganddatamining.Learningagooddatarepresentationiscrucialforclusteringalgorithms.Recently...

展开>> 收起<<
1 Deep Clustering A Comprehensive Survey Y azhou Ren Member IEEE Jingyu Pu Zhimeng Y ang Jie Xu Guofeng Li Xiaorong Pu.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:1.04MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注