1 Subspace-Contrastive Multi-View Clustering Lele Fu Lei Zhang Jinghua Yang Chuan Chen Chuanfu Zhang and Zibin Zheng Senior Member IEEE

2025-04-24 0 0 2.15MB 13 页 10玖币
侵权投诉
1
Subspace-Contrastive Multi-View Clustering
Lele Fu, Lei Zhang, Jinghua Yang, Chuan Chen*, Chuanfu Zhang, and Zibin Zheng, Senior Member, IEEE
Abstract—Most multi-view clustering methods are limited by
shallow models without sound nonlinear information perception
capability, or fail to effectively exploit complementary informa-
tion hidden in different views. To tackle these issues, we propose
a novel Subspace-Contrastive Multi-View Clustering (SCMC)
approach. Specifically, SCMC utilizes view-specific auto-encoders
to map the original multi-view data into compact features per-
ceiving its nonlinear structures. Considering the large semantic
gap of data from different modalities, we employ subspace
learning to unify the multi-view data into a joint semantic
space, namely the embedded compact features are passed through
multiple self-expression layers to learn the subspace represen-
tations, respectively. In order to enhance the discriminability
and efficiently excavate the complementarity of various subspace
representations, we use the contrastive strategy to maximize
the similarity between positive pairs while differentiate negative
pairs. Thus, a weighted fusion scheme is developed to initially
learn a consistent affinity matrix. Furthermore, we employ the
graph regularization to encode the local geometric structure
within varying subspaces for further fine-tuning the appropriate
affinities between instances. To demonstrate the effectiveness of
the proposed model, we conduct a large number of comparative
experiments on eight challenge datasets, the experimental results
show that SCMC outperforms existing shallow and deep multi-
view clustering methods.
Index Terms—Multi-view clustering, subspace clustering,
multi-view fusion, contrastive learning.
I. INTRODUCTION
With the growing popularity of data generation and feature
extraction, multi-view or multimedia data are available in large
quantities. To be specific, multi-view data refer to various
feature representations from multiple aspects of objects. For
instance, an image can be characterized by wavelet texture
(WT), local binary pattern (LBP), histogram of oriented gra-
dient (HOG), etc. A piece of document can be expressed in
numerous languages. Researchers generally believe that multi-
view data consist of rich and useful heterogeneous informa-
tion, so the technologies related to multi-view analysis are
receiving increasing attention. Multi-view clustering (MVC)
[1], [2], [3] is one of the representative technologies, which
aims to explore the complementary and consistent information
embedded in multi-view data to boost the clustering perfor-
mance.
Currently, there are extensive multi-view clustering meth-
ods. For example, graph-based MVC [4], [5], [6] learned
the connectivity graph matrices to reveal the relationship of
samples, then the designed fusion schemes are developed
Lele Fu and Chuanfu Zhang are with the School of System Sciences
and Engineering, Sun Yat-sen University, Guangzhou, China. Lei Zhang,
Chuan Chen, and Zibin Zheng are with the School of Computer Sci-
ence and Engineering, Sun Yat-sen University, Guangzhou, China. Jinghua
Yang is with Faculty of Information Technology, Macau University of
Science and Technology, Macau, China. (email: lawrencefzu@gmail.com,
chenchuan@mail.sysu.edu.cn). * Corresponding author.
to merge these graph matrices into a global graph. Spectral
embedding-based MVC [7], [8], [9] exploited low-dimensional
spectral embedding with orthogonal constraint for each view,
which portrays important components of data, a consensus
representation was further learned on the basis of them. The
goal of nonnegative matrix based MVC [10], [11], [12] is
to factorize a nonnegative discrete cluster indicator matrix
from varying representations, thus the argmax(·)function
is adopted to acquire the data labels. Among multitudinous
MVC methods, multi-view subspace clustering is a research
hotspot and widely studied for its superior performance, which
absorbs theory from conventional subspace clustering and
further develops it. The works [13], [14] are classic multi-
view subspace clustering approaches, which aimed to explore
a uniform underlying feature space from multiple subspace
representations. [15], [16] performed the tensor factorization
on the representation tensor to capture the global correlations
between views. These shallow models have yielded promis-
ing clustering results, but most real-world data are high-
dimensional and nonlinear, shallow models might not been
equipped with the ability to fetch nonlinear structures.
Auto-Encoder (AE) is an effective unsupervised deep rep-
resentation learning paradigm, which non-linearly maps the
original data features into a compact feature space via the
encoders, then passes the compact representations through
the decoders to reconstruct the data. AE is frequently used
to condense data information in clustering tasks. [17], [18]
are two well-known deep embedding learning methods, which
used Kullback-Leibler (KL) divergence regularization to max-
imize the similarity of soft assignments and target distribu-
tions. During the past few years, AE is also introduced to
multi-view subspace clustering. Sun et al. [19] used self-
supervised strategy to improve the unified subspace represen-
tation learning. Zhu et al. [20] simultaneously learned a set
of view-specific self-expression representations, then which
are combined into a common self-expression representation.
Wang et al. [21] learned a unified subspace representation
from multi-view discriminative feature spaces. Cui et al. [22]
proposed the spectral supervisor to guide the learning of con-
sensus subspace representation. The clustering performance
of the above deep multi-view subspace clustering approaches
are excellent, but their abilities of exploiting the association
between multiple subspace representations still need to be
improved. For instance, [19], [21] directly learn the consistent
self-expression representation from multi-view latent features
refined by AEs, which could not capture the characteristics of
disparate views, thus failing to utilizing the complementary
information. [20] applies a Hilbert Schmidt Independence
Criterion (HSIC) regularization term to reinforce the diversity
of different views, this indistinguishable alienation of different
views may render it difficult to obtain the agreement of them.
arXiv:2210.06795v1 [cs.LG] 13 Oct 2022
2
As for [22], a weighted fusion layer is used to integrate all self-
expression representations, which does not harness the view
correlations in insightful ways. Contrastive learning [23] is
an emerging self-supervised strategy that aims to maximize
the similarity between positive pairs whereas minimize the
similarity between negative pairs. In multi-view clustering
scenarios, there is a natural contrastive relationship between
varying views, thus giving rise to some multi-view contrastive
clustering methods [24], [25], [26]. An important objective
reality in multi-view data is that there may be large modality
gap of data under different views, which can drive the distance
between instance pairs to be extremely huge, rendering the
contrast process difficult. The current multi-view contrastive
clustering methods barely consider and address the problem
of modality gap, this is a vital motivation of this paper.
We are inspired by the idea of contrastive learning, and pro-
pose a Subspace-Contrastive Multi-View Clustering (SCMC)
method. Specifically, in order to perceive the nonlinear struc-
tures in multi-view data, we employ view-specific AEs to en-
code the initial features into multiple compact space, wherein
the respective subspace representations are further learned
through the self-expression layers, such that the semantic
information of data belonging to disparate modalities can be
unified into a common semantic space. Thus, we consider the
same sample under different views as the positive pairs, and
the rest of pairs are considered as negative, Fig. 2 illustrates
the manner of constructing positive and negative pairs. By
pairwisely contrasting multiple subspace representations, we
bring the positive pairs closer together and the negative pairs
further apart. This operation enhances the discriminability of
each subspace representation and explore the complemen-
tary information within them, which is different from the
discrimination-induced regularization achieved by the indis-
tinguishable mutual exclusion between various representations
in literatures [21], [20]. To obtain a consistent affinity matrix,
we use a weighted fusion scheme to merge multiple subspace
representations. Moreover, the graph regularization is applied
to encode the local structures inside the learned subspaces.
Finally, abundant experiments on eight challenge datasets
are implemented to verify the effectiveness of the proposed
SCMC. The major contributions of this paper are summarized
as follows:
We adopt view-specific auto-encoders to map the multi-
view data into compact feature spaces perceiving the
nonlinear structures, wherein respective subspace repre-
sentations are explored via self-expression layers to align
the semantic information of data under diverse modalities
into a unified semantic space.
We consider different subspace representations as contrast
targets, then the pairwise contrast of them is conducted
to exploit the complementarity between heterogeneous
views and enhance the discrimination of each subspace
representation.
To demonstrate the validity of the proposed SCMC, we
carry out comprehensive experiments on eight multi-view
datasets, and the experimental results show that SCMC
possesses advanced data clustering capability compare
with the baseline and other multi-view clustering meth-
ods.
The rest of this paper are structured as follows. Section
II briefly review the works related to multi-view subspace
clustering and contrastive learning. In Section III, we explicate
the objective loss and the network architecture of SCMC.
Experimental details are narrated in Section IV. Finally, con-
clusion is summarized in Section V.
II. RELATED WORKS
A. Multi-view Subspace Clustering
Multi-view subspace clustering leverages heterogeneous
features of data to group samples into a union of diverse
subspaces. Self-expression based subspace learning technol-
ogy have gained widespread attention due to its concise but
sound feature characterization capabilities. Extensive Self-
expression based multi-view subspace clustering approaches
have also been proposed. For instance, [14], [27], [28] aimed to
explore a common subspace representation from multi-source
features, which is further imposed on certain designed regu-
larization terms to fit clustering task. For capturing the high-
order correlations among views, [16], [29], [30] reorganized
multiple self-expression matrices into a third-order tensor,
which is constrained with the tensor nuclear norm to recover
a low-rank feature space. To decrease the high computational
complexity caused by subspace learning, [31], [32] introduced
the anchor graph embedding to approximate the global affinity
matrices. As the above methods are based on shallow models,
their abilities to match the complex data distributions are
limited. For bridge this gap, [20], [21] used deep network
frameworks to learn the subspace representations, then they
all adopted an exclusive regularization term to improve the
independence of each view, thus further boosting the com-
plementary information among varying views. However, the
exclusive regularization terms may also magnify the variability
between positive pairs, making it difficult to obtain a uniform
representation with high confidence.
B. Contrastive Learning
Contrastive learning is one of the research hot spots of
the self-supervised learning paradigm over recent years, its
core idea is to bring the distance between positive pairs
closer and the distance between negative pairs farther in a
projection feature space. In practice, [33], [34], [35] were
proposed successively in computer vision filed, which en-
hanced the discrimination of image representations by means
of contrastive learning to better serve the downstream tasks.
Owing to the presented favorable performance, contrastive
learning have gained attention and been applied in clustering
field. Li et al. [36] simultaneously contrasted the instance-level
and cluster-level representations to strength the separability
of samples belonging to different clusters. Zhong et al. [37]
learned clustering-friendly features and compact clustering as-
signments via the designed contrastive strategy. Furthermore,
researchers have extended single-view contrastive clustering
to multi-view cases, Pan et al. [38] proposed a multi-view
3
Clustering Results
.
.
.
𝑋(1)
𝑋(2)
𝑋(𝑉)
Decoder 𝑑1
Decoder 𝑑2
Decoder 𝑑𝑉
𝑋(1)
𝑋(2)
𝑋(V)
𝑍(1)
𝑍(2)
𝑍(𝑉)
Encoder 𝑒𝑉
Encoder 𝑒1
Encoder 𝑒2
𝐶(1)
𝐶(2)
𝐶(𝑉)
A
.
.
.
Contrast
Contrast
Reconstruct
.
.
.
Self-expression
Layer
Fig. 1: The framework of the proposed SCMC. To effectively handle high-dimensional and nonlinear structures in data, we use Vview-
specific encoders to encode the initial multi-view features {X(v)}V
v=1 as the compact embedding features {C(v)}V
v=1. Thus, {C(v)T}V
v=1
pass through the multiple self-expression layers to obtain the features {C(v)T
Z(v)}V
v=1, which are fed into the Vview-specific decoders
to reconstruct the recovered data {
ˆ
X(v)}V
v=1. Notably, {Z(v)}V
v=1 are essentially the coefficient matrices of multiple self-expression layers,
also called the subspace representations. We contrast these subspace representations in pairs to exploit the complementary information
between them. Additionally, a weighted fusion of all subspace representations is performed to obtain a unified affinity matrix while the graph
regularization is adopted to fine-tune the affinities. Finally, the spectral algorithm is employed to acquire the clustering results.
(a) View 1 (b) View 2 (c) View 3
Fig. 2: The diagram of constructing the positive and negative pairs.
Let us take three views as an example, the first data point of view
2 and the same data points of the other two views are positive pairs
(connected by solid lines). The first data point of view 2 and the
remaining data points of the other two views and its own view are
negative pairs (connected by dashed lines).
contrastive graph nodes clustering scheme, wherein a consis-
tent graph is learned via the graph contrastive loss. Xu et al.
[25] conducted data reconstruction in a low-level space and
punished the consistent objectives via a contrastive scheme
in a high-level space. In light of view invariance and local
structure, Trosten et al. [26] proposed a selective contrastive
alignment method to address the misaligned label distributions.
III. THE PROPOSED METHOD
In this section, we first explain the motivations for proposing
the SCMC method. Second, we present the objective functions
of the primary modules in the proposed SCMC, revealing
the mathematical details and the effect behind them. Thus,
the specific network architectures are summarized, which are
also graphically illustrated in Fig. 1 for better comprehension.
Finally, the training process is explained.
A. Motivation
1) Traditional shallow multi-view subspace clustering
methods [13], [27], [15] are limited in perceiving the
nonlinear structures in multi-view data. Therefore, we
want to utilize deep neural networks to handle this
problem, thus allowing data with complex distributional
properties to separate well in subspaces.
2) Existing multi-view subspace clustering methods [20],
[21] enhance the discrimination of different represen-
tations through a exclusive regularization term, but this
undifferentiated disparity of all representations may ren-
der the models difficult to acquire the agreement across
views. Inspired by contrastive learning, we differentially
treat samples from various views, construct cross-view
positive and negative pairs, and strengthen the discrim-
ination of subspace representations by bringing positive
pairs closer and separating negative pairs.
3) The semantic information gap between different modal-
ities in multi-view data [39] could be very large. For
摘要:

1Subspace-ContrastiveMulti-ViewClusteringLeleFu,LeiZhang,JinghuaYang,ChuanChen*,ChuanfuZhang,andZibinZheng,SeniorMember,IEEEAbstract—Mostmulti-viewclusteringmethodsarelimitedbyshallowmodelswithoutsoundnonlinearinformationperceptioncapability,orfailtoeffectivelyexploitcomplementaryinforma-tionhiddeni...

展开>> 收起<<
1 Subspace-Contrastive Multi-View Clustering Lele Fu Lei Zhang Jinghua Yang Chuan Chen Chuanfu Zhang and Zibin Zheng Senior Member IEEE.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:2.15MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注