1 Subspace-Contrastive Multi-View Clustering Lele Fu Lei Zhang Jinghua Yang Chuan Chen Chuanfu Zhang and Zibin Zheng Senior Member IEEE

2025-04-24 0 0 2.15MB 13 页 10玖币

侵权投诉

Subspace-Contrastive Multi-View Clustering

Lele Fu, Lei Zhang, Jinghua Yang, Chuan Chen*, Chuanfu Zhang, and Zibin Zheng, Senior Member, IEEE

Abstract—Most multi-view clustering methods are limited by

shallow models without sound nonlinear information perception

capability, or fail to effectively exploit complementary informa-

tion hidden in different views. To tackle these issues, we propose

a novel Subspace-Contrastive Multi-View Clustering (SCMC)

approach. Speciﬁcally, SCMC utilizes view-speciﬁc auto-encoders

to map the original multi-view data into compact features per-

ceiving its nonlinear structures. Considering the large semantic

gap of data from different modalities, we employ subspace

learning to unify the multi-view data into a joint semantic

space, namely the embedded compact features are passed through

multiple self-expression layers to learn the subspace represen-

tations, respectively. In order to enhance the discriminability

and efﬁciently excavate the complementarity of various subspace

representations, we use the contrastive strategy to maximize

the similarity between positive pairs while differentiate negative

pairs. Thus, a weighted fusion scheme is developed to initially

learn a consistent afﬁnity matrix. Furthermore, we employ the

graph regularization to encode the local geometric structure

within varying subspaces for further ﬁne-tuning the appropriate

afﬁnities between instances. To demonstrate the effectiveness of

the proposed model, we conduct a large number of comparative

experiments on eight challenge datasets, the experimental results

show that SCMC outperforms existing shallow and deep multi-

view clustering methods.

Index Terms—Multi-view clustering, subspace clustering,

multi-view fusion, contrastive learning.

I. INTRODUCTION

With the growing popularity of data generation and feature

extraction, multi-view or multimedia data are available in large

quantities. To be speciﬁc, multi-view data refer to various

feature representations from multiple aspects of objects. For

instance, an image can be characterized by wavelet texture

(WT), local binary pattern (LBP), histogram of oriented gra-

dient (HOG), etc. A piece of document can be expressed in

numerous languages. Researchers generally believe that multi-

view data consist of rich and useful heterogeneous informa-

tion, so the technologies related to multi-view analysis are

receiving increasing attention. Multi-view clustering (MVC)

[1], [2], [3] is one of the representative technologies, which

aims to explore the complementary and consistent information

embedded in multi-view data to boost the clustering perfor-

mance.

Currently, there are extensive multi-view clustering meth-

ods. For example, graph-based MVC [4], [5], [6] learned

the connectivity graph matrices to reveal the relationship of

samples, then the designed fusion schemes are developed

Lele Fu and Chuanfu Zhang are with the School of System Sciences

and Engineering, Sun Yat-sen University, Guangzhou, China. Lei Zhang,

Chuan Chen, and Zibin Zheng are with the School of Computer Sci-

ence and Engineering, Sun Yat-sen University, Guangzhou, China. Jinghua

Yang is with Faculty of Information Technology, Macau University of

Science and Technology, Macau, China. (email: lawrencefzu@gmail.com,

chenchuan@mail.sysu.edu.cn). * Corresponding author.

to merge these graph matrices into a global graph. Spectral

embedding-based MVC [7], [8], [9] exploited low-dimensional

spectral embedding with orthogonal constraint for each view,

which portrays important components of data, a consensus

representation was further learned on the basis of them. The

goal of nonnegative matrix based MVC [10], [11], [12] is

to factorize a nonnegative discrete cluster indicator matrix

from varying representations, thus the argmax(·)function

is adopted to acquire the data labels. Among multitudinous

MVC methods, multi-view subspace clustering is a research

hotspot and widely studied for its superior performance, which

absorbs theory from conventional subspace clustering and

further develops it. The works [13], [14] are classic multi-

view subspace clustering approaches, which aimed to explore

a uniform underlying feature space from multiple subspace

representations. [15], [16] performed the tensor factorization

on the representation tensor to capture the global correlations

between views. These shallow models have yielded promis-

ing clustering results, but most real-world data are high-

dimensional and nonlinear, shallow models might not been

equipped with the ability to fetch nonlinear structures.

Auto-Encoder (AE) is an effective unsupervised deep rep-

resentation learning paradigm, which non-linearly maps the

original data features into a compact feature space via the

encoders, then passes the compact representations through

the decoders to reconstruct the data. AE is frequently used

to condense data information in clustering tasks. [17], [18]

are two well-known deep embedding learning methods, which

used Kullback-Leibler (KL) divergence regularization to max-

imize the similarity of soft assignments and target distribu-

tions. During the past few years, AE is also introduced to

multi-view subspace clustering. Sun et al. [19] used self-

supervised strategy to improve the uniﬁed subspace represen-

tation learning. Zhu et al. [20] simultaneously learned a set

of view-speciﬁc self-expression representations, then which

are combined into a common self-expression representation.

Wang et al. [21] learned a uniﬁed subspace representation

from multi-view discriminative feature spaces. Cui et al. [22]

proposed the spectral supervisor to guide the learning of con-

sensus subspace representation. The clustering performance

of the above deep multi-view subspace clustering approaches

are excellent, but their abilities of exploiting the association

between multiple subspace representations still need to be

improved. For instance, [19], [21] directly learn the consistent

self-expression representation from multi-view latent features

reﬁned by AEs, which could not capture the characteristics of

disparate views, thus failing to utilizing the complementary

information. [20] applies a Hilbert Schmidt Independence

Criterion (HSIC) regularization term to reinforce the diversity

of different views, this indistinguishable alienation of different

views may render it difﬁcult to obtain the agreement of them.

arXiv:2210.06795v1 [cs.LG] 13 Oct 2022

As for [22], a weighted fusion layer is used to integrate all self-

expression representations, which does not harness the view

correlations in insightful ways. Contrastive learning [23] is

an emerging self-supervised strategy that aims to maximize

the similarity between positive pairs whereas minimize the

similarity between negative pairs. In multi-view clustering

scenarios, there is a natural contrastive relationship between

varying views, thus giving rise to some multi-view contrastive

clustering methods [24], [25], [26]. An important objective

reality in multi-view data is that there may be large modality

gap of data under different views, which can drive the distance

between instance pairs to be extremely huge, rendering the

contrast process difﬁcult. The current multi-view contrastive

clustering methods barely consider and address the problem

of modality gap, this is a vital motivation of this paper.

We are inspired by the idea of contrastive learning, and pro-

pose a Subspace-Contrastive Multi-View Clustering (SCMC)

method. Speciﬁcally, in order to perceive the nonlinear struc-

tures in multi-view data, we employ view-speciﬁc AEs to en-

code the initial features into multiple compact space, wherein

the respective subspace representations are further learned

through the self-expression layers, such that the semantic

information of data belonging to disparate modalities can be

uniﬁed into a common semantic space. Thus, we consider the

same sample under different views as the positive pairs, and

the rest of pairs are considered as negative, Fig. 2 illustrates

the manner of constructing positive and negative pairs. By

pairwisely contrasting multiple subspace representations, we

bring the positive pairs closer together and the negative pairs

further apart. This operation enhances the discriminability of

each subspace representation and explore the complemen-

tary information within them, which is different from the

discrimination-induced regularization achieved by the indis-

tinguishable mutual exclusion between various representations

in literatures [21], [20]. To obtain a consistent afﬁnity matrix,

we use a weighted fusion scheme to merge multiple subspace

representations. Moreover, the graph regularization is applied

to encode the local structures inside the learned subspaces.

Finally, abundant experiments on eight challenge datasets

are implemented to verify the effectiveness of the proposed

SCMC. The major contributions of this paper are summarized

as follows:

•We adopt view-speciﬁc auto-encoders to map the multi-

view data into compact feature spaces perceiving the

nonlinear structures, wherein respective subspace repre-

sentations are explored via self-expression layers to align

the semantic information of data under diverse modalities

into a uniﬁed semantic space.

•We consider different subspace representations as contrast

targets, then the pairwise contrast of them is conducted

to exploit the complementarity between heterogeneous

views and enhance the discrimination of each subspace

representation.

•To demonstrate the validity of the proposed SCMC, we

carry out comprehensive experiments on eight multi-view

datasets, and the experimental results show that SCMC

possesses advanced data clustering capability compare

with the baseline and other multi-view clustering meth-

ods.

The rest of this paper are structured as follows. Section

II brieﬂy review the works related to multi-view subspace

clustering and contrastive learning. In Section III, we explicate

the objective loss and the network architecture of SCMC.

Experimental details are narrated in Section IV. Finally, con-

clusion is summarized in Section V.

II. RELATED WORKS

A. Multi-view Subspace Clustering

Multi-view subspace clustering leverages heterogeneous

features of data to group samples into a union of diverse

subspaces. Self-expression based subspace learning technol-

ogy have gained widespread attention due to its concise but

sound feature characterization capabilities. Extensive Self-

expression based multi-view subspace clustering approaches

have also been proposed. For instance, [14], [27], [28] aimed to

explore a common subspace representation from multi-source

features, which is further imposed on certain designed regu-

larization terms to ﬁt clustering task. For capturing the high-

order correlations among views, [16], [29], [30] reorganized

multiple self-expression matrices into a third-order tensor,

which is constrained with the tensor nuclear norm to recover

a low-rank feature space. To decrease the high computational

complexity caused by subspace learning, [31], [32] introduced

the anchor graph embedding to approximate the global afﬁnity

matrices. As the above methods are based on shallow models,

their abilities to match the complex data distributions are

limited. For bridge this gap, [20], [21] used deep network

frameworks to learn the subspace representations, then they

all adopted an exclusive regularization term to improve the

independence of each view, thus further boosting the com-

plementary information among varying views. However, the

exclusive regularization terms may also magnify the variability

between positive pairs, making it difﬁcult to obtain a uniform

representation with high conﬁdence.

B. Contrastive Learning

Contrastive learning is one of the research hot spots of

the self-supervised learning paradigm over recent years, its

core idea is to bring the distance between positive pairs

closer and the distance between negative pairs farther in a

projection feature space. In practice, [33], [34], [35] were

proposed successively in computer vision ﬁled, which en-

hanced the discrimination of image representations by means

of contrastive learning to better serve the downstream tasks.

Owing to the presented favorable performance, contrastive

learning have gained attention and been applied in clustering

ﬁeld. Li et al. [36] simultaneously contrasted the instance-level

and cluster-level representations to strength the separability

of samples belonging to different clusters. Zhong et al. [37]

learned clustering-friendly features and compact clustering as-

signments via the designed contrastive strategy. Furthermore,

researchers have extended single-view contrastive clustering

to multi-view cases, Pan et al. [38] proposed a multi-view

Clustering Results

…

𝑋(1)

…

𝑋(2)

…

𝑋(𝑉)

Decoder 𝑑1

Decoder 𝑑2

Decoder 𝑑𝑉

…

෠

𝑋(1)

…

෠

𝑋(2)

…

෠

𝑋(V)

𝑍(1)

𝑍(2)

𝑍(𝑉)

Encoder 𝑒𝑉

Encoder 𝑒1

Encoder 𝑒2

𝐶(1)

𝐶(2)

𝐶(𝑉)

＋

Contrast

Reconstruct

Self-expression

Layer

Fig. 1: The framework of the proposed SCMC. To effectively handle high-dimensional and nonlinear structures in data, we use Vview-

speciﬁc encoders to encode the initial multi-view features {X(v)}V

v=1 as the compact embedding features {C(v)}V

v=1. Thus, {C(v)T}V

v=1

pass through the multiple self-expression layers to obtain the features {C(v)T

Z(v)}V

v=1, which are fed into the Vview-speciﬁc decoders

to reconstruct the recovered data {

X(v)}V

v=1. Notably, {Z(v)}V

v=1 are essentially the coefﬁcient matrices of multiple self-expression layers,

also called the subspace representations. We contrast these subspace representations in pairs to exploit the complementary information

between them. Additionally, a weighted fusion of all subspace representations is performed to obtain a uniﬁed afﬁnity matrix while the graph

regularization is adopted to ﬁne-tune the afﬁnities. Finally, the spectral algorithm is employed to acquire the clustering results.

(a) View 1 (b) View 2 (c) View 3

Fig. 2: The diagram of constructing the positive and negative pairs.

Let us take three views as an example, the ﬁrst data point of view

2 and the same data points of the other two views are positive pairs

(connected by solid lines). The ﬁrst data point of view 2 and the

remaining data points of the other two views and its own view are

negative pairs (connected by dashed lines).

contrastive graph nodes clustering scheme, wherein a consis-

tent graph is learned via the graph contrastive loss. Xu et al.

[25] conducted data reconstruction in a low-level space and

punished the consistent objectives via a contrastive scheme

in a high-level space. In light of view invariance and local

structure, Trosten et al. [26] proposed a selective contrastive

alignment method to address the misaligned label distributions.

III. THE PROPOSED METHOD

In this section, we ﬁrst explain the motivations for proposing

the SCMC method. Second, we present the objective functions

of the primary modules in the proposed SCMC, revealing

the mathematical details and the effect behind them. Thus,

the speciﬁc network architectures are summarized, which are

also graphically illustrated in Fig. 1 for better comprehension.

Finally, the training process is explained.

A. Motivation

1) Traditional shallow multi-view subspace clustering

methods [13], [27], [15] are limited in perceiving the

nonlinear structures in multi-view data. Therefore, we

want to utilize deep neural networks to handle this

problem, thus allowing data with complex distributional

properties to separate well in subspaces.

2) Existing multi-view subspace clustering methods [20],

[21] enhance the discrimination of different represen-

tations through a exclusive regularization term, but this

undifferentiated disparity of all representations may ren-

der the models difﬁcult to acquire the agreement across

views. Inspired by contrastive learning, we differentially

treat samples from various views, construct cross-view

positive and negative pairs, and strengthen the discrim-

ination of subspace representations by bringing positive

pairs closer and separating negative pairs.

3) The semantic information gap between different modal-

ities in multi-view data [39] could be very large. For

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Subspace-ContrastiveMulti-ViewClusteringLeleFu,LeiZhang,JinghuaYang,ChuanChen*,ChuanfuZhang,andZibinZheng,SeniorMember,IEEEAbstractMostmulti-viewclusteringmethodsarelimitedbyshallowmodelswithoutsoundnonlinearinformationperceptioncapability,orfailtoeffectivelyexploitcomplementaryinforma-tionhiddeni...

展开>> 收起<<

1 Subspace-Contrastive Multi-View Clustering Lele Fu Lei Zhang Jinghua Yang Chuan Chen Chuanfu Zhang and Zibin Zheng Senior Member IEEE.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Subspace-Contrastive Multi-View Clustering Lele Fu Lei Zhang Jinghua Yang Chuan Chen Chuanfu Zhang and Zibin Zheng Senior Member IEEE

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: