
Figure 1: The forgetting curves of DeepAligned (Blue).
During discovering intents in DeepAligned, the model con-
stantly forgets the knowledge learned from labeled data. The
brown line represents the baseline obtained by the model
after transferring prior knowledge. In contrast, our method
(Red) can alleviate forgetting well and See subsequent sec-
tion for more discuss.
softmax loss formed by pseudo labels cannot explore the in-
trinsic structure of unlabeled data, so it can not provide ac-
curate clustering supervised signals for discovering intents.
Different from the previous methods, we start from the
essential intuition that the discovery of intents should not
damage the identification of the known intents and the two
processes should achieve a win-win situation. The knowl-
edge contained in labeled data corpus can be used to guide
the discovery of the new intents, and the information learned
from the unlabeled corpus (in the process of discovering)
could improve the identification of the known intents.
Based on this intuition, with the help of optimizing iden-
tification of labeled data given the whole data corpus, we
propose a principled probabilistic framework for intents dis-
covery, where intent assignments as a latent variable. Ex-
pectation maximization provides a principal template for
learning this typical latent variable model. Specifically, in
the E-step, we use the current model to discover intents and
calculate a specified posterior probability of intent assign-
ments, which is to explore the intrinsic structure of data. In
the M-step, maximize the probability of identification of la-
beled data (which is to mitigate catastrophic forgetting) and
the posterior probability of intent assignments (which is to
help learn friendly features for discovering new intents) si-
multaneously to optimize and update model parameters. Ex-
tensive experiments conducted in three benchmark datasets
demonstrate our method can achieve substantial improve-
ments over strong baselines. We summarize our contribu-
tions as follows:
(Theory) We introduce a principled probabilistic frame-
work for discovering intents and provide a learning algo-
rithm based on Expectation Maximization. To the best of our
knowledge, this is the first complete theoretical framework
in this field and we hope it can inspire follow-up research.
(Methodology) We provide an efficient implementation
based on the proposed probabilistic framework. After trans-
ferring prior knowledge, we use a simple and effective
method to alleviate the forgetting. Furthermore, we use the
contrastive learning paradigm to explore the intrinsic struc-
ture of unlabeled data, which not only avoids misleading the
model caused by relying on pseudo labels but also helps to
better learn the features that are friendly to intent discovery.
(Experiments and Analysis) We conduct extensive ex-
periments on a suite of real-world datasets and establish sub-
stantial improvements.
Related Work
Our work is mainly related to two lines of research: Unsu-
pervised and Semi-supervised clustering.
Unsupervised Clustering Extracting meaningful infor-
mation from unlabeled data has been studied for a long time.
Traditional approaches like K-means (MacQueen et al.
1967) and Agglomerative Clustering (AC) (Gowda and Kr-
ishna 1978) are seminal but hardly perform well in high-
dimensional space. Recent efforts are devoted to using the
deep neural network to obtain good clustering representa-
tions. Xie, Girshick, and Farhadi (2016) propose Deep Em-
bedded Cluster (DEC) to learn and refine the features it-
eratively by optimizing a clustering objective based on an
auxiliary distribution. Unlike DEC, Yang et al. (2017) pro-
pose Deep Clustering Network (DCN) that performs nonlin-
ear dimensionality reduction and k-means clustering jointly
to learn friendly representation. Chang et al. (2017) (DAC)
apply unsupervised clustering to image clustering and pro-
poses a binary-classification framework that uses adaptive
learning for optimization. Then, DeepCluster (Caron et al.
2018) proposes an end-to-end training method that performs
cluster assignments and representation learning alternately.
However, the key drawback of unsupervised methods is their
incapability of taking advantage of prior knowledge to guide
the clustering.
Semi-supervised Clustering With the aid of a few la-
beled data, semi-supervised clustering usually produces bet-
ter results compared with unsupervised counterparts. PCK-
Means (Basu, Banerjee, and Mooney 2004) proposes that
the clustering can be supervised by pairwise constraints be-
tween samples in the dataset. KCL (Hsu, Lv, and Kira 2017)
transfers knowledge in the form of pairwise similarity pre-
dictions firstly and learns a clustering network to transfer
learning. Along this line, MCL (Hsu et al. 2019) further for-
mulates multi-classification as meta classification that pre-
dicts pairwise similarity and generalizes the framework to
various settings. DTC (Han, Vedaldi, and Zisserman 2019)
extends the DEC algorithm and proposes a mechanism to
estimate the number of new images categories using la-
beled data. When it comes to the field of text clustering,
CDAC+ (Lin, Xu, and Zhang 2020) combines the pairwise
constraints and target distribution to discover new intents
while DeepAligned (Zhang et al. 2021) introduces an align-
ment strategy to improve the clustering consistency. Very re-
cently, SCL (Shen et al. 2021) incorporates a strong back-
bone MPNet in the Siamese Network structure with con-
trastive loss (or rely on a large amount of additional external
data (Zhang et al. 2022)) to learn the better sentence repre-
sentations. Although these methods take known intents into
account, they may suffer from knowledge forgetting during
the training process. More importantly, these methods are