Discovering New Intents Using Latent Variables Yunhua Zhou Peiju Liu Yuxin Wang Xipeng Qiu School of Computer Science Fudan University

2025-05-03 0 0 413.53KB 8 页 10玖币
侵权投诉
Discovering New Intents Using Latent Variables
Yunhua Zhou, Peiju Liu, Yuxin Wang, Xipeng Qiu*
School of Computer Science, Fudan University
{zhouyh20,xpqiu}@fudan.edu.cn
{pjliu21, wangyuxin21}@m.fudan.edu.cn
Abstract
Discovering new intents is of great significance to establish-
ing Bootstrapped Task-Oriented Dialogue System. Most ex-
isting methods either lack the ability to transfer prior knowl-
edge in the known intent data or fall into the dilemma of for-
getting prior knowledge in the follow-up. More importantly,
these methods do not deeply explore the intrinsic structure
of unlabeled data, so they can not seek out the characteris-
tics that make an intent in general. In this paper, starting from
the intuition that discovering intents could be beneficial to
the identification of the known intents, we propose a prob-
abilistic framework for discovering intents where intent as-
signments are treated as latent variables. We adopt Expecta-
tion Maximization framework for optimization. Specifically,
In E-step, we conduct discovering intents and explore the in-
trinsic structure of unlabeled data by the posterior of intent
assignments. In M-step, we alleviate the forgetting of prior
knowledge transferred from known intents by optimizing the
discrimination of labeled data. Extensive experiments con-
ducted in three challenging real-world datasets demonstrate
our method can achieve substantial improvements.
Introduction
Unknown intent detection (Zhou, Liu, and Qiu 2022) in
Bootstrapped Task-Oriented Dialogue System (BTODS)
has gradually attracted more and more attention from re-
searchers. However, detecting unknown intent is only the
first step. For BTODS, discovering new intents is not only
the same basic but also more crucial and challenging. Be-
cause the preset intent set in BTODS is limited to cover all
intents, BTODS should discover potential new intents ac-
tively during interacting with the users. Specifically, a large
number of valuable unlabeled data will be generated within
the interaction between users and the dialogue system. Con-
sidering the limited labeled corpus and time-consuming an-
notating, which also requires prior domain knowledge, the
BTODS should adaptively identify known intents and dis-
cover unknown intents from those unlabeled data with the
aid of limited labeled data.
Just as discovering new intents plays a crucial role in es-
tablishing BTODS, discovering new intents has raised a lot
of research interest as unknown intent detection. Unsuper-
vised cluster learning is one popular method to solve this
*Corresponding author.
problem. To discover new intents from a large number of
unlabeled data, many works (Hakkani-T¨
ur et al. 2013, 2015;
Shi et al. 2018; Padmasundari 2018) formalize this problem
as an unsupervised clustering process. However, these meth-
ods mainly focus on how to construct pseudo-supervised sig-
nals to assist in guiding the clustering process and do not
fully utilize the prior knowledge contained in the existing
labeled data.
In a more general real scenario, we often have a small
(but containing prior knowledge that can be used to guide
the discovery of new intents) amount of labeled data in ad-
vance and a large amount of unlabeled data (e.g., in the dia-
logue scene mentioned above, it is generated in the interac-
tion with the dialogue system), which contains both known
intents and unknown intents to be discovered.Our purpose
is to identify the known intents and discover the potential
intents contained in the unlabeled corpus using labeled data.
Recently, Lin, Xu, and Zhang (2020) propose that pair-
wise similarities can be used as pseudo supervision sig-
nals to guide the discovery of new intents. However, as in
the analysis of Zhang et al. (2021), this method can not
achieve effective performance when there are more new
intents to be discovered. Inspired by Caron et al. (2018),
Zhang et al. (2021) (DeepAligned) propose an effective
method for discovering new intents. DeepAligned first fine-
tunes the BERT (Devlin et al. 2018) using the labeled data to
transfer the prior knowledge and generalize the knowledge
into the semantic features of unlabeled data. Further, to learn
friendly representations for clustering, DeepAligned assigns
a pseudo label to each unlabeled utterance and re-trains the
model under the supervision of the softmax which is calcu-
lated by those pseudo labels.
Nevertheless, DeepAligned may suffer from two critical
problems. Firstly, when the model is re-trained with pseudo
supervision signal, the model will forget the knowledge
transferred in the transferring stage, the forgetting curves
on different datasets as shown in Figure 1. During discov-
ering intents in DeepAligned, we test the performance of
the model on the validation set used in the transferring
prior knowledge stage and show that with the advancement
of clustering, the model constantly forgets the knowledge
learned from labeled data. Furthermore, the model could be
misled by inaccurate pseudo labels, particularly in large-
sized intent space (Wang et al. 2021). More importantly,
arXiv:2210.11804v1 [cs.CL] 21 Oct 2022
Figure 1: The forgetting curves of DeepAligned (Blue).
During discovering intents in DeepAligned, the model con-
stantly forgets the knowledge learned from labeled data. The
brown line represents the baseline obtained by the model
after transferring prior knowledge. In contrast, our method
(Red) can alleviate forgetting well and See subsequent sec-
tion for more discuss.
softmax loss formed by pseudo labels cannot explore the in-
trinsic structure of unlabeled data, so it can not provide ac-
curate clustering supervised signals for discovering intents.
Different from the previous methods, we start from the
essential intuition that the discovery of intents should not
damage the identification of the known intents and the two
processes should achieve a win-win situation. The knowl-
edge contained in labeled data corpus can be used to guide
the discovery of the new intents, and the information learned
from the unlabeled corpus (in the process of discovering)
could improve the identification of the known intents.
Based on this intuition, with the help of optimizing iden-
tification of labeled data given the whole data corpus, we
propose a principled probabilistic framework for intents dis-
covery, where intent assignments as a latent variable. Ex-
pectation maximization provides a principal template for
learning this typical latent variable model. Specifically, in
the E-step, we use the current model to discover intents and
calculate a specified posterior probability of intent assign-
ments, which is to explore the intrinsic structure of data. In
the M-step, maximize the probability of identification of la-
beled data (which is to mitigate catastrophic forgetting) and
the posterior probability of intent assignments (which is to
help learn friendly features for discovering new intents) si-
multaneously to optimize and update model parameters. Ex-
tensive experiments conducted in three benchmark datasets
demonstrate our method can achieve substantial improve-
ments over strong baselines. We summarize our contribu-
tions as follows:
(Theory) We introduce a principled probabilistic frame-
work for discovering intents and provide a learning algo-
rithm based on Expectation Maximization. To the best of our
knowledge, this is the first complete theoretical framework
in this field and we hope it can inspire follow-up research.
(Methodology) We provide an efficient implementation
based on the proposed probabilistic framework. After trans-
ferring prior knowledge, we use a simple and effective
method to alleviate the forgetting. Furthermore, we use the
contrastive learning paradigm to explore the intrinsic struc-
ture of unlabeled data, which not only avoids misleading the
model caused by relying on pseudo labels but also helps to
better learn the features that are friendly to intent discovery.
(Experiments and Analysis) We conduct extensive ex-
periments on a suite of real-world datasets and establish sub-
stantial improvements.
Related Work
Our work is mainly related to two lines of research: Unsu-
pervised and Semi-supervised clustering.
Unsupervised Clustering Extracting meaningful infor-
mation from unlabeled data has been studied for a long time.
Traditional approaches like K-means (MacQueen et al.
1967) and Agglomerative Clustering (AC) (Gowda and Kr-
ishna 1978) are seminal but hardly perform well in high-
dimensional space. Recent efforts are devoted to using the
deep neural network to obtain good clustering representa-
tions. Xie, Girshick, and Farhadi (2016) propose Deep Em-
bedded Cluster (DEC) to learn and refine the features it-
eratively by optimizing a clustering objective based on an
auxiliary distribution. Unlike DEC, Yang et al. (2017) pro-
pose Deep Clustering Network (DCN) that performs nonlin-
ear dimensionality reduction and k-means clustering jointly
to learn friendly representation. Chang et al. (2017) (DAC)
apply unsupervised clustering to image clustering and pro-
poses a binary-classification framework that uses adaptive
learning for optimization. Then, DeepCluster (Caron et al.
2018) proposes an end-to-end training method that performs
cluster assignments and representation learning alternately.
However, the key drawback of unsupervised methods is their
incapability of taking advantage of prior knowledge to guide
the clustering.
Semi-supervised Clustering With the aid of a few la-
beled data, semi-supervised clustering usually produces bet-
ter results compared with unsupervised counterparts. PCK-
Means (Basu, Banerjee, and Mooney 2004) proposes that
the clustering can be supervised by pairwise constraints be-
tween samples in the dataset. KCL (Hsu, Lv, and Kira 2017)
transfers knowledge in the form of pairwise similarity pre-
dictions firstly and learns a clustering network to transfer
learning. Along this line, MCL (Hsu et al. 2019) further for-
mulates multi-classification as meta classification that pre-
dicts pairwise similarity and generalizes the framework to
various settings. DTC (Han, Vedaldi, and Zisserman 2019)
extends the DEC algorithm and proposes a mechanism to
estimate the number of new images categories using la-
beled data. When it comes to the field of text clustering,
CDAC+ (Lin, Xu, and Zhang 2020) combines the pairwise
constraints and target distribution to discover new intents
while DeepAligned (Zhang et al. 2021) introduces an align-
ment strategy to improve the clustering consistency. Very re-
cently, SCL (Shen et al. 2021) incorporates a strong back-
bone MPNet in the Siamese Network structure with con-
trastive loss (or rely on a large amount of additional external
data (Zhang et al. 2022)) to learn the better sentence repre-
sentations. Although these methods take known intents into
account, they may suffer from knowledge forgetting during
the training process. More importantly, these methods are
摘要:

DiscoveringNewIntentsUsingLatentVariablesYunhuaZhou,PeijuLiu,YuxinWang,XipengQiu*SchoolofComputerScience,FudanUniversityfzhouyh20,xpqiug@fudan.edu.cnfpjliu21,wangyuxin21g@m.fudan.edu.cnAbstractDiscoveringnewintentsisofgreatsignicancetoestablish-ingBootstrappedTask-OrientedDialogueSystem.Mostex-isti...

展开>> 收起<<
Discovering New Intents Using Latent Variables Yunhua Zhou Peiju Liu Yuxin Wang Xipeng Qiu School of Computer Science Fudan University.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:413.53KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注