Discovering New Intents Using Latent Variables Yunhua Zhou Peiju Liu Yuxin Wang Xipeng Qiu School of Computer Science Fudan University

2025-05-03 0 0 413.53KB 8 页 10玖币

侵权投诉

Discovering New Intents Using Latent Variables

Yunhua Zhou, Peiju Liu, Yuxin Wang, Xipeng Qiu*

School of Computer Science, Fudan University

{zhouyh20,xpqiu}@fudan.edu.cn

{pjliu21, wangyuxin21}@m.fudan.edu.cn

Abstract

Discovering new intents is of great signiﬁcance to establish-

ing Bootstrapped Task-Oriented Dialogue System. Most ex-

isting methods either lack the ability to transfer prior knowl-

edge in the known intent data or fall into the dilemma of for-

getting prior knowledge in the follow-up. More importantly,

these methods do not deeply explore the intrinsic structure

of unlabeled data, so they can not seek out the characteris-

tics that make an intent in general. In this paper, starting from

the intuition that discovering intents could be beneﬁcial to

the identiﬁcation of the known intents, we propose a prob-

abilistic framework for discovering intents where intent as-

signments are treated as latent variables. We adopt Expecta-

tion Maximization framework for optimization. Speciﬁcally,

In E-step, we conduct discovering intents and explore the in-

trinsic structure of unlabeled data by the posterior of intent

assignments. In M-step, we alleviate the forgetting of prior

knowledge transferred from known intents by optimizing the

discrimination of labeled data. Extensive experiments con-

ducted in three challenging real-world datasets demonstrate

our method can achieve substantial improvements.

Introduction

Unknown intent detection (Zhou, Liu, and Qiu 2022) in

Bootstrapped Task-Oriented Dialogue System (BTODS)

has gradually attracted more and more attention from re-

searchers. However, detecting unknown intent is only the

ﬁrst step. For BTODS, discovering new intents is not only

the same basic but also more crucial and challenging. Be-

cause the preset intent set in BTODS is limited to cover all

intents, BTODS should discover potential new intents ac-

tively during interacting with the users. Speciﬁcally, a large

number of valuable unlabeled data will be generated within

the interaction between users and the dialogue system. Con-

sidering the limited labeled corpus and time-consuming an-

notating, which also requires prior domain knowledge, the

BTODS should adaptively identify known intents and dis-

cover unknown intents from those unlabeled data with the

aid of limited labeled data.

Just as discovering new intents plays a crucial role in es-

tablishing BTODS, discovering new intents has raised a lot

of research interest as unknown intent detection. Unsuper-

vised cluster learning is one popular method to solve this

*Corresponding author.

problem. To discover new intents from a large number of

unlabeled data, many works (Hakkani-T¨

ur et al. 2013, 2015;

Shi et al. 2018; Padmasundari 2018) formalize this problem

as an unsupervised clustering process. However, these meth-

ods mainly focus on how to construct pseudo-supervised sig-

nals to assist in guiding the clustering process and do not

fully utilize the prior knowledge contained in the existing

labeled data.

In a more general real scenario, we often have a small

(but containing prior knowledge that can be used to guide

the discovery of new intents) amount of labeled data in ad-

vance and a large amount of unlabeled data (e.g., in the dia-

logue scene mentioned above, it is generated in the interac-

tion with the dialogue system), which contains both known

intents and unknown intents to be discovered.Our purpose

is to identify the known intents and discover the potential

intents contained in the unlabeled corpus using labeled data.

Recently, Lin, Xu, and Zhang (2020) propose that pair-

wise similarities can be used as pseudo supervision sig-

nals to guide the discovery of new intents. However, as in

the analysis of Zhang et al. (2021), this method can not

achieve effective performance when there are more new

intents to be discovered. Inspired by Caron et al. (2018),

Zhang et al. (2021) (DeepAligned) propose an effective

method for discovering new intents. DeepAligned ﬁrst ﬁne-

tunes the BERT (Devlin et al. 2018) using the labeled data to

transfer the prior knowledge and generalize the knowledge

into the semantic features of unlabeled data. Further, to learn

friendly representations for clustering, DeepAligned assigns

a pseudo label to each unlabeled utterance and re-trains the

model under the supervision of the softmax which is calcu-

lated by those pseudo labels.

Nevertheless, DeepAligned may suffer from two critical

problems. Firstly, when the model is re-trained with pseudo

supervision signal, the model will forget the knowledge

transferred in the transferring stage, the forgetting curves

on different datasets as shown in Figure 1. During discov-

ering intents in DeepAligned, we test the performance of

the model on the validation set used in the transferring

prior knowledge stage and show that with the advancement

of clustering, the model constantly forgets the knowledge

learned from labeled data. Furthermore, the model could be

misled by inaccurate pseudo labels, particularly in large-

sized intent space (Wang et al. 2021). More importantly,

arXiv:2210.11804v1 [cs.CL] 21 Oct 2022

Figure 1: The forgetting curves of DeepAligned (Blue).

During discovering intents in DeepAligned, the model con-

stantly forgets the knowledge learned from labeled data. The

brown line represents the baseline obtained by the model

after transferring prior knowledge. In contrast, our method

(Red) can alleviate forgetting well and See subsequent sec-

tion for more discuss.

softmax loss formed by pseudo labels cannot explore the in-

trinsic structure of unlabeled data, so it can not provide ac-

curate clustering supervised signals for discovering intents.

Different from the previous methods, we start from the

essential intuition that the discovery of intents should not

damage the identiﬁcation of the known intents and the two

processes should achieve a win-win situation. The knowl-

edge contained in labeled data corpus can be used to guide

the discovery of the new intents, and the information learned

from the unlabeled corpus (in the process of discovering)

could improve the identiﬁcation of the known intents.

Based on this intuition, with the help of optimizing iden-

tiﬁcation of labeled data given the whole data corpus, we

propose a principled probabilistic framework for intents dis-

covery, where intent assignments as a latent variable. Ex-

pectation maximization provides a principal template for

learning this typical latent variable model. Speciﬁcally, in

the E-step, we use the current model to discover intents and

calculate a speciﬁed posterior probability of intent assign-

ments, which is to explore the intrinsic structure of data. In

the M-step, maximize the probability of identiﬁcation of la-

beled data (which is to mitigate catastrophic forgetting) and

the posterior probability of intent assignments (which is to

help learn friendly features for discovering new intents) si-

multaneously to optimize and update model parameters. Ex-

tensive experiments conducted in three benchmark datasets

demonstrate our method can achieve substantial improve-

ments over strong baselines. We summarize our contribu-

tions as follows:

(Theory) We introduce a principled probabilistic frame-

work for discovering intents and provide a learning algo-

rithm based on Expectation Maximization. To the best of our

knowledge, this is the ﬁrst complete theoretical framework

in this ﬁeld and we hope it can inspire follow-up research.

(Methodology) We provide an efﬁcient implementation

based on the proposed probabilistic framework. After trans-

ferring prior knowledge, we use a simple and effective

method to alleviate the forgetting. Furthermore, we use the

contrastive learning paradigm to explore the intrinsic struc-

ture of unlabeled data, which not only avoids misleading the

model caused by relying on pseudo labels but also helps to

better learn the features that are friendly to intent discovery.

(Experiments and Analysis) We conduct extensive ex-

periments on a suite of real-world datasets and establish sub-

stantial improvements.

Related Work

Our work is mainly related to two lines of research: Unsu-

pervised and Semi-supervised clustering.

Unsupervised Clustering Extracting meaningful infor-

mation from unlabeled data has been studied for a long time.

Traditional approaches like K-means (MacQueen et al.

1967) and Agglomerative Clustering (AC) (Gowda and Kr-

ishna 1978) are seminal but hardly perform well in high-

dimensional space. Recent efforts are devoted to using the

deep neural network to obtain good clustering representa-

tions. Xie, Girshick, and Farhadi (2016) propose Deep Em-

bedded Cluster (DEC) to learn and reﬁne the features it-

eratively by optimizing a clustering objective based on an

auxiliary distribution. Unlike DEC, Yang et al. (2017) pro-

pose Deep Clustering Network (DCN) that performs nonlin-

ear dimensionality reduction and k-means clustering jointly

to learn friendly representation. Chang et al. (2017) (DAC)

apply unsupervised clustering to image clustering and pro-

poses a binary-classiﬁcation framework that uses adaptive

learning for optimization. Then, DeepCluster (Caron et al.

2018) proposes an end-to-end training method that performs

cluster assignments and representation learning alternately.

However, the key drawback of unsupervised methods is their

incapability of taking advantage of prior knowledge to guide

the clustering.

Semi-supervised Clustering With the aid of a few la-

beled data, semi-supervised clustering usually produces bet-

ter results compared with unsupervised counterparts. PCK-

Means (Basu, Banerjee, and Mooney 2004) proposes that

the clustering can be supervised by pairwise constraints be-

tween samples in the dataset. KCL (Hsu, Lv, and Kira 2017)

transfers knowledge in the form of pairwise similarity pre-

dictions ﬁrstly and learns a clustering network to transfer

learning. Along this line, MCL (Hsu et al. 2019) further for-

mulates multi-classiﬁcation as meta classiﬁcation that pre-

dicts pairwise similarity and generalizes the framework to

various settings. DTC (Han, Vedaldi, and Zisserman 2019)

extends the DEC algorithm and proposes a mechanism to

estimate the number of new images categories using la-

beled data. When it comes to the ﬁeld of text clustering,

CDAC+ (Lin, Xu, and Zhang 2020) combines the pairwise

constraints and target distribution to discover new intents

while DeepAligned (Zhang et al. 2021) introduces an align-

ment strategy to improve the clustering consistency. Very re-

cently, SCL (Shen et al. 2021) incorporates a strong back-

bone MPNet in the Siamese Network structure with con-

trastive loss (or rely on a large amount of additional external

data (Zhang et al. 2022)) to learn the better sentence repre-

sentations. Although these methods take known intents into

account, they may suffer from knowledge forgetting during

the training process. More importantly, these methods are

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DiscoveringNewIntentsUsingLatentVariablesYunhuaZhou,PeijuLiu,YuxinWang,XipengQiu*SchoolofComputerScience,FudanUniversityfzhouyh20,xpqiug@fudan.edu.cnfpjliu21,wangyuxin21g@m.fudan.edu.cnAbstractDiscoveringnewintentsisofgreatsignicancetoestablish-ingBootstrappedTask-OrientedDialogueSystem.Mostex-isti...

展开>> 收起<<

Discovering New Intents Using Latent Variables Yunhua Zhou Peiju Liu Yuxin Wang Xipeng Qiu School of Computer Science Fudan University.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Discovering New Intents Using Latent Variables Yunhua Zhou Peiju Liu Yuxin Wang Xipeng Qiu School of Computer Science Fudan University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: