DialogUSR Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection Haoran Meng1Xin Zheng2 4Tianyu Liu3yZizhen Wang3He Feng3

2025-05-06 0 0 1.14MB 17 页 10玖币

侵权投诉

DialogUSR: Complex Dialogue Utterance Splitting and Reformulation

for Multiple Intent Detection

Haoran Meng1∗Xin Zheng2 4∗Tianyu Liu3∗† Zizhen Wang3He Feng3

Binghuai Lin3Xuemin Zhao3Yunbo Cao3Zhifang Sui1†

1MOE Key Laboratory of Computational Linguistics, Peking University, China

2Institute of Software, Chinese Academy of Sciences, China

3Tencent Cloud Xiaowei 4University of Chinese Academy of Sciences, China

haoran@stu.pku.edu.cn;zhengxin2020@iscas.ac.cn;{rogertyliu,zizhenwang,

mobisysfeng,binghuailin,xueminzhao,yunbocao}@tencent.com;szf@pku.edu.cn

Abstract

While interacting with chatbots, users may

elicit multiple intents in a single dialogue ut-

terance. Instead of training a dedicated multi-

intent detection model, we propose Dialo-

gUSR, a dialogue utterance splitting and refor-

mulation task that ﬁrst splits multi-intent user

query into several single-intent sub-queries

and then recovers all the coreferred and omit-

ted information in the sub-queries. Dialo-

gUSR can serve as a plug-in and domain-

agnostic module that empowers the multi-

intent detection for the deployed chatbots with

minimal efforts. We collect a high-quality

naturally occurring dataset that covers 23 do-

mains with a multi-step crowd-souring proce-

dure. To benchmark the proposed dataset, we

propose multiple action-based generative mod-

els that involve end-to-end and two-stage train-

ing, and conduct in-depth analyses on the pros

and cons of the proposed baselines.

1 Introduction

Thanks to the technological advances of natural lan-

guage processing (NLP) in the last decade, modern

personal virtual assistants like Apple Siri, Amazon

Alexa have managed to interact with end users in a

more natural and human-like way. Taking chatbots

as human listeners, users may elicit multiple intents

within a single query. For example, in Figure 1, a

single user query triggers the inquiries on both high-

speed train ticket price and the weather of destina-

tion. To handle multi-intent user queries, a straight-

forward solution is to train a dedicated natural lan-

guage understanding (NLU) system for multi-intent

detection. Rychalska et al. (2018) ﬁrst adopted hi-

erarchical structures to identify multiple user in-

tents. Gangadharaiah and Narayanaswamy (2019)

explored the joint multi-intent and slot-ﬁlling task

with a recurrent neural network. Qin et al. (2020)

*Equal contribution.

†Corresponding authors.

打开空调和右侧车窗

Turn on AC and right window

打开空调和右侧车窗

Open AC and right window

Intent1: ac_control

Intent2: car_control

打开空调

Turn on AC

打开右侧车窗

Open right window

DialogUSR Module Single-intent

NLU

Multi-intent NLU

Intent1:

ac_control

Multi-intent User Query

Intent2:

car_control

从北京坐高铁到南京多少钱那边天气怎么样

How long does it take from Beijing to Nanjing in high-speed

train and how is the weather there

Q1:从北京坐高铁到南京多少钱

How long does it take from Beijing to

Nanjing in high-speed train

Q2:南京天气怎么样

How is the weather in Nanjing

DialogUSR Plug-in Module

Single-intent

NLU

Multi-intent NLU

Intent1 (Q1)

Intent2 (Q2)

Multi-intent User Query

从北京坐高铁到南京多少钱那边天气怎么样

How long does it take from Beijing to Nanjing in high-speed

train and how is the weather there

Intent1: High_speed_train_ticket_price

DepartureFrom: Beijing DestinationTo: Nanjing

Intent2:Get_weather_POI POI: Nanjing

Figure 1: The task illustration for DialogUSR. It serves

as a plug-in module that empowers multi-intent de-

tection capability for deployed single-intent NLU sys-

tems.

further proposed an adaptive graph attention net-

work to model the joint intent-slot interaction. To

integrate the multi-intent detection model into a

product dialogue system, the developers would

make extra efforts in continuous deployment, i.e.

technical support for both single-intent and multi-

intent detection models, and system modiﬁcations,

i.e. changes in the APIs and implementations of

NLU and other related modules.

To provide an alternative way towards under-

standing multi-intent user queries, we propose com-

plex dialogue utterance splitting and reformulation

(DialogUSR) task with corresponding benchmark

dataset that ﬁrstly splits the multi-intent query into

several single-intent sub-queries and then recover

the coreferred and omitted information in the sub-

queries, as illustrated in Fig 1. With the proposed

task and dataset, the practitioners can train a multi-

intent query rewriting model that serves as a plug-in

module for the existing chatbot system with mini-

mal efforts. The trained transformation models are

also domain-agnostic in the sense that the learned

arXiv:2210.11279v1 [cs.CL] 20 Oct 2022

query splitting and rewriting skills in DialogUSR

are generic for multi-intent complex user queries

from diverse domains.

We employ a multi-step crowdsourcing proce-

dure to annotate the dataset for DialogUSR which

covers 23 domains with 11.6k instances. The natu-

rally occurring coreferences and omissions account

for 62.5% of the total human-written sub-queries,

which conforms to the genuine user preferences.

Speciﬁcally we ﬁrst collect initial queries from

2 Chinese task-oriented NLU datasets that cover

real-world user-agent interactions, then ask the an-

notators to write the subsequent queries as they

were sending multiple intents to the chatbots, ﬁ-

nally we aggregate the human written sub-queries

and provide completed sub-queries if coreferences

and omissions are involved. We also employ mul-

tiple screening and post-checking protocols in the

entire data creation process, in order to ensure the

high quality of the proposed dataset.

For baseline models, we carefully analyze the

transformation from the input multi-intent queries

to the corresponding single-intent sub-queries and

summarize multiple rewriting actions, including

deletion

splitting

completion

and

causal

completion

which are the local edits in the gener-

ation. Based on the summarized actions, we pro-

posed three types of generative baselines: end-to-

end, two-stage and causal two-stage models which

are empowered by strong pretrained models, and

conduct a series of empirical studies including the

exploration on the best action combination, the

model performance on different training data scale

and existing multi-intent NLU datasets.

We summarize our contributions as follows1:

The biggest challenges of multi-intent detec-

tion (MID) in the deployment is the heavy code

refactoring on a running dialogue system which

already does a good job in single-intent detection.

It motivates us to design DialogUSR, which serves

as a plug-in module and eases the difﬁculties of

incremental development.

Prior work on MID has higher cost of data

annotation and struggles in the open-domain or do-

main transfer scenarios. Only NLU experts can

adequately annotate the intent/slot info for a MID

user query, and the outputs of MID NLU models

are naturally limited by the pre-deﬁned intent/slot

ontology. In contrast, DialogUSR datasets can be

Code and data are provided in

https://github.com/

MrZhengXin/multi_intent_2022.

!"#$%&'()* +,-./0123456789:;<

=>?@+A/BC6DEFG

Check the high-speed train from Xiamen to Nanjing on Friday afternoon,

how long does the journey take, then check out the special food there.

H#IJ%&'K)* +,-./012345678 LHMN 9

:;<=> LHMN?@+A/BC6DEFG

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

[SP] how long does the journey take [SP] then check out the special food

there.

OPIP%P&'Q)* +,-./012345678 LHMN

9:;<=> LHMN +A/BC6DEFG

Translation is the same as above

RST#IP%P&'U)* +,-./012345678 LHMN

123456789:;<=> LHMN +A/456

DEFG

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

[SP] How long does it take to travel from Xiamen to Nanjing in high-

speed train [SP] Check out the special cuisine in Nanjing

RV$WVI RST#IP%P*

H%P#(&'X)Y+,-./012345678 Z[

+,-./012345678

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

Translation is the same as above

H%P#K&'\)Y+,-./012345678 LHMN 9

:;<=> Z[ 123456789:;<=>

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

[SP] how long does the journey take => How long does it take to travel

from Xiamen to Nanjing in high-speed train

H%P#Q&'])Y+,-./012345678 LHMN 9

:;<=> LHMN ?@+A/BC6DEFG

Z[ +A/456DEFG

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

[SP] how long does the journey take [SP] then check out the special food

there => Check out the special cuisine in Nanjing

^"_`%S`P"_ &^K^)* '( →'U

abS`W%VcP &d"eP)* '( →'K →'U

abS`W%VcP &RVW$VI)* '( →'K →L'X →'\ →']N

Step1. Initial Query Collection

+,-./012345678

Check the high-speed train from Xiamen to

Nanjing on Friday afternoon

Task-oriented

Query Datasets

fghi:+,A/j12klh

-./0345678

Hi, Iwanna check the high-speed train that

departs from Xiamen and arrives in Nanjing on

Friday afternoon

Sampling

Sentence

Simplification

Step2. Follow-up Query Creation

'(* +,-./012345678

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

'K* 9:;<=>

How long does it take

'Q* +A/BC6DEFG

Check out the special cuisine there

Step3. Query Aggregation

'(* +,-./012345678

Check the high-speed train from Xiamen to Nanjing on Friday afternoon

'K* 9:;<=>

How long does it take

'Q* +A/BC6DEFG

Check out the special cuisine there

mccnPcV%JS"* +,-./0123456789:;

<=>?@+A/BC6DEFG

Check the high-speed train from Xiamen to Nanjing on Friday afternoon,

how long does the journey take, then check out the special food there.

'K* 9:;<=>

How long does it take

'Q* +A/BC6

DEFG

Check out the special

cuisine there

'KonPb* 123456789

:;<=>

How long does it take to travel from

Xiamen to Nanjing in high-speed train

'QonPb* +A/456DEFG

Check out the special cuisine in Nanjing

Step4. Query Completion

Figure 2: The overview for the data collection proce-

dure of DialogUSR. Firstly we sample initial queries

from task-oriented NLU datasets (Sec. 2.1), then we

hire crowdsource workers to write follow-up queries

(Sec. 2.2). To aggregate the annotated queries, we pro-

pose text ﬁller templates (marked in red, Sec. 2.3) and

post-processing procedure. Finally we ask annotators

to recover the missing information in the incomplete

utterances (marked in blue, Sec. 2.4).

easily annotated by non-experts, and the derived

models are domain-agnostic in the sense that the

learned query splitting, coreference/omission re-

covery skills are generic for distinct domains

Presumably MID is more difﬁcult than sin-

gle intent detection (SID) given the same inten-

t/slot ontology. From the perspective of task

(re)formulation, DialogUSR is the ﬁrst to convert

a MID task to multiple SID tasks (the philosophy

of ’divide and conquer’) with a relatively low er-

ror propagation rate, providing an alternative and

effective way to handle the MID task.

2 Dataset Creation

We collect a high quality dataset via a 4-step crowd-

sourcing procedure as illustrated in Fig 2.

2.1 Initial Query Collection

In order to determine the topic of the multi-intent

user query, we sample an initial query from two

Chinese user query understanding datasets for

task-oriented conversational agents, namely SMP-

ECDT

(Zhang et al.,2017) and RiSAWOZ

(Quan

et al.,2020). Then we ask human annotators to sim-

plify the initial queries that have excessive length

(longer than 15 characters), or are too verbose or

repetitive in terms of semantics

. RiSAWOZ is a

a large-scale multi-domain Chinese Wizard-of-Oz

NLU dataset with rich semantic annotations, which

covers 12 domains in tourist attraction,railway,

hotel,restaurant, etc. SMP-ECDT is released as

the benchmark for the “domain and intent identiﬁ-

cation for user query” task in the evaluation track

of Chinese Social Media Processing conference

(SMP) 2017 and 2019. It covers divergent practical

user queries from 30 domains which are collected

from the production chatbots of iFLYTEK. We use

the two source datasets as our query resources as

they comprise a variety of common and naturally

occurring user queries in daily life for task-oriented

chatbot and cover diverse domains and topics.

2.2 Follow-up Query Creation

After specifying an initial query, we ask human an-

notators to put themselves in the same position of

a real end user and imagine they are eliciting mul-

tiple intents in a single complex user query while

interacting with conversational agents. The anno-

tators are instructed to write up to 3 subsequent

queries on what they need or what they would like

to know about according to the designated initial

query. Although most subsequent queries stick to

the topic of the initial query, we allow the human

annotators to switch to a different topic which is un-

related to the initial query

. For example in Figure

1, the second sub-query asks about the weather in

Nanjing, where the initial query is an inquiry on the

2http://ir.hit.edu.cn/SMP2017-ECDT

3https://github.com/terryqj0107/RiSAWOZ

The sentence simpliﬁcation phase makes the annotated

multi-intent queries sound more natural, as users are not likely

to elicit a lengthy query. Given the fact that we would add 2

or 3 following sub-queries to the initial queries, they should

be simpliﬁed to keep a proper query length (Fig 2).

In fact, we neither encourage nor discourage topic

switching in the annotation instruction.

railway information. We observe that 37.3% anno-

tated multi-intent queries involve topic switching

by manually checking 300 subsampled instances

in the training set, which conforms to the user be-

haviour in the real-world multi-intent queries.

2.3 Query Aggregation

In the pilot study, we tried to ask human anno-

tators to manually aggregate the sub-queries but

found that the derived queries are somewhat lack

of variations in the conjunctions between the sub-

queries, as the annotators tend to always pick up

the most common Chinese conjunctions like ’and’,

’or’, ’then’. We even observed sloppy annotators

trying to hack the annotation job by not using any

conjunctions at all for each query (most queries are

ﬂuent even without conjunctions). In a nutshell,

we ﬁnd it challenging to screen the annotators and

ensure the diversity and naturalness of the derived

query in the human-only annotation. We then resort

to human-in-the-loop annotation, sampling from

a rich conjunction set to connect sub-queries and

post-checking the sentence ﬂuency of aggregated

queries by GPT-2. After each round of annotation

(we have 6 rounds of annotations), we randomly

pick up 100 samples and check their quality, ﬁnd-

ing that over 95% of samples are of high quality.

Actually most sentences in the Fig 9 (appendix) are

ﬂuent and natural (especially in Chinese) without

cherry-picking.

More concretely we propose a set of pre-deﬁned

templates that correspond to different text inﬁlling

strategies between consecutive queries. Speciﬁ-

cally, with a 50% chance we concatenate two con-

secutive queries without using any text ﬁller. For

the other 50% chance, we sample a piece of text

from a set of pre-deﬁned text ﬁllers with differ-

ent sampling weights, such as “

首先

” (ﬁrst of all),

“

以及

” (and), “

我还想知道

” (I also would like to

know), “

接下来

” (then), “

最后

” (ﬁnally), and then

use the sampled text ﬁller as a conjunction while

concatenating consecutive queries. Although be-

ing locally coherent, the derived multi-intent query

may still exhibit some global incoherence and syn-

tactic issues, especially for longer text. We thus

post-process the derived query with a ranking pro-

cedure as an additional screening step. For each an-

notated query set, we generate 10 candidate multi-

intent queries with different sampled templates and

rank them according to language model perplexity

using a GPT-2 (117M) model. We only keep the

1373

1216

995

825 823

716 701

606 546 501 500 500 447 380

260 248 218 203 198 139 108 72 70

200

400

600

800

1000

1200

1400

1600

Figure 3: The domain statistics of DialogUSR, which covers diverse domains in the conversational agents.

the candidate with lowest perplexity to ensure the

ﬂuency and syntactic correctness. To avoid trivial

hacks in the complex query splitting, we remove

all the punctuations in the aggregated query, which

conforms to the default settings of most production

chatbots, i.e. no punctuations in the spoken lan-

guage understanding phase after going through the

automatic speech recognition module.

2.4 Query Completion

After assembling the multi-intent user queries, we

observe that incomplete utterances, such as co-

references and omissions, are frequently occurring

which account for 62.5% of total human-written

subsequent queries. Note that, in the annotation in-

struction, we do not explicitly ask the crowdsource

worker to use coreferences or omissions while writ-

ing the subsequent queries in the follow-up query

creation phase. The naturally occurring incomplete

utterances reﬂect genuine user preferences while

sending out multiple intents. To gather sufﬁcient in-

formation while splitting multi-intent queries into

independent single-intent queries, we ask another

group of annotators

to write the completed utter-

ances by recovering omitted and co-referred infor-

mation for the incomplete queries.

2.5 Data Annotation Settings

To perform human annotation, we hired crowd-

source workers from an internal data annotating

group. The workers were limited to those who

The query completion phase starts when follow-up query

creation phase has ﬁnished. We hire another group of annota-

tors that did not participate in the follow-up query writing task

to screen the quality of rewritten queries while doing query

completion.

have abundant hand-on experiences in annotating

conversational data with good records (recognized

as experts in the internal assessment, rejection rate

≤

1%). Additionally, all the workers were screened

via a 10-case qualiﬁcation test that covers various

annotation tasks in Sec 2.1 to Sec 2.4 (correctly

annotating 8 out of 10 cases). They were paid 0.6$

per datapoint, which is more than prevailing local

minimum wage. We split the entire annotation pro-

cedure into multiple rounds and hire another group

of human judges to post-check the quality of anno-

tated dataset and ﬁlter unqualiﬁed instances after

each round. In this way, we create a high-quality

crowdsourcing dataset.

3 Dataset Analysis

Dataset Statistics

In total, after accumulating

annotations for several rounds, we obtain 11,669

instances. We conduct 6 rounds of annotation,

increasing the annotation scale with each round

(ranging from

∼

100 instances/round to

∼

4000 in-

stances/round). On average, an aggregated multi-

intent complex query from the proposed Dialo-

gUSR dataset comprises 36.7 Chinese characters

by assembling 3.6 single-intent queries (including

initial and follow-up queries). After recovering

missing information in the query completion phase

(Sec 2.4), the average lengths of completed ini-

tial query, ﬁrst follow-up query, second follow-up

query and third follow-up query are 11.9, 12.3,

12.4, 10.8 respectively. We split the dataset into

train, validation and test sets with sizes of 10,169 ,

500, 1,000 respectively.

Domain Statistics

The domain statistics of Dia-

logUSR is depicted in Fig 3. Thanks to the diverse

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DialogUSR:ComplexDialogueUtteranceSplittingandReformulationforMultipleIntentDetectionHaoranMeng1XinZheng24TianyuLiu3yZizhenWang3HeFeng3BinghuaiLin3XueminZhao3YunboCao3ZhifangSui1y1MOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,China2InstituteofSoftware,ChineseAcademyofSciences,China3...

展开>> 收起<<

DialogUSR Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection Haoran Meng1Xin Zheng2 4Tianyu Liu3yZizhen Wang3He Feng3.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DialogUSR Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection Haoran Meng1Xin Zheng2 4Tianyu Liu3yZizhen Wang3He Feng3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: