DialogUSR Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection Haoran Meng1Xin Zheng2 4Tianyu Liu3yZizhen Wang3He Feng3

2025-05-06 0 0 1.14MB 17 页 10玖币
侵权投诉
DialogUSR: Complex Dialogue Utterance Splitting and Reformulation
for Multiple Intent Detection
Haoran Meng1Xin Zheng2 4Tianyu Liu3Zizhen Wang3He Feng3
Binghuai Lin3Xuemin Zhao3Yunbo Cao3Zhifang Sui1
1MOE Key Laboratory of Computational Linguistics, Peking University, China
2Institute of Software, Chinese Academy of Sciences, China
3Tencent Cloud Xiaowei 4University of Chinese Academy of Sciences, China
haoran@stu.pku.edu.cn;zhengxin2020@iscas.ac.cn;{rogertyliu,zizhenwang,
mobisysfeng,binghuailin,xueminzhao,yunbocao}@tencent.com;szf@pku.edu.cn
Abstract
While interacting with chatbots, users may
elicit multiple intents in a single dialogue ut-
terance. Instead of training a dedicated multi-
intent detection model, we propose Dialo-
gUSR, a dialogue utterance splitting and refor-
mulation task that first splits multi-intent user
query into several single-intent sub-queries
and then recovers all the coreferred and omit-
ted information in the sub-queries. Dialo-
gUSR can serve as a plug-in and domain-
agnostic module that empowers the multi-
intent detection for the deployed chatbots with
minimal efforts. We collect a high-quality
naturally occurring dataset that covers 23 do-
mains with a multi-step crowd-souring proce-
dure. To benchmark the proposed dataset, we
propose multiple action-based generative mod-
els that involve end-to-end and two-stage train-
ing, and conduct in-depth analyses on the pros
and cons of the proposed baselines.
1 Introduction
Thanks to the technological advances of natural lan-
guage processing (NLP) in the last decade, modern
personal virtual assistants like Apple Siri, Amazon
Alexa have managed to interact with end users in a
more natural and human-like way. Taking chatbots
as human listeners, users may elicit multiple intents
within a single query. For example, in Figure 1, a
single user query triggers the inquiries on both high-
speed train ticket price and the weather of destina-
tion. To handle multi-intent user queries, a straight-
forward solution is to train a dedicated natural lan-
guage understanding (NLU) system for multi-intent
detection. Rychalska et al. (2018) first adopted hi-
erarchical structures to identify multiple user in-
tents. Gangadharaiah and Narayanaswamy (2019)
explored the joint multi-intent and slot-filling task
with a recurrent neural network. Qin et al. (2020)
*Equal contribution.
Corresponding authors.
打开空调和右侧车窗
Turn on AC and right window
打开空调右侧车窗
Open AC and right window
Intent1: ac_control
Intent2: car_control
打开空调
Turn on AC
打开右侧车窗
Open right window
DialogUSR Module Single-intent
NLU
Multi-intent NLU
Intent1:
ac_control
Multi-intent User Query
Intent2:
car_control
从北京坐高铁到南京多少钱那边天气怎么样
How long does it take from Beijing to Nanjing in high-speed
train and how is the weather there
Q1:从北京高铁南京多少钱
How long does it take from Beijing to
Nanjing in high-speed train
Q2:南京天气怎么样
How is the weather in Nanjing
DialogUSR Plug-in Module
Single-intent
NLU
Multi-intent NLU
Intent1 (Q1)
Intent2 (Q2)
Multi-intent User Query
北京高铁南京多少钱那边天气怎么样
How long does it take from Beijing to Nanjing in high-speed
train and how is the weather there
Intent1: High_speed_train_ticket_price
DepartureFrom: Beijing DestinationTo: Nanjing
Intent2:Get_weather_POI POI: Nanjing
Figure 1: The task illustration for DialogUSR. It serves
as a plug-in module that empowers multi-intent de-
tection capability for deployed single-intent NLU sys-
tems.
further proposed an adaptive graph attention net-
work to model the joint intent-slot interaction. To
integrate the multi-intent detection model into a
product dialogue system, the developers would
make extra efforts in continuous deployment, i.e.
technical support for both single-intent and multi-
intent detection models, and system modifications,
i.e. changes in the APIs and implementations of
NLU and other related modules.
To provide an alternative way towards under-
standing multi-intent user queries, we propose com-
plex dialogue utterance splitting and reformulation
(DialogUSR) task with corresponding benchmark
dataset that firstly splits the multi-intent query into
several single-intent sub-queries and then recover
the coreferred and omitted information in the sub-
queries, as illustrated in Fig 1. With the proposed
task and dataset, the practitioners can train a multi-
intent query rewriting model that serves as a plug-in
module for the existing chatbot system with mini-
mal efforts. The trained transformation models are
also domain-agnostic in the sense that the learned
arXiv:2210.11279v1 [cs.CL] 20 Oct 2022
query splitting and rewriting skills in DialogUSR
are generic for multi-intent complex user queries
from diverse domains.
We employ a multi-step crowdsourcing proce-
dure to annotate the dataset for DialogUSR which
covers 23 domains with 11.6k instances. The natu-
rally occurring coreferences and omissions account
for 62.5% of the total human-written sub-queries,
which conforms to the genuine user preferences.
Specifically we first collect initial queries from
2 Chinese task-oriented NLU datasets that cover
real-world user-agent interactions, then ask the an-
notators to write the subsequent queries as they
were sending multiple intents to the chatbots, fi-
nally we aggregate the human written sub-queries
and provide completed sub-queries if coreferences
and omissions are involved. We also employ mul-
tiple screening and post-checking protocols in the
entire data creation process, in order to ensure the
high quality of the proposed dataset.
For baseline models, we carefully analyze the
transformation from the input multi-intent queries
to the corresponding single-intent sub-queries and
summarize multiple rewriting actions, including
deletion
,
splitting
,
completion
and
causal
completion
which are the local edits in the gener-
ation. Based on the summarized actions, we pro-
posed three types of generative baselines: end-to-
end, two-stage and causal two-stage models which
are empowered by strong pretrained models, and
conduct a series of empirical studies including the
exploration on the best action combination, the
model performance on different training data scale
and existing multi-intent NLU datasets.
We summarize our contributions as follows1:
1)
The biggest challenges of multi-intent detec-
tion (MID) in the deployment is the heavy code
refactoring on a running dialogue system which
already does a good job in single-intent detection.
It motivates us to design DialogUSR, which serves
as a plug-in module and eases the difficulties of
incremental development.
2)
Prior work on MID has higher cost of data
annotation and struggles in the open-domain or do-
main transfer scenarios. Only NLU experts can
adequately annotate the intent/slot info for a MID
user query, and the outputs of MID NLU models
are naturally limited by the pre-defined intent/slot
ontology. In contrast, DialogUSR datasets can be
1
Code and data are provided in
https://github.com/
MrZhengXin/multi_intent_2022.
!"#$%&'()* +,-./0123456789:;<
=>?@+A/BC6DEFG
Check the high-speed train from Xiamen to Nanjing on Friday afternoon,
how long does the journey take, then check out the special food there.
H#IJ%&'K)* +,-./012345678 LHMN 9
:;<=> LHMN?@+A/BC6DEFG
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
[SP] how long does the journey take [SP] then check out the special food
there.
OPIP%P&'Q)* +,-./012345678 LHMN
9:;<=> LHMN +A/BC6DEFG
Translation is the same as above
RST#IP%P&'U)* +,-./012345678 LHMN
123456789:;<=> LHMN +A/456
DEFG
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
[SP] How long does it take to travel from Xiamen to Nanjing in high-
speed train [SP] Check out the special cuisine in Nanjing
RV$WVI RST#IP%P*
H%P#(&'X)Y+,-./012345678 Z[
+,-./012345678
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
=>
Translation is the same as above
H%P#K&'\)Y+,-./012345678 LHMN 9
:;<=> Z[ 123456789:;<=>
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
[SP] how long does the journey take => How long does it take to travel
from Xiamen to Nanjing in high-speed train
H%P#Q&'])Y+,-./012345678 LHMN 9
:;<=> LHMN ?@+A/BC6DEFG
Z[ +A/456DEFG
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
[SP] how long does the journey take [SP] then check out the special food
there => Check out the special cuisine in Nanjing
^"_`%S`P"_ &^K^)* '( 'U
abS`W%VcP &d"eP)* '( 'K 'U
abS`W%VcP &RVW$VI)* '( 'K L'X '\ ']N
Step1. Initial Query Collection
+,-./012345678
Check the high-speed train from Xiamen to
Nanjing on Friday afternoon
Task-oriented
Query Datasets
fghi:+,A/j12klh
-./0345678
Hi, Iwanna check the high-speed train that
departs from Xiamen and arrives in Nanjing on
Friday afternoon
Sampling
Sentence
Simplification
Step2. Follow-up Query Creation
'(* +,-./012345678
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
'K* 9:;<=>
How long does it take
'Q* +A/BC6DEFG
Check out the special cuisine there
Step3. Query Aggregation
'(* +,-./012345678
Check the high-speed train from Xiamen to Nanjing on Friday afternoon
'K* 9:;<=>
How long does it take
'Q* +A/BC6DEFG
Check out the special cuisine there
mccnPcV%JS"* +,-./0123456789:;
<=>?@+A/BC6DEFG
Check the high-speed train from Xiamen to Nanjing on Friday afternoon,
how long does the journey take, then check out the special food there.
'K* 9:;<=>
How long does it take
'Q* +A/BC6
DEFG
Check out the special
cuisine there
'KonPb* 123456789
:;<=>
How long does it take to travel from
Xiamen to Nanjing in high-speed train
'QonPb* +A/456DEFG
Check out the special cuisine in Nanjing
Step4. Query Completion
Figure 2: The overview for the data collection proce-
dure of DialogUSR. Firstly we sample initial queries
from task-oriented NLU datasets (Sec. 2.1), then we
hire crowdsource workers to write follow-up queries
(Sec. 2.2). To aggregate the annotated queries, we pro-
pose text filler templates (marked in red, Sec. 2.3) and
post-processing procedure. Finally we ask annotators
to recover the missing information in the incomplete
utterances (marked in blue, Sec. 2.4).
easily annotated by non-experts, and the derived
models are domain-agnostic in the sense that the
learned query splitting, coreference/omission re-
covery skills are generic for distinct domains
3)
Presumably MID is more difficult than sin-
gle intent detection (SID) given the same inten-
t/slot ontology. From the perspective of task
(re)formulation, DialogUSR is the first to convert
a MID task to multiple SID tasks (the philosophy
of ’divide and conquer’) with a relatively low er-
ror propagation rate, providing an alternative and
effective way to handle the MID task.
2 Dataset Creation
We collect a high quality dataset via a 4-step crowd-
sourcing procedure as illustrated in Fig 2.
2.1 Initial Query Collection
In order to determine the topic of the multi-intent
user query, we sample an initial query from two
Chinese user query understanding datasets for
task-oriented conversational agents, namely SMP-
ECDT
2
(Zhang et al.,2017) and RiSAWOZ
3
(Quan
et al.,2020). Then we ask human annotators to sim-
plify the initial queries that have excessive length
(longer than 15 characters), or are too verbose or
repetitive in terms of semantics
4
. RiSAWOZ is a
a large-scale multi-domain Chinese Wizard-of-Oz
NLU dataset with rich semantic annotations, which
covers 12 domains in tourist attraction,railway,
hotel,restaurant, etc. SMP-ECDT is released as
the benchmark for the “domain and intent identifi-
cation for user query” task in the evaluation track
of Chinese Social Media Processing conference
(SMP) 2017 and 2019. It covers divergent practical
user queries from 30 domains which are collected
from the production chatbots of iFLYTEK. We use
the two source datasets as our query resources as
they comprise a variety of common and naturally
occurring user queries in daily life for task-oriented
chatbot and cover diverse domains and topics.
2.2 Follow-up Query Creation
After specifying an initial query, we ask human an-
notators to put themselves in the same position of
a real end user and imagine they are eliciting mul-
tiple intents in a single complex user query while
interacting with conversational agents. The anno-
tators are instructed to write up to 3 subsequent
queries on what they need or what they would like
to know about according to the designated initial
query. Although most subsequent queries stick to
the topic of the initial query, we allow the human
annotators to switch to a different topic which is un-
related to the initial query
5
. For example in Figure
1, the second sub-query asks about the weather in
Nanjing, where the initial query is an inquiry on the
2http://ir.hit.edu.cn/SMP2017-ECDT
3https://github.com/terryqj0107/RiSAWOZ
4
The sentence simplification phase makes the annotated
multi-intent queries sound more natural, as users are not likely
to elicit a lengthy query. Given the fact that we would add 2
or 3 following sub-queries to the initial queries, they should
be simplified to keep a proper query length (Fig 2).
5
In fact, we neither encourage nor discourage topic
switching in the annotation instruction.
railway information. We observe that 37.3% anno-
tated multi-intent queries involve topic switching
by manually checking 300 subsampled instances
in the training set, which conforms to the user be-
haviour in the real-world multi-intent queries.
2.3 Query Aggregation
In the pilot study, we tried to ask human anno-
tators to manually aggregate the sub-queries but
found that the derived queries are somewhat lack
of variations in the conjunctions between the sub-
queries, as the annotators tend to always pick up
the most common Chinese conjunctions like ’and’,
’or’, ’then’. We even observed sloppy annotators
trying to hack the annotation job by not using any
conjunctions at all for each query (most queries are
fluent even without conjunctions). In a nutshell,
we find it challenging to screen the annotators and
ensure the diversity and naturalness of the derived
query in the human-only annotation. We then resort
to human-in-the-loop annotation, sampling from
a rich conjunction set to connect sub-queries and
post-checking the sentence fluency of aggregated
queries by GPT-2. After each round of annotation
(we have 6 rounds of annotations), we randomly
pick up 100 samples and check their quality, find-
ing that over 95% of samples are of high quality.
Actually most sentences in the Fig 9 (appendix) are
fluent and natural (especially in Chinese) without
cherry-picking.
More concretely we propose a set of pre-defined
templates that correspond to different text infilling
strategies between consecutive queries. Specifi-
cally, with a 50% chance we concatenate two con-
secutive queries without using any text filler. For
the other 50% chance, we sample a piece of text
from a set of pre-defined text fillers with differ-
ent sampling weights, such as “
” (first of all),
” (and), “
” (I also would like to
know), “
” (then), “
” (finally), and then
use the sampled text filler as a conjunction while
concatenating consecutive queries. Although be-
ing locally coherent, the derived multi-intent query
may still exhibit some global incoherence and syn-
tactic issues, especially for longer text. We thus
post-process the derived query with a ranking pro-
cedure as an additional screening step. For each an-
notated query set, we generate 10 candidate multi-
intent queries with different sampled templates and
rank them according to language model perplexity
using a GPT-2 (117M) model. We only keep the
1373
1216
995
825 823
716 701
606 546 501 500 500 447 380
260 248 218 203 198 139 108 72 70
0
200
400
600
800
1000
1200
1400
1600
Figure 3: The domain statistics of DialogUSR, which covers diverse domains in the conversational agents.
the candidate with lowest perplexity to ensure the
fluency and syntactic correctness. To avoid trivial
hacks in the complex query splitting, we remove
all the punctuations in the aggregated query, which
conforms to the default settings of most production
chatbots, i.e. no punctuations in the spoken lan-
guage understanding phase after going through the
automatic speech recognition module.
2.4 Query Completion
After assembling the multi-intent user queries, we
observe that incomplete utterances, such as co-
references and omissions, are frequently occurring
which account for 62.5% of total human-written
subsequent queries. Note that, in the annotation in-
struction, we do not explicitly ask the crowdsource
worker to use coreferences or omissions while writ-
ing the subsequent queries in the follow-up query
creation phase. The naturally occurring incomplete
utterances reflect genuine user preferences while
sending out multiple intents. To gather sufficient in-
formation while splitting multi-intent queries into
independent single-intent queries, we ask another
group of annotators
6
to write the completed utter-
ances by recovering omitted and co-referred infor-
mation for the incomplete queries.
2.5 Data Annotation Settings
To perform human annotation, we hired crowd-
source workers from an internal data annotating
group. The workers were limited to those who
6
The query completion phase starts when follow-up query
creation phase has finished. We hire another group of annota-
tors that did not participate in the follow-up query writing task
to screen the quality of rewritten queries while doing query
completion.
have abundant hand-on experiences in annotating
conversational data with good records (recognized
as experts in the internal assessment, rejection rate
1%). Additionally, all the workers were screened
via a 10-case qualification test that covers various
annotation tasks in Sec 2.1 to Sec 2.4 (correctly
annotating 8 out of 10 cases). They were paid 0.6$
per datapoint, which is more than prevailing local
minimum wage. We split the entire annotation pro-
cedure into multiple rounds and hire another group
of human judges to post-check the quality of anno-
tated dataset and filter unqualified instances after
each round. In this way, we create a high-quality
crowdsourcing dataset.
3 Dataset Analysis
Dataset Statistics
In total, after accumulating
annotations for several rounds, we obtain 11,669
instances. We conduct 6 rounds of annotation,
increasing the annotation scale with each round
(ranging from
100 instances/round to
4000 in-
stances/round). On average, an aggregated multi-
intent complex query from the proposed Dialo-
gUSR dataset comprises 36.7 Chinese characters
by assembling 3.6 single-intent queries (including
initial and follow-up queries). After recovering
missing information in the query completion phase
(Sec 2.4), the average lengths of completed ini-
tial query, first follow-up query, second follow-up
query and third follow-up query are 11.9, 12.3,
12.4, 10.8 respectively. We split the dataset into
train, validation and test sets with sizes of 10,169 ,
500, 1,000 respectively.
Domain Statistics
The domain statistics of Dia-
logUSR is depicted in Fig 3. Thanks to the diverse
摘要:

DialogUSR:ComplexDialogueUtteranceSplittingandReformulationforMultipleIntentDetectionHaoranMeng1XinZheng24TianyuLiu3yZizhenWang3HeFeng3BinghuaiLin3XueminZhao3YunboCao3ZhifangSui1y1MOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,China2InstituteofSoftware,ChineseAcademyofSciences,China3...

展开>> 收起<<
DialogUSR Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection Haoran Meng1Xin Zheng2 4Tianyu Liu3yZizhen Wang3He Feng3.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:1.14MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注