Controllable Dialogue Simulation with In-Context Learning Zekun Li1 Wenhu Chen2 Shiyang Li1 Hong Wang1 Jing Qian1 Xifeng Yan1 1University of California Santa Barbara

2025-05-06 0 0 1.22MB 18 页 10玖币
侵权投诉
Controllable Dialogue Simulation with In-Context Learning
Zekun Li1, Wenhu Chen2, Shiyang Li1, Hong Wang1, Jing Qian1, Xifeng Yan1
1University of California, Santa Barbara
2University of Waterloo, Vector Institute
{zekunli, shiyangli, hongwang600, jing_qian, xyan}@cs.ucsb.edu
wenhuchen@uwaterloo.ca
Abstract
Building dialogue systems requires a large cor-
pus of annotated dialogues. Such datasets are
usually created via crowdsourcing, which is ex-
pensive and time-consuming. In this paper, we
propose DIALOGIC
1
, a novel dialogue simu-
lation method based on large language model
in-context learning to automate dataset creation.
Seeded with a few annotated dialogues, DIA-
LOGIC automatically selects in-context exam-
ples for demonstration and prompts GPT-3 to
generate new dialogues and annotations in a
controllable way. Our method can rapidly ex-
pand a small set of dialogue data with minimum
or zero human involvement and parameter up-
date and is thus much more cost-efficient and
time-saving than crowdsourcing. Experimen-
tal results on the MultiWOZ dataset demon-
strate that training a model on the simulated di-
alogues leads to even better performance than
using the same amount of human-generated
dialogues under the challenging low-resource
settings, with as few as 85 dialogues as a
seed. When enough data is available, our
method can still serve as an effective data
augmentation method. Human evaluation re-
sults also show that our simulated dialogues
have near-human fluency and annotation ac-
curacy. The code and data are available at
https://github.com/Leezekun/dialogic.
1 Introduction
Task-oriented dialogue (TOD) systems can assist
users in completing tasks such as booking a restau-
rant or making an appointment. Building such a
dialogue system requires a large corpus of anno-
tated dialogues (Wu et al.,2020), which is costly
to obtain in terms of money and time.
One popular approach to collecting and annotat-
ing task-oriented dialogues is crowdsourcing via
a Wizard-of-Oz setup (Mrksic et al.,2017;Eric
1
DIALOGUE SIMULATION WITH IN-CONTEXT LEARN-
ING
et al.,2017;Budzianowski et al.,2018), where
crowdworkers produce conversations. Significant
annotation efforts are further needed to label intent,
entities, etc. Prior work has been proposed to mini-
mize the cost and effort in data collection by hiring
crowdworkers or leveraging user simulators to in-
teract with existing dialogue systems (Williams
et al.,2013;Shah et al.,2018b,a;Papangelis et al.,
2019;Zhao et al.,2019;Rastogi et al.,2020;Tseng
et al.,2021). However, the dependency on exist-
ing dialogue systems leave the developers with a
classic chicken-and-egg problem. In addition, de-
veloping such user simulators typically requires
considerable handcrafting and human involvement.
In recent years, large language models
(LLMs) (Brown et al.,2020;Lieber et al.,2021;
Rae et al.,2021;Thoppilan et al.,2022;Smith et al.,
2022) demonstrate strong in-context learning ca-
pability. Provided with a few in-context examples,
the LLMs, such as GPT-3 (Brown et al.,2020),
can generate text with similar patterns without fine-
tuning. This capability has been leveraged to syn-
thesize training data in a few NLP tasks (Wang
et al.,2021b;Liu et al.,2022). Although there
have been methods that generate training data for
a single component in the TOD systems (Li et al.,
2022b), there hasn’t been a plausible solution to
generate whole dialogues with annotations for end-
to-end training due to its complex nature of involv-
ing multi-turn interactions, multiple possible logic
flows, and multiple types of annotations.
To address the challenge, we introduce a con-
trollable dialogue simulation method DIALOGIC
for dialogue dataset creation. Seeded with a few
seed dialogues, DIALOGIC automatically selects in-
context examples for demonstration and prompts
LLMs such as GPT-3 to generate annotated dia-
logues in a controllable way. DIALOGIC can play
the roles of both user and system simulator. Fig-
ure 1illustrates a partial example. For the user
side, GPT-3 is prompted first to generate the turn-
arXiv:2210.04185v4 [cs.CL] 6 Jun 2023
User([hotel] type is guest house, area is center):
I need a place to stay. It should be a guest house in the center of town .
GPT-3 Generation Turn-level Belief State Dialog-level Belief State Dialogue Act
<hotel, type, guest house>
<hotel, area, center>
<hotel, request, day>
<hotel, request, people>
<hotel, type, guest house>
<hotel, area, center>
<hotel, book stay, 2>
<hotel, book people, 3>
<hotel, book day, Friday>
<hotel, type, guest house>
<hotel, area, center>
<hotel, book stay, 2>
<hotel, book people, 3>
<hotel, book day, Friday>
<hotel, offerbooked, reference>
<general, reqmore, none>
System([hotel] [request] day people):
how many days and people will be staying ?
System([hotel] [offerbooked] reference [general] [reqmore]):
the booking was successful. You reference number is [value_reference] .
Can I help you with anything else ?
User([hotel] stay is 2 , people is 3, day is Friday):
we have 3 people, and we plan to stay 2 nights starting from Friday .
Figure 1: Illustration of a part of an annotated dialogue generated by our method. Left: the conversations and
annotations are generated simultaneously by GPT-3, where the user utterances are in blue, the system responses are
in green, and the annotations are in red. Right: the structured annotation obtained by parsing the GPT-3’s generation
shown on the left. Best viewed in color. A complete generated dialogue is shown in Appendix C.2 as Table 9.
level user goal (belief state), conditioned on which
the user utterance that expresses the goal will be
generated. Likewise, we prompt GPT-3 to gen-
erate the dialog act for the system side and then
the corresponding system response. We also pro-
pose automatic verification and revision methods
to mitigate annotation errors.
This paper has two key insights. First, leverag-
ing the in-context learning ability of LLMs, our
method can simulate both the user and system side
to generate annotated dialogues by learning from
a few examples. Except for the minimal efforts in
collecting the small seed dataset and training an
auxiliary model on that, the simulation process is
free of human involvement and parameter update,
making our method much cheaper and faster than
crowdsourcing in dataset creation. Specifically, a
large-scale and high-quality dataset such as Multi-
WOZ (Budzianowski et al.,2018) can be created us-
ing our method within only several hours. Second,
we design controllable dialogue generation strate-
gies to overcome the deficiency of GPT-3 in lack of
reliability and interpretability. We also investigate
effective representations and selection strategies of
in-context dialogue examples for LLMs to better
leverage their in-context learning capabilities.
We conduct experiments on MultiWOZ2.3 (Han
et al.,2021) dataset. Remarkably, in the challeng-
ing low resource settings where as low as only 85
seed dialogues (1% of the whole training dataset)
are given, the dialogues simulated by our method
lead to even better model performance than the
same amount of human-generated dialogues. DIA-
LOGIC can also serve as an effective data augmen-
tation method when the full training set is provided.
Human evaluations indicate that our simulated dia-
logues have comparable fluency, annotation accu-
racy, and more diverse dialogue flows than human-
generated dialogues. Our results demonstrate the
promise of leveraging large language model to au-
tomate the complex dialogue dataset creation. We
have released the code and simulated data to facili-
tate future studies.2
2 Related Work
2.1 Dialogue Collection and Simulation
Building end-to-end dialogue systems heavily re-
lies on annotated training data. Wizard-of-Oz (Kel-
ley,1984), as a popular approach, is able to pro-
duce high-quality conversations but totally relies
on human efforts (Mrksic et al.,2017;Eric et al.,
2017;Asri et al.,2017;Budzianowski et al.,2018).
There are also dialogue corpora of interactions be-
tween humans and existing dialogue systems or
APIs (Williams et al.,2013,2014;Raux et al.,
2005). To further reduce human efforts, user sim-
ulators are leveraged to interact with the system
via reinforcement learning or self-play (Shah et al.,
2018b,a;Papangelis et al.,2019;Zhao et al.,2019;
Rastogi et al.,2020;Tseng et al.,2021). However,
existing dialogue systems or APIs are still needed,
which restricts these solutions to existing domains.
To this end, Mohapatra et al. (2020) proposed a
method that utilizes GPT-2 (Radford et al.,2019)
to simulate both the user and system side. How-
ever, this method still needs many dialogues to train
the simulators and cannot guarantee the simulation
quality in low-resource settings.
2.2 Task-oriented Dialogue
A task-oriented dialogue system usually consists of
three components: natural language understanding
2https://github.com/Leezekun/dialogic
Goal
Generator
Dialogue
Retriever
User goal
Dialogue example 1
Dialogue example 2
……
Generated dialogue
Prompt
Demonstrate
Automatic
Revision
Ontology
Figure 2: Overview of the proposed method.
(NLU) for dialogue state tracking, dialogue man-
agement (DM) for predicting the dialog act based
on the dialogue states, and natural language gen-
eration (NLG) for mapping dialog act to natural
language response. The annotated data of belief
states, dialog acts, and system responses are needed
to train these components whether in a separate
way (Wu et al.,2019;Lee et al.,2019;Heck et al.,
2020), or an end-to-end fashion (Peng et al.,2021;
Hosseini-Asl et al.,2020;Lin et al.,2020;Yang
et al.,2021;Su et al.,2021). In this paper, we
aim to generate dialogues and their complete set of
annotations.
2.3 In-Context Learning
As an alternative to finetuning, in-context learning
with LLMs, such as GPT-3 (Brown et al.,2020),
can perform a new task by learning from a few in-
context examples without training model parame-
ters. Due to the superior few-shot performance and
scalability, in-context learning has been applied to a
wide range of NLP tasks. As for dialogue tasks, in-
context learning has been increasingly deployed in
tasks such as intent classification (Yu et al.,2021),
semantic parsing (Shin and Van Durme,2021), and
dialogue state tracking (Hu et al.,2022). Madotto
et al. (2021) built an end-to-end dialogue system
solely based on in-context learning. Despite its suc-
cess, GPT-3 requires a large number of resources
to be deployed. And its public API is charged
based on the length of input text. What’s worse,
the limitation of input length restricts the number
of in-context examples and thus the generation per-
formance. Consequently, a few methods have been
proposed to leverage GPT-3 to synthesize data to
train smaller models for inference (Wang et al.,
2021a,b;Liu et al.,2022;Li et al.,2022a). Al-
though it is especially desirable for dialogue tasks
as the input prompt of dialogues is usually lengthy,
there hasn’t been a plausible solution to generating
annotated dialogues for developing TOD systems
due to its complex nature of involving multi-turn
interactions and multiple types of annotations.
3 Method
In this paper, we introduce a novel method DIA-
LOGIC to simulate annotated dialogues for building
task-oriented dialogue systems based on language
model in-context learning. The only requirements
are a small seed dataset
Ds
consisting of a few
annotated dialogues and an ontology
O
that in-
cludes all slots and possible slot values for each do-
main. An auxiliary TOD model
M
such as Simple-
TOD (Hosseini-Asl et al.,2020) and PPTOD (Su
et al.,2021) trained on
Ds
will be used to verify
and revise generated annotations. Our goal is to
expand Dsby generating new dialogues. For each
turn of the dialogues, we need to generate the user
utterance
U
, belief state
B
, database (DB) query
result
Q
, dialog act
A
, and system responses
S
(we
omit the turn index for brevity).
We will elaborate the design of our method us-
ing a well-studied task-oriented dialogue dataset
MultiWOZ (Budzianowski et al.,2018;Eric et al.,
2020;Han et al.,2021), which covers 7 domains
such as hotel and restaurant, and 24 slots such as
hotel-area and restaurant-food (see Appendix A
for more details). To simulate the low-resource
environment, we use 1%, 5%, 10% of the training
dataset as the seed dataset Ds.
3.1 Overview
A partial example of a simulated dialogue is shown
in Figure 1. The pipeline of our method is illus-
trated in Figure 2. For a domain, the goal generator
will take the ontology
O
as input to generate a new
user goal
Gi
. Then we select a few seed dialogues
with similar user goals from
Ds
as the in-context
example for GPT-3. Given the user goal
Gi
and the
selected in-context examples, we leverage GPT-3
to generate a new dialogue
Ci
. As the generated
data may fail to satisfy our requirement, we design
methods for automatic verification and revision.
3.2 In-context Example
User Goal. A task-oriented dialogue is a conver-
sation where the dialogue system helps accomplish
the user’s goal. For a new dialogue
Ci
, we first gen-
erate its user goal
Gi
based on the ontology. The
user goal and belief state are a set of domain-slot-
value triplets: (domain, slot_name, slot_value). For
example, when a user wants to book a 4-star ho-
tel for 2 nights, and a cheap restaurant that serves
Chinese food, his user goal will be {(hotel, stars,
4), (hotel, book stay, 2), (restaurant, pricerange,
cheap), (restaurant, food, chinese)}. We investi-
gate several ways to generate the user goal, i.e.,
determining the domains, slots, and slot values to
be selected, which will be discussed as follows.
Example Selection. Given the target user goal
Gt
,
we select a few seed dialogues as in-context exam-
ples, from which GPT-3 can learn to generate the
target dialogue
Ci
. To achieve that, the selected di-
alogue examples should contain as much ontology
information needed in the target dialogue (i.e., men-
tioned slots) as possible so that GPT-3 can mimic
the “in-domain” generation. To measure how two
dialogue goals
Gi
and
Gj
overlap, we calculate their
similarity as:
wij =
D(Gi)TD(Gj)
D(Gi)SD(Gj)
·
S(Gi)TS(Gj)
S(Gi)SS(Gj)
,(1)
where
D(Gi)
and
S(Gi)
denote the set of domains
and slots in the user goal
Gi
, respectively. The first
part is the Jaccard similarity (Niwattanakul et al.,
2013) of the domain set, while the second part is
that of the slot set. The probability of a dialogue
Cj
from the seed dataset
Ds
being sampled as in-
context examples for the target dialogue Ciis:
pj=ewij
PCk∈Dsewik,(2)
where
τ
is the temperature. A higher temperature
will introduce more randomness and diversity in
example selection.
We investigate several ways to generate user
goals and select in-context examples:
Random Sampling: we randomly select do-
mains, slots, and slot values to form a user
goal and sample in-context examples as de-
scribed in Equation 2. In this way, we can
generate any unseen user goal and thus the
corresponding dialogues. However, as the
number of seed dialogues is limited, it is hard
to guarantee that the sampled dialogue exam-
ples can cover all the information required for
generating the target dialogue.
Instruction: You are going to book a hotel, and your requirements for
the hotel are ([hotel] type is hotel , pricerange is cheap , parking is yes ,
stay is 2 , day is tuesday , people is 6). Make sure you get the booking
information once booked.
Conversation:
User ([hotel] type is hotel , pricerange is cheap): I am looking for a place
to to stay that has cheap price range it should be in a type of hotel .
Assistant ([hotel] [request] area): okay , do you have a specific area you
want to stay in ?
User ([hotel] parking is yes , pricerange is cheap): no , i just need to
make sure it is cheap . oh , and i need parking .
Assistant ([hotel] [inform] price choice parking type [offerbook]): i
found [value_choice] [value_price] [value_type] for you that include -s
parking . do you like me to book it ?
User ([hotel] stay is 3 , day is tuesday , people is 6): yes , please . 6
people 3 nights starting on tuesday .
Assistant ([hotel] [nobook] day [request] stay day): i am sorry but i was
n’t able to book that for you for [value_day] . is there another day you
would like to stay or perhaps a shorter stay ?
User ([hotel] stay is 2): how about only 2 nights .
Assistant ([hotel] [offerbooked] reference [general] [reqmore]): booking
was successful . reference number is : [value_reference] . anything else i
can do for you ?
User ([general]): no , that will be all . goodbye .
Assistant ([general] [bye]): thank you for using our services .
Figure 3: Illustration of an in-context example from the
MultiWOZ dataset. The user goal,belief states, and
dialog acts are in red. User utterances are in blue, while
system responses are in green. Best viewed in color.
Value Substitution: we only substitute the slot
values of the seed dialogues’ user goal to form
a new user goal. This method can ensure that
all the required slots are mentioned in the in-
context examples. However, GPT-3 will tend
to replicate the in-context examples, and thus
few diversity can be introduced.
Combination: we first select a few dialogues
from the seed dataset and then combine their
user goals to create a new goal. As the new
user goal might involve too many domains
and slots, we randomly drop some slots. This
method can ensure that all the mentioned slots
in the target user goal are covered in the ex-
amples and encourage the GPT-3 to generate
diverse data.
We experimentally found the Combination method
yields the best performance. More details, compar-
ison, and discussion of different goal generation
methods can be found in Appendix A.2.
Demonstration. To better demonstrate the desired
pattern of generated data for a dialogue to GPT-3,
we design the format for the example dialogues as
shown in Figure 3. The user goal and belief state
are converted from a sequence of triplets to the
natural language via a template. For example, the
user goal of {(hotel, stars, 4), (hotel, book stay, 2),
(restaurant, pricerange, cheap), (restaurant, food,
chinese)} will be converted to [hotel] star is 4 ,
摘要:

ControllableDialogueSimulationwithIn-ContextLearningZekunLi1,WenhuChen2,ShiyangLi1,HongWang1,JingQian1,XifengYan11UniversityofCalifornia,SantaBarbara2UniversityofWaterloo,VectorInstitute{zekunli,shiyangli,hongwang600,jing_qian,xyan}@cs.ucsb.eduwenhuchen@uwaterloo.caAbstractBuildingdialoguesystemsreq...

展开>> 收起<<
Controllable Dialogue Simulation with In-Context Learning Zekun Li1 Wenhu Chen2 Shiyang Li1 Hong Wang1 Jing Qian1 Xifeng Yan1 1University of California Santa Barbara.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.22MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注