Controllable Dialogue Simulation with In-Context Learning Zekun Li1 Wenhu Chen2 Shiyang Li1 Hong Wang1 Jing Qian1 Xifeng Yan1 1University of California Santa Barbara

2025-05-06 0 0 1.22MB 18 页 10玖币

侵权投诉

Controllable Dialogue Simulation with In-Context Learning

Zekun Li1, Wenhu Chen2, Shiyang Li1, Hong Wang1, Jing Qian1, Xifeng Yan1

1University of California, Santa Barbara

2University of Waterloo, Vector Institute

{zekunli, shiyangli, hongwang600, jing_qian, xyan}@cs.ucsb.edu

wenhuchen@uwaterloo.ca

Abstract

Building dialogue systems requires a large cor-

pus of annotated dialogues. Such datasets are

usually created via crowdsourcing, which is ex-

pensive and time-consuming. In this paper, we

propose DIALOGIC

, a novel dialogue simu-

lation method based on large language model

in-context learning to automate dataset creation.

Seeded with a few annotated dialogues, DIA-

LOGIC automatically selects in-context exam-

ples for demonstration and prompts GPT-3 to

generate new dialogues and annotations in a

controllable way. Our method can rapidly ex-

pand a small set of dialogue data with minimum

or zero human involvement and parameter up-

date and is thus much more cost-efﬁcient and

time-saving than crowdsourcing. Experimen-

tal results on the MultiWOZ dataset demon-

strate that training a model on the simulated di-

alogues leads to even better performance than

using the same amount of human-generated

dialogues under the challenging low-resource

settings, with as few as 85 dialogues as a

seed. When enough data is available, our

method can still serve as an effective data

augmentation method. Human evaluation re-

sults also show that our simulated dialogues

have near-human ﬂuency and annotation ac-

curacy. The code and data are available at

https://github.com/Leezekun/dialogic.

1 Introduction

Task-oriented dialogue (TOD) systems can assist

users in completing tasks such as booking a restau-

rant or making an appointment. Building such a

dialogue system requires a large corpus of anno-

tated dialogues (Wu et al.,2020), which is costly

to obtain in terms of money and time.

One popular approach to collecting and annotat-

ing task-oriented dialogues is crowdsourcing via

a Wizard-of-Oz setup (Mrksic et al.,2017;Eric

DIALOGUE SIMULATION WITH IN-CONTEXT LEARN-

ING

et al.,2017;Budzianowski et al.,2018), where

crowdworkers produce conversations. Signiﬁcant

annotation efforts are further needed to label intent,

entities, etc. Prior work has been proposed to mini-

mize the cost and effort in data collection by hiring

crowdworkers or leveraging user simulators to in-

teract with existing dialogue systems (Williams

et al.,2013;Shah et al.,2018b,a;Papangelis et al.,

2019;Zhao et al.,2019;Rastogi et al.,2020;Tseng

et al.,2021). However, the dependency on exist-

ing dialogue systems leave the developers with a

classic chicken-and-egg problem. In addition, de-

veloping such user simulators typically requires

considerable handcrafting and human involvement.

In recent years, large language models

(LLMs) (Brown et al.,2020;Lieber et al.,2021;

Rae et al.,2021;Thoppilan et al.,2022;Smith et al.,

2022) demonstrate strong in-context learning ca-

pability. Provided with a few in-context examples,

the LLMs, such as GPT-3 (Brown et al.,2020),

can generate text with similar patterns without ﬁne-

tuning. This capability has been leveraged to syn-

thesize training data in a few NLP tasks (Wang

et al.,2021b;Liu et al.,2022). Although there

have been methods that generate training data for

a single component in the TOD systems (Li et al.,

2022b), there hasn’t been a plausible solution to

generate whole dialogues with annotations for end-

to-end training due to its complex nature of involv-

ing multi-turn interactions, multiple possible logic

ﬂows, and multiple types of annotations.

To address the challenge, we introduce a con-

trollable dialogue simulation method DIALOGIC

for dialogue dataset creation. Seeded with a few

seed dialogues, DIALOGIC automatically selects in-

context examples for demonstration and prompts

LLMs such as GPT-3 to generate annotated dia-

logues in a controllable way. DIALOGIC can play

the roles of both user and system simulator. Fig-

ure 1illustrates a partial example. For the user

side, GPT-3 is prompted ﬁrst to generate the turn-

arXiv:2210.04185v4 [cs.CL] 6 Jun 2023

User([hotel] type is guest house, area is center):

I need a place to stay. It should be a guest house in the center of town .

GPT-3 Generation Turn-level Belief State Dialog-level Belief State Dialogue Act

<hotel, type, guest house>

<hotel, area, center>

<hotel, request, day>

<hotel, request, people>

<hotel, type, guest house>

<hotel, area, center>

<hotel, book stay, 2>

<hotel, book people, 3>

<hotel, book day, Friday>

<hotel, type, guest house>

<hotel, area, center>

<hotel, book stay, 2>

<hotel, book people, 3>

<hotel, book day, Friday>

<hotel, offerbooked, reference>

<general, reqmore, none>

System([hotel] [request] day people):

how many days and people will be staying ?

System([hotel] [offerbooked] reference [general] [reqmore]):

the booking was successful. You reference number is [value_reference] .

Can I help you with anything else ?

User([hotel] stay is 2 , people is 3, day is Friday):

we have 3 people, and we plan to stay 2 nights starting from Friday .

Figure 1: Illustration of a part of an annotated dialogue generated by our method. Left: the conversations and

annotations are generated simultaneously by GPT-3, where the user utterances are in blue, the system responses are

in green, and the annotations are in red. Right: the structured annotation obtained by parsing the GPT-3’s generation

shown on the left. Best viewed in color. A complete generated dialogue is shown in Appendix C.2 as Table 9.

level user goal (belief state), conditioned on which

the user utterance that expresses the goal will be

generated. Likewise, we prompt GPT-3 to gen-

erate the dialog act for the system side and then

the corresponding system response. We also pro-

pose automatic veriﬁcation and revision methods

to mitigate annotation errors.

This paper has two key insights. First, leverag-

ing the in-context learning ability of LLMs, our

method can simulate both the user and system side

to generate annotated dialogues by learning from

a few examples. Except for the minimal efforts in

collecting the small seed dataset and training an

auxiliary model on that, the simulation process is

free of human involvement and parameter update,

making our method much cheaper and faster than

crowdsourcing in dataset creation. Speciﬁcally, a

large-scale and high-quality dataset such as Multi-

WOZ (Budzianowski et al.,2018) can be created us-

ing our method within only several hours. Second,

we design controllable dialogue generation strate-

gies to overcome the deﬁciency of GPT-3 in lack of

reliability and interpretability. We also investigate

effective representations and selection strategies of

in-context dialogue examples for LLMs to better

leverage their in-context learning capabilities.

We conduct experiments on MultiWOZ2.3 (Han

et al.,2021) dataset. Remarkably, in the challeng-

ing low resource settings where as low as only 85

seed dialogues (1% of the whole training dataset)

are given, the dialogues simulated by our method

lead to even better model performance than the

same amount of human-generated dialogues. DIA-

LOGIC can also serve as an effective data augmen-

tation method when the full training set is provided.

Human evaluations indicate that our simulated dia-

logues have comparable ﬂuency, annotation accu-

racy, and more diverse dialogue ﬂows than human-

generated dialogues. Our results demonstrate the

promise of leveraging large language model to au-

tomate the complex dialogue dataset creation. We

have released the code and simulated data to facili-

tate future studies.2

2 Related Work

2.1 Dialogue Collection and Simulation

Building end-to-end dialogue systems heavily re-

lies on annotated training data. Wizard-of-Oz (Kel-

ley,1984), as a popular approach, is able to pro-

duce high-quality conversations but totally relies

on human efforts (Mrksic et al.,2017;Eric et al.,

2017;Asri et al.,2017;Budzianowski et al.,2018).

There are also dialogue corpora of interactions be-

tween humans and existing dialogue systems or

APIs (Williams et al.,2013,2014;Raux et al.,

2005). To further reduce human efforts, user sim-

ulators are leveraged to interact with the system

via reinforcement learning or self-play (Shah et al.,

2018b,a;Papangelis et al.,2019;Zhao et al.,2019;

Rastogi et al.,2020;Tseng et al.,2021). However,

existing dialogue systems or APIs are still needed,

which restricts these solutions to existing domains.

To this end, Mohapatra et al. (2020) proposed a

method that utilizes GPT-2 (Radford et al.,2019)

to simulate both the user and system side. How-

ever, this method still needs many dialogues to train

the simulators and cannot guarantee the simulation

quality in low-resource settings.

2.2 Task-oriented Dialogue

A task-oriented dialogue system usually consists of

three components: natural language understanding

2https://github.com/Leezekun/dialogic

Goal

Generator

Dialogue

Retriever

User goal

Dialogue example 1

Dialogue example 2

……

Generated dialogue

Prompt

Demonstrate

Automatic

Revision

Ontology

Figure 2: Overview of the proposed method.

(NLU) for dialogue state tracking, dialogue man-

agement (DM) for predicting the dialog act based

on the dialogue states, and natural language gen-

eration (NLG) for mapping dialog act to natural

language response. The annotated data of belief

states, dialog acts, and system responses are needed

to train these components whether in a separate

way (Wu et al.,2019;Lee et al.,2019;Heck et al.,

2020), or an end-to-end fashion (Peng et al.,2021;

Hosseini-Asl et al.,2020;Lin et al.,2020;Yang

et al.,2021;Su et al.,2021). In this paper, we

aim to generate dialogues and their complete set of

annotations.

2.3 In-Context Learning

As an alternative to ﬁnetuning, in-context learning

with LLMs, such as GPT-3 (Brown et al.,2020),

can perform a new task by learning from a few in-

context examples without training model parame-

ters. Due to the superior few-shot performance and

scalability, in-context learning has been applied to a

wide range of NLP tasks. As for dialogue tasks, in-

context learning has been increasingly deployed in

tasks such as intent classiﬁcation (Yu et al.,2021),

semantic parsing (Shin and Van Durme,2021), and

dialogue state tracking (Hu et al.,2022). Madotto

et al. (2021) built an end-to-end dialogue system

solely based on in-context learning. Despite its suc-

cess, GPT-3 requires a large number of resources

to be deployed. And its public API is charged

based on the length of input text. What’s worse,

the limitation of input length restricts the number

of in-context examples and thus the generation per-

formance. Consequently, a few methods have been

proposed to leverage GPT-3 to synthesize data to

train smaller models for inference (Wang et al.,

2021a,b;Liu et al.,2022;Li et al.,2022a). Al-

though it is especially desirable for dialogue tasks

as the input prompt of dialogues is usually lengthy,

there hasn’t been a plausible solution to generating

annotated dialogues for developing TOD systems

due to its complex nature of involving multi-turn

interactions and multiple types of annotations.

3 Method

In this paper, we introduce a novel method DIA-

LOGIC to simulate annotated dialogues for building

task-oriented dialogue systems based on language

model in-context learning. The only requirements

are a small seed dataset

consisting of a few

annotated dialogues and an ontology

that in-

cludes all slots and possible slot values for each do-

main. An auxiliary TOD model

such as Simple-

TOD (Hosseini-Asl et al.,2020) and PPTOD (Su

et al.,2021) trained on

will be used to verify

and revise generated annotations. Our goal is to

expand Dsby generating new dialogues. For each

turn of the dialogues, we need to generate the user

utterance

, belief state

, database (DB) query

result

, dialog act

, and system responses

(we

omit the turn index for brevity).

We will elaborate the design of our method us-

ing a well-studied task-oriented dialogue dataset

MultiWOZ (Budzianowski et al.,2018;Eric et al.,

2020;Han et al.,2021), which covers 7 domains

such as hotel and restaurant, and 24 slots such as

hotel-area and restaurant-food (see Appendix A

for more details). To simulate the low-resource

environment, we use 1%, 5%, 10% of the training

dataset as the seed dataset Ds.

3.1 Overview

A partial example of a simulated dialogue is shown

in Figure 1. The pipeline of our method is illus-

trated in Figure 2. For a domain, the goal generator

will take the ontology

as input to generate a new

user goal

. Then we select a few seed dialogues

with similar user goals from

as the in-context

example for GPT-3. Given the user goal

and the

selected in-context examples, we leverage GPT-3

to generate a new dialogue

. As the generated

data may fail to satisfy our requirement, we design

methods for automatic veriﬁcation and revision.

3.2 In-context Example

User Goal. A task-oriented dialogue is a conver-

sation where the dialogue system helps accomplish

the user’s goal. For a new dialogue

, we ﬁrst gen-

erate its user goal

based on the ontology. The

user goal and belief state are a set of domain-slot-

value triplets: (domain, slot_name, slot_value). For

example, when a user wants to book a 4-star ho-

tel for 2 nights, and a cheap restaurant that serves

Chinese food, his user goal will be {(hotel, stars,

4), (hotel, book stay, 2), (restaurant, pricerange,

cheap), (restaurant, food, chinese)}. We investi-

gate several ways to generate the user goal, i.e.,

determining the domains, slots, and slot values to

be selected, which will be discussed as follows.

Example Selection. Given the target user goal

we select a few seed dialogues as in-context exam-

ples, from which GPT-3 can learn to generate the

target dialogue

. To achieve that, the selected di-

alogue examples should contain as much ontology

information needed in the target dialogue (i.e., men-

tioned slots) as possible so that GPT-3 can mimic

the “in-domain” generation. To measure how two

dialogue goals

and

overlap, we calculate their

similarity as:

wij =



D(Gi)TD(Gj)

D(Gi)SD(Gj)



·



S(Gi)TS(Gj)

S(Gi)SS(Gj)



,(1)

where

D(Gi)

and

S(Gi)

denote the set of domains

and slots in the user goal

, respectively. The ﬁrst

part is the Jaccard similarity (Niwattanakul et al.,

2013) of the domain set, while the second part is

that of the slot set. The probability of a dialogue

from the seed dataset

being sampled as in-

context examples for the target dialogue Ciis:

pj=ewij /τ

PCk∈Dsewik/τ ,(2)

where

is the temperature. A higher temperature

will introduce more randomness and diversity in

example selection.

We investigate several ways to generate user

goals and select in-context examples:

•

Random Sampling: we randomly select do-

mains, slots, and slot values to form a user

goal and sample in-context examples as de-

scribed in Equation 2. In this way, we can

generate any unseen user goal and thus the

corresponding dialogues. However, as the

number of seed dialogues is limited, it is hard

to guarantee that the sampled dialogue exam-

ples can cover all the information required for

generating the target dialogue.

Instruction: You are going to book a hotel, and your requirements for

the hotel are ([hotel] type is hotel , pricerange is cheap , parking is yes ,

stay is 2 , day is tuesday , people is 6). Make sure you get the booking

information once booked.

Conversation:

User ([hotel] type is hotel , pricerange is cheap): I am looking for a place

to to stay that has cheap price range it should be in a type of hotel .

Assistant ([hotel] [request] area): okay , do you have a speciﬁc area you

want to stay in ?

User ([hotel] parking is yes , pricerange is cheap): no , i just need to

make sure it is cheap . oh , and i need parking .

Assistant ([hotel] [inform] price choice parking type [offerbook]): i

found [value_choice] [value_price] [value_type] for you that include -s

parking . do you like me to book it ?

User ([hotel] stay is 3 , day is tuesday , people is 6): yes , please . 6

people 3 nights starting on tuesday .

Assistant ([hotel] [nobook] day [request] stay day): i am sorry but i was

n’t able to book that for you for [value_day] . is there another day you

would like to stay or perhaps a shorter stay ?

User ([hotel] stay is 2): how about only 2 nights .

Assistant ([hotel] [offerbooked] reference [general] [reqmore]): booking

was successful . reference number is : [value_reference] . anything else i

can do for you ?

User ([general]): no , that will be all . goodbye .

Assistant ([general] [bye]): thank you for using our services .

Figure 3: Illustration of an in-context example from the

MultiWOZ dataset. The user goal,belief states, and

dialog acts are in red. User utterances are in blue, while

system responses are in green. Best viewed in color.

•

Value Substitution: we only substitute the slot

values of the seed dialogues’ user goal to form

a new user goal. This method can ensure that

all the required slots are mentioned in the in-

context examples. However, GPT-3 will tend

to replicate the in-context examples, and thus

few diversity can be introduced.

•

Combination: we ﬁrst select a few dialogues

from the seed dataset and then combine their

user goals to create a new goal. As the new

user goal might involve too many domains

and slots, we randomly drop some slots. This

method can ensure that all the mentioned slots

in the target user goal are covered in the ex-

amples and encourage the GPT-3 to generate

diverse data.

We experimentally found the Combination method

yields the best performance. More details, compar-

ison, and discussion of different goal generation

methods can be found in Appendix A.2.

Demonstration. To better demonstrate the desired

pattern of generated data for a dialogue to GPT-3,

we design the format for the example dialogues as

shown in Figure 3. The user goal and belief state

are converted from a sequence of triplets to the

natural language via a template. For example, the

user goal of {(hotel, stars, 4), (hotel, book stay, 2),

(restaurant, pricerange, cheap), (restaurant, food,

chinese)} will be converted to [hotel] star is 4 ,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ControllableDialogueSimulationwithIn-ContextLearningZekunLi1,WenhuChen2,ShiyangLi1,HongWang1,JingQian1,XifengYan11UniversityofCalifornia,SantaBarbara2UniversityofWaterloo,VectorInstitute{zekunli,shiyangli,hongwang600,jing_qian,xyan}@cs.ucsb.eduwenhuchen@uwaterloo.caAbstractBuildingdialoguesystemsreq...

展开>> 收起<<

Controllable Dialogue Simulation with In-Context Learning Zekun Li1 Wenhu Chen2 Shiyang Li1 Hong Wang1 Jing Qian1 Xifeng Yan1 1University of California Santa Barbara.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Controllable Dialogue Simulation with In-Context Learning Zekun Li1 Wenhu Chen2 Shiyang Li1 Hong Wang1 Jing Qian1 Xifeng Yan1 1University of California Santa Barbara

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: