Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues Thibault Cordier12 Tanguy Urvoy2 Fabrice Lefèvre1 Lina M. Rojas-Barahona2

2025-05-06 0 0 1.96MB 10 页 10玖币

侵权投诉

Graph Neural Network Policies and Imitation Learning

for Multi-Domain Task-Oriented Dialogues

Thibault Cordier*,1,2, Tanguy Urvoy2, Fabrice Lefèvre1, Lina M. Rojas-Barahona2

1LIA - Avignon University, Avignon, France

2Orange Labs, Lannion, France

thibault.cordier@alumni.univ-avignon.fr

fabrice.lefevre@univ-avignon.fr

{thibault.cordier, linamaria.rojasbarahona, tanguy.urvoy}@orange.com

Abstract

Task-oriented dialogue systems are designed

to achieve speciﬁc goals while conversing with

humans. In practice, they may have to han-

dle simultaneously several domains and tasks.

The dialogue manager must therefore be able

to take into account domain changes and plan

over different domains/tasks in order to deal

with multi-domain dialogues. However, learn-

ing with reinforcement in such context be-

comes difﬁcult because the state-action dimen-

sion is larger while the reward signal remains

scarce. Our experimental results suggest that

structured policies based on graph neural net-

works combined with different degrees of im-

itation learning can effectively handle multi-

domain dialogues. The reported experiments

underline the beneﬁt of structured policies

over standard policies.

Introduction

Task-oriented dialogue systems are designed to

achieve speciﬁc goals while conversing with hu-

mans. They can help with various tasks in different

domains, such as seeking and booking a restaurant

or a hotel (Zhu et al.,2020). The conversation’s

goal is usually modelled as a slot-ﬁlling problem.

The dialogue manager (DM) is the core component

of these systems that chooses the dialogue actions

according to the context. Reinforcement learning

(RL) can be used to model the DM, in which case

the policy is trained to maximize the probability of

satisfying the goal (Gao et al.,2018).

We focus here on the multi-domain multi-task

dialogue problem. In practice, real applications

like personal assistants or chatbots must deal with

multiple tasks: the user may ﬁrst want to

ﬁnd

hotel (ﬁrst task), then

book

it (second task). More-

over, the tasks may cover several domains: the user

may want to ﬁnd a hotel (ﬁrst task, ﬁrst domain),

book it (second task, ﬁrst domain), and then ﬁnd a

restaurant nearby (ﬁrst task, second domain).

One way of handling this complexity is to rely

on a domain hierarchy which decomposes the

decision-making process; another way is to switch

easily from one domain to another by scaling up

the policy. Although structured dialogue policies

can adapt quickly from a domain to another (Chen

et al.,2020b), covering multiple domains remains

a hard task because it increases the dimensions of

the state and action spaces while the reward signal

remains sparse. A common technique to circum-

vent this reward scarcity is to guide the learning

by injecting some knowledge through a teacher

policy1.

Our main contribution is to study how structured

policies like graph neural networks (GNN) com-

bined with some degree of imitation learning (IL)

can be effective to handle multi-domain dialogues.

We provide large scale experiments in a dedicated

framework (Zhu et al.,2020) in which we analyze

the performance of different types of policies, from

multi-domain policy to generic policy, with differ-

ent levels of imitation learning.

The remainder of this paper is structured as fol-

lows. We present the related work in Section 1.

Section 2presents our structured policies combined

with imitation learning. The experiments and evalu-

ation are described in Sections 3and 4respectively.

Finally, we conclude in Section 5.

1 Related Work

Fundamental hierarchical reinforcement learning

(Dayan and Hinton,1993;Parr and Russell,1998;

Sutton et al.,1999;Dietterich,2000) has inspired

a previous string of works on dialogue manage-

ment (Budzianowski et al.,2017;Casanueva et al.,

2018a,b;Chen et al.,2020b). Recently, the use of

structured hierarchy with GNN (Zhou et al.,2020;

Wu et al.,2020) rather than a set of classical feed-

For deployment the teacher is expected to be a human

expert, however, for experimentation purposes we used the

handcrafted policy as a proxy (Casanueva et al.,2017).

arXiv:2210.05252v1 [cs.CL] 11 Oct 2022

(a) Domain-selection module. (b) Domain-speciﬁc decision module.

Figure 1: GNN policy for multi-domain dialogues with hierarchical decision making and weight sharing.

forward networks (FNN) enables the learning of

non-independent sub-policies (Chen et al.,2018,

2020a). These works adopted the Domain Indepen-

dent Parametrisation (DIP) that standardizes the

slots representation into a common feature space

to eliminate the domain dependence. It allows poli-

cies to deal with different slots in the same way. It

is therefore possible to build policies that handle a

variable number of slots and that transfer to differ-

ent domains on similar tasks (Wang et al.,2015).

Our contribution differs from Chen et al. (2020b)

on three points: ﬁrst we perform our experiments

on CONVLAB (Zhu et al.,2020) which is a ded-

icated multi-domain framework; second, the dia-

logue state tracker (DST) output is not discarded

when activating the domain; third, we adapt the

GNN structure to each domain by keeping the rel-

evant nodes while sharing the edge’s weights.

The reward sparsity can be bypassed by guiding

the learning through the injection of some knowl-

edge via a teacher policy. This approach, called

imitation learning (IL) (Hussein et al.,2017), can

be declined from pure behaviour cloning (BC)

where the agent only learns to mimic its teacher to

pure reinforcement learning (RL) where no hint

is provided (Shah et al.,2016;Hester et al.,2018;

Gordon-Hall et al.,2020;Cordier et al.,2020).

2 Extended GNN Policies with Imitation

We adopt the multi-task setting as presented in

CONVLAB, in which a single dialogue can have

the following tasks: (i)

ﬁnd

, in which the system

requests information in order to query a database

and make an offer; (ii)

book

, in which the system

requests information in order to book the item. A

single dialogue can also contain multiple domains

such as hotel,restaurant,attraction,train, etc.

Our method, illustrated in Figure 1, is designed

to adapt: (i) at the domain-level (i.e. be scalable

to changes in the number of slots), and (ii) at the

multi-domain-level (i.e. be scalable to changes of

domain). For each dialogue turn, it works as fol-

low: ﬁrst, the DST module chooses which domain

to activate. Then, the multi-domain belief state

(and action space) is projected into the active do-

main (i.e only the DIP nodes corresponding to the

active domain are kept) as shown in Figure 1a. Af-

terwards, we apply the GNN message passing as

Chen et al. (2020b) but only among the domain

speciﬁc DIP nodes in the decision making module

(Figure 1b).

GNN Policies

The GNN structure we consider

is a fully connected graph in which the nodes are

extracted from the DIP. We distinguish two types

of nodes: the slot nodes representing the parametri-

sation of each slot (denoted as S-NODE) and the

general node representing the parametrisation of

the domain (as I-NODE for slot-Independent node).

This yields three types of edges: I2S (for I-NODE

to S-NODE), S2I and S2S. This abstract structure

is a way of modelling the relations between slots

as well as exploiting symmetries based on weight

sharing (Figure 1b).

Imitation Learning

In addition to the structured

architecture, we use some level of IL to guide the

agent’s exploration. In our experiments, we used

CONVLAB’s handcrafted policy as a teacher (or or-

acle)

, but other policies could be used as well. Be-

haviour cloning (BC) is a pure supervised learning

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GraphNeuralNetworkPoliciesandImitationLearningforMulti-DomainTask-OrientedDialoguesThibaultCordier*,1,2,TanguyUrvoy2,FabriceLefèvre1,LinaM.Rojas-Barahona21LIA-AvignonUniversity,Avignon,France2OrangeLabs,Lannion,Francethibault.cordier@alumni.univ-avignon.frfabrice.lefevre@univ-avignon.fr{thibault.cor...

展开>> 收起<<

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues Thibault Cordier12 Tanguy Urvoy2 Fabrice Lefèvre1 Lina M. Rojas-Barahona2.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues Thibault Cordier12 Tanguy Urvoy2 Fabrice Lefèvre1 Lina M. Rojas-Barahona2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: