Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues Thibault Cordier12 Tanguy Urvoy2 Fabrice Lefèvre1 Lina M. Rojas-Barahona2

2025-05-06 0 0 1.96MB 10 页 10玖币
侵权投诉
Graph Neural Network Policies and Imitation Learning
for Multi-Domain Task-Oriented Dialogues
Thibault Cordier*,1,2, Tanguy Urvoy2, Fabrice Lefèvre1, Lina M. Rojas-Barahona2
1LIA - Avignon University, Avignon, France
2Orange Labs, Lannion, France
thibault.cordier@alumni.univ-avignon.fr
fabrice.lefevre@univ-avignon.fr
{thibault.cordier, linamaria.rojasbarahona, tanguy.urvoy}@orange.com
Abstract
Task-oriented dialogue systems are designed
to achieve specific goals while conversing with
humans. In practice, they may have to han-
dle simultaneously several domains and tasks.
The dialogue manager must therefore be able
to take into account domain changes and plan
over different domains/tasks in order to deal
with multi-domain dialogues. However, learn-
ing with reinforcement in such context be-
comes difficult because the state-action dimen-
sion is larger while the reward signal remains
scarce. Our experimental results suggest that
structured policies based on graph neural net-
works combined with different degrees of im-
itation learning can effectively handle multi-
domain dialogues. The reported experiments
underline the benefit of structured policies
over standard policies.
Introduction
Task-oriented dialogue systems are designed to
achieve specific goals while conversing with hu-
mans. They can help with various tasks in different
domains, such as seeking and booking a restaurant
or a hotel (Zhu et al.,2020). The conversation’s
goal is usually modelled as a slot-filling problem.
The dialogue manager (DM) is the core component
of these systems that chooses the dialogue actions
according to the context. Reinforcement learning
(RL) can be used to model the DM, in which case
the policy is trained to maximize the probability of
satisfying the goal (Gao et al.,2018).
We focus here on the multi-domain multi-task
dialogue problem. In practice, real applications
like personal assistants or chatbots must deal with
multiple tasks: the user may first want to
find
a
hotel (first task), then
book
it (second task). More-
over, the tasks may cover several domains: the user
may want to find a hotel (first task, first domain),
book it (second task, first domain), and then find a
restaurant nearby (first task, second domain).
One way of handling this complexity is to rely
on a domain hierarchy which decomposes the
decision-making process; another way is to switch
easily from one domain to another by scaling up
the policy. Although structured dialogue policies
can adapt quickly from a domain to another (Chen
et al.,2020b), covering multiple domains remains
a hard task because it increases the dimensions of
the state and action spaces while the reward signal
remains sparse. A common technique to circum-
vent this reward scarcity is to guide the learning
by injecting some knowledge through a teacher
policy1.
Our main contribution is to study how structured
policies like graph neural networks (GNN) com-
bined with some degree of imitation learning (IL)
can be effective to handle multi-domain dialogues.
We provide large scale experiments in a dedicated
framework (Zhu et al.,2020) in which we analyze
the performance of different types of policies, from
multi-domain policy to generic policy, with differ-
ent levels of imitation learning.
The remainder of this paper is structured as fol-
lows. We present the related work in Section 1.
Section 2presents our structured policies combined
with imitation learning. The experiments and evalu-
ation are described in Sections 3and 4respectively.
Finally, we conclude in Section 5.
1 Related Work
Fundamental hierarchical reinforcement learning
(Dayan and Hinton,1993;Parr and Russell,1998;
Sutton et al.,1999;Dietterich,2000) has inspired
a previous string of works on dialogue manage-
ment (Budzianowski et al.,2017;Casanueva et al.,
2018a,b;Chen et al.,2020b). Recently, the use of
structured hierarchy with GNN (Zhou et al.,2020;
Wu et al.,2020) rather than a set of classical feed-
1
For deployment the teacher is expected to be a human
expert, however, for experimentation purposes we used the
handcrafted policy as a proxy (Casanueva et al.,2017).
arXiv:2210.05252v1 [cs.CL] 11 Oct 2022
(a) Domain-selection module. (b) Domain-specific decision module.
Figure 1: GNN policy for multi-domain dialogues with hierarchical decision making and weight sharing.
forward networks (FNN) enables the learning of
non-independent sub-policies (Chen et al.,2018,
2020a). These works adopted the Domain Indepen-
dent Parametrisation (DIP) that standardizes the
slots representation into a common feature space
to eliminate the domain dependence. It allows poli-
cies to deal with different slots in the same way. It
is therefore possible to build policies that handle a
variable number of slots and that transfer to differ-
ent domains on similar tasks (Wang et al.,2015).
Our contribution differs from Chen et al. (2020b)
on three points: first we perform our experiments
on CONVLAB (Zhu et al.,2020) which is a ded-
icated multi-domain framework; second, the dia-
logue state tracker (DST) output is not discarded
when activating the domain; third, we adapt the
GNN structure to each domain by keeping the rel-
evant nodes while sharing the edge’s weights.
The reward sparsity can be bypassed by guiding
the learning through the injection of some knowl-
edge via a teacher policy. This approach, called
imitation learning (IL) (Hussein et al.,2017), can
be declined from pure behaviour cloning (BC)
where the agent only learns to mimic its teacher to
pure reinforcement learning (RL) where no hint
is provided (Shah et al.,2016;Hester et al.,2018;
Gordon-Hall et al.,2020;Cordier et al.,2020).
2 Extended GNN Policies with Imitation
We adopt the multi-task setting as presented in
CONVLAB, in which a single dialogue can have
the following tasks: (i)
find
, in which the system
requests information in order to query a database
and make an offer; (ii)
book
, in which the system
requests information in order to book the item. A
single dialogue can also contain multiple domains
such as hotel,restaurant,attraction,train, etc.
Our method, illustrated in Figure 1, is designed
to adapt: (i) at the domain-level (i.e. be scalable
to changes in the number of slots), and (ii) at the
multi-domain-level (i.e. be scalable to changes of
domain). For each dialogue turn, it works as fol-
low: first, the DST module chooses which domain
to activate. Then, the multi-domain belief state
(and action space) is projected into the active do-
main (i.e only the DIP nodes corresponding to the
active domain are kept) as shown in Figure 1a. Af-
terwards, we apply the GNN message passing as
Chen et al. (2020b) but only among the domain
specific DIP nodes in the decision making module
(Figure 1b).
GNN Policies
The GNN structure we consider
is a fully connected graph in which the nodes are
extracted from the DIP. We distinguish two types
of nodes: the slot nodes representing the parametri-
sation of each slot (denoted as S-NODE) and the
general node representing the parametrisation of
the domain (as I-NODE for slot-Independent node).
This yields three types of edges: I2S (for I-NODE
to S-NODE), S2I and S2S. This abstract structure
is a way of modelling the relations between slots
as well as exploiting symmetries based on weight
sharing (Figure 1b).
Imitation Learning
In addition to the structured
architecture, we use some level of IL to guide the
agent’s exploration. In our experiments, we used
CONVLABs handcrafted policy as a teacher (or or-
acle)
1
, but other policies could be used as well. Be-
haviour cloning (BC) is a pure supervised learning
摘要:

GraphNeuralNetworkPoliciesandImitationLearningforMulti-DomainTask-OrientedDialoguesThibaultCordier*,1,2,TanguyUrvoy2,FabriceLefèvre1,LinaM.Rojas-Barahona21LIA-AvignonUniversity,Avignon,France2OrangeLabs,Lannion,Francethibault.cordier@alumni.univ-avignon.frfabrice.lefevre@univ-avignon.fr{thibault.cor...

展开>> 收起<<
Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues Thibault Cordier12 Tanguy Urvoy2 Fabrice Lefèvre1 Lina M. Rojas-Barahona2.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.96MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注