MetaASSIST Robust Dialogue State Tracking with Meta Learning Fanghua YeyXi WangyJie HuangzShenghui LixSamuel SternEmine Yilmazy yUniversity College London UK

2025-05-02 0 0 1.64MB 13 页 10玖币
侵权投诉
MetaASSIST: Robust Dialogue State Tracking with Meta Learning
Fanghua YeXi WangJie HuangShenghui Li§Samuel SternEmine Yilmaz
University College London, UK
University of Illinois at Urbana-Champaign, USA
§Uppsala University, Sweden
Affiniti AI, London, UK
{fanghua.ye.19, xi-wang, emine.yilmaz}@ucl.ac.uk
jeffhj@illinois.edu, shenghui.li@it.uu.se, samuel.stern@affiniti.ai
Abstract
Existing dialogue datasets contain lots of noise
in their state annotations. Such noise can hurt
model training and ultimately lead to poor gen-
eralization performance. A general framework
named ASSIST has recently been proposed to
train robust dialogue state tracking (DST) mod-
els. It introduces an auxiliary model to generate
pseudo labels for the noisy training set. These
pseudo labels are combined with vanilla labels
by a common fixed weighting parameter to train
the primary DST model. Notwithstanding the
improvements of ASSIST on DST, tuning the
weighting parameter is challenging. Moreover,
a single parameter shared by all slots and all in-
stances may be suboptimal. To overcome these
limitations, we propose a meta learning-based
framework MetaASSIST to adaptively learn the
weighting parameter. Specifically, we propose
three schemes with varying degrees of flexibil-
ity, ranging from slot-wise to both slot-wise and
instance-wise, to convert the weighting param-
eter into learnable functions. These functions
are trained in a meta-learning manner by taking
the validation set as meta data. Experimental
results demonstrate that all three schemes can
achieve competitive performance. Most impres-
sively, we achieve a state-of-the-art joint goal
accuracy of 80.10% on MultiWOZ 2.4.
1 Introduction
Task-oriented dialogue systems have recently be-
come a hot research topic. They act as digital per-
sonal assistants, helping users with various tasks
such as hotel bookings, restaurant reservations, and
weather checks. Dialogue state tracking (DST) is
recognized as a core task of the dialogue manager.
Its goal is to keep track of users’ intentions at each
turn of the dialogue (Mrkši´
c et al.,2017;Rastogi
et al.,2020). Tracking the dialogue state accurately
is of significant importance, as the state informa-
tion will be fed into the dialogue policy learning
module to determine the next system action to per-
form (Manotumruksa et al.,2021). In general, the
Auxiliary DST Model Primary DST Model
User: “Hello, could you please find a hotel in the downtown for me?”
+=
Pseudo Label
hotel
-
area: center
0
0
0
0
1
Vanilla L abel
hotel-area: east
1
0
0
0
0
Combined Label
hotel-area
.5
0
0
0
.5
standard augmented
EWSNCEWSNCEWSNC
#1 − #
trained on a small clean dataset
# = 0.5
Figure 1: The structure of ASSIST and MetaASSIST.
Both frameworks utilize soft labels obtained by linearly
combining pseudo labels (one-hot) and vanilla labels
(one-hot) using a weighting parameter αto enhance the
training process compared to standard training that only
relies on vanilla noisy labels. ASSIST adopts a single
αshared by all slots and all training samples, while
MetaASSIST uses slot-wise (and instance-wise) αs.
dialogue state is represented as a set of (slot,value)
pairs (Henderson et al.,2014;Budzianowski et al.,
2018). The slots for a particular task or domain are
predefined (e.g., “hotel-name”). Their values are
extracted from the dialogue context.
So far, a great variety of DST models have been
proposed (Wu et al.,2019;Campagna et al.,2020;
Balaraman et al.,2021;Lee et al.,2021;Guo et al.,
2022;Shin et al.,2022;Wang et al.,2022). These
models assume that all state labels provided in the
dataset are correct, without considering the effect
of label noise. However, dialogue state annotations
are error-prone, especially considering that most
dialogue datasets (e.g., MultiWOZ Budzianowski
et al.,2018) are collected through crowdsourcing.
The presence of label noise may impair model train-
ing and lead to poor generalization performance of
the trained model, as deep neural models can easily
overfit noisy training data (Zhang et al.,2021).
In order to robustly train DST models from noisy
labels, Ye et al. (2022) proposed a general frame-
work dubbed ASSIST, which augments the stan-
dard model training procedure with a small clean
dataset. As shown in Figure 1, ASSIST first trains
arXiv:2210.12397v1 [cs.CL] 22 Oct 2022
an auxiliary model on the small clean dataset and
applies this model to generate pseudo labels for
each sample in the noisy training set. Then, it lin-
early combines the pseudo labels and vanilla labels
to train the primary model. Both theoretically and
empirically, ASSIST has been shown to be effec-
tive in reducing the impact of label noise.
However, ASSIST adopts a common weighting
parameter to combine the pseudo labels and vanilla
labels for all slots and all training samples, which
is suboptimal. In reality, different slots tend to have
different noise rates (Eric et al.,2020), indicating
that the weighting parameter should be slot-wise.
On the other hand, different training samples may
also require different weighting parameters, since
whether pseudo labels or vanilla labels should be
preferred is highly dependent on specific training
instances. Furthermore, the weighting parameter is
considered a hyperparameter and thus needs to be
carefully tuned on each dataset.
To address the aforementioned limitations of AS-
SIST, we propose MetaASSIST, a meta learning-
based general framework that supports automati-
cally learning slot-wise (and instance-wise) weight-
ing parameters. Specifically, our contributions are:
We propose three different schemes for trans-
forming the weighting parameters into learn-
able functions. These schemes have varying
degrees of flexibility, ranging from slot-wise
to both slot-wise and instance-wise.
We propose to train these learnable functions
through a meta-learning paradigm that takes
the validation set as meta data and adaptively
adjusts the parameters of each learnable func-
tion (as a result, the weighting parameters) by
reducing the validation loss.
We conduct extensive experiments to test the
effectiveness of the proposed three schemes.
All of them achieve superior performance. For
the first time, we achieve over
80%
joint goal
accuracy on MultiWOZ 2.4 (Ye et al.,2021a).
2 Preliminaries
In task-oriented dialogue systems, the DST module
transforms users’ goals or intentions expressed in
unstructured natural languages into structured state
representations (e.g., a series of slot-value pairs).
The state representations are continually updated
in each round of the user-system interactions.
2.1 Problem Statement
More formally, we symbolize a dialogue of
T
turns
as
X={(R1, U1),...,(RT, UT)}
, where
Rt
and
Ut
denote the system response and user utterance
at turn
t
(
1tT
), respectively. We adopt
Xt
to
represent the dialogue context from the first turn to
the
t
-th turn, i.e.,
Xt={(R1, U1),...,(Rt, Ut)}
.
Further, let
S
denote the set of all the predefined
slots and
Bt={(s, vt)|s∈ S}
the dialogue state
at turn
t
. Here,
vt
is the corresponding value of slot
s
at turn
t
. Then, the DST problem is defined as
learning a dialogue state tracker F:Xt→ Bt.
As discussed earlier, annotating dialogue states
via crowdsourcing is prone to incorrect and incon-
sistent labels. These noisy annotations are likely
to adversely affect model training. We denote the
noisy state annotations as
˜
Bt={(s, ˜vt)|s∈ S}
,
where
˜vt
is the noisy label of slot
s
at turn
t
. In this
work,
˜
Bt
refers to the labels provided in the dataset
and
Bt
refers to the unknown true state annotations.
As pointed out by Ye et al. (2022), existing DST
approaches are only able to learn a suboptimal di-
alogue state tracker
˜
F:Xt˜
Bt
rather than the
optimal dialogue state tracker
F:Xt→ Bt
. Aim-
ing at learning a strong dialogue state tracker
F
to
better approximate
F
,Ye et al. (2022) proposed a
general framework ASSIST that supports training
DST models robustly from noisy labels.
2.2 Overview of ASSIST
ASSIST assumes that a small clean dataset is avail-
able. Based on this assumption, it firstly trains an
auxiliary model on the clean dataset. Then, it lever-
ages the trained model to generate pseudo labels
for each sample in the large noisy training set. The
generated pseudo labels are expected to be a good
complement to the vanilla noisy labels. Therefore,
combining the two types of labels has the poten-
tial to reduce the influence of noisy labels when
training the primary model.
Denote the generated pseudo state annotations
as
˘
Bt={(s, ˘vt)|s∈ S}
, where
˘vt
represents the
pseudo label of slot
s
at turn
t
. Within the frame-
work of ASSIST, the primary model is required
to predict
˘
Bt
and
˜
Bt
concurrently during the train-
ing process. In other words, the target of model
training turns into learning a dialogue state tracker
F:XtC(˘
Bt,˜
Bt)
, where
C(˘
Bt,˜
Bt)
denotes a
combination of
˘
Bt
and
˜
Bt
. There can be different
methods to combine the generated pseudo labels
and vanilla noisy labels. The most straightforward
way is to combine them linearly, which is also the
strategy adopted in ASSIST. The linearly combined
label of slot sat turn tis formulated as:
vc
t=α˘
vt+ (1 α)˜
vt,(1)
where
˘
vt
and
˜
vt
are the one-hot vector representa-
tion of the pseudo label
˘vt
and vanilla noisy label
˜vt
, respectively. The parameter
α(0 α1)
is
employed to control the weights of ˘
vtand ˜
vt.
Let
p(˘vt|Xt, s)
denote the likelihood of
˘vt
and
p(˜vt|Xt, s)
the likelihood of
˜vt
. Then, the likeli-
hood of the combined label vc
tis calculated as:
p(vc
t|Xt, s) = p(˘vt|Xt, s)αp(˜vt|Xt, s)(1α).(2)
Based on this formula, the training objective of the
primary model can be derived as follows:
L=1
|Dn||S| X
Xt∈Dn
X
s∈S
log p(vc
t|Xt, s)
=α
|Dn||S| X
Xt∈Dn
X
s∈S
log p(˘vt|Xt, s)+
(1 α)
|Dn||S| X
Xt∈Dn
X
s∈S
log p(˜vt|Xt, s),
(3)
where Dnrepresents the noisy training set.
3 MetaASSIST: A Meta Learning-Based
Version of ASSIST
Equations
(1)
and
(3)
show that a single
α
is shared
by all slots when combining the pseudo labels and
vanilla labels. This is suboptimal, as the ratio of the
noise rate of pseudo labels to that of vanilla labels
tends to be different for different slots. When the
vanilla labels have higher quality than the gener-
ated pseudo labels,
α
should be set to a small value;
otherwise, a large
α
should be used. This implies
that setting
α
to different values for different slots
can help train the primary model more robustly. In
the following, we first theoretically show that the
combined labels obtained via slot-wise weighting
parameters instead of a common one can better ap-
proximate the unknown true labels. Then, we elab-
orate on the proposed framework MetaASSIST.
3.1 Theoretical Justification
Following (Ye et al.,2022), we employ the mean
squared loss to define the mean approximation error
of any corrupted labels
¨
vt
to their corresponding
unknown true labels vt, as formularized below:
Y¨
v=1
|Dn||S| X
Xt∈Dn
X
s∈S
EDc[k¨
vtvtk2
2].(4)
Here,
Dc
refers to the small clean dataset. Both
¨
vt
and vtare the vector representations of labels.
Let
αs
be the slot-wise weighting parameter for
slot
s
. We utilize
vs
t
to denote the combined label
obtained by replacing αwith αsin Eq. (1). Thus,
vs
t=αs˘
vt+ (1 αs)˜
vt.(5)
Same as α,αsis also bounded between 0 and 1.
Substituting the corrupted labels
¨
vt
in Eq.
(4)
with vs
tand vc
t, we have the following theorem:
Theorem 1.
The optimal mean approximation er-
ror with respect to the combined labels
vs
t
derived
from slot-wise weighting parameters
αs
is smaller
than or equal to that of the combined labels
vc
t
derived from a shared weighting parameter
α
, i.e.,
min
αs
Yvsmin
αYvc.
Proof.
The conclusion is obvious as we can replace
αswith αif Yvc< Yvs, but not vice versa.
3.2 Slot-Wise Weighting Parameters as Meta
Learnable Functions
In the framework of ASSIST,
α
is treated as a hy-
perparameter. It needs to be meticulously tuned in
the training phase so as to help the primary model
achieve the best performance. Although it is feasi-
ble to tune a single parameter
α
, it would become
extremely painful to tune all the slot-wise parame-
ters. This is because multi-domain dialogues can
have dozens of or even hundreds of slots (e.g., there
are 37 slots in the MultiWOZ dataset Eric et al.,
2020). To circumvent the troublesome step of tun-
ing each slot-wise parameter
αs
of slot
s
, we pro-
pose to learn all these parameters automatically via
meta learning (Hospedales et al.,2021).
Specifically, we propose three different schemes
to cast the slot-wise weighting parameters as learn-
able functions, which are described in detail below:
Scheme One (S1):
The first scheme assumes that
the parameter
αs
is fully independent of the dia-
logue context
Xt
. As a consequence of this assump-
tion, all the training samples will share the same
αs
for slot
s
. Given that the parameter
αs
is restricted
to fall in the range of 0 to 1, it is tricky to learn it by
gradient-based optimizers. In our implementation,
we introduce an unconstrained learnable parameter
wsand regard αsas a Sigmoid function of ws:
αs=f1(ws) = Sigmoid(ws).(6)
As thus, the parameter
ws
rather than
αs
will be
directly optimized during the training process.
摘要:

MetaASSIST:RobustDialogueStateTrackingwithMetaLearningFanghuaYeyXiWangyJieHuangzShenghuiLixSamuelStern{EmineYilmazyyUniversityCollegeLondon,UKzUniversityofIllinoisatUrbana-Champaign,USAxUppsalaUniversity,Sweden{AfnitiAI,London,UK{fanghua.ye.19,xi-wang,emine.yilmaz}@ucl.ac.ukjeffhj@illinois.edu,shen...

展开>> 收起<<
MetaASSIST Robust Dialogue State Tracking with Meta Learning Fanghua YeyXi WangyJie HuangzShenghui LixSamuel SternEmine Yilmazy yUniversity College London UK.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.64MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注