an auxiliary model on the small clean dataset and
applies this model to generate pseudo labels for
each sample in the noisy training set. Then, it lin-
early combines the pseudo labels and vanilla labels
to train the primary model. Both theoretically and
empirically, ASSIST has been shown to be effec-
tive in reducing the impact of label noise.
However, ASSIST adopts a common weighting
parameter to combine the pseudo labels and vanilla
labels for all slots and all training samples, which
is suboptimal. In reality, different slots tend to have
different noise rates (Eric et al.,2020), indicating
that the weighting parameter should be slot-wise.
On the other hand, different training samples may
also require different weighting parameters, since
whether pseudo labels or vanilla labels should be
preferred is highly dependent on specific training
instances. Furthermore, the weighting parameter is
considered a hyperparameter and thus needs to be
carefully tuned on each dataset.
To address the aforementioned limitations of AS-
SIST, we propose MetaASSIST, a meta learning-
based general framework that supports automati-
cally learning slot-wise (and instance-wise) weight-
ing parameters. Specifically, our contributions are:
•
We propose three different schemes for trans-
forming the weighting parameters into learn-
able functions. These schemes have varying
degrees of flexibility, ranging from slot-wise
to both slot-wise and instance-wise.
•
We propose to train these learnable functions
through a meta-learning paradigm that takes
the validation set as meta data and adaptively
adjusts the parameters of each learnable func-
tion (as a result, the weighting parameters) by
reducing the validation loss.
•
We conduct extensive experiments to test the
effectiveness of the proposed three schemes.
All of them achieve superior performance. For
the first time, we achieve over
80%
joint goal
accuracy on MultiWOZ 2.4 (Ye et al.,2021a).
2 Preliminaries
In task-oriented dialogue systems, the DST module
transforms users’ goals or intentions expressed in
unstructured natural languages into structured state
representations (e.g., a series of slot-value pairs).
The state representations are continually updated
in each round of the user-system interactions.
2.1 Problem Statement
More formally, we symbolize a dialogue of
T
turns
as
X={(R1, U1),...,(RT, UT)}
, where
Rt
and
Ut
denote the system response and user utterance
at turn
t
(
1≤t≤T
), respectively. We adopt
Xt
to
represent the dialogue context from the first turn to
the
t
-th turn, i.e.,
Xt={(R1, U1),...,(Rt, Ut)}
.
Further, let
S
denote the set of all the predefined
slots and
Bt={(s, vt)|s∈ S}
the dialogue state
at turn
t
. Here,
vt
is the corresponding value of slot
s
at turn
t
. Then, the DST problem is defined as
learning a dialogue state tracker F:Xt→ Bt.
As discussed earlier, annotating dialogue states
via crowdsourcing is prone to incorrect and incon-
sistent labels. These noisy annotations are likely
to adversely affect model training. We denote the
noisy state annotations as
˜
Bt={(s, ˜vt)|s∈ S}
,
where
˜vt
is the noisy label of slot
s
at turn
t
. In this
work,
˜
Bt
refers to the labels provided in the dataset
and
Bt
refers to the unknown true state annotations.
As pointed out by Ye et al. (2022), existing DST
approaches are only able to learn a suboptimal di-
alogue state tracker
˜
F:Xt→˜
Bt
rather than the
optimal dialogue state tracker
F:Xt→ Bt
. Aim-
ing at learning a strong dialogue state tracker
F∗
to
better approximate
F
,Ye et al. (2022) proposed a
general framework ASSIST that supports training
DST models robustly from noisy labels.
2.2 Overview of ASSIST
ASSIST assumes that a small clean dataset is avail-
able. Based on this assumption, it firstly trains an
auxiliary model on the clean dataset. Then, it lever-
ages the trained model to generate pseudo labels
for each sample in the large noisy training set. The
generated pseudo labels are expected to be a good
complement to the vanilla noisy labels. Therefore,
combining the two types of labels has the poten-
tial to reduce the influence of noisy labels when
training the primary model.
Denote the generated pseudo state annotations
as
˘
Bt={(s, ˘vt)|s∈ S}
, where
˘vt
represents the
pseudo label of slot
s
at turn
t
. Within the frame-
work of ASSIST, the primary model is required
to predict
˘
Bt
and
˜
Bt
concurrently during the train-
ing process. In other words, the target of model
training turns into learning a dialogue state tracker
F∗:Xt→C(˘
Bt,˜
Bt)
, where
C(˘
Bt,˜
Bt)
denotes a
combination of
˘
Bt
and
˜
Bt
. There can be different
methods to combine the generated pseudo labels
and vanilla noisy labels. The most straightforward