In light of the poor performance of humans, we
postulate that the so-called golden knowledge is an
oversimplification of KGC. Concretely, dialogue
is one-to-many in nature with high entropy (Paran-
jape et al.,2022), thus there might exist more than
one proper knowledge to ground on. Take a conver-
sation from Reddit as an example (Figure 1). All
the knowledge is relevant and the four responses
grounded on them are reasonable. In a word, there
is no such golden knowledge in this case. The
hypothesis of golden knowledge overlooks the one-
to-many properties in conversation, penalizing per-
fectly valid knowledge and therefore is harmful to
the diversity of generation.
We identify two limitations for previous meth-
ods to go beyond the golden knowledge and learn
the one-to-many generalization. Firstly, previous
methods that tacitly assume the existence of golden
knowledge already produce acceptable perfor-
mance successfully, since most benchmarks (Zhou
et al.,2018;Dinan et al.,2019) provide only one
response, which coincidentally support the golden
knowledge hypothesis when evaluation. Besides, a
KGC model has no chance to be exposed to more
than one response when training on these bench-
marks. In a word, existing benchmarks are unable
to train or evaluate the one-to-many generaliza-
tion of a model. Second, the golden knowledge is
flexible in granularity, not limited to a complete
sentence (Figure 1). But previous methods usually
limit the granularity of grounding to a complete
sentence. Consequently, their decision space of
knowledge selection is severely skewed and over-
fitted by the observed response. In the compressed
decision space, they are incapable to model the
underlying relationship between the multiple re-
sponses and their groundings as well.
In this work, we propose a new KGC frame-
work that is better in one-to-many generalization
ability on two counts: (1) To train and evaluate
the one-to-many generalization ability of a KGC
model, we establish the first multi-reference KGC
dataset and a series of metrics. (2) To extend the
hypothesis space of knowledge selection, instead of
choosing a knowledge sentence from the candidate
set, we design a variational span reading model
which directly reads the knowledge text and sam-
ples a span as our grounding. We further propose
a wake-sleep style learning algorithm to adapt the
original evidence lower bound objective (ELBO) to
the multi-reference scenario. We conduct extensive
experiments and both automatic evaluation and hu-
man evaluation suggest the efficacy of our methods
in multi-reference KGC.
Our contributions are summarized below:
•
To our best knowledge, we are the first to ex-
plore the one-to-many problem in KGC and estab-
lish a multi-reference KGC dataset as well as a
series of metrics.
•
We propose a variational span reading model,
which reads and comprehends knowledge at a finer
granularity and sample a span as the knowledge to
ground on.
•
We propose an adversarial activated multi-
reference learning algorithm to ameliorate the orig-
inal ELBO in the multi-reference scenario.
2 Related Work
Our work is in line with the research of
knowledge-
grounded conversation
, whose goal is to generate
informative responses with external knowledge (Di-
nan et al.,2019;Kim et al.,2020;Zhao et al.,
2020b). Since existing benchmarks usually only
contain one reference for a conversation (Zhou
et al.,2018;Dinan et al.,2019;Gopalakrishnan
et al.,2019;Wu et al.,2019), most previous works
take the assumption of golden knowledge (Zhao
et al.,2020b;Dinan et al.,2019), and some of them
use hindsight information from response to detect
the golden knowledge (Chen et al.,2020;Kim et al.,
2020;Paranjape et al.,2022), omitting all the other
unobserved but plausible responses. Besides, the
granularity of grounding is limited to a complete
sentence or passage. Recently, some researchers
have attempted to explore the possibility of ground-
ing dialogue with span (Wu et al.,2021;Meng et al.,
2020;Zhan et al.,2021). Their spans are determin-
istic from hard selection process. Differently, we
view the span prediction as a probabilistic process
and propose a variational method to capture the
attention span.
The proposed model also relates to the
one-to-
many
property in dialogue, referring to the phe-
nomenon that the multiple responses are proper
for a single dialogue context. How to train and
evaluate the one-to-many generalization of a di-
alogue system is a widely studied topic in open-
domain response generation (Gupta et al.,2019;
Zhao et al.,2017;Chan et al.,2021). Inspired by
the efficacy of Variational Auto-Encoder (VAE),
some previous works resort to latent variables to
model the one-to-many property of dialogue. For