There Is No Standard Answer Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao1y Tingchen Fu2y Chongyang Tao3 Rui Yan2

2025-05-06 0 0 504.1KB 14 页 10玖币
侵权投诉
There Is No Standard Answer: Knowledge-Grounded Dialogue
Generation with Adversarial Activated Multi-Reference Learning
Xueliang Zhao1
, Tingchen Fu2
, Chongyang Tao3, Rui Yan2
1Wangxuan Institute of Computer Technology, Peking University
2Gaoling School of Artificial Intelligence, Renmin University of China
3Microsoft Corporation
{zhaoxlpku,lucas.futingchen,chongyangtao}@gmail.com
ruiyan@ruc.edu.cn
Abstract
Knowledge-grounded conversation (KGC)
shows excellent potential to deliver an en-
gaging and informative response. However,
existing approaches emphasize selecting one
golden knowledge given a particular dialogue
context, overlooking the one-to-many phe-
nomenon in dialogue. As a result, the existing
paradigm limits the diversity of knowledge
selection and generation. To this end, we
establish a multi-reference KGC dataset and
propose a series of metrics to systematically
assess the one-to-many efficacy of existing
KGC models. Furthermore, to extend the
hypothesis space of knowledge selection to
enhance the mapping relationship between
multiple knowledge and multiple responses,
we devise a span-based variational model
and optimize the model in a wake-sleep
style with an ameliorated evidence lower
bound objective to learn the one-to-many
generalization. Both automatic and human
evaluations demonstrate the efficacy of our
approach.
1 Introduction
Maintaining appropriate human-computer dialogue
is an important task leaping toward advanced artifi-
cial intelligence and external knowledge is a key in-
gredient to engaging and meaningful responses (Di-
nan et al.,2019). To this end, the research area
of knowledge-grounded conversation (KGC) has
been explored with great interest. In recent years,
a number of methods (Lian et al.,2019;Kim et al.,
2020;Zhao et al.,2020a,b) and benchmarks (Di-
nan et al.,2019;Zhou et al.,2018) have been pro-
posed. These methods mainly follow the two-step
paradigm proposed by Dinan et al. (2019): Given a
dialogue context and a candidate knowledge pool,
The first two authors contribute equally. Xueliang Zhao
is responsible for the design of the methodology and algo-
rithm. Tingchen Fu is responsible for the implementation and
experiment. The order is decided by a coin flip.
*Corresponding author: Rui Yan (ruiyan@ruc.edu.cn).
T
here is no blue pigmentation in blue eyes. The eyes appear blue
as a result of
rayleigh scattering, the same process that makes the
sky appear
blue.
Response
R1: Daddy why is the sky blue? Same reason your eyes are, son .
R2: It sounds really stupid to me to say ' the sky isn't actually
lllllblue, but dyed by rayleigh scattering of sunlinght or something
||||like that. Imean, the sky is blue. It's blue, guys.
R3: I even know someone whose two eyes in different colors, they
lllllcall it heterochromia. Do they have no pigmentation in eyes as
lllllwell?
R4: Is it affected by generic and inheritable from parents?
K1: This would be analogous to the change in the color of the sky ,
llllllfrom the blue given by the rayleigh scattering of sunlight by
llllllsmall gas molecules when the sky is clear, to the gray caused by
llllllmie scattering of large water droplets when the sky is cloudy.
K2:Heterochromia of the eye is called heterochromia
llllliridum or heterochromia iridis.
K3:For example, the film actor lee van cleef was born with one blue
llllleye and one green eye, a trait that reportedly was common in
lllllhis family, suggesting that it was a genetic trait.
Context
Knowledge
Figure 1: A conversation from Reddit. Text highlighted
in the same color are responses and their corresponding
groundings in the knowledge pool.
they (1) first select one or more knowledge pas-
sages from the candidate pool, and then (2) gener-
ate a response based on the dialogue context and
the selected knowledge.
A large body of works put the emphasis on dis-
covering the golden knowledge from the knowledge
pool. To be more specific, although many knowl-
edge passages in the candidate pool are relevant to
the current conversation context (context-relevant),
usually only one of them pertains to the observed
response (label-relevant), which is often dubbed
as golden knowledge by a number of works and
researchers. Although many techniques have been
developed to discriminate the golden knowledge
from the candidate pool, their precision is still far
from satisfactory (Zhao et al.,2020b). Moreover,
it seems that even humans are unable to accurately
identify the so-called golden knowledge.1
1
According to experiments in Kim et al. (2020), humans
could only achieve a precision of 17% on Wizard of Wikipedia
dataset.
arXiv:2210.12459v1 [cs.CL] 22 Oct 2022
In light of the poor performance of humans, we
postulate that the so-called golden knowledge is an
oversimplification of KGC. Concretely, dialogue
is one-to-many in nature with high entropy (Paran-
jape et al.,2022), thus there might exist more than
one proper knowledge to ground on. Take a conver-
sation from Reddit as an example (Figure 1). All
the knowledge is relevant and the four responses
grounded on them are reasonable. In a word, there
is no such golden knowledge in this case. The
hypothesis of golden knowledge overlooks the one-
to-many properties in conversation, penalizing per-
fectly valid knowledge and therefore is harmful to
the diversity of generation.
We identify two limitations for previous meth-
ods to go beyond the golden knowledge and learn
the one-to-many generalization. Firstly, previous
methods that tacitly assume the existence of golden
knowledge already produce acceptable perfor-
mance successfully, since most benchmarks (Zhou
et al.,2018;Dinan et al.,2019) provide only one
response, which coincidentally support the golden
knowledge hypothesis when evaluation. Besides, a
KGC model has no chance to be exposed to more
than one response when training on these bench-
marks. In a word, existing benchmarks are unable
to train or evaluate the one-to-many generaliza-
tion of a model. Second, the golden knowledge is
flexible in granularity, not limited to a complete
sentence (Figure 1). But previous methods usually
limit the granularity of grounding to a complete
sentence. Consequently, their decision space of
knowledge selection is severely skewed and over-
fitted by the observed response. In the compressed
decision space, they are incapable to model the
underlying relationship between the multiple re-
sponses and their groundings as well.
In this work, we propose a new KGC frame-
work that is better in one-to-many generalization
ability on two counts: (1) To train and evaluate
the one-to-many generalization ability of a KGC
model, we establish the first multi-reference KGC
dataset and a series of metrics. (2) To extend the
hypothesis space of knowledge selection, instead of
choosing a knowledge sentence from the candidate
set, we design a variational span reading model
which directly reads the knowledge text and sam-
ples a span as our grounding. We further propose
a wake-sleep style learning algorithm to adapt the
original evidence lower bound objective (ELBO) to
the multi-reference scenario. We conduct extensive
experiments and both automatic evaluation and hu-
man evaluation suggest the efficacy of our methods
in multi-reference KGC.
Our contributions are summarized below:
To our best knowledge, we are the first to ex-
plore the one-to-many problem in KGC and estab-
lish a multi-reference KGC dataset as well as a
series of metrics.
We propose a variational span reading model,
which reads and comprehends knowledge at a finer
granularity and sample a span as the knowledge to
ground on.
We propose an adversarial activated multi-
reference learning algorithm to ameliorate the orig-
inal ELBO in the multi-reference scenario.
2 Related Work
Our work is in line with the research of
knowledge-
grounded conversation
, whose goal is to generate
informative responses with external knowledge (Di-
nan et al.,2019;Kim et al.,2020;Zhao et al.,
2020b). Since existing benchmarks usually only
contain one reference for a conversation (Zhou
et al.,2018;Dinan et al.,2019;Gopalakrishnan
et al.,2019;Wu et al.,2019), most previous works
take the assumption of golden knowledge (Zhao
et al.,2020b;Dinan et al.,2019), and some of them
use hindsight information from response to detect
the golden knowledge (Chen et al.,2020;Kim et al.,
2020;Paranjape et al.,2022), omitting all the other
unobserved but plausible responses. Besides, the
granularity of grounding is limited to a complete
sentence or passage. Recently, some researchers
have attempted to explore the possibility of ground-
ing dialogue with span (Wu et al.,2021;Meng et al.,
2020;Zhan et al.,2021). Their spans are determin-
istic from hard selection process. Differently, we
view the span prediction as a probabilistic process
and propose a variational method to capture the
attention span.
The proposed model also relates to the
one-to-
many
property in dialogue, referring to the phe-
nomenon that the multiple responses are proper
for a single dialogue context. How to train and
evaluate the one-to-many generalization of a di-
alogue system is a widely studied topic in open-
domain response generation (Gupta et al.,2019;
Zhao et al.,2017;Chan et al.,2021). Inspired by
the efficacy of Variational Auto-Encoder (VAE),
some previous works resort to latent variables to
model the one-to-many property of dialogue. For
𝑞!(𝑍"|𝑅#)
Prior Reading
Posterior Reading
sample
𝑞!(𝑍$|𝑅#)𝑞!(𝑍"|𝑅%) 𝑞!(𝑍$|𝑅%)𝑞!(𝑍"|𝑅&)𝑞!(𝑍$|𝑅&)
𝐶 + 𝑅!𝐶 + 𝑅"𝐶 + 𝑅#
𝑝'(𝑍")𝑝'(𝑍$)
𝑅𝑒𝑤𝑎𝑟𝑑
𝐾𝐿𝐷𝑖𝑣
𝑆!
𝐶
sample
𝑆"
sample
𝑆#
Generator Reconstruction and
Grounding Measurement
… … … …
… … … …
0
𝑅!0
𝑅"0
𝑅#
sample
Generator
𝑆
Figure 2: The architecture of the proposed model.
example, Zhao et al. (2017) model discourse-level
diversity with a latent variable subjecting to the
Gaussian distribution. Qiu et al. (2019) posit a two-
stage method that represents the distinct features of
multiple references with a continuous latent vari-
able. However, their latent variables are poor in
interpretability. Bao et al. (2020) and Bao et al.
(2021) introduce discrete latent variables into the
pre-training process. Each value of the latent vari-
able corresponds to the particular latent speech act.
As for the evaluation of dialogue system, Gupta
et al. (2019) show that multi-reference evaluation
achieves better correlation with human judgments
and release a test set for open-domain dialogue.
But to our best knowledge, although Moghe et al.
(2018) construct a multi-reference test set for KGC,
there is no standard benchmark for one-to-many
training and evaluation in KGC.
3 Methodology
3.1 Problem Formulation and Overview
For a multi-reference KGC dataset, each case is a
triplet
(C, K, R)
where
C= [w1, w2,· · · , wlC]
is
the context of a conversation composed of previ-
ous utterance tokens and K= [k1, k2,· · · , klK]is
the concatenated sequence of background knowl-
edge and facts. We use
wi
and
kj
to denote
the
i
-th token in context and the
j
-th token in
knowledge respectively.
R={Ri}n
i=1
is a set
of observed responses. Our goal is to predict
various spans
(S1, S2,· · · , Sn)
in knowledge indi-
cated by the start position
Zs
and the end position
Ze
, and then generate multiple diverse responses
(R1, R2,· · · , Rn)accordingly.
The architecture of our approach is exhibited
in Figure 2. It mainly consists of two parts, se-
lective reading (Section 3.2) and multi-reference
learning (Section 3.3). Concretely, for selective
reading, we calculate the prior distribution of
Zs
and
Ze
with the dialogue context and the knowl-
edge, which we refer to as
pθ(Zs)
and
pθ(Ze)
.
The two distributions are used to estimate the joint
distribution
pθ(Zs, Ze)
. Meanwhile, we compute
an auxiliary posterior distribution
qφ(Zs|R)
and
qφ(Ze|R)
, which are used for teaching the prior
through minimizing KL-divergence. Note that the
posterior is only involved in the training process.
For multi-reference learning, We devise a wake-
sleep style learning algorithm. In the wake step,
the posterior and generator learn to maximize the
evidence lower bound objective with respect to the
augmented response set; In the sleep step, a dis-
criminator is trained to distinguish the observed
real responses and augmented responses. The two
steps are conducted iteratively to learn one-to-many
generalization in dialogue.
3.2 Variational Span Reading
Prior Reading.
To compute the prior distribution
of the span, we first concatenate the context and
the knowledge together into a single sequence:
Ipri ={w1, w2,· · · , wlC, k1, k2,· · · , klK},(1)
before passing through multiple BERT layers (De-
vlin et al.,2019):
Hpri = BERT(Ipri)R(lC+lK)×d.(2)
Compared with independent encoding, it allows
more sufficient interaction between the dialogue
context and knowledge to obtain the context-aware
knowledge
Kpri =Hpri
[lC:lC+lK]
as a slice of knowl-
edge part in
Hpri
and knowledge-aware context
representation as a mean pooling of the context
part:
hc=1
lC
lC
X
i=1
Hpri
i.(3)
摘要:

ThereIsNoStandardAnswer:Knowledge-GroundedDialogueGenerationwithAdversarialActivatedMulti-ReferenceLearningXueliangZhao1y,TingchenFu2y,ChongyangTao3,RuiYan21WangxuanInstituteofComputerTechnology,PekingUniversity2GaolingSchoolofArticialIntelligence,RenminUniversityofChina3MicrosoftCorporation{zhaox...

展开>> 收起<<
There Is No Standard Answer Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao1y Tingchen Fu2y Chongyang Tao3 Rui Yan2.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:504.1KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注