There Is No Standard Answer Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao1y Tingchen Fu2y Chongyang Tao3 Rui Yan2

2025-05-06 0 0 504.1KB 14 页 10玖币

侵权投诉

There Is No Standard Answer: Knowledge-Grounded Dialogue

Generation with Adversarial Activated Multi-Reference Learning

Xueliang Zhao1†

, Tingchen Fu2†

, Chongyang Tao3, Rui Yan2∗

1Wangxuan Institute of Computer Technology, Peking University

2Gaoling School of Artiﬁcial Intelligence, Renmin University of China

3Microsoft Corporation

{zhaoxlpku,lucas.futingchen,chongyangtao}@gmail.com

ruiyan@ruc.edu.cn

Abstract

Knowledge-grounded conversation (KGC)

shows excellent potential to deliver an en-

gaging and informative response. However,

existing approaches emphasize selecting one

golden knowledge given a particular dialogue

context, overlooking the one-to-many phe-

nomenon in dialogue. As a result, the existing

paradigm limits the diversity of knowledge

selection and generation. To this end, we

establish a multi-reference KGC dataset and

propose a series of metrics to systematically

assess the one-to-many efﬁcacy of existing

KGC models. Furthermore, to extend the

hypothesis space of knowledge selection to

enhance the mapping relationship between

multiple knowledge and multiple responses,

we devise a span-based variational model

and optimize the model in a wake-sleep

style with an ameliorated evidence lower

bound objective to learn the one-to-many

generalization. Both automatic and human

evaluations demonstrate the efﬁcacy of our

approach.

1 Introduction

Maintaining appropriate human-computer dialogue

is an important task leaping toward advanced artiﬁ-

cial intelligence and external knowledge is a key in-

gredient to engaging and meaningful responses (Di-

nan et al.,2019). To this end, the research area

of knowledge-grounded conversation (KGC) has

been explored with great interest. In recent years,

a number of methods (Lian et al.,2019;Kim et al.,

2020;Zhao et al.,2020a,b) and benchmarks (Di-

nan et al.,2019;Zhou et al.,2018) have been pro-

posed. These methods mainly follow the two-step

paradigm proposed by Dinan et al. (2019): Given a

dialogue context and a candidate knowledge pool,

†

The ﬁrst two authors contribute equally. Xueliang Zhao

is responsible for the design of the methodology and algo-

rithm. Tingchen Fu is responsible for the implementation and

experiment. The order is decided by a coin ﬂip.

*Corresponding author: Rui Yan (ruiyan@ruc.edu.cn).

here is no blue pigmentation in blue eyes. The eyes appear blue

as a result of

rayleigh scattering, the same process that makes the

sky appear

blue.

Response

R1: Daddy why is the sky blue? Same reason your eyes are, son .

R2: It sounds really stupid to me to say ' the sky isn't actually

lllllblue, but dyed by rayleigh scattering of sunlinght or something

||||like that. Imean, the sky is blue. It's blue, guys.

R3: I even know someone whose two eyes in different colors, they

lllllcall it heterochromia. Do they have no pigmentation in eyes as

lllllwell?

R4: Is it affected by generic and inheritable from parents?

K1: This would be analogous to the change in the color of the sky ,

llllllfrom the blue given by the rayleigh scattering of sunlight by

llllllsmall gas molecules when the sky is clear, to the gray caused by

llllllmie scattering of large water droplets when the sky is cloudy.

K2:Heterochromia of the eye is called heterochromia

llllliridum or heterochromia iridis.

K3:For example, the film actor lee van cleef was born with one blue

llllleye and one green eye, a trait that reportedly was common in

lllllhis family, suggesting that it was a genetic trait.

Context

Knowledge

Figure 1: A conversation from Reddit. Text highlighted

in the same color are responses and their corresponding

groundings in the knowledge pool.

they (1) ﬁrst select one or more knowledge pas-

sages from the candidate pool, and then (2) gener-

ate a response based on the dialogue context and

the selected knowledge.

A large body of works put the emphasis on dis-

covering the golden knowledge from the knowledge

pool. To be more speciﬁc, although many knowl-

edge passages in the candidate pool are relevant to

the current conversation context (context-relevant),

usually only one of them pertains to the observed

response (label-relevant), which is often dubbed

as golden knowledge by a number of works and

researchers. Although many techniques have been

developed to discriminate the golden knowledge

from the candidate pool, their precision is still far

from satisfactory (Zhao et al.,2020b). Moreover,

it seems that even humans are unable to accurately

identify the so-called golden knowledge.1

According to experiments in Kim et al. (2020), humans

could only achieve a precision of 17% on Wizard of Wikipedia

dataset.

arXiv:2210.12459v1 [cs.CL] 22 Oct 2022

In light of the poor performance of humans, we

postulate that the so-called golden knowledge is an

oversimpliﬁcation of KGC. Concretely, dialogue

is one-to-many in nature with high entropy (Paran-

jape et al.,2022), thus there might exist more than

one proper knowledge to ground on. Take a conver-

sation from Reddit as an example (Figure 1). All

the knowledge is relevant and the four responses

grounded on them are reasonable. In a word, there

is no such golden knowledge in this case. The

hypothesis of golden knowledge overlooks the one-

to-many properties in conversation, penalizing per-

fectly valid knowledge and therefore is harmful to

the diversity of generation.

We identify two limitations for previous meth-

ods to go beyond the golden knowledge and learn

the one-to-many generalization. Firstly, previous

methods that tacitly assume the existence of golden

knowledge already produce acceptable perfor-

mance successfully, since most benchmarks (Zhou

et al.,2018;Dinan et al.,2019) provide only one

response, which coincidentally support the golden

knowledge hypothesis when evaluation. Besides, a

KGC model has no chance to be exposed to more

than one response when training on these bench-

marks. In a word, existing benchmarks are unable

to train or evaluate the one-to-many generaliza-

tion of a model. Second, the golden knowledge is

ﬂexible in granularity, not limited to a complete

sentence (Figure 1). But previous methods usually

limit the granularity of grounding to a complete

sentence. Consequently, their decision space of

knowledge selection is severely skewed and over-

ﬁtted by the observed response. In the compressed

decision space, they are incapable to model the

underlying relationship between the multiple re-

sponses and their groundings as well.

In this work, we propose a new KGC frame-

work that is better in one-to-many generalization

ability on two counts: (1) To train and evaluate

the one-to-many generalization ability of a KGC

model, we establish the ﬁrst multi-reference KGC

dataset and a series of metrics. (2) To extend the

hypothesis space of knowledge selection, instead of

choosing a knowledge sentence from the candidate

set, we design a variational span reading model

which directly reads the knowledge text and sam-

ples a span as our grounding. We further propose

a wake-sleep style learning algorithm to adapt the

original evidence lower bound objective (ELBO) to

the multi-reference scenario. We conduct extensive

experiments and both automatic evaluation and hu-

man evaluation suggest the efﬁcacy of our methods

in multi-reference KGC.

Our contributions are summarized below:

•

To our best knowledge, we are the ﬁrst to ex-

plore the one-to-many problem in KGC and estab-

lish a multi-reference KGC dataset as well as a

series of metrics.

•

We propose a variational span reading model,

which reads and comprehends knowledge at a ﬁner

granularity and sample a span as the knowledge to

ground on.

•

We propose an adversarial activated multi-

reference learning algorithm to ameliorate the orig-

inal ELBO in the multi-reference scenario.

2 Related Work

Our work is in line with the research of

knowledge-

grounded conversation

, whose goal is to generate

informative responses with external knowledge (Di-

nan et al.,2019;Kim et al.,2020;Zhao et al.,

2020b). Since existing benchmarks usually only

contain one reference for a conversation (Zhou

et al.,2018;Dinan et al.,2019;Gopalakrishnan

et al.,2019;Wu et al.,2019), most previous works

take the assumption of golden knowledge (Zhao

et al.,2020b;Dinan et al.,2019), and some of them

use hindsight information from response to detect

the golden knowledge (Chen et al.,2020;Kim et al.,

2020;Paranjape et al.,2022), omitting all the other

unobserved but plausible responses. Besides, the

granularity of grounding is limited to a complete

sentence or passage. Recently, some researchers

have attempted to explore the possibility of ground-

ing dialogue with span (Wu et al.,2021;Meng et al.,

2020;Zhan et al.,2021). Their spans are determin-

istic from hard selection process. Differently, we

view the span prediction as a probabilistic process

and propose a variational method to capture the

attention span.

The proposed model also relates to the

one-to-

many

property in dialogue, referring to the phe-

nomenon that the multiple responses are proper

for a single dialogue context. How to train and

evaluate the one-to-many generalization of a di-

alogue system is a widely studied topic in open-

domain response generation (Gupta et al.,2019;

Zhao et al.,2017;Chan et al.,2021). Inspired by

the efﬁcacy of Variational Auto-Encoder (VAE),

some previous works resort to latent variables to

model the one-to-many property of dialogue. For

𝑞!(𝑍"|𝑅#)

Prior Reading

Posterior Reading

sample

𝑞!(𝑍$|𝑅#)𝑞!(𝑍"|𝑅%) 𝑞!(𝑍$|𝑅%)𝑞!(𝑍"|𝑅&)𝑞!(𝑍$|𝑅&)

𝐶 + 𝑅!𝐶 + 𝑅"𝐶 + 𝑅#

𝑝'(𝑍")𝑝'(𝑍$)

𝑅𝑒𝑤𝑎𝑟𝑑

𝐾𝐿𝐷𝑖𝑣

𝑆!

𝐶

sample

𝑆"

sample

𝑆#

Generator Reconstruction and

Grounding Measurement

… … … …

𝑅!…0

𝑅"…0

𝑅#

sample

Generator

𝑆

Figure 2: The architecture of the proposed model.

example, Zhao et al. (2017) model discourse-level

diversity with a latent variable subjecting to the

Gaussian distribution. Qiu et al. (2019) posit a two-

stage method that represents the distinct features of

multiple references with a continuous latent vari-

able. However, their latent variables are poor in

interpretability. Bao et al. (2020) and Bao et al.

(2021) introduce discrete latent variables into the

pre-training process. Each value of the latent vari-

able corresponds to the particular latent speech act.

As for the evaluation of dialogue system, Gupta

et al. (2019) show that multi-reference evaluation

achieves better correlation with human judgments

and release a test set for open-domain dialogue.

But to our best knowledge, although Moghe et al.

(2018) construct a multi-reference test set for KGC,

there is no standard benchmark for one-to-many

training and evaluation in KGC.

3 Methodology

3.1 Problem Formulation and Overview

For a multi-reference KGC dataset, each case is a

triplet

(C, K, R)

where

C= [w1, w2,· · · , wlC]

the context of a conversation composed of previ-

ous utterance tokens and K= [k1, k2,· · · , klK]is

the concatenated sequence of background knowl-

edge and facts. We use

and

to denote

the

-th token in context and the

-th token in

knowledge respectively.

R={Ri}n

i=1

is a set

of observed responses. Our goal is to predict

various spans

(S1, S2,· · · , Sn)

in knowledge indi-

cated by the start position

and the end position

, and then generate multiple diverse responses

(R1, R2,· · · , Rn)accordingly.

The architecture of our approach is exhibited

in Figure 2. It mainly consists of two parts, se-

lective reading (Section 3.2) and multi-reference

learning (Section 3.3). Concretely, for selective

reading, we calculate the prior distribution of

and

with the dialogue context and the knowl-

edge, which we refer to as

pθ(Zs)

and

pθ(Ze)

The two distributions are used to estimate the joint

distribution

pθ(Zs, Ze)

. Meanwhile, we compute

an auxiliary posterior distribution

qφ(Zs|R)

and

qφ(Ze|R)

, which are used for teaching the prior

through minimizing KL-divergence. Note that the

posterior is only involved in the training process.

For multi-reference learning, We devise a wake-

sleep style learning algorithm. In the wake step,

the posterior and generator learn to maximize the

evidence lower bound objective with respect to the

augmented response set; In the sleep step, a dis-

criminator is trained to distinguish the observed

real responses and augmented responses. The two

steps are conducted iteratively to learn one-to-many

generalization in dialogue.

3.2 Variational Span Reading

Prior Reading.

To compute the prior distribution

of the span, we ﬁrst concatenate the context and

the knowledge together into a single sequence:

Ipri ={w1, w2,· · · , wlC, k1, k2,· · · , klK},(1)

before passing through multiple BERT layers (De-

vlin et al.,2019):

Hpri = BERT(Ipri)∈R(lC+lK)×d.(2)

Compared with independent encoding, it allows

more sufﬁcient interaction between the dialogue

context and knowledge to obtain the context-aware

knowledge

Kpri =Hpri

[lC:lC+lK]

as a slice of knowl-

edge part in

Hpri

and knowledge-aware context

representation as a mean pooling of the context

part:

hc=1

i=1

Hpri

i.(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ThereIsNoStandardAnswer:Knowledge-GroundedDialogueGenerationwithAdversarialActivatedMulti-ReferenceLearningXueliangZhao1y,TingchenFu2y,ChongyangTao3,RuiYan21WangxuanInstituteofComputerTechnology,PekingUniversity2GaolingSchoolofArticialIntelligence,RenminUniversityofChina3MicrosoftCorporation{zhaox...

展开>> 收起<<

There Is No Standard Answer Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao1y Tingchen Fu2y Chongyang Tao3 Rui Yan2.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

There Is No Standard Answer Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao1y Tingchen Fu2y Chongyang Tao3 Rui Yan2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: