Transformer-Based Conditioned Variational Autoencoder for Dialogue Generation Huihui Yang

2025-05-06 1 0 1.06MB 8 页 10玖币

侵权投诉

Transformer-Based Conditioned Variational Autoencoder for Dialogue

Generation

Huihui Yang

Zhejiang University

yanghh0@zju.edu.cn

Abstract

In human dialogue, ang one query usually

elicits numerous appropriate responses. The

Transformer-based dialogue model produces

frequently occurring sentences in the corpus

since it is a one-to-one mapping function.

CVAE is a technique for reducing generic

replies. In this paper, we create a dialogue

model (CVAE-T) based on the Transformer

with CVAE structure. We use a pre-trained

MLM model to rewrite some key n-grams in

responses to obtain a series of negative exam-

ples, and introduce a regularization term dur-

ing training to explicitly guide the latent vari-

able in learning the semantic differences be-

tween each pair of positive and negative ex-

amples. Experiments suggest that the method

we design is capable of producing more infor-

mative replies.

1 Introduction

The training data used to train the dialogue mod-

els contains a great deal of unknown background

information, making the dialogue a one-to-many

problem where different people can come up with

different but reasonable answers to the same ques-

tion. Generative diversity is a crucial characteris-

tic for building dialogue systems. Zhao et al. 2017

use CVAE for dialogue modeling and demonstrate

that the sentences produced by CVAE model are

more diverse than those produced by conventional

sequence-to-sequence model.

For CVAE model, the approximate posterior

carries little useful information at the beginning

of training, and the model tends to ﬁt the distri-

bution directly without reference to the latent vari-

able, which is known as the KL-vanishing (Bow-

man et al.,2016). In order to alleviate this

problem, some researchers introduce dialogue in-

tent labels (Zhao et al.,2017) or sentence func-

tion labels (interrogative, declarative and imper-

ative) (Ke et al.,2018) as additional information

to supervise the posterior network learning. How-

ever, this method has many drawbacks: 1) It is ex-

pensive to annotate labels and challenging to ex-

pand to large-scale datasets. 2) It only focuses on

the attributes of a certain aspect of sentences, and

the limited number of tags are difﬁcult to cover

all the attributes of that aspect. 3) The tags them-

selves do not carry semantic information, which

is not conducive to model learning. We observe

that some key words or phrases in the sentence can

serve as representations of high-level sentence at-

tributes, disregarding the need for additional tags.

We locate the key n-grams in each response using

a keyword extraction algorithm and replace them

with a special token [MASK] respectively. These

masked positions are rewritten by a pre-trained

MLM model to generate a series of negative sen-

tences semantically distinct from the original sen-

tence. A regularization term is used to constrain

the prior and posterior distribution during training,

helping the latent variable to perceive the differ-

ence between positive and negative examples.

Dialogue models should be able to handle long

dependencies well because conversation datasets

usually contain multiple rounds of sentences,

and as the conversation goes on, the dialogue

history accumulates into a very long sequence.

Transformer-based models (Zhang et al.,2020;

Roller et al.,2020) have shown strong genera-

tive power when trained on large-scale conversa-

tional corpora. Due to its self-attention mech-

anism and excellent parallelism, Transformer is

suited for processing prolonged sequences. Its hi-

erarchical structure also enables the decoder to in-

corporate latent variable in a more ﬂexible man-

ner. We choose Transformer as encoder-decoder

framework and explore how the CVAE structure

could be better integrated with it for dialogue gen-

eration.

The contributions of this paper can be summa-

rized as follows:

arXiv:2210.12326v1 [cs.CL] 22 Oct 2022

• We design a Transformer-based conditioned

variational autoencoder for dialogue genera-

tion, named CVAE-T.

• We utilize a simple and effective method

to prompt the latent variable learning more

meaningful distribution.

• Experiments illustrate that CVAE-T achieves

signiﬁcant improvements on diversity and in-

formativeness for dialogue generation.

2 Model

In this section, we ﬁrst deﬁne our task and then

describe in detail the components of our designed

model. The overall architecture of our model is

illustrated in Figure 2.

2.1 Task Deﬁnition

In our deﬁnition of dyadic conversation task, there

are three elements: dialogue context c, response

yand latent variable z. Dialogue context cis the

concatation of conversation history hand current

query x, which can be denoted as c= [h;x]. La-

tent variable zis utilized to model the probability

distribution of different potential factors inﬂuenc-

ing conversation generation. The dialogue genera-

tion task can be expressed as the following condi-

tional probability:

p(y, z|c) = p(z|c)·p(y|c, z)(1)

p(z|c)represents the sampling process of the la-

tent variable z, which is approximated by a neu-

ral network called prior network with parameter

θ.p(y|c, z)represents the decoding process and

is approximated by the decoder network in the

encoder-decoder framework.

2.2 Input Representation

Besides the word embedding and position embed-

ding used in the original Transformer, we also em-

ploy another two kinds of embedding, turn embed-

ding and role embedding, similar to PLATO (Bao

et al.,2020) to represent a token. Each utterance

xin a dialogue context cis assigned a turn id,

decreasing sequentially from the maximum turns

to 1, and the turn id of yis always set to 0. As

there are two speakers in each dialog episode for

our task, we set two role ids: 0 for the person who

speaks ﬁrst and 1 for the other. The ﬁnal input em-

bedding of a token is the sum of the corresponding

word, position, turn and role embedding, as shown

in Figure 3.

Figure 1: Negative sentences generation process.

2.3 Negative Sentences Generation

We use the Yake (Yet Another Keyword Extractor)

algorithm (Campos et al.,2018), which is an unsu-

pervised method, to extract the keywords for each

response. Yake creates a set of ﬁve features to cap-

ture the unique characteristics of each term. These

are as follows: (1) Casing, (2) Word Positional, (3)

Word Frequency, (4) Word Relatedness to Con-

text, and (5) Word DifSentence. Inspired by the

generation of pseudo-QE data in machine transla-

tion quality evaluation task, we select BERT (De-

vlin et al.,2019) to rewrite the keywords found. If

mkey n-grams are found in a response, we sub-

stitute those n-grams with token [MASK] respec-

tively to create msentences. We then feed these m

sentences into BERT and sample the model output

at each masked position to inﬁll all masked sen-

tences. The negative sentences generation process

is shown in Figure 1.

2.4 Encoder

Encoder block in our model obeys the original

Transformer (Vaswani et al.,2017), which in-

cludes two key subcomponents: multi-head atten-

tion and feed-forward network. Each subcompo-

nent is accompanied with a residual connection

and layer normalization. During training, the en-

coder needs to be called three times to encode

three different types of input data. They are as

follows: (1) dialogue context c, (2) the concata-

tion of dialogue context cand response y, (3) the

concatation of dialogue context cand negative re-

sponse y−.y−is chosen randomly from the list of

negative sentences of y.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Transformer-BasedConditionedVariationalAutoencoderforDialogueGenerationHuihuiYangZhejiangUniversityyanghh0@zju.edu.cnAbstractInhumandialogue,angonequeryusuallyelicitsnumerousappropriateresponses.TheTransformer-baseddialoguemodelproducesfrequentlyoccurringsentencesinthecorpussinceitisaone-to-onemappi...

展开>> 收起<<

Transformer-Based Conditioned Variational Autoencoder for Dialogue Generation Huihui Yang.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Transformer-Based Conditioned Variational Autoencoder for Dialogue Generation Huihui Yang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: