Personalized Dialogue Generation with Persona-Adaptive Attention Qiushi Huang12 Yu Zhang2 Tom Ko3 Xubo Liu1 Bo Wu4 Wenwu Wang1 H Tang1 1University of Surrey

2025-05-02 0 0 578.23KB 11 页 10玖币

侵权投诉

Personalized Dialogue Generation with Persona-Adaptive Attention

Qiushi Huang1,2, Yu Zhang2*, Tom Ko3, Xubo Liu1, Bo Wu4, Wenwu Wang1, H Tang1*

1University of Surrey

2Southern University of Science and Technology

3ByteDance AI Lab

4MIT-IBM Watson AI Lab

{qiushi.huang,xubo.liu,w.wang,h.tang}@surrey.ac.uk,{yu.zhang.ust,tomkocse}@gmail.com, bo.wu@ibm.com

Abstract

Persona-based dialogue systems aim to generate consistent

responses based on historical context and predeﬁned persona.

Unlike conventional dialogue generation, the persona-based

dialogue needs to consider both dialogue context and per-

sona, posing a challenge for coherent training. Speciﬁcally,

this requires a delicate weight balance between context and

persona. To achieve that, in this paper, we propose an ef-

fective framework with Persona-Adaptive Attention (PAA),

which adaptively integrates the weights from the persona and

context information via our designed attention. In addition,

a dynamic masking mechanism is applied to the PAA to not

only drop redundant information in context and persona but

also serve as a regularization mechanism to avoid overﬁt-

ting. Experimental results demonstrate the superiority of the

proposed PAA framework compared to the strong baselines

in both automatic and human evaluation. Moreover, the pro-

posed PAA approach can perform equivalently well in a low-

resource regime compared to models trained in a full-data set-

ting, which achieve a similar result with only 20% to 30% of

data compared to the larger models trained in the full-data

setting. To fully exploit the effectiveness of our design, we

designed several variants for handling the weighted informa-

tion in different ways, showing the necessity and sufﬁciency

of our weighting and masking designs.‡

Introduction

Persona is essential for building a trustful and conﬁdent con-

versational system. Recently, there has been an increasing

interest in incorporating explicit persona into dialogue gen-

eration models (Wolf et al. 2019; Liu et al. 2020; Song et al.

2021) since the release of the publicly available datasets

(Zhang et al. 2018; Dinan et al. 2019). Typically, persona in-

formation consists of several sentences describing the facts

or background of the interlocutor. An example taken from

the ConvAI2 dataset (Dinan et al. 2019) is shown in Fig-

ure 1. In this example, the system should consider the in-

formation in the persona sentences and generate consistent

responses based on both persona and dialogue history.

One challenge in persona-based dialogue generation is

that the related datasets are usually small. As collecting dia-

*Corresponding authors.

‡Code is available at: https://github.com/hqsiswiliam/persona-

adaptive-attention

I'm a stunt double as my second job.

I only eat Kosher.

I was raised in a single parent

household.

My favorite drink is Cuba Libre.

System's Persona

Hello what are doing

today?

I prefer Mojitos or

Watermelon.

I am good, I just got off

work and tired, I

have two jobs.

Those are really

yummy too, but not my

favorite .

Figure 1: An example from the ConvAI2 dataset.

logues in persona-based dialogue datasets requires crowd-

workers to chat with each other based on provided per-

sona proﬁles, building such quality datasets is expensive and

time-consuming, which in turn restricts the size of those

datasets. For example, the ConvAI2 dataset (Dinan et al.

2019) only contains 131k utterances with less than 5k unique

personas, much smaller than open-domain dialogue datasets

such as Pushshift.io Reddit (Baumgartner et al. 2020) with

roughly 1.2B utterances.

Another challenge is to choose the weights between the

persona and context. Unlike open-domain dialogue models

that generate responses by considering the dialogue context

alone, persona-based dialogue generation systems need to

additionally take personalized background descriptions into

account along with the dialogue context. The weights be-

tween context and persona should be dynamically adjusted

by the dialogue system under different situations. For ex-

ample, given a user utterance “How are you?”, the context-

preferred answer is likely to be “I am ﬁne.”, which is safe but

bland. Meanwhile, a persona-preferred answer would fuse

persona information to the response, such as “I am spending

time with my four sisters”. Under such circumstances, the

persona-preferred answer would be more informative and

meaningful. On the other hand, sometimes, the system needs

to focus on context to make the conversation interactive and

engaging. For instance, if the user says: “I have two grey-

hounds. Their names are Tom and Jerry.”, then the system

would focus on the context and answer: “That’s cute! How

old are they?”, which encourages the user to chat with the

arXiv:2210.15088v4 [cs.CL] 10 Jan 2024

dialogue system. From the above two scenarios, it can be

seen that the weights between context and persona should

be adjusted accordingly, which is important for a dialogue

model to build long-term relationships with users.

Most existing works on persona-based dialogue gen-

eration tasks have primarily addressed the data scarcity

challenge by utilizing external data or sophisticated train-

ing processes. For instance, Song et al. use the MNLI

dataset (Williams, Nangia, and Bowman 2018) as auxiliary

tasks, Cao et al. augment the data through text manipula-

tion, Roller et al. add other dialogue datasets in pretext tasks,

and Liu et al. adopt multi-stage training with reinforcement

learning. Those works obtained decent performance, but few

of them considered the second challenge.

To address the aforementioned second challenge, in this

paper, we design a Persona-Adaptive Attention (PAA) to

dynamically learn the weights of the persona and context

information in the proposed framework. To enhance the per-

sona information in the PAA, we prepend the persona in the

decoder as a prompt so that the weights can capture more

persona-related information. To balance the context and per-

sona information, the PAA takes two cross-attention and the

self-attention from the persona-prompted decoder to com-

pute the weights for combining the latent representations

from the context and persona. Moreover, inspired by some

ﬁndings in (Welleck et al. 2019; Cao et al. 2022; ?) that not

all context and persona information is useful to generate the

response, we design two dynamic masks to the weighted la-

tent representation to not only remove redundant informa-

tion but also act as a regularizer in the PAA.

As a byproduct, extensive experiments on the ConvAI2

dataset show that the proposed framework achieves compa-

rable or even better performance than existing works without

the use of external datasets or sophisticated training proce-

dures. One reason is that our framework explicitly consid-

ered learning the weights between context and persona in

the architecture design that can perform well under a low-

data regime. This observation indicates that the proposed

framework could also alleviate the ﬁrst challenge, making

the proposed framework kill two birds with one stone. This

demonstrates the effectiveness of the proposed framework.

Our contributions can be summarized as follows.

• We propose the PAA in an encoder-decoder framework.

This framework models the persona and context infor-

mation by two separate transformer encoders, which are

then fused in the persona-prompted decoder by the pro-

posed PAA mechanism.

• Extensive experiments on the ConvAI2 dataset show that

the proposed model performs comparably to or even bet-

ter than strong baseline methods by about 30% improve-

ment in terms of the perplexity metric.

• We demonstrate that our framework is a data-efﬁcient ar-

chitecture that can achieve comparable performance with

20% to 30% of the training data compared with a larger

model such as GPT2 (Radford et al. 2019) trained on the

full dataset.

Related Work

Persona-based Dialogue Generation

There is a growing interest in persona-based dialogue gen-

eration tasks, especially the work on the PersonaChat/Con-

vAI2 dataset. The release of the PersonaChat dataset (Zhang

et al. 2018) has provoked vibrant research in integrating ex-

plicit persona into dialogue response generation. The Con-

vAI2 dataset (Dinan et al. 2019) is a further split of the Per-

sonaChat dataset to serve as a dataset for the conversational

competition.1Most of the works on persona-based dialogue

generation are conducted on the ConvAI2 dataset, so we will

use the ConvAI2 dataset as our primary training and eval-

uation dataset. Zhang et al. utilized LSTM (Hochreiter and

Schmidhuber 1997) to generate a response from persona and

context. Later, TransferTransfo (Wolf et al. 2019) leveraged

the pre-trained language model by ﬁne-tuning the dataset on

the GPT2 model with the concatenated input. Meanwhile,

BERT over BERT (BoB) (Song et al. 2021) is composed

of three BERTs (Devlin et al. 2019), which is trained with

both negative log-likelihood and unlikelihood losses. BoB

utilizes the MNLI dataset (Williams, Nangia, and Bowman

2018) as an auxiliary dataset to help the model recognize the

positive and negative samples given an anchor sample. P2

bot (Liu et al. 2020) addressed the persona-based dialogue

task by introducing a transmitter and receiver model, which

is further tuned with reinforcement learning on manually de-

signed rewards. A recent work (Cao et al. 2022) tackled the

problem in a model-agnostic fashion, providing strategies

for data augmentation and curriculum learning. Distinctively

different from these previous works, we propose an effective

approach without the aid of external datasets or complicated

training setups.

Attention Mechanisms for Conditional Dialogue

Generation

Several studies introduced explicitly designed cross-

attention to address dialogue generation. Those works are

tailored either on condition sentences (Zheng et al. 2020)

or categorical label (Zeng and Nie 2021). Zheng et al.

proposed an attention routing structure that facilitates the

weight from persona information to generate the response.

The attention routing structure adds the cross-attention/self-

attention results from persona-response, context-response,

and response-response pairs together to obtain a fused cross-

attention to balance the weights among different sources of

input. Those cross-attention/self-attentions are also calcu-

lated in our approach. However, instead of calculating the

weights from an external predictor, our approach computes

these within the framework, followed by applying the mask-

ing on the weighted cross-attention results to alleviate the

training difﬁculties.

In addition, Zeng and Nie introduced a condition-aware

transformer block into their model to determine the amount

of condition information as a bias in word generation prob-

ability at a position (Zeng and Nie 2021). In the condition-

aware block, the keys and values from a condition (e.g., topic

1http://convai.io/2018/

label) and context are concatenated. Then the block calcu-

lates the concatenated content in a cross-attention to obtain

a bias term, which is then added to the self-attention. Un-

like the condition-aware block approach, our model gener-

ates two masks with weights to balance the information from

persona and context rather than through a bias term. In ad-

dition, our framework takes persona and context text as in-

put, while condition-aware transformer (Zeng and Nie 2021)

uses the categorical label and context text as input.

Methodology

Task Formulation

Suppose that we have a persona-based conversation ses-

sion C={P, U}, where each persona P={p1, . . . , pe}

is composed of eproﬁle sentences that describe the back-

ground of an interlocutor and the dialogue context U=

{uh,1, um,1, ..., uh,n}includes the utterances spoken by the

ﬁrst interlocutor (e.g., human) hand the second interlocu-

tor (e.g., machine) minteractively. In the persona-based

dialogue generation task, Prepresents the persona for m

and the conversational session always starts with h. There-

fore, the objective of this task is to generate the response

r=um,n given persona Pand the dialogue context U.

Overall Framework

As depicted in Figure 2, our framework consists of two en-

coders and one decoder with PAA to perform the decod-

ing process. The encoding layer uses a transformer encoder

architecture to encode persona Pand dialogue context U,

respectively, into latent representations. The encoder layers

are randomly initialized, while the decoder layers are ini-

tialized with the pre-trained GPT2. The persona information

is fed to the persona encoder as well as the decoder as a

prompt, offering strong guidance for GPT2 to decode the

target response. PAA handles the cross-attentions from the

persona and context information to balance and regularize

the two parts by weighting and masking.

Inputs for Persona-Adaptive Attention

Before presenting the proposed PAA, in this section, we

introduce the decoder’s self-attention and encoder-decoder

cross-attention as the inputs for the PAA.

Firstly, the persona Pand context Uare processed sep-

arately by two encoders. Let IP={tP

1, ..., tP

l}denote

a concatenation of all sentences in P, where tP

iis the i-

th token in the persona Pwith total ltokens. Meanwhile,

IU={tU

1, ..., tU

k}represents the token sequence for the con-

catenated context content U. Then, we use the bi-directional

transformer encoders for encoding the text span. Generally,

we get the encoder results from IPand IUas

hP=EncoderP(IP),

hU=EncoderU(IU),(1)

where EncoderPand EncoderUdenote the bi-directional

transformer encoders for persona and context. hP∈Rl×d

and hU∈Rk×dare the hidden states before the last pooling

layer from the encoders, where dis the output dimension of

the encoders.

Since our framework adopts the encoder-decoder struc-

ture, we process the persona-prompted response in the de-

coder. Speciﬁcally, to model the ty

r+1 that is the (r+ 1)-

th token in the response, we calculate the self-attention on

IR={IP,[BOS], ty

1, ..., ty

r}, where [BOS]is a special to-

ken indicating the begin of the sentence and ty

iis the i-th

decoded response token. Formally, the self-attention result

from IRcan be expressed as

hR=Self-Attention(IR) + MR,

hR=AddNorm(hR),(2)

where hR,ˆ

hR∈R(l+r)×d, and MRis the decoder’s mask

to make the self-attention calculation uni-directional.

After obtaining the encoders’ hidden states hPand hU,

as well as the decoder’s self-attention output hR, we then

calculate the cross-attention based on the (hP, hR)and

(hU, hR)pairs. The cross-attention is calculated in a simi-

lar way to the self-attention, where Kand Vare provided

from the encoder and Qis from the decoder. In detail, we

can formulate cross-attention as

oP=Softmax(QrK⊤

√d)Vp,

oU=Softmax(QrK⊤

√d)Vu,

(3)

where Qr∈R(l+r)×ddenotes a linear transformation of

hR,, Kp, Vp∈Rl×ddenote linear transformations of hP,,

Ku, Vu∈Rk×dcome from linear transformations of hU,,

and dis the dimension of the attention head. By calculat-

ing the cross-attentions, we obtain the correlation results be-

tween the encoders and decoder, which serve as the parts of

input for PAA.

Persona-Adaptive Attention

To fuse the cross-attention results, the proposed PAA will

use the weighting and masking mechanisms to utilize the

persona information.

Speciﬁcally, we take the self-attention result hRand

cross-attention result oPas input to generate the initial

weights wpersona for the persona information. The motiva-

tion behind this operation is to enable the model to consider

the relationship between persona and the response in both

self-attention and cross-attention fashions. Formally, this op-

eration can be presented as

mp=F C([hR;oP]),

wpersona =Sigmoid(mp).(4)

In Eq. (4), [; ] denotes the concatenation operation, and

hR, oPare ﬁrstly mapped into mp∈R(l+r)×dusing a lin-

ear layer F C followed by a Sigmoid(·)to obtain the initial

weight for the persona cross-attention. The weight is then

applied to the persona-response and context-response cross-

attention results to form a complementary relationship, lead-

ing to the weighted cross-attention ˜oPand ˜oUas

˜oP=wpersonaoP,

˜oU= (1 −wpersona)oU.(5)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PersonalizedDialogueGenerationwithPersona-AdaptiveAttentionQiushiHuang1,2,YuZhang2*,TomKo3,XuboLiu1,BoWu4,WenwuWang1,HTang1*1UniversityofSurrey2SouthernUniversityofScienceandTechnology3ByteDanceAILab4MIT-IBMWatsonAILab{qiushi.huang,xubo.liu,w.wang,h.tang}@surrey.ac.uk,{yu.zhang.ust,tomkocse}@gmail.c...

展开>> 收起<<

Personalized Dialogue Generation with Persona-Adaptive Attention Qiushi Huang12 Yu Zhang2 Tom Ko3 Xubo Liu1 Bo Wu4 Wenwu Wang1 H Tang1 1University of Surrey.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Personalized Dialogue Generation with Persona-Adaptive Attention Qiushi Huang12 Yu Zhang2 Tom Ko3 Xubo Liu1 Bo Wu4 Wenwu Wang1 H Tang1 1University of Surrey

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: