Personalized Dialogue Generation with Persona-Adaptive Attention Qiushi Huang12 Yu Zhang2 Tom Ko3 Xubo Liu1 Bo Wu4 Wenwu Wang1 H Tang1 1University of Surrey

2025-05-02 0 0 578.23KB 11 页 10玖币
侵权投诉
Personalized Dialogue Generation with Persona-Adaptive Attention
Qiushi Huang1,2, Yu Zhang2*, Tom Ko3, Xubo Liu1, Bo Wu4, Wenwu Wang1, H Tang1*
1University of Surrey
2Southern University of Science and Technology
3ByteDance AI Lab
4MIT-IBM Watson AI Lab
{qiushi.huang,xubo.liu,w.wang,h.tang}@surrey.ac.uk,{yu.zhang.ust,tomkocse}@gmail.com, bo.wu@ibm.com
Abstract
Persona-based dialogue systems aim to generate consistent
responses based on historical context and predefined persona.
Unlike conventional dialogue generation, the persona-based
dialogue needs to consider both dialogue context and per-
sona, posing a challenge for coherent training. Specifically,
this requires a delicate weight balance between context and
persona. To achieve that, in this paper, we propose an ef-
fective framework with Persona-Adaptive Attention (PAA),
which adaptively integrates the weights from the persona and
context information via our designed attention. In addition,
a dynamic masking mechanism is applied to the PAA to not
only drop redundant information in context and persona but
also serve as a regularization mechanism to avoid overfit-
ting. Experimental results demonstrate the superiority of the
proposed PAA framework compared to the strong baselines
in both automatic and human evaluation. Moreover, the pro-
posed PAA approach can perform equivalently well in a low-
resource regime compared to models trained in a full-data set-
ting, which achieve a similar result with only 20% to 30% of
data compared to the larger models trained in the full-data
setting. To fully exploit the effectiveness of our design, we
designed several variants for handling the weighted informa-
tion in different ways, showing the necessity and sufficiency
of our weighting and masking designs.
Introduction
Persona is essential for building a trustful and confident con-
versational system. Recently, there has been an increasing
interest in incorporating explicit persona into dialogue gen-
eration models (Wolf et al. 2019; Liu et al. 2020; Song et al.
2021) since the release of the publicly available datasets
(Zhang et al. 2018; Dinan et al. 2019). Typically, persona in-
formation consists of several sentences describing the facts
or background of the interlocutor. An example taken from
the ConvAI2 dataset (Dinan et al. 2019) is shown in Fig-
ure 1. In this example, the system should consider the in-
formation in the persona sentences and generate consistent
responses based on both persona and dialogue history.
One challenge in persona-based dialogue generation is
that the related datasets are usually small. As collecting dia-
*Corresponding authors.
Code is available at: https://github.com/hqsiswiliam/persona-
adaptive-attention
I'm a stunt double as my second job.
I only eat Kosher.
I was raised in a single parent
household.
My favorite drink is Cuba Libre.
System's Persona
Hello what are doing
today?
I prefer Mojitos or
Watermelon.
I am good, I just got off
work and tired, I
have two jobs.
Those are really
yummy too, but not my
favorite .
Figure 1: An example from the ConvAI2 dataset.
logues in persona-based dialogue datasets requires crowd-
workers to chat with each other based on provided per-
sona profiles, building such quality datasets is expensive and
time-consuming, which in turn restricts the size of those
datasets. For example, the ConvAI2 dataset (Dinan et al.
2019) only contains 131k utterances with less than 5k unique
personas, much smaller than open-domain dialogue datasets
such as Pushshift.io Reddit (Baumgartner et al. 2020) with
roughly 1.2B utterances.
Another challenge is to choose the weights between the
persona and context. Unlike open-domain dialogue models
that generate responses by considering the dialogue context
alone, persona-based dialogue generation systems need to
additionally take personalized background descriptions into
account along with the dialogue context. The weights be-
tween context and persona should be dynamically adjusted
by the dialogue system under different situations. For ex-
ample, given a user utterance “How are you?”, the context-
preferred answer is likely to be “I am fine., which is safe but
bland. Meanwhile, a persona-preferred answer would fuse
persona information to the response, such as “I am spending
time with my four sisters”. Under such circumstances, the
persona-preferred answer would be more informative and
meaningful. On the other hand, sometimes, the system needs
to focus on context to make the conversation interactive and
engaging. For instance, if the user says: “I have two grey-
hounds. Their names are Tom and Jerry., then the system
would focus on the context and answer: “That’s cute! How
old are they?”, which encourages the user to chat with the
arXiv:2210.15088v4 [cs.CL] 10 Jan 2024
dialogue system. From the above two scenarios, it can be
seen that the weights between context and persona should
be adjusted accordingly, which is important for a dialogue
model to build long-term relationships with users.
Most existing works on persona-based dialogue gen-
eration tasks have primarily addressed the data scarcity
challenge by utilizing external data or sophisticated train-
ing processes. For instance, Song et al. use the MNLI
dataset (Williams, Nangia, and Bowman 2018) as auxiliary
tasks, Cao et al. augment the data through text manipula-
tion, Roller et al. add other dialogue datasets in pretext tasks,
and Liu et al. adopt multi-stage training with reinforcement
learning. Those works obtained decent performance, but few
of them considered the second challenge.
To address the aforementioned second challenge, in this
paper, we design a Persona-Adaptive Attention (PAA) to
dynamically learn the weights of the persona and context
information in the proposed framework. To enhance the per-
sona information in the PAA, we prepend the persona in the
decoder as a prompt so that the weights can capture more
persona-related information. To balance the context and per-
sona information, the PAA takes two cross-attention and the
self-attention from the persona-prompted decoder to com-
pute the weights for combining the latent representations
from the context and persona. Moreover, inspired by some
findings in (Welleck et al. 2019; Cao et al. 2022; ?) that not
all context and persona information is useful to generate the
response, we design two dynamic masks to the weighted la-
tent representation to not only remove redundant informa-
tion but also act as a regularizer in the PAA.
As a byproduct, extensive experiments on the ConvAI2
dataset show that the proposed framework achieves compa-
rable or even better performance than existing works without
the use of external datasets or sophisticated training proce-
dures. One reason is that our framework explicitly consid-
ered learning the weights between context and persona in
the architecture design that can perform well under a low-
data regime. This observation indicates that the proposed
framework could also alleviate the first challenge, making
the proposed framework kill two birds with one stone. This
demonstrates the effectiveness of the proposed framework.
Our contributions can be summarized as follows.
We propose the PAA in an encoder-decoder framework.
This framework models the persona and context infor-
mation by two separate transformer encoders, which are
then fused in the persona-prompted decoder by the pro-
posed PAA mechanism.
Extensive experiments on the ConvAI2 dataset show that
the proposed model performs comparably to or even bet-
ter than strong baseline methods by about 30% improve-
ment in terms of the perplexity metric.
We demonstrate that our framework is a data-efficient ar-
chitecture that can achieve comparable performance with
20% to 30% of the training data compared with a larger
model such as GPT2 (Radford et al. 2019) trained on the
full dataset.
Related Work
Persona-based Dialogue Generation
There is a growing interest in persona-based dialogue gen-
eration tasks, especially the work on the PersonaChat/Con-
vAI2 dataset. The release of the PersonaChat dataset (Zhang
et al. 2018) has provoked vibrant research in integrating ex-
plicit persona into dialogue response generation. The Con-
vAI2 dataset (Dinan et al. 2019) is a further split of the Per-
sonaChat dataset to serve as a dataset for the conversational
competition.1Most of the works on persona-based dialogue
generation are conducted on the ConvAI2 dataset, so we will
use the ConvAI2 dataset as our primary training and eval-
uation dataset. Zhang et al. utilized LSTM (Hochreiter and
Schmidhuber 1997) to generate a response from persona and
context. Later, TransferTransfo (Wolf et al. 2019) leveraged
the pre-trained language model by fine-tuning the dataset on
the GPT2 model with the concatenated input. Meanwhile,
BERT over BERT (BoB) (Song et al. 2021) is composed
of three BERTs (Devlin et al. 2019), which is trained with
both negative log-likelihood and unlikelihood losses. BoB
utilizes the MNLI dataset (Williams, Nangia, and Bowman
2018) as an auxiliary dataset to help the model recognize the
positive and negative samples given an anchor sample. P2
bot (Liu et al. 2020) addressed the persona-based dialogue
task by introducing a transmitter and receiver model, which
is further tuned with reinforcement learning on manually de-
signed rewards. A recent work (Cao et al. 2022) tackled the
problem in a model-agnostic fashion, providing strategies
for data augmentation and curriculum learning. Distinctively
different from these previous works, we propose an effective
approach without the aid of external datasets or complicated
training setups.
Attention Mechanisms for Conditional Dialogue
Generation
Several studies introduced explicitly designed cross-
attention to address dialogue generation. Those works are
tailored either on condition sentences (Zheng et al. 2020)
or categorical label (Zeng and Nie 2021). Zheng et al.
proposed an attention routing structure that facilitates the
weight from persona information to generate the response.
The attention routing structure adds the cross-attention/self-
attention results from persona-response, context-response,
and response-response pairs together to obtain a fused cross-
attention to balance the weights among different sources of
input. Those cross-attention/self-attentions are also calcu-
lated in our approach. However, instead of calculating the
weights from an external predictor, our approach computes
these within the framework, followed by applying the mask-
ing on the weighted cross-attention results to alleviate the
training difficulties.
In addition, Zeng and Nie introduced a condition-aware
transformer block into their model to determine the amount
of condition information as a bias in word generation prob-
ability at a position (Zeng and Nie 2021). In the condition-
aware block, the keys and values from a condition (e.g., topic
1http://convai.io/2018/
label) and context are concatenated. Then the block calcu-
lates the concatenated content in a cross-attention to obtain
a bias term, which is then added to the self-attention. Un-
like the condition-aware block approach, our model gener-
ates two masks with weights to balance the information from
persona and context rather than through a bias term. In ad-
dition, our framework takes persona and context text as in-
put, while condition-aware transformer (Zeng and Nie 2021)
uses the categorical label and context text as input.
Methodology
Task Formulation
Suppose that we have a persona-based conversation ses-
sion C={P, U}, where each persona P={p1, . . . , pe}
is composed of eprofile sentences that describe the back-
ground of an interlocutor and the dialogue context U=
{uh,1, um,1, ..., uh,n}includes the utterances spoken by the
first interlocutor (e.g., human) hand the second interlocu-
tor (e.g., machine) minteractively. In the persona-based
dialogue generation task, Prepresents the persona for m
and the conversational session always starts with h. There-
fore, the objective of this task is to generate the response
r=um,n given persona Pand the dialogue context U.
Overall Framework
As depicted in Figure 2, our framework consists of two en-
coders and one decoder with PAA to perform the decod-
ing process. The encoding layer uses a transformer encoder
architecture to encode persona Pand dialogue context U,
respectively, into latent representations. The encoder layers
are randomly initialized, while the decoder layers are ini-
tialized with the pre-trained GPT2. The persona information
is fed to the persona encoder as well as the decoder as a
prompt, offering strong guidance for GPT2 to decode the
target response. PAA handles the cross-attentions from the
persona and context information to balance and regularize
the two parts by weighting and masking.
Inputs for Persona-Adaptive Attention
Before presenting the proposed PAA, in this section, we
introduce the decoder’s self-attention and encoder-decoder
cross-attention as the inputs for the PAA.
Firstly, the persona Pand context Uare processed sep-
arately by two encoders. Let IP={tP
1, ..., tP
l}denote
a concatenation of all sentences in P, where tP
iis the i-
th token in the persona Pwith total ltokens. Meanwhile,
IU={tU
1, ..., tU
k}represents the token sequence for the con-
catenated context content U. Then, we use the bi-directional
transformer encoders for encoding the text span. Generally,
we get the encoder results from IPand IUas
hP=EncoderP(IP),
hU=EncoderU(IU),(1)
where EncoderPand EncoderUdenote the bi-directional
transformer encoders for persona and context. hPRl×d
and hURk×dare the hidden states before the last pooling
layer from the encoders, where dis the output dimension of
the encoders.
Since our framework adopts the encoder-decoder struc-
ture, we process the persona-prompted response in the de-
coder. Specifically, to model the ty
r+1 that is the (r+ 1)-
th token in the response, we calculate the self-attention on
IR={IP,[BOS], ty
1, ..., ty
r}, where [BOS]is a special to-
ken indicating the begin of the sentence and ty
iis the i-th
decoded response token. Formally, the self-attention result
from IRcan be expressed as
hR=Self-Attention(IR) + MR,
ˆ
hR=AddNorm(hR),(2)
where hR,ˆ
hRR(l+r)×d, and MRis the decoder’s mask
to make the self-attention calculation uni-directional.
After obtaining the encoders’ hidden states hPand hU,
as well as the decoder’s self-attention output hR, we then
calculate the cross-attention based on the (hP, hR)and
(hU, hR)pairs. The cross-attention is calculated in a simi-
lar way to the self-attention, where Kand Vare provided
from the encoder and Qis from the decoder. In detail, we
can formulate cross-attention as
oP=Softmax(QrK
p
d)Vp,
oU=Softmax(QrK
u
d)Vu,
(3)
where QrR(l+r)×ddenotes a linear transformation of
ˆ
hR,, Kp, VpRl×ddenote linear transformations of hP,,
Ku, VuRk×dcome from linear transformations of hU,,
and dis the dimension of the attention head. By calculat-
ing the cross-attentions, we obtain the correlation results be-
tween the encoders and decoder, which serve as the parts of
input for PAA.
Persona-Adaptive Attention
To fuse the cross-attention results, the proposed PAA will
use the weighting and masking mechanisms to utilize the
persona information.
Specifically, we take the self-attention result hRand
cross-attention result oPas input to generate the initial
weights wpersona for the persona information. The motiva-
tion behind this operation is to enable the model to consider
the relationship between persona and the response in both
self-attention and cross-attention fashions. Formally, this op-
eration can be presented as
mp=F C([hR;oP]),
wpersona =Sigmoid(mp).(4)
In Eq. (4), [; ] denotes the concatenation operation, and
hR, oPare firstly mapped into mpR(l+r)×dusing a lin-
ear layer F C followed by a Sigmoid(·)to obtain the initial
weight for the persona cross-attention. The weight is then
applied to the persona-response and context-response cross-
attention results to form a complementary relationship, lead-
ing to the weighted cross-attention ˜oPand ˜oUas
˜oP=wpersonaoP,
˜oU= (1 wpersona)oU.(5)
摘要:

PersonalizedDialogueGenerationwithPersona-AdaptiveAttentionQiushiHuang1,2,YuZhang2*,TomKo3,XuboLiu1,BoWu4,WenwuWang1,HTang1*1UniversityofSurrey2SouthernUniversityofScienceandTechnology3ByteDanceAILab4MIT-IBMWatsonAILab{qiushi.huang,xubo.liu,w.wang,h.tang}@surrey.ac.uk,{yu.zhang.ust,tomkocse}@gmail.c...

展开>> 收起<<
Personalized Dialogue Generation with Persona-Adaptive Attention Qiushi Huang12 Yu Zhang2 Tom Ko3 Xubo Liu1 Bo Wu4 Wenwu Wang1 H Tang1 1University of Surrey.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:578.23KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注