
dialogue system. From the above two scenarios, it can be
seen that the weights between context and persona should
be adjusted accordingly, which is important for a dialogue
model to build long-term relationships with users.
Most existing works on persona-based dialogue gen-
eration tasks have primarily addressed the data scarcity
challenge by utilizing external data or sophisticated train-
ing processes. For instance, Song et al. use the MNLI
dataset (Williams, Nangia, and Bowman 2018) as auxiliary
tasks, Cao et al. augment the data through text manipula-
tion, Roller et al. add other dialogue datasets in pretext tasks,
and Liu et al. adopt multi-stage training with reinforcement
learning. Those works obtained decent performance, but few
of them considered the second challenge.
To address the aforementioned second challenge, in this
paper, we design a Persona-Adaptive Attention (PAA) to
dynamically learn the weights of the persona and context
information in the proposed framework. To enhance the per-
sona information in the PAA, we prepend the persona in the
decoder as a prompt so that the weights can capture more
persona-related information. To balance the context and per-
sona information, the PAA takes two cross-attention and the
self-attention from the persona-prompted decoder to com-
pute the weights for combining the latent representations
from the context and persona. Moreover, inspired by some
findings in (Welleck et al. 2019; Cao et al. 2022; ?) that not
all context and persona information is useful to generate the
response, we design two dynamic masks to the weighted la-
tent representation to not only remove redundant informa-
tion but also act as a regularizer in the PAA.
As a byproduct, extensive experiments on the ConvAI2
dataset show that the proposed framework achieves compa-
rable or even better performance than existing works without
the use of external datasets or sophisticated training proce-
dures. One reason is that our framework explicitly consid-
ered learning the weights between context and persona in
the architecture design that can perform well under a low-
data regime. This observation indicates that the proposed
framework could also alleviate the first challenge, making
the proposed framework kill two birds with one stone. This
demonstrates the effectiveness of the proposed framework.
Our contributions can be summarized as follows.
• We propose the PAA in an encoder-decoder framework.
This framework models the persona and context infor-
mation by two separate transformer encoders, which are
then fused in the persona-prompted decoder by the pro-
posed PAA mechanism.
• Extensive experiments on the ConvAI2 dataset show that
the proposed model performs comparably to or even bet-
ter than strong baseline methods by about 30% improve-
ment in terms of the perplexity metric.
• We demonstrate that our framework is a data-efficient ar-
chitecture that can achieve comparable performance with
20% to 30% of the training data compared with a larger
model such as GPT2 (Radford et al. 2019) trained on the
full dataset.
Related Work
Persona-based Dialogue Generation
There is a growing interest in persona-based dialogue gen-
eration tasks, especially the work on the PersonaChat/Con-
vAI2 dataset. The release of the PersonaChat dataset (Zhang
et al. 2018) has provoked vibrant research in integrating ex-
plicit persona into dialogue response generation. The Con-
vAI2 dataset (Dinan et al. 2019) is a further split of the Per-
sonaChat dataset to serve as a dataset for the conversational
competition.1Most of the works on persona-based dialogue
generation are conducted on the ConvAI2 dataset, so we will
use the ConvAI2 dataset as our primary training and eval-
uation dataset. Zhang et al. utilized LSTM (Hochreiter and
Schmidhuber 1997) to generate a response from persona and
context. Later, TransferTransfo (Wolf et al. 2019) leveraged
the pre-trained language model by fine-tuning the dataset on
the GPT2 model with the concatenated input. Meanwhile,
BERT over BERT (BoB) (Song et al. 2021) is composed
of three BERTs (Devlin et al. 2019), which is trained with
both negative log-likelihood and unlikelihood losses. BoB
utilizes the MNLI dataset (Williams, Nangia, and Bowman
2018) as an auxiliary dataset to help the model recognize the
positive and negative samples given an anchor sample. P2
bot (Liu et al. 2020) addressed the persona-based dialogue
task by introducing a transmitter and receiver model, which
is further tuned with reinforcement learning on manually de-
signed rewards. A recent work (Cao et al. 2022) tackled the
problem in a model-agnostic fashion, providing strategies
for data augmentation and curriculum learning. Distinctively
different from these previous works, we propose an effective
approach without the aid of external datasets or complicated
training setups.
Attention Mechanisms for Conditional Dialogue
Generation
Several studies introduced explicitly designed cross-
attention to address dialogue generation. Those works are
tailored either on condition sentences (Zheng et al. 2020)
or categorical label (Zeng and Nie 2021). Zheng et al.
proposed an attention routing structure that facilitates the
weight from persona information to generate the response.
The attention routing structure adds the cross-attention/self-
attention results from persona-response, context-response,
and response-response pairs together to obtain a fused cross-
attention to balance the weights among different sources of
input. Those cross-attention/self-attentions are also calcu-
lated in our approach. However, instead of calculating the
weights from an external predictor, our approach computes
these within the framework, followed by applying the mask-
ing on the weighted cross-attention results to alleviate the
training difficulties.
In addition, Zeng and Nie introduced a condition-aware
transformer block into their model to determine the amount
of condition information as a bias in word generation prob-
ability at a position (Zeng and Nie 2021). In the condition-
aware block, the keys and values from a condition (e.g., topic
1http://convai.io/2018/