Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning Yi Cheng1 Wenge Liu2 Wenjie Li1y Jiashuo Wang1

2025-05-08 0 0 790.37KB 13 页 10玖币
侵权投诉
Improving Multi-turn Emotional Support Dialogue Generation with
Lookahead Strategy Planning
Yi Cheng1, Wenge Liu2, Wenjie Li1, Jiashuo Wang1,
Ruihui Zhao3, Bang Liu4
,Xiaodan Liang5, Yefeng Zheng3
1Hong Kong Polytechnic University 2Baidu Inc., Beijing, China 3Tencent Jarvis Lab
4RALI & Mila, Université de Montréal 5Sun Yat-sen University
{csycheng,cswjli,jessiejs.wang}@comp.polyu.edu.hk,
{kzllwg,xdliang328}@gmail.com, bang.liu@umontreal.ca,
{zacharyzhao,yefengzheng}@tencent.com
Abstract
Providing Emotional Support (ES) to soothe
people in emotional distress is an essential ca-
pability in social interactions. Most existing
research on building ES conversation systems
only considers single-turn interactions with
users, which is over-simplified. In comparison,
multi-turn ES conversation systems can pro-
vide ES more effectively, but face several new
technical challenges, including: i) how to con-
duct support strategy planning that could lead
to the best supporting effects; ii) how to dy-
namically model the user’s state. In this paper,
we propose a novel system named MultiESC
to address these issues. For strategy planning,
drawing inspiration from the A* search algo-
rithm, we propose lookahead heuristics to esti-
mate the future user feedback after using par-
ticular strategies, which helps to select strate-
gies that can lead to the best long-term effects.
For user state modeling, MultiESC focuses on
capturing users’ subtle emotional expressions
and understanding their emotion causes. Ex-
tensive experiments show that MultiESC sig-
nificantly outperforms competitive baselines
in both strategy planning and dialogue gen-
eration. Our codes are available at https:
//github.com/lwgkzl/MultiESC.
1 Introduction
Almost every human has experienced emotional
distress, even if not suffering from any mental dis-
orders. Frequently, people deal with the distress
by seeking Emotional Support (ES) from social
interactions (Langford et al.,1997;Greene,2003).
Nevertheless, ES from family and friends is not al-
ways available (Webber and Mascari,2018). With
the potential of providing more people with in-time
support, developing Emotional Support Conversa-
tion (ESC) systems has attracted much attention.
However, since early ES datasets are constructed
Equal contribution.
Corresponding author.
(Greeting) Hello. How can I be of service tonight?
(Questioning) Tell me more please. I am all ears.
When did this happen? How long ago?
(Reflection of Feelings) The fact that he cheated
on you and you broke up with him must be hard.
(Providing Suggestions) Maybe you should
look ahead. Focus on your future…
I'm just feeling depressed over the breakup.
Hoping for some inspiration.
Just last week. I came home from work early ...
Yeah, we're over. And it is hard. I don't think I
could ever get past it.
Figure 1: An example of an emotional support conver-
sation between the support-seeker (left) and the sup-
porter (right). The support strategies adopted by the
supporter are presented in red italics before the utter-
ances.
by crawling post-response pairs from online fo-
rums, they only contain single-turn conversations
(Medeiros and Bosse,2018;Sharma et al.,2020).
Thus, most of the existing research on ESC also
only considers single-turn interactions with the user
(Medeiros and Bosse,2018;Sharma et al.,2020,
2021), which is over-simplified and has limited
support effects. It was not until recently that Liu
et al. (2021) released the first large-scale multi-turn
ES dataset, ESCONV. They also designed an ESC
framework, suggesting the conversation procedures
and support strategies for multi-turn ESC.
Compared to the single-turn scenario, develop-
ing multi-turn ESC systems faces several new chal-
lenges. One significant challenge is support strat-
egy planning. As pointed out in the psychologi-
cal literature, particular procedures and strategies
are indispensable for effective emotional support
(Greene,2003;Hill,2009). As in Fig. 1, the sup-
porter strategically soothes the support-seeker by
first caringly inquiring about the situation, then
resonating with the seeker’s feelings, and finally
arXiv:2210.04242v1 [cs.CL] 9 Oct 2022
providing suggestions to evoke positive emotions.
Notably, strategy planning in ESC should be
conducted on a long planning horizon. That is,
instead of merely considering the dialogue history
or foreseeing the immediate effect after using the
strategy, the system should further look ahead, to
consider how much the adopted strategy would
contribute to reducing the user’s emotional distress
at a long run. Though some strategies may not
directly provide comfort, they are still essential
for reaching the long-term dialogue goal, such as
greetings at the beginning of the conversation and
inquiring about the user’s experiences.
Another challenge for multi-turn ESC is how
to dynamically model the user’s state during the
conversation. Prior works on emotion-related dia-
logue tasks mainly detect the user’s coarse-grained
emotion type to enhance dialogue generation (Lin
et al.,2019;Majumder et al.,2020;Li et al.,2020a).
However, such practice is not completely appropri-
ate for ESC. The reason is that the user’s emotion
in ESC almost stays the same type, such as being
sad, throughout the conversation. Instead, it often
changes subtly in terms of emotion intensity. Be-
sides, effective ES requires more than only identify-
ing the user’s emotion. A thorough understanding
of the user’s situation is also essential.
In this paper, we propose a multi-turn ESC sys-
tem MultiESC to address the above issues. For
strategy planning, we draw inspiration from the
A
search algorithm (Hart et al.,1968;Pearl,1985) and
its recent application in constrained text generation
(Lu et al.,2021), which addressed the challenge
of planning ahead by incorporating heuristic esti-
mation of future cost. In MultiESC, we develop
lookahead heuristics to estimate the expectation of
the future user feedback to help select the strategy
that can lead to the best long-term effect. Con-
cretely, we implement a strategy sequence genera-
tor to produce the probability of the future strategy
sequences, and a user feedback predictor to pre-
dict the feedback after applying the sequence of
strategies. For user state modeling, MultiESC cap-
tures the user’s subtle emotion expressed in the
context by incorporating external knowledge from
the NRC VAD lexicon (Mohammad,2018). More-
over, it identifies the user’s emotion causes (i.e.,
the experiences that caused the depressed emotion)
to more thoroughly understand the user’s situation.
In summary, our contributions are as follows:
We propose a multi-turn ESC system, MultiESC,
which conducts support strategy planning with
foresight of the user feedback and dynamically
tracks the user’s state by capturing the subtle
emotional expressions and the emotion causes.
It is a pioneer work that adopts
A
-like looka-
head heuristics to achieve dialogue strategy se-
lection on a long planning horizon.
Experiments show that MultiESC significantly
outperforms a set of state-of-the-art models
in generation quality and strategy planning,
demonstrating the effectiveness of our proposed
method.
2 Related Work
Emotional Support Conversation Systems.
Since early ES datasets were mainly composed of
single-turn conversations (Medeiros and Bosse,
2018;Sharma et al.,2020), most research on
developing ESC systems only considered the
simplified scenario of single-turn interactions
with the user (Sharma et al.,2021;Hosseini and
Caragea,2021). The few works that developed
multi-turn ES chatbots rely on predefined tem-
plates and handcrafted rules (Zwaan et al.,2012),
which suffer from limited generality. It was not
until last year that Liu et al. (2021) released the
first multi-turn ESC dataset ESCONV. Following
Liu et al. (2021), Peng et al. (2022) and Tu et al.
(2022) recently explored data-driven multi-turn
ESC systems. Peng et al. (2022) proposed
a hierarchical graph network to capture both
the global context and the local user intention.
They did not consider strategy planning, which
is critical in multi-turn ESC. Tu et al. (2022)
proposed to enhance context encoding with
commonsense knowledge and use the predicted
strategy distribution to guide response generation.
Nevertheless, their method of strategy prediction,
directly implemented with a vanilla Transformer
encoder, was relatively preliminary and did not
consider any user-feedback-oriented planning as
we do.
Empathetic Response Generation.
Empathetic
Response Generation (ERG) (Rashkin et al.,2019)
is a research area closely related to ESC, as being
empathetic is a crucial ability for providing emo-
tional support (Greene,2003;Pérez-Rosas et al.,
2017). However, ERG does not has the explicit
goal of proactively soothing the user’s negative
emotion. Instead, it only reactively generates re-
sponses that are consistent with the user’s emotion
User State
Modeling
Dialogue
Encoder
Ut
Ht
Dialogue
History
Strategy
Planning
Utterance
Decoder
𝑠𝑡
𝑥t
User State
Embeddings
Dialogue History
Embeddings
Strategy
Utterance
Figure 2: The overall framework of MultiESC. Details
about the user state modeling and the strategy planning
modules are shown in Fig. 3and Fig. 4, respectively.
(Lin et al.,2019;Majumder et al.,2020;Li et al.,
2020a;Zheng et al.,2021;Wang et al.,2021).
3 Preliminaries
ESConv.
Our research is conducted on ESCONV.
It is a long conversation dataset, with an average of
29.8 utterances in each dialogue. It also includes
rich annotations, such as the strategies adopted by
the supporter and the user feedback scores. There
are overall eight types of strategies (e.g., question,
reflection of feelings and self-disclosure). The user
feedback score indicates how much the user’s emo-
tional distress is reduced during the conversation.
They are marked by the support-seekers on a Likert
scale with five levels after every two turns. More
data statistics are provided in the appendix.
NRC VAD Lexicon.
The NRC VAD lexicon in-
cludes the Valence-Arousal-Dominance (VAD)
scores of 20,000 English words. The VAD score
of a word measures its underlying emotion in
three dimensions: valence (pleased-displeased),
arousal (excited-calm), and dominance (dominant-
submissive). For example, the VAD scores of “lone-
liness” and “abandon” are (0.15, 0.18, 0.22) and
(0.05, 0.52, 0.25), respectively. The VAD model
captures a wide range of emotions and allows dif-
ferent emotions to be comparable.
Problem Formulation of ESC.
Denote the utter-
ances from the system and the user at the
i
-th round
of the conversation are respectively
(xi, yi)
,
1
while
the user’s state is
ui
(
i
=1, 2, ...,
nR
). Suppose the
set of all support strategies is
S
. At the
t
-th turn,
given the dialogue history
Ht
=
{(xi, yi)}t1
i=1
, the
system tracks the user states
Ut
=
{u1, u2, ..., ut1}
from
Ht
and generates the next utterance
xt
, using
an appropriate support strategy ˆ
st∈ S.
1
We suppose that ESCs are always initiated by the system
(or the supporter).
[CLS]
𝑥1
[SEP]
𝑦1
[EC]
c1
Emotion Embeddings
Emotion
KB
Transformer Encoder
Emotion Cause
Detection
Round 1
Round 2
Round t-1
u1
u2
ut−1
Ut
𝑥1: Is there anything I can do to help?
𝑦1: I feel so lonely these days. I have
not seen my friends for a long time.
Figure 3: The architecture of the user state modeling
module in MultiESC.
4 Methodology
As shown in Fig. 2, our proposed system Multi-
ESC consists of four modules. The dialogue en-
coder first converts the dialogue history
Ht
into
the embeddings
Ht
. At the same time, the user
state modeling module extracts the user state in-
formation, producing the embeddings
Ut
. Then,
given
Ht
and
Ut
, the strategy planning module se-
lects the strategy
st
. Finally, the utterance decoder
generates the utterance
xt
, adopting the strategy
st
.
4.1 Dialogue Encoder
The dialogue encoder module is implemented with
a Transformer encoder (Vaswani et al.,2017). We
concatenate the utterances in
Ht
and keep the last
N
tokens of the concatenation as its input sequence.
Given the input, it produces the dialogue history
embeddings HtRN×demb .
4.2 User State Modeling
Fig. 3illustrates the workflow of user state mod-
eling. To identify the user’s state at the
i
-th round
of the conversation, we first extract the emotion
cause mentioned at this round, denoted as
ci
, with
an off-the-shelf detector
2
trained on a large-scale
emotion cause detection dataset (Poria et al.,2021).
For example, in Fig. 3,
c1
=“ I have not seen my
friends for a long time”. Then, we concatenate the
dialogue content
xi
,
yi
and the emotion cause
ci
with special separator tokens to form the input of a
Transformer encoder. Here, the system’s utterance
xi
is also considered because it often provides nec-
essary context for understanding the user’s state.
The input sequence is represented as the positional
sum of emotion embeddings, word embeddings,
and positional embeddings.
The emotion embeddings are used to fuse
the emotion information. They are obtained
2https://github.com/declare-lab/RECCON
摘要:

ImprovingMulti-turnEmotionalSupportDialogueGenerationwithLookaheadStrategyPlanningYiCheng1,WengeLiu2,WenjieLi1y,JiashuoWang1,RuihuiZhao3,BangLiu4;XiaodanLiang5,YefengZheng31HongKongPolytechnicUniversity2BaiduInc.,Beijing,China3TencentJarvisLab4RALI&Mila,UniversitédeMontréal5SunYat-senUniversity{cs...

展开>> 收起<<
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning Yi Cheng1 Wenge Liu2 Wenjie Li1y Jiashuo Wang1.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:790.37KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注