Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning Yi Cheng1 Wenge Liu2 Wenjie Li1y Jiashuo Wang1

2025-05-08 1 0 790.37KB 13 页 10玖币

侵权投诉

Improving Multi-turn Emotional Support Dialogue Generation with

Lookahead Strategy Planning

Yi Cheng1∗, Wenge Liu2∗, Wenjie Li1†, Jiashuo Wang1,

Ruihui Zhao3, Bang Liu4

,Xiaodan Liang5, Yefeng Zheng3

1Hong Kong Polytechnic University 2Baidu Inc., Beijing, China 3Tencent Jarvis Lab

4RALI & Mila, Université de Montréal 5Sun Yat-sen University

{csycheng,cswjli,jessiejs.wang}@comp.polyu.edu.hk,

{kzllwg,xdliang328}@gmail.com, bang.liu@umontreal.ca,

{zacharyzhao,yefengzheng}@tencent.com

Abstract

Providing Emotional Support (ES) to soothe

people in emotional distress is an essential ca-

pability in social interactions. Most existing

research on building ES conversation systems

only considers single-turn interactions with

users, which is over-simpliﬁed. In comparison,

multi-turn ES conversation systems can pro-

vide ES more effectively, but face several new

technical challenges, including: i) how to con-

duct support strategy planning that could lead

to the best supporting effects; ii) how to dy-

namically model the user’s state. In this paper,

we propose a novel system named MultiESC

to address these issues. For strategy planning,

drawing inspiration from the A* search algo-

rithm, we propose lookahead heuristics to esti-

mate the future user feedback after using par-

ticular strategies, which helps to select strate-

gies that can lead to the best long-term effects.

For user state modeling, MultiESC focuses on

capturing users’ subtle emotional expressions

and understanding their emotion causes. Ex-

tensive experiments show that MultiESC sig-

niﬁcantly outperforms competitive baselines

in both strategy planning and dialogue gen-

eration. Our codes are available at https:

//github.com/lwgkzl/MultiESC.

1 Introduction

Almost every human has experienced emotional

distress, even if not suffering from any mental dis-

orders. Frequently, people deal with the distress

by seeking Emotional Support (ES) from social

interactions (Langford et al.,1997;Greene,2003).

Nevertheless, ES from family and friends is not al-

ways available (Webber and Mascari,2018). With

the potential of providing more people with in-time

support, developing Emotional Support Conversa-

tion (ESC) systems has attracted much attention.

However, since early ES datasets are constructed

∗Equal contribution.

†Corresponding author.

(Greeting) Hello. How can I be of service tonight?

(Questioning) Tell me more please. I am all ears.

When did this happen? How long ago?

(Reflection of Feelings) The fact that he cheated

on you and you broke up with him must be hard.

(Providing Suggestions) Maybe you should

look ahead. Focus on your future…

I'm just feeling depressed over the breakup.

Hoping for some inspiration.

Just last week. I came home from work early ...

Yeah, we're over. And it is hard. I don't think I

could ever get past it.

…

Figure 1: An example of an emotional support conver-

sation between the support-seeker (left) and the sup-

porter (right). The support strategies adopted by the

supporter are presented in red italics before the utter-

ances.

by crawling post-response pairs from online fo-

rums, they only contain single-turn conversations

(Medeiros and Bosse,2018;Sharma et al.,2020).

Thus, most of the existing research on ESC also

only considers single-turn interactions with the user

(Medeiros and Bosse,2018;Sharma et al.,2020,

2021), which is over-simpliﬁed and has limited

support effects. It was not until recently that Liu

et al. (2021) released the ﬁrst large-scale multi-turn

ES dataset, ESCONV. They also designed an ESC

framework, suggesting the conversation procedures

and support strategies for multi-turn ESC.

Compared to the single-turn scenario, develop-

ing multi-turn ESC systems faces several new chal-

lenges. One signiﬁcant challenge is support strat-

egy planning. As pointed out in the psychologi-

cal literature, particular procedures and strategies

are indispensable for effective emotional support

(Greene,2003;Hill,2009). As in Fig. 1, the sup-

porter strategically soothes the support-seeker by

ﬁrst caringly inquiring about the situation, then

resonating with the seeker’s feelings, and ﬁnally

arXiv:2210.04242v1 [cs.CL] 9 Oct 2022

providing suggestions to evoke positive emotions.

Notably, strategy planning in ESC should be

conducted on a long planning horizon. That is,

instead of merely considering the dialogue history

or foreseeing the immediate effect after using the

strategy, the system should further look ahead, to

consider how much the adopted strategy would

contribute to reducing the user’s emotional distress

at a long run. Though some strategies may not

directly provide comfort, they are still essential

for reaching the long-term dialogue goal, such as

greetings at the beginning of the conversation and

inquiring about the user’s experiences.

Another challenge for multi-turn ESC is how

to dynamically model the user’s state during the

conversation. Prior works on emotion-related dia-

logue tasks mainly detect the user’s coarse-grained

emotion type to enhance dialogue generation (Lin

et al.,2019;Majumder et al.,2020;Li et al.,2020a).

However, such practice is not completely appropri-

ate for ESC. The reason is that the user’s emotion

in ESC almost stays the same type, such as being

sad, throughout the conversation. Instead, it often

changes subtly in terms of emotion intensity. Be-

sides, effective ES requires more than only identify-

ing the user’s emotion. A thorough understanding

of the user’s situation is also essential.

In this paper, we propose a multi-turn ESC sys-

tem MultiESC to address the above issues. For

strategy planning, we draw inspiration from the

A∗

search algorithm (Hart et al.,1968;Pearl,1985) and

its recent application in constrained text generation

(Lu et al.,2021), which addressed the challenge

of planning ahead by incorporating heuristic esti-

mation of future cost. In MultiESC, we develop

lookahead heuristics to estimate the expectation of

the future user feedback to help select the strategy

that can lead to the best long-term effect. Con-

cretely, we implement a strategy sequence genera-

tor to produce the probability of the future strategy

sequences, and a user feedback predictor to pre-

dict the feedback after applying the sequence of

strategies. For user state modeling, MultiESC cap-

tures the user’s subtle emotion expressed in the

context by incorporating external knowledge from

the NRC VAD lexicon (Mohammad,2018). More-

over, it identiﬁes the user’s emotion causes (i.e.,

the experiences that caused the depressed emotion)

to more thoroughly understand the user’s situation.

In summary, our contributions are as follows:

•

We propose a multi-turn ESC system, MultiESC,

which conducts support strategy planning with

foresight of the user feedback and dynamically

tracks the user’s state by capturing the subtle

emotional expressions and the emotion causes.

•

It is a pioneer work that adopts

A∗

-like looka-

head heuristics to achieve dialogue strategy se-

lection on a long planning horizon.

•

Experiments show that MultiESC signiﬁcantly

outperforms a set of state-of-the-art models

in generation quality and strategy planning,

demonstrating the effectiveness of our proposed

method.

2 Related Work

Emotional Support Conversation Systems.

Since early ES datasets were mainly composed of

single-turn conversations (Medeiros and Bosse,

2018;Sharma et al.,2020), most research on

developing ESC systems only considered the

simpliﬁed scenario of single-turn interactions

with the user (Sharma et al.,2021;Hosseini and

Caragea,2021). The few works that developed

multi-turn ES chatbots rely on predeﬁned tem-

plates and handcrafted rules (Zwaan et al.,2012),

which suffer from limited generality. It was not

until last year that Liu et al. (2021) released the

ﬁrst multi-turn ESC dataset ESCONV. Following

Liu et al. (2021), Peng et al. (2022) and Tu et al.

(2022) recently explored data-driven multi-turn

ESC systems. Peng et al. (2022) proposed

a hierarchical graph network to capture both

the global context and the local user intention.

They did not consider strategy planning, which

is critical in multi-turn ESC. Tu et al. (2022)

proposed to enhance context encoding with

commonsense knowledge and use the predicted

strategy distribution to guide response generation.

Nevertheless, their method of strategy prediction,

directly implemented with a vanilla Transformer

encoder, was relatively preliminary and did not

consider any user-feedback-oriented planning as

we do.

Empathetic Response Generation.

Empathetic

Response Generation (ERG) (Rashkin et al.,2019)

is a research area closely related to ESC, as being

empathetic is a crucial ability for providing emo-

tional support (Greene,2003;Pérez-Rosas et al.,

2017). However, ERG does not has the explicit

goal of proactively soothing the user’s negative

emotion. Instead, it only reactively generates re-

sponses that are consistent with the user’s emotion

User State

Modeling

Dialogue

Encoder

Dialogue

History

Strategy

Planning

Utterance

Decoder

𝑠𝑡

𝑥t

User State

Embeddings

Dialogue History

Embeddings

Strategy

Utterance

Figure 2: The overall framework of MultiESC. Details

about the user state modeling and the strategy planning

modules are shown in Fig. 3and Fig. 4, respectively.

(Lin et al.,2019;Majumder et al.,2020;Li et al.,

2020a;Zheng et al.,2021;Wang et al.,2021).

3 Preliminaries

ESConv.

Our research is conducted on ESCONV.

It is a long conversation dataset, with an average of

29.8 utterances in each dialogue. It also includes

rich annotations, such as the strategies adopted by

the supporter and the user feedback scores. There

are overall eight types of strategies (e.g., question,

reﬂection of feelings and self-disclosure). The user

feedback score indicates how much the user’s emo-

tional distress is reduced during the conversation.

They are marked by the support-seekers on a Likert

scale with ﬁve levels after every two turns. More

data statistics are provided in the appendix.

NRC VAD Lexicon.

The NRC VAD lexicon in-

cludes the Valence-Arousal-Dominance (VAD)

scores of 20,000 English words. The VAD score

of a word measures its underlying emotion in

three dimensions: valence (pleased-displeased),

arousal (excited-calm), and dominance (dominant-

submissive). For example, the VAD scores of “lone-

liness” and “abandon” are (0.15, 0.18, 0.22) and

(0.05, 0.52, 0.25), respectively. The VAD model

captures a wide range of emotions and allows dif-

ferent emotions to be comparable.

Problem Formulation of ESC.

Denote the utter-

ances from the system and the user at the

-th round

of the conversation are respectively

(xi, yi)

while

the user’s state is

(

=1, 2, ...,

). Suppose the

set of all support strategies is

. At the

-th turn,

given the dialogue history

{(xi, yi)}t−1

i=1

, the

system tracks the user states

{u1, u2, ..., ut−1}

from

and generates the next utterance

, using

an appropriate support strategy ˆ

st∈ S.

We suppose that ESCs are always initiated by the system

(or the supporter).

[CLS]

𝑥1

[SEP]

𝑦1

[EC]

Emotion Embeddings

Emotion

Transformer Encoder

Emotion Cause

Detection

Round 1

Round 2

Round t-1

…

ut−1

𝑥1: Is there anything I can do to help?

𝑦1: I feel so lonely these days. I have

not seen my friends for a long time.

Figure 3: The architecture of the user state modeling

module in MultiESC.

4 Methodology

As shown in Fig. 2, our proposed system Multi-

ESC consists of four modules. The dialogue en-

coder ﬁrst converts the dialogue history

into

the embeddings

. At the same time, the user

state modeling module extracts the user state in-

formation, producing the embeddings

. Then,

given

and

, the strategy planning module se-

lects the strategy

. Finally, the utterance decoder

generates the utterance

, adopting the strategy

4.1 Dialogue Encoder

The dialogue encoder module is implemented with

a Transformer encoder (Vaswani et al.,2017). We

concatenate the utterances in

and keep the last

tokens of the concatenation as its input sequence.

Given the input, it produces the dialogue history

embeddings Ht∈RN×demb .

4.2 User State Modeling

Fig. 3illustrates the workﬂow of user state mod-

eling. To identify the user’s state at the

-th round

of the conversation, we ﬁrst extract the emotion

cause mentioned at this round, denoted as

, with

an off-the-shelf detector

trained on a large-scale

emotion cause detection dataset (Poria et al.,2021).

For example, in Fig. 3,

=“ I have not seen my

friends for a long time”. Then, we concatenate the

dialogue content

and the emotion cause

with special separator tokens to form the input of a

Transformer encoder. Here, the system’s utterance

is also considered because it often provides nec-

essary context for understanding the user’s state.

The input sequence is represented as the positional

sum of emotion embeddings, word embeddings,

and positional embeddings.

The emotion embeddings are used to fuse

the emotion information. They are obtained

2https://github.com/declare-lab/RECCON

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingMulti-turnEmotionalSupportDialogueGenerationwithLookaheadStrategyPlanningYiCheng1,WengeLiu2,WenjieLi1y,JiashuoWang1,RuihuiZhao3,BangLiu4;XiaodanLiang5,YefengZheng31HongKongPolytechnicUniversity2BaiduInc.,Beijing,China3TencentJarvisLab4RALI&Mila,UniversitédeMontréal5SunYat-senUniversity{cs...

展开>> 收起<<

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning Yi Cheng1 Wenge Liu2 Wenjie Li1y Jiashuo Wang1.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning Yi Cheng1 Wenge Liu2 Wenjie Li1y Jiashuo Wang1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: