providing suggestions to evoke positive emotions.
Notably, strategy planning in ESC should be
conducted on a long planning horizon. That is,
instead of merely considering the dialogue history
or foreseeing the immediate effect after using the
strategy, the system should further look ahead, to
consider how much the adopted strategy would
contribute to reducing the user’s emotional distress
at a long run. Though some strategies may not
directly provide comfort, they are still essential
for reaching the long-term dialogue goal, such as
greetings at the beginning of the conversation and
inquiring about the user’s experiences.
Another challenge for multi-turn ESC is how
to dynamically model the user’s state during the
conversation. Prior works on emotion-related dia-
logue tasks mainly detect the user’s coarse-grained
emotion type to enhance dialogue generation (Lin
et al.,2019;Majumder et al.,2020;Li et al.,2020a).
However, such practice is not completely appropri-
ate for ESC. The reason is that the user’s emotion
in ESC almost stays the same type, such as being
sad, throughout the conversation. Instead, it often
changes subtly in terms of emotion intensity. Be-
sides, effective ES requires more than only identify-
ing the user’s emotion. A thorough understanding
of the user’s situation is also essential.
In this paper, we propose a multi-turn ESC sys-
tem MultiESC to address the above issues. For
strategy planning, we draw inspiration from the
A∗
search algorithm (Hart et al.,1968;Pearl,1985) and
its recent application in constrained text generation
(Lu et al.,2021), which addressed the challenge
of planning ahead by incorporating heuristic esti-
mation of future cost. In MultiESC, we develop
lookahead heuristics to estimate the expectation of
the future user feedback to help select the strategy
that can lead to the best long-term effect. Con-
cretely, we implement a strategy sequence genera-
tor to produce the probability of the future strategy
sequences, and a user feedback predictor to pre-
dict the feedback after applying the sequence of
strategies. For user state modeling, MultiESC cap-
tures the user’s subtle emotion expressed in the
context by incorporating external knowledge from
the NRC VAD lexicon (Mohammad,2018). More-
over, it identifies the user’s emotion causes (i.e.,
the experiences that caused the depressed emotion)
to more thoroughly understand the user’s situation.
In summary, our contributions are as follows:
•
We propose a multi-turn ESC system, MultiESC,
which conducts support strategy planning with
foresight of the user feedback and dynamically
tracks the user’s state by capturing the subtle
emotional expressions and the emotion causes.
•
It is a pioneer work that adopts
A∗
-like looka-
head heuristics to achieve dialogue strategy se-
lection on a long planning horizon.
•
Experiments show that MultiESC significantly
outperforms a set of state-of-the-art models
in generation quality and strategy planning,
demonstrating the effectiveness of our proposed
method.
2 Related Work
Emotional Support Conversation Systems.
Since early ES datasets were mainly composed of
single-turn conversations (Medeiros and Bosse,
2018;Sharma et al.,2020), most research on
developing ESC systems only considered the
simplified scenario of single-turn interactions
with the user (Sharma et al.,2021;Hosseini and
Caragea,2021). The few works that developed
multi-turn ES chatbots rely on predefined tem-
plates and handcrafted rules (Zwaan et al.,2012),
which suffer from limited generality. It was not
until last year that Liu et al. (2021) released the
first multi-turn ESC dataset ESCONV. Following
Liu et al. (2021), Peng et al. (2022) and Tu et al.
(2022) recently explored data-driven multi-turn
ESC systems. Peng et al. (2022) proposed
a hierarchical graph network to capture both
the global context and the local user intention.
They did not consider strategy planning, which
is critical in multi-turn ESC. Tu et al. (2022)
proposed to enhance context encoding with
commonsense knowledge and use the predicted
strategy distribution to guide response generation.
Nevertheless, their method of strategy prediction,
directly implemented with a vanilla Transformer
encoder, was relatively preliminary and did not
consider any user-feedback-oriented planning as
we do.
Empathetic Response Generation.
Empathetic
Response Generation (ERG) (Rashkin et al.,2019)
is a research area closely related to ESC, as being
empathetic is a crucial ability for providing emo-
tional support (Greene,2003;Pérez-Rosas et al.,
2017). However, ERG does not has the explicit
goal of proactively soothing the user’s negative
emotion. Instead, it only reactively generates re-
sponses that are consistent with the user’s emotion