Keep Me Updated Memory Management in Long-term Conversations Sanghwan Bae12Donghyun Kwak12Soyoung Kang12Min Young Lee12 Sungdong Kim23Yuin Jeong1Hyeri Kim1Sang-Woo Lee123

2025-05-06 0 0 1.47MB 19 页 10玖币

侵权投诉

Keep Me Updated! Memory Management in Long-term Conversations

Sanghwan Bae1,2Donghyun Kwak1,2Soyoung Kang1,2Min Young Lee1,2

Sungdong Kim2,3Yuin Jeong1Hyeri Kim1Sang-Woo Lee1,2,3

Woomyoung Park1,2Nako Sung1

NAVER CLOVA1NAVER AI Lab2KAIST AI3

Abstract

Remembering important information from the

past and continuing to talk about it in the

present are crucial in long-term conversa-

tions. However, previous literature does not

deal with cases where the memorized infor-

mation is outdated, which may cause confu-

sion in later conversations. To address this is-

sue, we present a novel task and a correspond-

ing dataset of memory management in long-

term conversations, in which bots keep track

of and bring up the latest information about

users while conversing through multiple ses-

sions. In order to support more precise and in-

terpretable memory, we represent memory as

unstructured text descriptions of key informa-

tion and propose a new mechanism of memory

management that selectively eliminates inval-

idated or redundant information. Experimen-

tal results show that our approach outperforms

the baselines that leave the stored memory un-

changed in terms of engagingness and human-

ness, with larger performance gap especially in

the later sessions.

1 Introduction

In human interactions, memory is an important

mechanism that helps us hold conversations, de-

velop rapport, and maintain long-term relationships

(Alea and Bluck,2003;Nelson,2003;Brewer et al.,

2017). To this end, recent studies (Wu et al.,2020;

Xu et al.,2022a,b) on open-domain dialogues have

proposed methods to remember and utilize persona

information (Zhang et al.,2018) of the interlocu-

tors obtained from previous conversations. Speciﬁ-

cally, they summarize the persona information in

an extractive or abstractive way and give it as a

condition for generating responses in subsequent

conversations. They show that this feature leads to

better consistency and engagingness of the chatbot

systems.

Despite such progress, an aspect overlooked by

previous studies is that memorized information can

Figure 1: An example of a long-term dialogue. There

is information obtained from an early session that is no

longer true in a later session, e.g. “Got a sore throat”.

This information should be removed from the memory

of later sessions in order to correctly follow up with the

interlocuter.

be invalidated by newly gathered information. They

simply accumulate and maintain the stored infor-

mation in memory; once stored, such information

has no possibility of getting updated in the future.

Memory in real-life conversations, however, can

change over time, either in a short period of time

(e.g. health status, plans for the weekend, or re-

cently watched movie) or in relatively longer pe-

riod of time (e.g. age, job, or hobby). Such memory

needs to be kept track by asking its status again in

subsequent conversations, as exempliﬁed in Figure

1. Therefore, updating previous memory with new

relevant information and maintaining it up-to-date

are important features of human-like long-term con-

versations.

In this work, we study the methods of memoriz-

ing and updating dynamic information and utilizing

arXiv:2210.08750v1 [cs.CL] 17 Oct 2022

them in successive dialogues. We formulate a new

task of memory management in long-term conver-

sations and construct its corresponding dataset

by extending an existing Korean open-domain dia-

logue dataset (Bae et al.,2022) to multiple sessions

with changing user information. In each session of

our dataset, while the user and the bot have a con-

versation, information about the user is identiﬁed

from the dialogue. Then, in successive sessions,

the bot keeps in memory only the information valid

at that point and utilizes the resulting memory in

dialogue.

In addition, we propose a long-term dialogue

system including a novel memory management

mechanism. In this system, information about the

interlocutors revealed in the previous conversation

is abstractively summarized and stored in memory.

Speciﬁcally, the memory management mechanism

decides which information to keep in memory. For

this purpose, we deﬁne four pairwise operations

(PASS, REPLACE, APPEND, and DELETE) to

ﬁnd and eliminate the information that can cause

confusion or redundancy in later conversations.

For example, if the previous memory sentence is

“Haven’t got COVID tested yet” and the new in-

coming summary is “Just got positive results from

COVID test”, the two sentences are contradictory,

in which the former needs to be replaced in mem-

ory by the latter. Through this process, only valid

information remains in new memory. Then, in sub-

sequent sessions, a relevant information from this

memory is retrieved and given as additional condi-

tion for generating chatbot responses.

With extensive experiments and ablations, we

show that the proposed memory management mech-

anism becomes more advantageous in terms of

memorability as the sessions proceed, leading

to better engagingness and humanness in multi-

session dialogues.

Our contributions are as follows:

We make a step towards long-term conversa-

tions with dynamic memory that must be kept

up-to-date.

We propose a novel memory management

mechanism in the form of unstructured text

that achieves better results in automatic and

human evaluation over baselines.

The dataset is available at

https://github.com/

naver-ai/carecall-memory

We release the ﬁrst Korean long-term dialogue

dataset for further research on memory man-

agement in dialogues.

2 Related Work

Personalized Dialogue System

Building

human-like open-domain chatbots is one of the

seminal research topics in the ﬁeld of natural

language processing. Zhang et al. (2020) has

provided a strong backbone generator model

for dialogue systems, while Adiwardana et al.

(2020), Roller et al. (2021) and Thoppilan et al.

(2022) have paved the way for the development

of more human-like, natural-sounding chatbots.

The applications of open-domain chatbots have

also widely expanded, including role-speciﬁed

(Bae et al.,2022) and personalized (Zhang et al.,

2018) dialogue systems. In particular, personalized

dialogue system has typically been studied

either via utilizing predeﬁned, explicitly stated

user proﬁle (Zhang et al.,2018), or via directly

extracting user proﬁle from dialogue history (Xu

et al.,2022a,b). While the latter approach is

preferred in recent research works (Zhong et al.,

2022), long-term management of the obtained

information is yet to be studied.

Long-term Memory in Conversation

Because

it is inefﬁcient to use the entire dialogue history as

long-term memory, techniques for obtaining and

managing information from dialogue history have

been studied. Representing latent features as neural

memory (Weston et al.,2015;Tran et al.,2016;

Munkhdalai et al.,2019) used to be a traditional

method. Slot-value format in dialogue state track-

ing (Heck et al.,2020;Hosseini-Asl et al.,2020;

Kim et al.,2020), and graph format in Hsiao et al.

(2020) have been the two major approaches in han-

dling the memorized information in a structured

way. Kim et al. (2020) suggested update operations

on ﬁxed-sized slot-value pairs for dialogue states.

Wu et al. (2020) extracted user attributes from dia-

logues in triples. However, such approaches have

not been demonstrated in a multi-session setting.

Leveraging the advancement of pre-trained lan-

guage models (Devlin et al.,2019;Raffel et al.,

2020;Brown et al.,2020;Kim et al.,2021), re-

cent studies attempt to use the unstructured form

of text as memory, which is expected to be ad-

vantageous in terms of generalizability and inter-

pretability. Ma et al. (2021) and Xu et al. (2022b)

selectively stored dialogue history with relevant

information, while Zhong et al. (2022) employed

reﬁners to extract ﬁne-grained information from

dialogue history. Xu et al. (2022a) summarized the

dialogue history to avoid overﬂow and redundancy.

Nevertheless, these works rarely consider that the

obtained information may change and become out-

dated. Speciﬁcally, MSC (Xu et al.,2022a) does not

reﬂect the change of information. In other words,

information in MSC remains ﬁxed once it is stored.

DuLeMon (Xu et al.,2022b) is not formatted in

a multi-session manner, making it impossible to

track memory changes across multiple sessions.

3 Task and Dataset

This section describes the task of long-term con-

versations with dynamic memory changes and the

process of constructing a new dataset to conduct

research on this task.

3.1 Task Deﬁnition

An episode consists of multiple consecutive

dialogue sessions with a speciﬁc user. Dia-

logue context of the current session is

Dt=

{c1, u1, c2, u2,· · · , ct, ut}

at time step

, where

and

represent the chatbot’s and user’s utterance,

respectively. Natural language memory sentences

M={m1, m2,· · · , mn}

contain user informa-

tion abstracted from the previous sessions of the

same episode. Then, given the dialogue context

, and memory

, we are interested in predict-

ing the chatbot’s response

ct+1

. At the end of each

session, the entire session

is summarized into

several sentences of user information, denoted as

S={s1, s2,· · · , sk}

. Memory sentences

for

the next session are constructed by combining

and S.

3.2 Dataset Construction

To study this task, we build a new dataset based

on CareCall dataset

(Bae et al.,2022), which con-

sists of single sessions of open-domain dialogues

between bots and users. We choose this dataset be-

cause the sessions contain various topics that are

likely to change in a short period of time, such as

user’s health, sleep, and diet, as well as those in

a relatively longer period of time, such as family,

pets, and frequently visited places. We extend this

single-session dataset to a multi-session setting,

which is a similar procedure presented in MSC

(Xu et al.,2022a). Our resulting dataset contains

2https://github.com/naver-ai/carecall-corpus

Statistics

Sessions 7,665

Session 1 2,812

Session 2 2,798

Session 3 743

Session 4 674

Session 5 638

Turns 160,191

Avg. turns per session 20.90

Avg. words per turn 4.93

Unique words for all turns 59,434

Distinct-1/2 for all turns 0.0753/0.2891

Avg. memory sentences per session |M|3.41

Avg. summary sentences per session |S|2.88

Avg. words per summary sentence 4.70

Distinct-1/2 for all summary sentences 0.1425/0.3926

Table 1: Statistics of our CareCallmem dataset. Distinct-

1/2 (Li et al.,2016) is the number of distinct uni- or

bi-brams divided by total number of words.

more persona updates than other datasets (Xu et al.,

2022a,b) (see Section C.1 in Appendix for more

details).

3.2.1 Preliminary Step: Dialogue and

Summary

To efﬁciently collect the dataset, we train prelimi-

nary models for dialogue summaries and memory

grounded dialogues to ﬁrst automatically generate

the dataset, and then a group of annotators revise

them. This procedure has shown to be more effec-

tive in recent studies (Sun et al.,2021;Bae et al.,

2022;Liu et al.,2022;Zheng et al.,2022). In the

entire process, we leverage the large-scale language

models (LMs) for each step; HyperCLOVA 6.9B

as backbone LM.

Dialogue Summary

We randomly sample 600

dialogue sessions with more than 15 turns from the

CareCall dataset. We ask annotators to summarize

each session into several sentences to build

that

may be useful to continue the next conversation. Us-

ing these summaries, we ﬁne-tune LMs to generate

summaries given dialogues

P(S|D)

. The models

then generate summaries of unseen dialogues ran-

domly sampled from the CareCall dataset. Finally,

annotators edit the generated summaries by ﬁlling

in missing information or correcting erroneous sen-

tences. Since there is no memory sentence for the

ﬁrst session, i.e.

M=∅

, memory for the second

session M0is equal to S.

Figure 2: The overview of the proposed system. (1) Memory grounded response generation model (Section 4.1)

conditioned on memory sentences Mconverses with human user. (2) At the end of the session, the dialogue sum-

marizer (Section 4.2) summarizes user information into several sentences Sfrom the session history. (3) Memory

operator (Section 4.3) predicts the operations for every (mi, sj)pair to select information to leave, which consists

the next memory M0.

Memory Grounded Dialogue

To build a second

session of each episode, annotators write dialogue

sessions grounded on the 600 human-written sum-

maries from the previous step. Likewise, we ﬁne-

tune LMs to generate the entire dialogue sessions

given previous memory

P(D|M)

. Then, the ﬁne-

tuned models generate memory grounded dialogues

from the unseen dialogue summaries in the pre-

vious paragraph. Lastly, human annotators revise

the generated dialogues, i.e. correcting wrong re-

sponses (misuse of memory, not sensible, or out-

of-bounds from CareCall’s role described in Bae

et al. (2022)).

3.2.2 Interactive Step: Multi-Session

Dialogue

From the preliminary step, we obtain the data to

build a chatbot that can conduct interactive con-

versation utilizing the memorized information. To

construct a multi-session dialogue system, we train

the dialogue summarizer and memory grounded

response generator described in Section 4on pre-

viously collected

(D, S)

pairs with

(M, D)

pairs

respectively.

Then, crowdworkers converse with the resulting

system for 5 sessions per episode, starting from

the ﬁrst session. The interval between sessions is

assumed to be from 1 to 2 weeks. At the end of

each session, the summarizer generates

from the

current session. Both generated responses and sum-

maries are edited by annotators to correct errors.

Lastly, we ask annotators to select which sentences

and

should remain in new memory

for the next session. We provide details of quality

control in Appendix Aand an example episode

in Figure 4in Appendix. We name this dataset as

CareCall

mem

and the statistics of the dataset are

given in Table 1, which includes all the collected

data described in Section 3.2.1-3.2.2.

4 Models

We propose a long-term dialogue system with mem-

ory management mechanism. The system consists

of three parts: memory grounded response genera-

tion, dialogue summarization, and memory update.

The overall architecture is shown in Figure 2.

4.1 Memory Grounded Response Generation

Response Generation

We consider the response

generation model conditioned on memory sen-

tences. Given the memory

and the dialogue

history

Dt={c1, u1, c2, u2,· · · , ct, ut}

at time

step

, the conditional probability of the next tar-

get response

ct+1 ={w1, w2,· · · , w|ct+1|}

can be

written as the product of a sequence of conditional

probabilities:

p(ct+1|Dt, M) = Y

pθ(wi|Dt, M, w<i),(1)

where

-th token of the sequence and

trainable parameters of the model. We use Hyper-

CLOVA 6.9B as the response generation model.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

KeepMeUpdated!MemoryManagementinLong-termConversationsSanghwanBae1;2DonghyunKwak1;2SoyoungKang1;2MinYoungLee1;2SungdongKim2;3YuinJeong1HyeriKim1Sang-WooLee1;2;3WoomyoungPark1;2NakoSung1NAVERCLOVA1NAVERAILab2KAISTAI3AbstractRememberingimportantinformationfromthepastandcontinuingtotalkaboutitinthepres...

展开>> 收起<<

Keep Me Updated Memory Management in Long-term Conversations Sanghwan Bae12Donghyun Kwak12Soyoung Kang12Min Young Lee12 Sungdong Kim23Yuin Jeong1Hyeri Kim1Sang-Woo Lee123.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Keep Me Updated Memory Management in Long-term Conversations Sanghwan Bae12Donghyun Kwak12Soyoung Kang12Min Young Lee12 Sungdong Kim23Yuin Jeong1Hyeri Kim1Sang-Woo Lee123

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: