
SUMBot: Summarizing Context in Open-Domain Dialogue Systems
Rui Ribeiro, Lu´
ısa Coheur
INESC-ID, Lisboa, Portugal
Instituto Superior T´
ecnico, Universidade de Lisboa, Portugal
rui.m.ribeiro@tecnico.ulisboa.pt, luisa.coheur@tecnico.ulisboa.pt
Abstract
In this paper, we investigate the problem of including relevant
information as context in open-domain dialogue systems. Most
models struggle to identify and incorporate important knowl-
edge from dialogues and simply use the entire turns as context,
which increases the size of the input fed to the model with un-
necessary information. Additionally, due to the input size limi-
tation of a few hundred tokens of large pre-trained models, re-
gions of the history are not included and informative parts from
the dialogue may be omitted. In order to surpass this problem,
we introduce a simple method that substitutes part of the context
with a summary instead of the whole history, which increases
the ability of models to keep track of all the previous relevant
information. We show that the inclusion of a summary may im-
prove the answer generation task and discuss some examples to
further understand the system’s weaknesses.
Index Terms: dialogue systems, summarization, dealing with
context, open-domain
1. Introduction
Chit-chat systems have become more and more prominent with
the emergence of large pre-trained models and the increased ac-
cess to public libraries [1, 2, 3] that allow to easily train and
deploy these models. Specifically, new advances have shown
promising progress in the dialogue generation task, as these
systems became more competent at providing human-like an-
swers. However, these deep-learning systems tend to generate
generic responses which are repetitive or incoherent with the
context, particularly when conversations attain many interac-
tions and contain long turns.
Recent approaches have studied the ability of deep gener-
ative models to capture relevant information from the dialogue
context [4, 5]. They have found that these models do not effi-
ciently make use of all parts from the dialogue history and tend
to ignore relevant turn information. Other approaches [6, 7, 8, 9]
have attempted to represent the context and leverage the result-
ing representations to various dialogue tasks. However, none of
these approaches has studied the substitution of the context with
a summary.
In this paper, we investigate the importance of encapsulat-
ing complete dialogue utterances into a summary and reducing
the context size in the open-domain dialogue task. We attempt
to answer the following question: can a summary of the pre-
vious context include all the important information and also
decrease the input size fed to a model? To answer this ques-
tion, we propose a simple yet effective method that incorpo-
rates summaries of the previous turns that are not included as
input. More specifically, apart from the user request, we only
include a few complete speaker turns, and the remaining turns
are compiled into a summary that describes succinctly the omit-
ted utterances. We train different versions of the model where
Figure 1: Example of a dialogue between two speakers and the
respective summary on the SAMSum dataset.
we change the number of complete utterances provided, which
may vary between 0 and 10. This procedure allows us to ana-
lyze if the inclusion of summaries is an effective strategy and if
the summaries become a valuable choice as substitutes for the
complete turns.
The training is divided into two independent stages: first,
we fine-tune BART [10] in the SAMSum corpus [11] and use
it to generate summaries for the dialogue context. Figure 1
shows an example of a dialogue from this dataset. Then, we
fine-tune DialoGPT decoder [12] with the summaries from the
previous stage by incorporating them with the dialogue between
both speakers.
We evaluate our model on the open-domain Persona-Chat
dataset [13] and observe that the inclusion of the summaries
may improve the overall results. We also analyze if the sum-
maries are proper substitutes for the dialogue history and dis-
cuss possible flaws that can decrease the performance of the
generation model.
2. Related Work
Since the introduction of encoder-decoder models [14, 15], chit-
chat dialogue systems have been in constant evolution and are
more capable of generating fluent and human-like sentences. In
these systems, the encoder extracts important features from the
utterances and passes that information to a decoder that gener-
ates a response.
Considering that our approach attempts to provide a proper
substitute for the dialogue history, the related work that be-
comes more relevant focuses on studying and representing the
context in the dialogue task. [4] study the aptitude of encoder-
decoder models based on RNNs and Transformers to interpret
and understand the dialogue context. The authors introduce
arXiv:2210.06496v1 [cs.CL] 12 Oct 2022