
Figure 1: An example of an input, consisting of a source document and highlights (left), and the generated passage
covering the highlighted content while preserving coherence (right). Such highlights in realistic use cases may be
produced either by a human user or by a salience detection model.
focusing on personal needs, possibly interactively
(Hirsch et al.,2021;Shapira et al.,2021). Then, an
available controlled text reduction module would
transform the pre-selected fragments into a concise
summary. Also, separating the content selection
and generation stages can lead to developing data-
efficient systems, one to model salient content and
another to generate the text. It could also lead to
a more efficient characterization and research of
each step separately without the need for probing,
which is the prevailing approach in end-to-end mod-
els (Conneau et al.,2018;Tenney et al.,2019a,b;
Slobodkin et al.,2021;Pandit and Hou,2021).
To promote research on the advocated text re-
duction task, we first develop a suitable controlled
crowdsourcing methodology, following Roit et al.
(2020), and apply it to produce high-quality dev
and test datasets (§4). Next, we automatically gen-
erate a larger training dataset, by aligning propo-
sitional units of information (Ernst et al.,2021b),
extracted with OpenIE (Stanovsky et al.,2018), be-
tween source documents and their summaries (§5).
We use this data to train an abstractive supervised
model, and evaluate its performance against our
testset while comparing it to an extractive reference
baseline, which simply concatenates the highlights.
We also perform analyses where we manipulate the
highlights and show that the addition of highlights
to a supervised model is helpful in steering the
model toward the pre-selected content, in addition
to improving overall faithfulness and fluency (§8).
Hence, the contribution of this paper is manifold:
1.
Proposing the "Controlled Text Reduction"
task as a standalone module in automated or
semi-automated use cases.
2.
Defining an intuitive and easy-to-reproduce
crowd-sourcing method for the task.
3.
Constructing the first data suite for the task,
including crowd-sourced dev and test sets and
an automatically-generated train set.
4.
Developing a supervised baseline model for
future work.
2 Background
In this section, we briefly review related work and
discuss the limitations of their framing.
As mentioned above, much of the related previ-
ous work focused primarily on end-to-end summa-
rization (Carbonell and Goldstein,1998;Haghighi
and Vanderwende,2009;Nallapati et al.,2016b,a;
Paulus et al.,2017;Gehrmann et al.,2018b), with
the vast majority of related datasets aimed at end-
to-end summarization (Fabbri et al.,2019;Kim
et al.,2019;Ghalandari et al.,2020), with only a
source document as input. On the other hand, re-
search on leveraging control through the injection
of pre-chosen (rather than learned) signals in the
seq-to-seq scenario focused mostly on semantic
and syntactic signals, and also almost exclusively
targeted Machine Translation models (Bugliarello
and Okazaki,2020;Akoury et al.,2019;Sundarara-
man et al.,2019;Choshen and Abend,2021;Slo-
bodkin et al.,2022).
Attempts to leverage some control over the gen-
eration step in summarization received attention
in recent years in the form of query-focused sum-
marization (Baumel et al.,2018;Xu and Lapata,
2020,2021;Wei and Zhizhuo,2017) and keywords-
focused summarization (Keskar et al.,2019;He
et al.,2020), with a few recently published corre-
sponding datasets (Pasunuru et al.,2021;Kulkarni
et al.,2020;Baumel et al.,2016). A similar trend
tried to leverage control through the addition of a
planning step (Zhao et al.,2020;Narayan et al.,
2021). Although these lines of research allowed for
some control over salience, this control was limited
and mostly focused on biasing the summary’s topic,
style, or structure.
The prevailing way to treat summarization in
earlier works was to separate the salience detec-
tion phase from the text generation phase (Barzilay
and McKeown,2005;Oya et al.,2014;Banerjee
et al.,2016;Vilca and Cabezudo,2017), yet the
evaluation was performed on the whole pipeline.