
The European commission announced
on Friday that it was providing 11
million euros (about 11.1 million U.S.
dollars) for the united nations high
commissioner for refugees (UNHCR)
to support programs in the fields of
protection , registration and staff
security in refugee -hosting countries ,
especially in Africa.
EU donates 11 million
dollars to UNHCR
Source document
Headline generation task
Key information prediction task
Figure 1: An example of multi-task decomposition for
headline generation. The bold parts are salient tokens.
not explicitly model the key information well and
hence reduce the informative correctness of gen-
erated headlines. Secondly, multi-task fine-tuning
methods should improve the model ability by shar-
ing the encoder and tailing two classifiers for key
information prediction task and headline generation
task, respectively. In fact, due to the limited dataset
scale, the shared encoder could not be trained well
to significantly distinguish the tasks or enhance
each other mutually. As a result, vanilla multi-
task methods could achieve little benefit for gen-
eration tasks (Nan et al.,2021a;Magooda et al.,
2021). Our empirical experiments later can also
show this point. Therefore, existing single-task
or multi-task fine-tuning methods cannot perform
well under less-data constrained situations.
In this paper, we set out to address the above
mentioned issues from the following two aspects.
On the one hand, to explicitly model the key in-
formation, we still adopt the multi-task paradigm,
while the two tasks utilize their own models. Then
we argue that the two tasks have probabilistic con-
nections and present them in dual forms. In this
way, the key information is explicitly highlighted,
and setting two separate models to obey duality
constraints cannot only make the model more capa-
ble to distinguish tasks but also capture the relation
between tasks. On the other hand, to capture more
data knowledge from limited dataset, besides the
source document, headlines and key tokens are ad-
ditionally used as input data for the key information
prediction task and headline generation task respec-
tively. We call this method as
duality fine-tuning
which obeys the definition of dual learning (He
et al.,2016;Xia et al.,2018). Moreover, we de-
velop the duality fine-tuning method to be compati-
ble with both autoregressive and encoder-decoder
models (LM).
To evaluate our method, we collect two datasets
with the key information of overlapping salient to-
kens
1
in two languages (English and Chinese), and
leverage various representative pre-trained models
1We expect our method to be orthogonal to specific key
information definition.
(BERT (Devlin et al.,2019), UniLM (Dong et al.,
2019) and BART (Lewis et al.,2020)). The ex-
tensive experiments significantly demonstrate the
effectiveness of our proposed method to produce
more readable (on Rouge metric) and more in-
formative (on key information correctness metric)
headlines than counterpart methods, which indi-
cates that our method is consistently useful with
various pre-trained models and generative regimes.
In summary, the main contributions include:
•
We study a new task that how to improve per-
formance of headline generation under less-
data constrained situations. We highlight to
model the key information and propose a
novel duality fine-tuning method. To our best
knowledge, this is the first work to integrate
dual learning with fine-tuning paradigm for
the task of headline generation.
•
The duality fine-tuning method which should
model multiple tasks to obey the probabilistic
duality constraints is a new choice suitable for
less-data constrained multi-task generation,
in terms of capturing more data knowledge,
learning more powerful models to simultane-
ously distinguish and build connections be-
tween multiple tasks, and being compatible
with both autoregressive and encoder-decoder
generative pre-trained models.
•
We collect two small-scale public datasets in
two languages. Extensive experiments prove
the effectiveness of our method to improve
performance of readability and informative-
ness on Rouge metric and key information
accuracy metric.
2 Related Work
Usually, headline generation is regarded as a spe-
cial task of general abstractive text summarization,
and the majority of existing studies could be easily
adapted to headline generation by feeding headline
related datasets (Matsumaru et al.,2020;Yamada
et al.,2021). For example, sequence-to-sequence
based models are investigated for text summariza-
tion, which emphasizes on generating fluent and
natural summaries (Sutskever et al.,2014;Nallapati
et al.,2016;Gehring et al.,2017;See et al.,2017).
In recent years, the large-scale transformer-based
models (Devlin et al.,2019;Dong et al.,2019;
Lewis et al.,2020) and the two-stage (pre-training