Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning Zhuoxuan Jiangy Lingfeng Qiaoy Di Yiny Shanshan Fengz Bo Renx

2025-04-29 0 0 859.29KB 11 页 10玖币
侵权投诉
Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning
Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng, Bo Ren§
Tencent Youtu Lab, Shanghai, China
Harbin Institute of Technology, Shenzhen, China
§Tencent Youtu Lab, Hefei, China
jzhx@pku.edu.cn,{leafqiao,endymecyyin,timren}@tencent.com,victor_fengss@foxmail.com
Abstract
Recent language generative models are mostly
trained on large-scale datasets, while in some
real scenarios, the training datasets are often
expensive to obtain and would be small-scale.
In this paper we investigate the challenging
task of less-data constrained generation, espe-
cially when the generated news headlines are
short yet expected by readers to keep read-
able and informative simultaneously. We high-
light the key information modeling task and
propose a novel duality fine-tuning method
by formally defining the probabilistic duality
constraints between key information predic-
tion and headline generation tasks. The pro-
posed method can capture more information
from limited data, build connections between
separate tasks, and is suitable for less-data con-
strained generation tasks. Furthermore, the
method can leverage various pre-trained gener-
ative regimes, e.g., autoregressive and encoder-
decoder models. We conduct extensive experi-
ments to demonstrate that our method is effec-
tive and efficient to achieve improved perfor-
mance in terms of language modeling metric
and informativeness correctness metric on two
public datasets.
1 Introduction
In an age of information explosion, headline gen-
eration becomes one fundamental application in
the natural language process (NLP) field (Tan et al.,
2017;Li et al.,2021). Currently, the headline gener-
ation is usually regarded as a special case of general
text summarization. Therefore, many cutting-edge
techniques based on pre-trained models and fine-
tuning methods can be directly adapted by feeding
headline generation datasets (Zhang et al.,2020b;
Gu et al.,2020). Actually, compared with those
textual summaries, headline generation aims at gen-
erating only one sentence or a piece of short texts
given a long document (e.g., a news article). It
is challenging to guarantee the generated headline
readable and informative at the same time, which
is important to attract or inform readers especially
for news domain (Matsumaru et al.,2020).
Recently, some works find that neglecting the
key information would degrade the performance
of generative models which only consider captur-
ing natural language (Nan et al.,2021b). Then
many works about modeling different kinds of key
information have been studied to enhance the infor-
mation correctness of generative summaries. For
example, overlapping salient words between source
document and target summary (Li et al.,2020), key-
words (Li et al.,2018), key phrases (Mao et al.,
2020) and named entities (Nan et al.,2021a) are
involved to design generative models. However,
those works are mostly either trained on large-scale
datasets or targets for long summaries (Ao et al.,
2021). In some real applications, it is expensive
to obtain massive labeled data. Thus it becomes a
much more challenging task that how to generate
short headlines which should be both readable and
informative under less-data constrained situations.
To model the key information, existing works
often follow the assumption that a generated sum-
mary essentially consists of two-fold elements: the
natural language part and the key information part.
The former focuses on language fluency and read-
ability, while the later is for information correct-
ness. For this reason, an additional task of key
information prediction is leveraged and the multi-
task learning method is employed (Li et al.,2020;
Nan et al.,2021a). Figure 1can illustrate the intu-
itive idea more clearly, and the bold parts can be
treated as the key information (overlapping salient
tokens), which should be modeled well to inform
correct and sufficient information for readers.
To achieve the above motivation, technically, ap-
plying existing fine-tuning and multi-task learn-
ing methods to headline generation can be a nat-
ural choice. However they have some drawbacks.
Firstly, single-task normal fine-tuning methods can-
arXiv:2210.04473v1 [cs.CL] 10 Oct 2022
The European commission announced
on Friday that it was providing 11
million euros (about 11.1 million U.S.
dollars) for the united nations high
commissioner for refugees (UNHCR)
to support programs in the fields of
protection , registration and staff
security in refugee -hosting countries ,
especially in Africa.
EU donates 11 million
dollars to UNHCR
Source document
Headline generation task
11
million
UNHCR
Key information prediction task
Figure 1: An example of multi-task decomposition for
headline generation. The bold parts are salient tokens.
not explicitly model the key information well and
hence reduce the informative correctness of gen-
erated headlines. Secondly, multi-task fine-tuning
methods should improve the model ability by shar-
ing the encoder and tailing two classifiers for key
information prediction task and headline generation
task, respectively. In fact, due to the limited dataset
scale, the shared encoder could not be trained well
to significantly distinguish the tasks or enhance
each other mutually. As a result, vanilla multi-
task methods could achieve little benefit for gen-
eration tasks (Nan et al.,2021a;Magooda et al.,
2021). Our empirical experiments later can also
show this point. Therefore, existing single-task
or multi-task fine-tuning methods cannot perform
well under less-data constrained situations.
In this paper, we set out to address the above
mentioned issues from the following two aspects.
On the one hand, to explicitly model the key in-
formation, we still adopt the multi-task paradigm,
while the two tasks utilize their own models. Then
we argue that the two tasks have probabilistic con-
nections and present them in dual forms. In this
way, the key information is explicitly highlighted,
and setting two separate models to obey duality
constraints cannot only make the model more capa-
ble to distinguish tasks but also capture the relation
between tasks. On the other hand, to capture more
data knowledge from limited dataset, besides the
source document, headlines and key tokens are ad-
ditionally used as input data for the key information
prediction task and headline generation task respec-
tively. We call this method as
duality fine-tuning
which obeys the definition of dual learning (He
et al.,2016;Xia et al.,2018). Moreover, we de-
velop the duality fine-tuning method to be compati-
ble with both autoregressive and encoder-decoder
models (LM).
To evaluate our method, we collect two datasets
with the key information of overlapping salient to-
kens
1
in two languages (English and Chinese), and
leverage various representative pre-trained models
1We expect our method to be orthogonal to specific key
information definition.
(BERT (Devlin et al.,2019), UniLM (Dong et al.,
2019) and BART (Lewis et al.,2020)). The ex-
tensive experiments significantly demonstrate the
effectiveness of our proposed method to produce
more readable (on Rouge metric) and more in-
formative (on key information correctness metric)
headlines than counterpart methods, which indi-
cates that our method is consistently useful with
various pre-trained models and generative regimes.
In summary, the main contributions include:
We study a new task that how to improve per-
formance of headline generation under less-
data constrained situations. We highlight to
model the key information and propose a
novel duality fine-tuning method. To our best
knowledge, this is the first work to integrate
dual learning with fine-tuning paradigm for
the task of headline generation.
The duality fine-tuning method which should
model multiple tasks to obey the probabilistic
duality constraints is a new choice suitable for
less-data constrained multi-task generation,
in terms of capturing more data knowledge,
learning more powerful models to simultane-
ously distinguish and build connections be-
tween multiple tasks, and being compatible
with both autoregressive and encoder-decoder
generative pre-trained models.
We collect two small-scale public datasets in
two languages. Extensive experiments prove
the effectiveness of our method to improve
performance of readability and informative-
ness on Rouge metric and key information
accuracy metric.
2 Related Work
Usually, headline generation is regarded as a spe-
cial task of general abstractive text summarization,
and the majority of existing studies could be easily
adapted to headline generation by feeding headline
related datasets (Matsumaru et al.,2020;Yamada
et al.,2021). For example, sequence-to-sequence
based models are investigated for text summariza-
tion, which emphasizes on generating fluent and
natural summaries (Sutskever et al.,2014;Nallapati
et al.,2016;Gehring et al.,2017;See et al.,2017).
In recent years, the large-scale transformer-based
models (Devlin et al.,2019;Dong et al.,2019;
Lewis et al.,2020) and the two-stage (pre-training
and fine-tuning) learning paradigm (Zhang et al.,
2019;Gehrmann et al.,2019;Rothe et al.,2020)
have greatly promoted the performance of most
NLP tasks. And headline generation can also bene-
fit from those works.
Since the length of headlines is often short and
almost ‘every word is precious’, compared to gen-
eral text summarization, modeling the key informa-
tion is better worth of paying attention (Li et al.,
2020;Mao et al.,2020;Zhu et al.,2021b;Nan
et al.,2021a;Zhu et al.,2021a). However, to our
knowledge, little work focuses on this problem for
headline generation, especially under the less-data
constrained situations, and mostly they focus on
low-resource long text summarization (Parida and
Motlicek,2019;Bajaj et al.,2021;Yu et al.,2021).
Recent years witness the rapid development of
transformers-based pre-trained models (Wolf et al.,
2020) and two kinds of regimes of natural language
generation (NLG) are prevalent (Li and Liang,
2021). One is based on autoregressive language
models which have a shared transformer encoder
structure for encoding and decoding (Devlin et al.,
2019;Dong et al.,2019;Zhuang et al.,2021),
while the other is based on the standard trans-
former framework which has two separate encoder-
decoder structures (Lewis et al.,2020;Zhang et al.,
2020a). Fine-tuning and multi-task learning on
them to reuse the ability of pre-trained models are
widely studied for various tasks (Liu and Lapata,
2019;Rothe et al.,2020;Gururangan et al.,2020).
Our work can also align with this research line and
we propose a new multi-task fine-tuning method.
We leverage the core idea of dual learning, which
can fully mine information from limited data and
well model multiple tasks by designing duality con-
straints (He et al.,2016;Xia et al.,2018). This
learning paradigm has been successfully applied
to many fields, such as image-to-image transla-
tion (Yi et al.,2017), recommendation system (Sun
et al.,2020), supervise and unsupervised NLU and
NLG (Su et al.,2019,2020). Those works have
demonstrated that duality modeling is suitable for
small-scale training situations.
3 Problem Definition
In this section, we formally present our problem.
The training set is denoted as
X= (D,H,K)
,
where
D
and
H
are the sets of source documents
and target headlines.
K
is the set of key informa-
tion, which indicates the overlapping salient tokens
(stopwords excluded) in each pair of document
and headline. A training sample is denoted as
a tuple
(d, h, k)
.
d={x(d)
1, x(d)
2, ..., x(d)
n}
,
h=
{x(h)
1, x(h)
2, ..., x(h)
m}
,
k={x(k)
1, x(k)
2, ..., x(k)
l}
,
where
x()
i
is a token of document, headline or
key information, and
n
,
m
,
l
are the lengths of
respective token sequences.
3.1 Definition of Dual Tasks
Given the input data
x= (d, h, k)
, we define our
problem in a dual form, which contains two tasks.
Formally, the key information prediction task aims
at finding a function
f: (d, h)k
, which maxi-
mizes the conditional probability
p(k|d, h;θ)
of the
real key information
k
. Correspondingly, the head-
line generation task targets at learning a function
g: (d, k)h
, which maximizes the conditional
probability
p(h|d, k;ϕ)
of real headline
h
. The two
tasks can be defined as follows:
f(d, h;θ),arg max Y
x∈X
p(k|d, h;θ),
g(d, k;ϕ),arg max Y
x∈X
p(h|d, k;ϕ).
3.2 Probabilistic Duality Constraints
Based on the principle of dual learning
paradigm (He et al.,2016), we treat the key
information prediction task as primary task and the
headline generation task as secondary task. Ideally,
if the primary model and secondary model are both
trained optimally, the probabilistic duality between
the two tasks should satisfy the following equation:
p(X) = Y
x∈X
P(d, k, h) = Y
x∈X
p(d)p(h|d; ˆϕ)p(k|d, h;θ)
=Y
x∈X
p(d)p(k|d;ˆ
θ)p(h|d, k;ϕ).
p(k|d, h;θ)
and
p(h|d, k;ϕ)
are the target mod-
els to learn, while
p(k|d;ˆ
θ)
and
p(h|d; ˆϕ)
denote
the marginal distribution models. By integrating
the above probabilistic duality equation and further
dividing the common term
p(d)
, our problem can
be formally defined to optimize the objectives:
Objective 1 : min
θ
1
|X | X
x∈X
l1(f(d, h;θ), k),
Objective 2 : min
ϕ
1
|X | X
x∈X
l2(g(d, k;ϕ), h),
s.t. Y
x∈X
p(h|d; ˆϕ)p(k|d, h;θ) = Y
x∈X
p(k|d;ˆ
θ)p(h|d, k;ϕ),
(1)
where
l1
is the loss function for key information
prediction and l2is that for headline generation.
摘要:

LeveragingKeyInformationModelingtoImproveLess-DataConstrainedNewsHeadlineGenerationviaDualityFine-TuningZhuoxuanJiangy,LingfengQiaoy,DiYiny,ShanshanFengz,BoRenxyTencentYoutuLab,Shanghai,ChinazHarbinInstituteofTechnology,Shenzhen,ChinaxTencentYoutuLab,Hefei,Chinajzhx@pku.edu.cn,{leafqiao,endymecyyin,...

展开>> 收起<<
Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning Zhuoxuan Jiangy Lingfeng Qiaoy Di Yiny Shanshan Fengz Bo Renx.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:859.29KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注