Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning Zhuoxuan Jiangy Lingfeng Qiaoy Di Yiny Shanshan Fengz Bo Renx

2025-04-29 0 0 859.29KB 11 页 10玖币

侵权投诉

Leveraging Key Information Modeling to Improve Less-Data Constrained

News Headline Generation via Duality Fine-Tuning

Zhuoxuan Jiang†, Lingfeng Qiao†, Di Yin†, Shanshan Feng‡, Bo Ren§

†Tencent Youtu Lab, Shanghai, China

‡Harbin Institute of Technology, Shenzhen, China

§Tencent Youtu Lab, Hefei, China

jzhx@pku.edu.cn,{leafqiao,endymecyyin,timren}@tencent.com,victor_fengss@foxmail.com

Abstract

Recent language generative models are mostly

trained on large-scale datasets, while in some

real scenarios, the training datasets are often

expensive to obtain and would be small-scale.

In this paper we investigate the challenging

task of less-data constrained generation, espe-

cially when the generated news headlines are

short yet expected by readers to keep read-

able and informative simultaneously. We high-

light the key information modeling task and

propose a novel duality ﬁne-tuning method

by formally deﬁning the probabilistic duality

constraints between key information predic-

tion and headline generation tasks. The pro-

posed method can capture more information

from limited data, build connections between

separate tasks, and is suitable for less-data con-

strained generation tasks. Furthermore, the

method can leverage various pre-trained gener-

ative regimes, e.g., autoregressive and encoder-

decoder models. We conduct extensive experi-

ments to demonstrate that our method is effec-

tive and efﬁcient to achieve improved perfor-

mance in terms of language modeling metric

and informativeness correctness metric on two

public datasets.

1 Introduction

In an age of information explosion, headline gen-

eration becomes one fundamental application in

the natural language process (NLP) ﬁeld (Tan et al.,

2017;Li et al.,2021). Currently, the headline gener-

ation is usually regarded as a special case of general

text summarization. Therefore, many cutting-edge

techniques based on pre-trained models and ﬁne-

tuning methods can be directly adapted by feeding

headline generation datasets (Zhang et al.,2020b;

Gu et al.,2020). Actually, compared with those

textual summaries, headline generation aims at gen-

erating only one sentence or a piece of short texts

given a long document (e.g., a news article). It

is challenging to guarantee the generated headline

readable and informative at the same time, which

is important to attract or inform readers especially

for news domain (Matsumaru et al.,2020).

Recently, some works ﬁnd that neglecting the

key information would degrade the performance

of generative models which only consider captur-

ing natural language (Nan et al.,2021b). Then

many works about modeling different kinds of key

information have been studied to enhance the infor-

mation correctness of generative summaries. For

example, overlapping salient words between source

document and target summary (Li et al.,2020), key-

words (Li et al.,2018), key phrases (Mao et al.,

2020) and named entities (Nan et al.,2021a) are

involved to design generative models. However,

those works are mostly either trained on large-scale

datasets or targets for long summaries (Ao et al.,

2021). In some real applications, it is expensive

to obtain massive labeled data. Thus it becomes a

much more challenging task that how to generate

short headlines which should be both readable and

informative under less-data constrained situations.

To model the key information, existing works

often follow the assumption that a generated sum-

mary essentially consists of two-fold elements: the

natural language part and the key information part.

The former focuses on language ﬂuency and read-

ability, while the later is for information correct-

ness. For this reason, an additional task of key

information prediction is leveraged and the multi-

task learning method is employed (Li et al.,2020;

Nan et al.,2021a). Figure 1can illustrate the intu-

itive idea more clearly, and the bold parts can be

treated as the key information (overlapping salient

tokens), which should be modeled well to inform

correct and sufﬁcient information for readers.

To achieve the above motivation, technically, ap-

plying existing ﬁne-tuning and multi-task learn-

ing methods to headline generation can be a nat-

ural choice. However they have some drawbacks.

Firstly, single-task normal ﬁne-tuning methods can-

arXiv:2210.04473v1 [cs.CL] 10 Oct 2022

The European commission announced

on Friday that it was providing 11

million euros (about 11.1 million U.S.

dollars) for the united nations high

commissioner for refugees (UNHCR)

to support programs in the fields of

protection , registration and staff

security in refugee -hosting countries ,

especially in Africa.

EU donates 11 million

dollars to UNHCR

Source document

Headline generation task

million

UNHCR

Key information prediction task

Figure 1: An example of multi-task decomposition for

headline generation. The bold parts are salient tokens.

not explicitly model the key information well and

hence reduce the informative correctness of gen-

erated headlines. Secondly, multi-task ﬁne-tuning

methods should improve the model ability by shar-

ing the encoder and tailing two classiﬁers for key

information prediction task and headline generation

task, respectively. In fact, due to the limited dataset

scale, the shared encoder could not be trained well

to signiﬁcantly distinguish the tasks or enhance

each other mutually. As a result, vanilla multi-

task methods could achieve little beneﬁt for gen-

eration tasks (Nan et al.,2021a;Magooda et al.,

2021). Our empirical experiments later can also

show this point. Therefore, existing single-task

or multi-task ﬁne-tuning methods cannot perform

well under less-data constrained situations.

In this paper, we set out to address the above

mentioned issues from the following two aspects.

On the one hand, to explicitly model the key in-

formation, we still adopt the multi-task paradigm,

while the two tasks utilize their own models. Then

we argue that the two tasks have probabilistic con-

nections and present them in dual forms. In this

way, the key information is explicitly highlighted,

and setting two separate models to obey duality

constraints cannot only make the model more capa-

ble to distinguish tasks but also capture the relation

between tasks. On the other hand, to capture more

data knowledge from limited dataset, besides the

source document, headlines and key tokens are ad-

ditionally used as input data for the key information

prediction task and headline generation task respec-

tively. We call this method as

duality ﬁne-tuning

which obeys the deﬁnition of dual learning (He

et al.,2016;Xia et al.,2018). Moreover, we de-

velop the duality ﬁne-tuning method to be compati-

ble with both autoregressive and encoder-decoder

models (LM).

To evaluate our method, we collect two datasets

with the key information of overlapping salient to-

kens

in two languages (English and Chinese), and

leverage various representative pre-trained models

1We expect our method to be orthogonal to speciﬁc key

information deﬁnition.

(BERT (Devlin et al.,2019), UniLM (Dong et al.,

2019) and BART (Lewis et al.,2020)). The ex-

tensive experiments signiﬁcantly demonstrate the

effectiveness of our proposed method to produce

more readable (on Rouge metric) and more in-

formative (on key information correctness metric)

headlines than counterpart methods, which indi-

cates that our method is consistently useful with

various pre-trained models and generative regimes.

In summary, the main contributions include:

•

We study a new task that how to improve per-

formance of headline generation under less-

data constrained situations. We highlight to

model the key information and propose a

novel duality ﬁne-tuning method. To our best

knowledge, this is the ﬁrst work to integrate

dual learning with ﬁne-tuning paradigm for

the task of headline generation.

•

The duality ﬁne-tuning method which should

model multiple tasks to obey the probabilistic

duality constraints is a new choice suitable for

less-data constrained multi-task generation,

in terms of capturing more data knowledge,

learning more powerful models to simultane-

ously distinguish and build connections be-

tween multiple tasks, and being compatible

with both autoregressive and encoder-decoder

generative pre-trained models.

•

We collect two small-scale public datasets in

two languages. Extensive experiments prove

the effectiveness of our method to improve

performance of readability and informative-

ness on Rouge metric and key information

accuracy metric.

2 Related Work

Usually, headline generation is regarded as a spe-

cial task of general abstractive text summarization,

and the majority of existing studies could be easily

adapted to headline generation by feeding headline

related datasets (Matsumaru et al.,2020;Yamada

et al.,2021). For example, sequence-to-sequence

based models are investigated for text summariza-

tion, which emphasizes on generating ﬂuent and

natural summaries (Sutskever et al.,2014;Nallapati

et al.,2016;Gehring et al.,2017;See et al.,2017).

In recent years, the large-scale transformer-based

models (Devlin et al.,2019;Dong et al.,2019;

Lewis et al.,2020) and the two-stage (pre-training

and ﬁne-tuning) learning paradigm (Zhang et al.,

2019;Gehrmann et al.,2019;Rothe et al.,2020)

have greatly promoted the performance of most

NLP tasks. And headline generation can also bene-

ﬁt from those works.

Since the length of headlines is often short and

almost ‘every word is precious’, compared to gen-

eral text summarization, modeling the key informa-

tion is better worth of paying attention (Li et al.,

2020;Mao et al.,2020;Zhu et al.,2021b;Nan

et al.,2021a;Zhu et al.,2021a). However, to our

knowledge, little work focuses on this problem for

headline generation, especially under the less-data

constrained situations, and mostly they focus on

low-resource long text summarization (Parida and

Motlicek,2019;Bajaj et al.,2021;Yu et al.,2021).

Recent years witness the rapid development of

transformers-based pre-trained models (Wolf et al.,

2020) and two kinds of regimes of natural language

generation (NLG) are prevalent (Li and Liang,

2021). One is based on autoregressive language

models which have a shared transformer encoder

structure for encoding and decoding (Devlin et al.,

2019;Dong et al.,2019;Zhuang et al.,2021),

while the other is based on the standard trans-

former framework which has two separate encoder-

decoder structures (Lewis et al.,2020;Zhang et al.,

2020a). Fine-tuning and multi-task learning on

them to reuse the ability of pre-trained models are

widely studied for various tasks (Liu and Lapata,

2019;Rothe et al.,2020;Gururangan et al.,2020).

Our work can also align with this research line and

we propose a new multi-task ﬁne-tuning method.

We leverage the core idea of dual learning, which

can fully mine information from limited data and

well model multiple tasks by designing duality con-

straints (He et al.,2016;Xia et al.,2018). This

learning paradigm has been successfully applied

to many ﬁelds, such as image-to-image transla-

tion (Yi et al.,2017), recommendation system (Sun

et al.,2020), supervise and unsupervised NLU and

NLG (Su et al.,2019,2020). Those works have

demonstrated that duality modeling is suitable for

small-scale training situations.

3 Problem Deﬁnition

In this section, we formally present our problem.

The training set is denoted as

X= (D,H,K)

where

and

are the sets of source documents

and target headlines.

is the set of key informa-

tion, which indicates the overlapping salient tokens

(stopwords excluded) in each pair of document

and headline. A training sample is denoted as

a tuple

(d, h, k)

d={x(d)

1, x(d)

2, ..., x(d)

{x(h)

1, x(h)

2, ..., x(h)

k={x(k)

1, x(k)

2, ..., x(k)

where

x(∗)

is a token of document, headline or

key information, and

are the lengths of

respective token sequences.

3.1 Deﬁnition of Dual Tasks

Given the input data

x= (d, h, k)

, we deﬁne our

problem in a dual form, which contains two tasks.

Formally, the key information prediction task aims

at ﬁnding a function

f: (d, h)→k

, which maxi-

mizes the conditional probability

p(k|d, h;θ)

of the

real key information

. Correspondingly, the head-

line generation task targets at learning a function

g: (d, k)→h

, which maximizes the conditional

probability

p(h|d, k;ϕ)

of real headline

. The two

tasks can be deﬁned as follows:

f(d, h;θ),arg max Y

x∈X

p(k|d, h;θ),

g(d, k;ϕ),arg max Y

x∈X

p(h|d, k;ϕ).

3.2 Probabilistic Duality Constraints

Based on the principle of dual learning

paradigm (He et al.,2016), we treat the key

information prediction task as primary task and the

headline generation task as secondary task. Ideally,

if the primary model and secondary model are both

trained optimally, the probabilistic duality between

the two tasks should satisfy the following equation:

p(X) = Y

x∈X

P(d, k, h) = Y

x∈X

p(d)p(h|d; ˆϕ)p(k|d, h;θ)

x∈X

p(d)p(k|d;ˆ

θ)p(h|d, k;ϕ).

p(k|d, h;θ)

and

p(h|d, k;ϕ)

are the target mod-

els to learn, while

p(k|d;ˆ

θ)

and

p(h|d; ˆϕ)

denote

the marginal distribution models. By integrating

the above probabilistic duality equation and further

dividing the common term

p(d)

, our problem can

be formally deﬁned to optimize the objectives:

Objective 1 : min

|X | X

x∈X

l1(f(d, h;θ), k),

Objective 2 : min

|X | X

x∈X

l2(g(d, k;ϕ), h),

s.t. Y

x∈X

p(h|d; ˆϕ)p(k|d, h;θ) = Y

x∈X

p(k|d;ˆ

θ)p(h|d, k;ϕ),

(1)

where

is the loss function for key information

prediction and l2is that for headline generation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LeveragingKeyInformationModelingtoImproveLess-DataConstrainedNewsHeadlineGenerationviaDualityFine-TuningZhuoxuanJiangy,LingfengQiaoy,DiYiny,ShanshanFengz,BoRenxyTencentYoutuLab,Shanghai,ChinazHarbinInstituteofTechnology,Shenzhen,ChinaxTencentYoutuLab,Hefei,Chinajzhx@pku.edu.cn,{leafqiao,endymecyyin,...

展开>> 收起<<

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning Zhuoxuan Jiangy Lingfeng Qiaoy Di Yiny Shanshan Fengz Bo Renx.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning Zhuoxuan Jiangy Lingfeng Qiaoy Di Yiny Shanshan Fengz Bo Renx

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: