MOCHA A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu

2025-05-06 0 0 420.5KB 11 页 10玖币

侵权投诉

MOCHA: A Multi-Task Training Approach for Coherent Text Generation

from Cognitive Perspective

Zhe Hu

Baidu Inc

huzhe01@baidu.com

Hou Pong Chan

University of Macau

hpchan@um.edu.mo

Lifu Huang

Virginia Tech

lifuh@vt.edu

Abstract

Teaching neural models to generate narrative

coherent texts is a critical problem. Recent

pre-trained language models have achieved

promising results, but there is still a gap

between human written texts and machine-

generated outputs. In this work, we propose

a novel multi-task training strategy for coher-

ent text generation grounded on the cognitive

theory of writing, which empowers the model

to learn essential subskills needed for writing

including planning and reviewing besides end-

to-end generation. We extensively evaluate our

model on three open-ended generation tasks

including story generation, news article writ-

ing and argument generation. Experiments

show that our model achieves better results

on both few-shot and fully-supervised settings

than strong baselines, and human evaluations

conﬁrm that our model can generate more co-

herent outputs.

1 Introduction

With the recent development of pretraining tech-

niques, large neural language models have achieved

impressive results on various text generation tasks

and can generate ﬂuent outputs. However, when

generating long-form texts (i.e., paragraphs with

multiple sentences), there is still a large gap be-

tween machine-generated outputs and human writ-

ten texts: the generated outputs usually suffer from

incoherence issues and fail to maintain overall nar-

rative coherence (See et al.,2019).

One possible reason for the above defects is

the lack of effective text planning as global guid-

ance to control the generation process. Compared

with the traditional generation systems which of-

ten decompose the generation task into text plan-

ning and surface realization (Reiter and Dale,1997;

Carenini and Moore,2006), current autoregressive

neural language models are typically trained to

produce texts in a left-to-right token-level man-

ner, which lacks anchored goal to constrain the

generation process (Fan et al.,2019). Recent stud-

ies incorporate text planning into neural models

by leveraging structured representations (Goldfarb-

Tarrant et al.,2020;Hua and Wang,2020) or latent

variables (Wang et al.,2022;Hu et al.,2022) as

high-level plans, but they need manually designed

domain-speciﬁc plans or complicated supervision

signals to train the model.

Another reason is the ineffective usage of the

negative samples as contrasts to teach the model

better distinguish between correct and incorrect

targets. Negative samples are useful to enhance

the model ability to generate better outputs (He

and Glass,2020). Recent work explores techin-

ques such as contrastive learning (Lee et al.,2020;

Su et al.,2022;An et al.,2022) and unlikelihood

training (Welleck et al.,2020;Li et al.,2020) to

leverage negative samples for model training.

We draw our motivations from the

cognitive

process theory of writing

(Flower and Hayes,

1981): “Writing is best understood as a set of

distinct thinking processes which writers orches-

trate or organize during the act of composing”. In

particular, the basic mental process of writing in-

cludes

planning

translating

(surface realization),

and

reviewing

, where the reviewing process further

involves

evaluating

and

revising

subskills. Current

language models are typically trained to maximize

the token-level loglikelihood and thus learn to ac-

quire the writing skills all at once. However, as

stated by Bruce (1978), learning the whole set of

task components for writing at once makes the

learning process very hard, and they suggest that

the intermediate tasks beneﬁt to acquiring and ex-

ercising different writing subskills.

In this work, we propose

MOCHA

, a Multi-task

training apprOach for CoHerent text generAtion by

enriching the token-level generation objective with

additional tasks speciﬁcally designed for different

writing subskills grounded on the cognitive per-

spective. Speciﬁcally, we introduce two additional

arXiv:2210.14650v1 [cs.CL] 26 Oct 2022

Generate a coherent output. [Title] "Objectivism"

is the most optimal way to go through life…

Output: Imagine that there are two groups of equal size. Group A

follows your philosophy. Group B identifies with their entire

group to some degree, and will make some level of effort to help

its members, even if it does not immediately benefit themselves.

Which group will be selected for? It’s a complex question …

Produce a plan. [Title] "Objectivism" is

the most optimal way to go through life… Output: <s1> two groups; equal size <s2> group A; philosophy

<s3> group B; entire group; will make some level; effort; members

<s4> group <s5> a complex question; many selective pressures…

Conduct surface realization. [Title]: “Objectivism”

is the most optimal... [Plan]: <s1> two groups; equal

size <s2> group A; philosophy <s3> group B; …

Revise the Output. [Title]: "Objectivism" is the

most optimal... [Output]: Imagine that there are

hundreds of thousands of them…

Is the output positive or negative? [Title]:

"Objectivism" is the most optimal ... [Output]:

Imagine that there are …

MOCHA

End-to-end Generation Task

Decomposed Generation Tasks

Reviewing Tasks Output: Imagine that there are two groups of equal size. Group

A follows your philosophy. Group B identifies …

Output: Imagine that there are two groups of equal size. Group

A follows your philosophy. Group B identifies …

Output: Positive

Figure 1:

Overview of our framework. We train our model with different tasks grounded on the cognitive theory of writing: (1)

end-to-end token-level generation task; (2) decomposed generation tasks including text planning and surface generation; (3)

reviewing tasks with revising ﬂawed targets and distinguishing between correct and incorrect options.

tasks needed for generating coherent outputs: (1)

decomposed generation tasks that divide the end-

to-end generation into text planning and surface

realization, and (2) reviewing tasks which leverage

negative samples to enforce the model to distin-

guish the correct and incorrect outputs and further

revise the ﬂawed texts.

Our work is closely related to the recent multi-

task training approach (Sanh et al.,2021) by con-

verting different tasks into text-to-text transfer

with corresponding prompts. Recent work (Raf-

fel et al.,2019) has shown that multi-task learn-

ing (MTL) with shared parameters across differ-

ent tasks can effectively improve the model per-

formance on text understanding (Aribandi et al.,

2021), dialogue generation (Li et al.,2021;Su

et al.,2021), and structured knowledge ground-

ing (Xie et al.,2022). Different from previous

work, we study coherent long text generation with

MTL to tackle different subskills needed for writ-

ing. Experimental results show that our method

outperforms strong baselines and achieves better

few-shot performance compared with vanilla T5

on story generation, counter-argument generation

and news article writing. Human evaluation further

conﬁrms that our method can generate more coher-

ent outputs. Data and Code are available at:

https:

//github.com/Derekkk/Mocha-EMNLP22

2 Method

Text generation is typically formulated as a

sequence-to-sequence (seq2seq) transformation:

p(y|x) = Qn

t=1 p(yt|y1:t−1, x)

, where

(x, y)

is a

source-target pair. We adopt the state-of-the-art

model T5 (Raffel et al.,2019) as the backbone,

which is an encoder-decoder Transformer. For each

sample, we introduce additional training objectives

to jointly improve the writing ability. Our training

objectives include end-to-end generation,decom-

posed generation and reviewing tasks. All tasks are

converted to text-to-text transfer with a task prompt

prepended to the source input. Notably, training

samples of the augmented tasks can be constructed

automatically, without further data labeling efforts.

The overall framework is illustrated in Figure 1.

2.1 End-to-end Generation Task

The end-to-end generation (Gen.) task is the same

as the typical training objective for text genera-

tion. We prepend the source input with a task

prompt (e.g., “Generate a coherent output”), and

the model is trained to generate the target. How-

ever, only applying this task is hard to generate

coherent outputs as it couples the whole set of writ-

ing processes at once and makes training difﬁcult.

Therefore, we introduce the additional subtasks.

2.2 Decomposed Generation Task

Generating narrative coherent outputs requires the

model to conduct effective text planning to decide

high-level plots, and properly reﬂect the plans in

the surface outputs. Thus, we propose two decom-

posed generation tasks (Decomp.).

Text Planning.

This task requires the model to

produce structured plots as high-level plans. We

follow Hua and Wang (2020) to adopt ordered

keyphrase chains to represent the plots. Concretely,

we extract salient noun and verb phrases from

the target as keyphrases, and then concatenate

the keyphrases with the same order they appear

in the target as the plan (more details are in Ap-

pendix A.2). The task prompt “Produce a plan” is

prepended to the title, and the model is trained to

generate the text plan, as shown in Figure 1.

Surface Realization.

Surface realization task

teaches the model to properly reﬂect the text plan

in the ﬁnal target. We concatenate the task prompt

(e.g., “Conduct surface realization”), title and the

corresponding plan as the input sequence, which is

consumed by the model to generate the ﬁnal target.

2.3 Reviewing Task

We propose two reviewing (Review.) tasks which

leverage negative samples to enhance the model

to better distinguish the coherent outputs from dis-

tracts, and learn to revise the ﬂawed outputs.

Revise Task.

The revise task aims to empower the

model to edit the ﬂawed outputs (Wang et al.,2018).

For each sample, we construct two ﬂawed nega-

tives: (1) randomly shufﬂe the target sentences to

encourage model to learn correct sentence ordering,

and (2) replace the keyphrases in the target with

random keyphrases to enhance better content orga-

nization. The model takes as input the task prompt

(“Revising the Output”), title, and the ﬂawed out-

put, and recovers the original target.

Distinguishing Task.

This task requires the model

to distinguish the original output from the dis-

tracted ones given an input. The distracted targets

are constructed with the same strategies as the Re-

vise Task. Similar to Zhou et al. (2020), the input

sequence is the concatenation of the task prompt

(e.g., “Which Option is Better”), the title, an output

with 50% to be the original target or a distracted

one otherwise. The model is trained to predict

whether the output is correct by generating “pos-

itive” or “negative”. By doing so, we expect the

model to give a preference of the coherent targets

and learn to generate better outputs.

2.4 Joint Training with Multi-tasks

We jointly train the aforementioned objectives with

shared parameters to reinforce the writing ability.

Speciﬁcally, given a source-target pair

(x, y)

, we

ﬁrst construct two decomposed generation samples

for text planning and surface realization tasks re-

spectively. Then we construct two ﬂawed samples

for the revise task. Finally, for the distinguishing

task, we choose the output with 50% to be the posi-

tive target or a distracted negative target otherwise.

All objectives are converted to text-to-text transfer

tasks, and jointly trained to maximize the likeli-

hood probability:

L=LGen. +LDecomp. +LReview.

Reddit/CMV Wikiplots NYTimes

# Train 42,462 95,571 103,579

# Dev 6,480 5,328 5,000

# Test 7,562 5,404 5,000

# Words 116.3 425.4 218.2

# Sent. 5.5 18.0 9.1

Table 1:

Statistics of the datasets. # Words denotes the aver-

age number of words in the target, and # Sent. represents the

average number of sentences.

During inference, we use the end-to-end generation

task to produce ﬁnal outputs.

3 Experimental Setting

3.1 Datasets

We evaluate our model on three datasets of dis-

tinct domains: (1) Reddit/ChangeMyView (Red-

dit/CMV) for argument generation (Hua and

Wang,2020), (2) Wikiplots for story generation,

and (3) New Tork Times for news article writ-

ing (Sandhaus,2008). We follow the previous

work (Rashkin et al.,2020) to further include top-

ical keyphrases as guidance outline, where noun

and verb phrases which contain at least one topic

signature words (Lin and Hovy,2000) from targets

are extracted. The title and keyphrases are concate-

nated as the input

. The statistics are in Table 1,

and more details are in Appendix A.1.

3.2 Model Details

We use T5-base (Raffel et al.,2019) in all exper-

iments. During training, we optimize our model

with AdamW (Loshchilov and Hutter,2017), and

the learning rate is 5e-5. For decoding, we apply

nucleus sampling (Holtzman et al.,2019) with k

as 10 and p as 0.9. The maximum of generation

steps are 200 for argument generation, 512 for story

generation and 350 for NYT article generation.

Baselines.

We ﬁrst consider generation mod-

els including GPT2 (Brown et al.,2020) and

T5 (Raffel et al.,2019) without multitask train-

ing. We also include strong planning-based meth-

ods: (1) CONTENTPLAN is a two-step genera-

tion model (Goldfarb-Tarrant et al.,2020;Hua and

Wang,2020), where a planner ﬁrst produces or-

dered keyphrase plans, and a generator consumes

the plans and generates ﬁnal outputs; (2) BOW-

PLAN (Kang and Hovy,2020) predicts keywords

as the global plan to guide the generation. All mod-

els are implemented with T5-base except for GPT2.

More details are in Appendix A

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MOCHA:AMulti-TaskTrainingApproachforCoherentTextGenerationfromCognitivePerspectiveZheHuBaiduInchuzhe01@baidu.comHouPongChanUniversityofMacauhpchan@um.edu.moLifuHuangVirginiaTechlifuh@vt.eduAbstractTeachingneuralmodelstogeneratenarrativecoherenttextsisacriticalproblem.Recentpre-trainedlanguagemodelsh...

展开>> 收起<<

MOCHA A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MOCHA A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: