MOCHA A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu

2025-05-06 0 0 420.5KB 11 页 10玖币
侵权投诉
MOCHA: A Multi-Task Training Approach for Coherent Text Generation
from Cognitive Perspective
Zhe Hu
Baidu Inc
huzhe01@baidu.com
Hou Pong Chan
University of Macau
hpchan@um.edu.mo
Lifu Huang
Virginia Tech
lifuh@vt.edu
Abstract
Teaching neural models to generate narrative
coherent texts is a critical problem. Recent
pre-trained language models have achieved
promising results, but there is still a gap
between human written texts and machine-
generated outputs. In this work, we propose
a novel multi-task training strategy for coher-
ent text generation grounded on the cognitive
theory of writing, which empowers the model
to learn essential subskills needed for writing
including planning and reviewing besides end-
to-end generation. We extensively evaluate our
model on three open-ended generation tasks
including story generation, news article writ-
ing and argument generation. Experiments
show that our model achieves better results
on both few-shot and fully-supervised settings
than strong baselines, and human evaluations
confirm that our model can generate more co-
herent outputs.
1 Introduction
With the recent development of pretraining tech-
niques, large neural language models have achieved
impressive results on various text generation tasks
and can generate fluent outputs. However, when
generating long-form texts (i.e., paragraphs with
multiple sentences), there is still a large gap be-
tween machine-generated outputs and human writ-
ten texts: the generated outputs usually suffer from
incoherence issues and fail to maintain overall nar-
rative coherence (See et al.,2019).
One possible reason for the above defects is
the lack of effective text planning as global guid-
ance to control the generation process. Compared
with the traditional generation systems which of-
ten decompose the generation task into text plan-
ning and surface realization (Reiter and Dale,1997;
Carenini and Moore,2006), current autoregressive
neural language models are typically trained to
produce texts in a left-to-right token-level man-
ner, which lacks anchored goal to constrain the
generation process (Fan et al.,2019). Recent stud-
ies incorporate text planning into neural models
by leveraging structured representations (Goldfarb-
Tarrant et al.,2020;Hua and Wang,2020) or latent
variables (Wang et al.,2022;Hu et al.,2022) as
high-level plans, but they need manually designed
domain-specific plans or complicated supervision
signals to train the model.
Another reason is the ineffective usage of the
negative samples as contrasts to teach the model
better distinguish between correct and incorrect
targets. Negative samples are useful to enhance
the model ability to generate better outputs (He
and Glass,2020). Recent work explores techin-
ques such as contrastive learning (Lee et al.,2020;
Su et al.,2022;An et al.,2022) and unlikelihood
training (Welleck et al.,2020;Li et al.,2020) to
leverage negative samples for model training.
We draw our motivations from the
cognitive
process theory of writing
(Flower and Hayes,
1981): “Writing is best understood as a set of
distinct thinking processes which writers orches-
trate or organize during the act of composing”. In
particular, the basic mental process of writing in-
cludes
planning
,
translating
(surface realization),
and
reviewing
, where the reviewing process further
involves
evaluating
and
revising
subskills. Current
language models are typically trained to maximize
the token-level loglikelihood and thus learn to ac-
quire the writing skills all at once. However, as
stated by Bruce (1978), learning the whole set of
task components for writing at once makes the
learning process very hard, and they suggest that
the intermediate tasks benefit to acquiring and ex-
ercising different writing subskills.
In this work, we propose
MOCHA
, a Multi-task
training apprOach for CoHerent text generAtion by
enriching the token-level generation objective with
additional tasks specifically designed for different
writing subskills grounded on the cognitive per-
spective. Specifically, we introduce two additional
arXiv:2210.14650v1 [cs.CL] 26 Oct 2022
Generate a coherent output. [Title] "Objectivism"
is the most optimal way to go through life…
Output: Imagine that there are two groups of equal size. Group A
follows your philosophy. Group B identifies with their entire
group to some degree, and will make some level of effort to help
its members, even if it does not immediately benefit themselves.
Which group will be selected for? It’s a complex question …
Produce a plan. [Title] "Objectivism" is
the most optimal way to go through life… Output: <s1> two groups; equal size <s2> group A; philosophy
<s3> group B; entire group; will make some level; effort; members
<s4> group <s5> a complex question; many selective pressures…
Conduct surface realization. [Title]: “Objectivism”
is the most optimal... [Plan]: <s1> two groups; equal
size <s2> group A; philosophy <s3> group B;
Revise the Output. [Title]: "Objectivism" is the
most optimal... [Output]: Imagine that there are
hundreds of thousands of them
Is the output positive or negative? [Title]:
"Objectivism" is the most optimal ... [Output]:
Imagine that there are
MOCHA
End-to-end Generation Task
Decomposed Generation Tasks
Reviewing Tasks Output: Imagine that there are two groups of equal size. Group
A follows your philosophy. Group B identifies …
Output: Imagine that there are two groups of equal size. Group
A follows your philosophy. Group B identifies …
Output: Positive
Figure 1:
Overview of our framework. We train our model with different tasks grounded on the cognitive theory of writing: (1)
end-to-end token-level generation task; (2) decomposed generation tasks including text planning and surface generation; (3)
reviewing tasks with revising flawed targets and distinguishing between correct and incorrect options.
tasks needed for generating coherent outputs: (1)
decomposed generation tasks that divide the end-
to-end generation into text planning and surface
realization, and (2) reviewing tasks which leverage
negative samples to enforce the model to distin-
guish the correct and incorrect outputs and further
revise the flawed texts.
Our work is closely related to the recent multi-
task training approach (Sanh et al.,2021) by con-
verting different tasks into text-to-text transfer
with corresponding prompts. Recent work (Raf-
fel et al.,2019) has shown that multi-task learn-
ing (MTL) with shared parameters across differ-
ent tasks can effectively improve the model per-
formance on text understanding (Aribandi et al.,
2021), dialogue generation (Li et al.,2021;Su
et al.,2021), and structured knowledge ground-
ing (Xie et al.,2022). Different from previous
work, we study coherent long text generation with
MTL to tackle different subskills needed for writ-
ing. Experimental results show that our method
outperforms strong baselines and achieves better
few-shot performance compared with vanilla T5
on story generation, counter-argument generation
and news article writing. Human evaluation further
confirms that our method can generate more coher-
ent outputs. Data and Code are available at:
https:
//github.com/Derekkk/Mocha-EMNLP22
2 Method
Text generation is typically formulated as a
sequence-to-sequence (seq2seq) transformation:
p(y|x) = Qn
t=1 p(yt|y1:t1, x)
, where
(x, y)
is a
source-target pair. We adopt the state-of-the-art
model T5 (Raffel et al.,2019) as the backbone,
which is an encoder-decoder Transformer. For each
sample, we introduce additional training objectives
to jointly improve the writing ability. Our training
objectives include end-to-end generation,decom-
posed generation and reviewing tasks. All tasks are
converted to text-to-text transfer with a task prompt
prepended to the source input. Notably, training
samples of the augmented tasks can be constructed
automatically, without further data labeling efforts.
The overall framework is illustrated in Figure 1.
2.1 End-to-end Generation Task
The end-to-end generation (Gen.) task is the same
as the typical training objective for text genera-
tion. We prepend the source input with a task
prompt (e.g., “Generate a coherent output”), and
the model is trained to generate the target. How-
ever, only applying this task is hard to generate
coherent outputs as it couples the whole set of writ-
ing processes at once and makes training difficult.
Therefore, we introduce the additional subtasks.
2.2 Decomposed Generation Task
Generating narrative coherent outputs requires the
model to conduct effective text planning to decide
high-level plots, and properly reflect the plans in
the surface outputs. Thus, we propose two decom-
posed generation tasks (Decomp.).
Text Planning.
This task requires the model to
produce structured plots as high-level plans. We
follow Hua and Wang (2020) to adopt ordered
keyphrase chains to represent the plots. Concretely,
we extract salient noun and verb phrases from
the target as keyphrases, and then concatenate
the keyphrases with the same order they appear
in the target as the plan (more details are in Ap-
pendix A.2). The task prompt “Produce a plan” is
prepended to the title, and the model is trained to
generate the text plan, as shown in Figure 1.
Surface Realization.
Surface realization task
teaches the model to properly reflect the text plan
in the final target. We concatenate the task prompt
(e.g., “Conduct surface realization”), title and the
corresponding plan as the input sequence, which is
consumed by the model to generate the final target.
2.3 Reviewing Task
We propose two reviewing (Review.) tasks which
leverage negative samples to enhance the model
to better distinguish the coherent outputs from dis-
tracts, and learn to revise the flawed outputs.
Revise Task.
The revise task aims to empower the
model to edit the flawed outputs (Wang et al.,2018).
For each sample, we construct two flawed nega-
tives: (1) randomly shuffle the target sentences to
encourage model to learn correct sentence ordering,
and (2) replace the keyphrases in the target with
random keyphrases to enhance better content orga-
nization. The model takes as input the task prompt
(“Revising the Output”), title, and the flawed out-
put, and recovers the original target.
Distinguishing Task.
This task requires the model
to distinguish the original output from the dis-
tracted ones given an input. The distracted targets
are constructed with the same strategies as the Re-
vise Task. Similar to Zhou et al. (2020), the input
sequence is the concatenation of the task prompt
(e.g., “Which Option is Better”), the title, an output
with 50% to be the original target or a distracted
one otherwise. The model is trained to predict
whether the output is correct by generating “pos-
itive” or “negative”. By doing so, we expect the
model to give a preference of the coherent targets
and learn to generate better outputs.
2.4 Joint Training with Multi-tasks
We jointly train the aforementioned objectives with
shared parameters to reinforce the writing ability.
Specifically, given a source-target pair
(x, y)
, we
first construct two decomposed generation samples
for text planning and surface realization tasks re-
spectively. Then we construct two flawed samples
for the revise task. Finally, for the distinguishing
task, we choose the output with 50% to be the posi-
tive target or a distracted negative target otherwise.
All objectives are converted to text-to-text transfer
tasks, and jointly trained to maximize the likeli-
hood probability:
L=LGen. +LDecomp. +LReview.
.
Reddit/CMV Wikiplots NYTimes
# Train 42,462 95,571 103,579
# Dev 6,480 5,328 5,000
# Test 7,562 5,404 5,000
# Words 116.3 425.4 218.2
# Sent. 5.5 18.0 9.1
Table 1:
Statistics of the datasets. # Words denotes the aver-
age number of words in the target, and # Sent. represents the
average number of sentences.
During inference, we use the end-to-end generation
task to produce final outputs.
3 Experimental Setting
3.1 Datasets
We evaluate our model on three datasets of dis-
tinct domains: (1) Reddit/ChangeMyView (Red-
dit/CMV) for argument generation (Hua and
Wang,2020), (2) Wikiplots for story generation,
and (3) New Tork Times for news article writ-
ing (Sandhaus,2008). We follow the previous
work (Rashkin et al.,2020) to further include top-
ical keyphrases as guidance outline, where noun
and verb phrases which contain at least one topic
signature words (Lin and Hovy,2000) from targets
are extracted. The title and keyphrases are concate-
nated as the input
x
. The statistics are in Table 1,
and more details are in Appendix A.1.
3.2 Model Details
We use T5-base (Raffel et al.,2019) in all exper-
iments. During training, we optimize our model
with AdamW (Loshchilov and Hutter,2017), and
the learning rate is 5e-5. For decoding, we apply
nucleus sampling (Holtzman et al.,2019) with k
as 10 and p as 0.9. The maximum of generation
steps are 200 for argument generation, 512 for story
generation and 350 for NYT article generation.
Baselines.
We first consider generation mod-
els including GPT2 (Brown et al.,2020) and
T5 (Raffel et al.,2019) without multitask train-
ing. We also include strong planning-based meth-
ods: (1) CONTENTPLAN is a two-step genera-
tion model (Goldfarb-Tarrant et al.,2020;Hua and
Wang,2020), where a planner first produces or-
dered keyphrase plans, and a generator consumes
the plans and generates final outputs; (2) BOW-
PLAN (Kang and Hovy,2020) predicts keywords
as the global plan to guide the generation. All mod-
els are implemented with T5-base except for GPT2.
More details are in Appendix A
摘要:

MOCHA:AMulti-TaskTrainingApproachforCoherentTextGenerationfromCognitivePerspectiveZheHuBaiduInchuzhe01@baidu.comHouPongChanUniversityofMacauhpchan@um.edu.moLifuHuangVirginiaTechlifuh@vt.eduAbstractTeachingneuralmodelstogeneratenarrativecoherenttextsisacriticalproblem.Recentpre-trainedlanguagemodelsh...

展开>> 收起<<
MOCHA A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:11 页 大小:420.5KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注