Re3 Generating Longer Stories With Recursive Reprompting and Revision Kevin Yang1Yuandong Tian2Nanyun Peng3Dan Klein1

2025-04-24 1 0 1.46MB 86 页 10玖币

侵权投诉

Re3: Generating Longer Stories With Recursive Reprompting and

Revision

Kevin Yang1Yuandong Tian2Nanyun Peng3Dan Klein1

1UC Berkeley, 2Meta AI, 3UCLA

{yangk,klein}@berkeley.edu,yuandong@meta.com,violetpeng@cs.ucla.edu

Abstract

We consider the problem of automatically gen-

erating longer stories of over two thousand

words. Compared to prior work on shorter sto-

ries, long-range plot coherence and relevance

are more central challenges here. We pro-

pose the Recursive Reprompting and Revision

framework (Re3) to address these challenges

by (a) prompting a general-purpose language

model to construct a structured overarching

plan, and (b) generating story passages by re-

peatedly injecting contextual information from

both the plan and current story state into a lan-

guage model prompt. We then revise by (c)

reranking different continuations for plot co-

herence and premise relevance, and ﬁnally (d)

editing the best continuation for factual con-

sistency. Compared to similar-length stories

generated directly from the same base model,

human evaluators judged substantially more of

Re3’s stories as having a coherent overarching

plot (by 14% absolute increase), and relevant

to the given initial premise (by 20%).

1 Introduction

Generating long-term coherent stories is a long-

standing challenge for artiﬁcial intelligence, requir-

ing a comprehensive grasp of linguistic, world, and

commonsense knowledge (Charniak,1972;Turner,

1994). Recently, many works have automatically

generated short stories ranging in length from ﬁve

sentences to one or two paragraphs (Fan et al.,

2018;Yao et al.,2019;Goldfarb-Tarrant et al.,

2020;Rashkin et al.,2020;Han et al.,2022). While

stories of such length serve as a good test bed for

text generation, they are much shorter than typical

short stories meant for human consumption, which

are often several pages in length.

In this work, we aim to bridge some of this gap

by generating much longer “short” stories: the ﬁnal

generated stories in our experiments are 2000-2500

words. We are the ﬁrst to automatically generate

plot-coherent stories of such length, with further

Generate a setting, characters, and

outline by prompting a language model.

A new law grad returns home to start

her career, but struggles with the

broken justice system.

Liza Turner pulled up in front of the

house where she’d grown up. Little had

changed since she was a teenager...

Write story continuations by prompting

based on the plan and previous story.

Rerank story continuations for plot

coherence and premise relevance.

Edit selected continuation to maintain

long-range factual consistency.

Plan

Draft

Rewrite

Edit

Premise

Story

Figure 1: High-level overview of Re3.

length increases limited primarily by evaluation

rather than technical issues.

Generating stories

of such length faces qualitatively new challenges

compared to prior work on shorter stories. First, the

system must maintain a coherent overarching plot

over thousands of words. Given an initial premise,

it should maintain relevance to this premise over

thousands of words as well. Additional challenges

include preservation of narration style and avoiding

factual contradictions over a very long horizon.

Of course, recent years have also witnessed a

dramatic rise in the capabilities of general-purpose

(non-ﬁnetuned) large pretrained language models.

Of particular note are their strong zero-shot capabil-

ities, especially when given clever prompts (Brown

et al.,2020;Kojima et al.,2022). Yet despite recent

improvements, even the best models to date may

still struggle with complex long-form generation,

such as in our story generation task (Section 4).

In contrast, human writers successfully navigate

the myriad challenges of long-form generation on

a regular basis. We observe that a human writer

does not simply write a long document in one shot.

1We generate a 7500-word story in Appendix M.

arXiv:2210.06774v3 [cs.CL] 22 Oct 2022

Rather, he or she may (a) create a detailed plan,

then (b) draft each next passage of the document

according to that plan. He or she may then revise

by (c) rewriting passages entirely, and/or (d) post-

editing for ﬁner details.

Motivated by this observation, we propose the

cursive

prompting and

vision framework

(Re

, Figure 1) to generate longer stories. While

based on the human writing process, Re

is a fully

automatic system with no human intervention, un-

like prior approaches which model the human writ-

ing process with a human in the loop (Goldfarb-

Tarrant et al.,2019;Coenen et al.,2021;Lee et al.,

2022). First, (a) Re

’s Plan module generates a plan

by prompting GPT3 (Brown et al.,2020) to aug-

ment a given premise with a setting, characters, and

outline. (b) Re

’s Draft module then generates each

next story continuation by recursively reprompting

GPT3 using a strategically crafted prompt, in a

procedure which can be viewed as a generaliza-

tion of chain-of-thought prompting (Kojima et al.,

2022). Speciﬁcally, our prompt is dynamically re-

constructed at each step by selectively manifesting

contextually relevant information from the initial

plan—itself generated by prompting—and the story

thus far. We then divide the revision process into (c)

a Rewrite module which emulates a full rewrite by

reranking alternate continuations, and (d) an Edit

module which makes smaller local edits to improve

factual consistency with previous passages.

As an additional contribution, our Plan and Draft

modules are fully zero-shot rather than trained on

existing story datasets. Thus not only does Re

generate stories an order of magnitude longer than

those of prior work, but it is not limited to any

particular training domain.

To evaluate Re

for longer story generation, we

compare its generated stories to similar-length sto-

ries from two GPT3-based “rolling-window” base-

lines (Section 4). In pairwise comparisons, human

evaluators rated stories from Re

as signiﬁcantly

and substantially more coherent in overarching plot

(up to 14% absolute increase in the fraction deemed

coherent), as well as relevant to the initial premise

(up to 20%). In fact, evaluators predicted up to

83% of stories written by Re

to be written by hu-

mans. The results indicate that Re

can be highly

effective at improving long-range coherence and

premise relevance in longer story generation.2

All code and data available at

https://github.com/

yangkevin2/emnlp22-re3-story-generation.

2 Related Work

Automatic Story Generation.

Several previous

works have modeled parts of our proposed writing

process, usually one part at a time.

Most similar to our Plan module are approaches

using an outline or structured schema to main-

tain plot coherence (Li et al.,2013;Fan et al.,

2018;Yao et al.,2019;Goldfarb-Tarrant et al.,

2020;Rashkin et al.,2020;Tian and Peng,2022).

Other methods for high-level planning include la-

tent variables (Miao and Blunsom,2016;Wang

and Wan,2019;Wang et al.,2022), coarse-to-ﬁne

slot-ﬁlling (Fan et al.,2019), and keywords and/or

control codes (Peng et al.,2018;Ippolito et al.,

2019;Xu et al.,2020;Lin and Riedl,2021).

Meanwhile, our Rewrite module uses rerankers

similar to Guan et al. (2020) and Wang et al. (2020),

although we model both coherence and premise

relevance. Yu et al. (2020) iteratively edits and

improves the output like our Edit module, but we

additionally detect when edits are required.

We emphasize again the length of stories we

aim to generate. In prior studies, out-of-the-box

language models struggled to generate even very

short stories (Holtzman et al.,2019;See et al.,

2019). Although there exist datasets of relatively

longer stories, such as WritingPrompts (Fan et al.,

2018) and STORIUM (Akoury et al.,2020), many

works still only focus on stories of about ﬁve sen-

tences (Wang and Wan,2019;Yao et al.,2019;

Qin et al.,2019;Wang et al.,2022), even when

using language models with hundreds of billions

of parameters (Xu et al.,2020). Some challenges

of generating longer stories are apparent in Wang

et al. (2022): their method generates high-quality

few-sentence stories, but their forced long text gen-

erations, while judged better than baselines’, re-

main confusing and repetitive. Moreover, maintain-

ing long-range plot coherence, premise relevance,

and factual consistency is substantially harder over

multiple-thousand-word horizons.

Human-In-The-Loop Story Generation

. In con-

trast to fully automatic approaches like Re

, sev-

eral recent works have proposed human-interactive

methods to maintain quality in longer stories (Co-

enen et al.,2021;Lee et al.,2022;Chung et al.,

2022). Such works commonly combine both plan-

ning and revision systems (Goldfarb-Tarrant et al.,

2019;Coenen et al.,2021). In principle, Re

is also

highly controllable via human interaction, as both

our planning and revision systems operate nearly

entirely in natural language space; however, we

focus on fully automatic generation in this work.

Prompting.

Numerous works have demonstrated

general-purpose language models’ strong zero-shot

ability on a wide variety of tasks via prompt-

ing (Brown et al.,2020;Zhong et al.,2021;Sanh

et al.,2021;Ouyang et al.,2022;Wu et al.,

2022). Careful prompt design can yield further

gains (Lee et al.,2021;Liu et al.,2021;Kojima

et al.,2022). However, most prompting methods

focus on shorter-answer tasks rather than long-form

generation. Instead of generating the output in one

shot, our recursive reprompting procedure treats

prompting as a subroutine to generate the ﬁnal

output in conjunction with our planning and revi-

sion infrastructure. Compared to chain-of-thought

prompting approaches like Kojima et al. (2022),

goes a step further by repeatedly re-composing

the prompt in modular fashion, dynamically recom-

bining the most contextually relevant parts of both

the high-level plan and the story thus far.

3 Recursive Reprompting and Revision

We now describe our Recursive Reprompting and

Revision framework (Re

), which decomposes

the human writing process into our Plan, Draft,

Rewrite, and Edit modules. See Appendix Kfor

concrete examples of each component in practice.

3.1 Plan Module

Plan

Premise: A new law grad returns home

to start her career, but struggles

with the broken justice system.

Premise

Setting: The story is set in

a small town in the United States.

Setting

1. Character Portrait:

Liza Turner is a 22-year-old woman.

2. Character Portrait:

Peyton Turner is Liza’s older sister.

Characters

Outline the main plot points of the

story.

1. Liza Turner graduates from law

school.

2. She moves back to her hometown to

start her career.

3. She struggles with the reality of

the broken justice system.

Outline

Figure 2:

Illustration of Re

’s Plan module, which prompts a

language model to generate a setting, characters, and outline

based on the premise. Highlighting indicates generated text.

The Plan module augments a story premise with

a setting, characters, and outline (Figure 2).

The setting is a simple one-sentence extension of

the premise, obtained by using

The story is set

to prompt GPT3-Instruct-175B (Ouyang et al.,

2022), a version of GPT3 ﬁnetuned to better follow

human instructions. Next, we use GPT3-Instruct-

175B to generate up to three character names and

then descriptions, conditioned on the premise and

setting. For names, we do rejection sampling using

simple heuristics to ﬁlter out malformed outputs

(Appendix A). Finally, we prompt GPT3-Instruct-

175B to write a numbered outline of the story and

parse the output into a list of outline points, re-

sampling until the list is well-formed.

These plan components, themselves generated

by prompting, will be repeatedly reused to compose

prompts for generating story passages in the Draft

module; hence recursive reprompting.

3.2 Draft Module

Draft

Relevant context:

Liza Turner is a 22-year-old woman.

Peyton Turner is Liza’s older sister.

Relevant

Context

Previous story summary:

Liza Turner graduates from law school.

Sections’

Outlines

Immediately before current passage:

Liza Turner returns home to her small

town, feeling both familiar and

unsafe. She is unsure if she wants to

live there, but hesitant to leave.

Recent

Story

Summary

In the upcoming passage,

She moves back to her hometown to

start her career.

Upcoming

Section

Outline

Full text below:

She locked up the car and carried her

things into the house through the back

door in case Peyton was home still.

Auto-

regressive

Context

Figure 3:

Illustration of the prompt constructed in Re

’s Draft

module to generate each next story continuation. Our recursive

reprompting approach combines pieces of the plan (blue) and

previously generated story (grey) into a single prompt by

concatenating the depicted components in order.

For each point of the outline, we will generate

several story passages before moving on to the

next outline point. Each passage is generated as a

ﬁxed-length continuation from a structured prompt,

which is composed by our recursive reprompting

procedure as shown in Figure 3.

The prompt begins with a selection of “Rel-

evant Context” shown at the top of Figure 3.

As the story progresses, we dynamically update

the list of character descriptions using a named-

entity-recognition-based pipeline, which identiﬁes

new entities from each new story passage using

Flair (Akbik et al.,2018) and writes descriptions

using GPT3-Instruct-175B. Thus “Relevant Con-

text” initially contains all of the premise, setting,

and characters shown in Figure 2, but subsequently

selects only what is most relevant to the most recent

story passage using a pretrained Dense Passage Re-

trieval (DPR) model (Karpukhin et al.,2020).

The remainder of the prompt can be viewed as

a coarse-to-ﬁne description of the previous story,

following the intuition that an author needs detailed

information about the most recent passage but per-

haps only higher-level information about much

earlier passages. As shown in Figure 3, we in-

clude “Previous Sections’ Outlines” as a very high-

level summary of previous larger story sections,

followed by a “Recent Story Summary” written by

GPT3-Instruct-13B

of a few penultimate passages.

At the end we repeat verbatim the immediately pre-

ceding passage as “Autoregressive Context” from

which point the story should continue. Finally, to

enforce relevance to the current outline point, we

include the “Current Section Outline” in the prompt

just before “Autoregressive Context.”

Finally, the full prompt is fed to GPT3-175B to

generate the next story passage.4

3.3 Rewrite Module

Rewrite

All the lights were off

and there was no sign of

Peyton. She shrugged and

decided to go out and

spend the rest of her

evening at one of New

York City’s many bars.

Draft

Continuation

Coherence

Relevance

-1.7

She knew Peyton was

probably working late at

his restaurant so he

wouldn't come home early

to see her, but she

wouldn't put it past him

to do it anyway.

Draft

Continuation

Coherence

Relevance

2.0

✓✓

Figure 4:

’s Rewrite module reranks the Draft module’s

continuations for coherence and relevance.

The generator’s ﬁrst output continuation is often

low-quality, even with the planning and recursive

reprompting in the Plan and Draft modules. Hu-

mans may encounter a similar problem after a ﬁrst

draft, particularly upon receiving feedback from

others, and be forced to rewrite a passage altogether.

Our Rewrite module models this rewriting process

As economical usage of large language models is becom-

ing increasingly important (Strubell et al.,2019), we use the

13B model where we observe it is not substantially worse.

This step does not use GPT3-Instruct-175B, as we ob-

served in preliminary experiments that an earlier version of

GPT3-Instruct-175B would frequently repeat sections of the

prompt. Generators other than GPT3-175B are also possible

in principle: for example, retrieval-augmented architectures

like RAG (Lewis et al.,2020) or architectures designed for

long-range dependencies like S4 (Gu et al.,2021). However,

it is critical to use a sufﬁciently high-quality language model:

even scaling down to GPT3-13B resulted in noticeably less

coherent outputs in our preliminary experiments.

by reranking Draft module outputs based on coher-

ence with the previous passage and relevance to the

current outline point (Figure 4).

We note that this Rewrite module is the only

part of Re

which uses prior story data. All of the

modules which actually generate text (Plan, Draft,

and to some extent Edit) do not require prior data.

Coherence Reranker.

We train a discriminative

model to predict whether a continuation is coherent

with the previous story. As data, we split stories

from the WritingPrompts dataset (Fan et al.,2018)

into passages up to 1000 tokens long, labeling the

ending up to 200 tokens as the gold continuation.

Inspired by the contrastive learning setup of Wang

et al. (2020) and Guan et al. (2020), we obtain neg-

ative examples by replacing the gold continuation

with a random other continuation from either the

same story or a different one. We then ﬁnetune a

pretrained Longformer-Base (Beltagy et al.,2020)

to classify whether a continuation is the true con-

tinuation for a given passage.

Relevance Reranker.

We train a relevance model

with the same architecture as our coherence model

to predict whether a continuation is relevant to the

current outline point. We construct a dataset of

2000 training examples, where each example con-

sists of a 200-token story passage from Writing-

Prompts and a brief summary written by GPT3-

Instruct-13B. Negative examples are constructed

by selecting the summary of a different passage,

whether in the same story or a different one.

Additional Heuristics.

Finally, we ﬁlter out con-

tinuations with some writing problems which are

easy to detect via rule-based heuristics. For exam-

ple, we check for repetition issues, e.g., repeating

chunks of the structured prompt. Similarly, to main-

tain consistent narration, we ﬁlter out ﬁrst person

continuations to enforce a consistent third person

perspective. Full details in Appendix B.

3.4 Edit Module

In contrast to the Rewrite module which reranks

complete alternate continuations, the Edit module

makes local edits to further reﬁne a passage pro-

duced by careful planning, drafting, and rewriting.

Speciﬁcally, we aim to remove long-range fac-

tual inconsistencies. When a human detects a small

factual discontinuity upon proofreading, he or she

might simply edit the offending detail, rather than

making major changes to the high-level plan or do-

ing substantial rewriting. Our Edit module mimics

Edit

Peyton Turner

Peyton Turner is male.

Peyton works at a restaurant.

Inferred Facts

She knew Peyton was probably

working late at his restaurant so

he wouldn't come home early to see

her, but she wouldn't put it past

him to do it anyway.

Selected

Continuation

Peyton Turner

Younger sister Liza Turner

Gender female male

Workplace restaurant

Attribute

Dictionary

Edit so that

Peyton Turner is female.

Editing

Instruction

She knew Peyton was probably

working late at her restaurant so

she wouldn't come home early to see

her, but she wouldn't put it past

her to do it anyway.

Final Edited

Continuation

Figure 5:

Illustration of Re

’s Edit module. Starting from the

Rewrite module’s best continuation, we infer natural language

facts about each character, and convert them to attribute-value

pairs. New values (blue) are added to the attribute dictionary,

and contradictory values (red) are corrected.

this process in two steps: detecting factual incon-

sistencies, and correcting them.

Detecting Factual Inconsistencies.

An inconsis-

tency involves two statements. As the number

of statement pairs scales quadratically with story

length, naively comparing all pairs can result in

a sea of false positive “contradictions” (Section

5.2). Flagging inconsistencies while avoiding false

positives requires overwhelming precision.

Task Framing. To make the task more tractable,

we focus on factual inconsistencies in character

attributes (e.g., age, occupation, relationship to an-

other character). At a high level, our detection

system maintains a compact knowledge base in the

form of Figure 5’s “Attribute Dictionary” for each

character. With each new story passage, we check

for contradictions against only these attribute-value

dictionaries instead of all previous text. The dictio-

naries are then updated for the new passage, and

new dictionaries are created for new characters

when detected as described in Section 3.2.

Thus, the core of our detection system is a high-

precision information extraction procedure for ob-

taining attribute-value pairs for a given character

from a story passage. Rather than hard-coding a

ﬁxed set of attributes, our system is inspired by

Open Information Extraction (Etzioni et al.,2008),

in order to capture the wide variety of possible

attributes which may be salient in different stories.

Implementation Details. We begin by prompt-

ing GPT3-Instruct-175B for a numbered list of

facts about the given character, shown as “Inferred

Facts” in Figure 5. Each fact is fed with a few-shot

prompt to GPT3-Instruct-13B to extract attribute

keys. We then prompt GPT3-Instruct-13B with

the fact and each attribute key to obtain complete

attribute-value pairs. In steps prone to hallucina-

tion, we generate three outputs and keep only those

which are repeated, or entailed by other outputs ac-

cording to a BART-Large-based (Lewis et al.,2019)

entailment model trained on MNLI (Williams et al.,

2018). See Appendix Cfor complete details on

information extraction, with example prompts.

Finally, we add new pairs to our dictionary, and

use the entailment model to ﬂag contradictions be-

tween new and old values for the same key.

Correcting Factual Inconsistencies.

Once an in-

consistency is detected, we frame the task of cor-

recting it as controlled text editing. The original

natural language fact (i.e., “Inferred Facts” in Fig-

ure 5) from which we extracted the contradicted

attribute-value pair now becomes the basis for the

“Editing Instruction” in Figure 5. This instruction

is then fed along with the original continuation to

the beta GPT3 Edit API.

4 Evaluation

Task Setup.

We frame the task as generating a

story given a brief initial premise. As a “story” is

difﬁcult to deﬁne in a rule-based manner, we do not

impose any rule-based constraints on acceptable

outputs, but will instead evaluate via several human-

annotated metrics as described later.

To generate the initial premises, we prompt

GPT3-Instruct-175B with high temperature to ac-

quire 100 diverse premises.

All premises and sto-

ries are in English.

Method Instantiation.

For fair comparison, it is

desirable for the concrete implementation (hence-

forth RE

) of our Re

framework to output stories

of consistent length. While Re

is capable of gen-

erating shorter or longer stories (see e.g., our 7500-

word example in Appendix M), here we aim for

roughly 3000 tokens (2000-2500 words).

Thus

we re-sample the initial outlines (Section 3.1) until

they contain exactly three points, and generate ex-

actly four 256-token continuations for each outline

Combining this simple premise generation scheme with

yields a story generation system which operates fully

from scratch, with no input premise required.

See Appendix Ffor analysis on how story length may

impact quality.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Re3:GeneratingLongerStoriesWithRecursiveRepromptingandRevisionKevinYang1YuandongTian2NanyunPeng3DanKlein11UCBerkeley,2MetaAI,3UCLA{yangk,klein}@berkeley.edu,yuandong@meta.com,violetpeng@cs.ucla.eduAbstractWeconsidertheproblemofautomaticallygen-eratinglongerstoriesofovertwothousandwords.Comparedtopri...

展开>> 收起<<

Re3 Generating Longer Stories With Recursive Reprompting and Revision Kevin Yang1Yuandong Tian2Nanyun Peng3Dan Klein1.pdf

共86页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Re3 Generating Longer Stories With Recursive Reprompting and Revision Kevin Yang1Yuandong Tian2Nanyun Peng3Dan Klein1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: