Help me write a poem Instruction Tuning as a Vehicle for Collaborative Poetry Writing Tuhin Chakrabarty1Vishakh Padmakumar2He He23

2025-05-06 0 0 1.17MB 16 页 10玖币

侵权投诉

Help me write a poem: Instruction Tuning as a Vehicle for Collaborative

Poetry Writing

Tuhin Chakrabarty1∗Vishakh Padmakumar2∗He He 2,3

1Department of Computer Science, Columbia University

2Center for Data Science, New York University

3Department of Computer Science, New York University

tuhin.chakr@cs.columbia.edu, vishakh@nyu.edu, hhe@nyu.edu

Abstract

Recent work in training large language mod-

els (LLMs) to follow natural language instruc-

tions has opened up exciting opportunities for

natural language interface design. Building

on the prior success of LLMs in the realm of

computer-assisted creativity, we aim to study

if LLMs can improve the quality of user-

generated content through collaboration. We

present CoPoet, a collaborative poetry writing

system. In contrast to auto-completing a user’s

text, CoPoet is controlled by user instructions

that specify the attributes of the desired text,

such as Write a sentence about ‘love’ or Write

a sentence ending in ‘ﬂy’. The core compo-

nent of our system is a language model ﬁne-

tuned on a diverse collection of instructions for

poetry writing. Our model is not only com-

petitive with publicly available LLMs trained

on instructions (InstructGPT), but is also capa-

ble of satisfying unseen compositional instruc-

tions. A study with 15 qualiﬁed crowdwork-

ers shows that users successfully write poems

with CoPoet on diverse topics ranging from

Monarchy to Climate change. Further, the

collaboratively written poems are preferred by

third-party evaluators over those written with-

out the system.1

1 Introduction

Advancements in large language models (LLMs)

have made remarkable progress towards generating

coherent text in a wide variety of domains. This

has spurred increasing interest in computer-assisted

creativity (See et al.,2019;Elkins and Chun,2020;

Ramesh et al.,2022;Branwen,2020) such as build-

ing co-creative assistants for writing stories, poems,

and argumentative essays (Lee et al.,2022;Swan-

son et al.,2021;Uthus et al.,2019;Donahue et al.,

2020;Padmakumar and He,2022;Du et al.,2022).

∗Both Authors Contributed Equally

Our code, preprocessed data, models, and the interaction

logs from our user study are available at

https://github.

com/vishakhpk/creative-instructions

Write a simile about 'cake'

A cake is like a cloud of joy.

A cake is like a piece of heaven.

Suggest a verse to follow the

sentence 'The cake is like a

cloud of joy'

The frosting is like a rainbow.

So soft so fluffy, I could cry.

The cake is like a cloud of joy.

The frosting is like a rainbow.

One slice is heaven on Earth,

A flowing chocolate volcano.

Write a poetic sentence that

contains the word 'chocolate'

and ends in 'volcano'

A flowing chocolate volcano.

I ate so much chocolate and

erupted like a volcano.

The cake is like a cloud of joy.

The frosting is like a rainbow.

One slice is heaven on Earth,

The cake is like a cloud of joy.

The frosting is like a rainbow.

The cake is like a cloud of joy.

Poem Draft Interactions

Figure 1: A collaborative poem entitled ’Decadence’,

written with CoPoet assistance. Green text was writ-

ten directly by the human, who interacts with CoPoet

using instructions. CoPoet offers multiple suggestions

which the user can accept or reject . The user wrote

a four line poem before indicating completion of the

task.

The adoption of these technologies hinges on their

ability to provide appropriate suggestions while be-

ing easy to interact with. However, there has been

limited research on the effectiveness of such col-

laboration, e.g., whether the assistant understands

user intents and whether collaboration improves

the ﬁnal outcome.

In this paper, we aim to understand the collabo-

ration capabilities of LLMs through a case study of

collaborative poetry writing. Writing a poem is of-

ten a challenging task because it is both open-ended

and highly constrained. Unlike stories or other ar-

gumentative texts, in order to write a poem we need

creative content that satisﬁes various long- and

short-range form constraints such as rhyme, meter,

arXiv:2210.13669v1 [cs.CL] 25 Oct 2022

and sound, which poses a signiﬁcant challenge for

end-to-end poem generation systems (Ghazvinine-

jad et al.,2016;Tian and Peng,2022;Van de Cruys,

2020;Ormazabal et al.,2022). While LLMs some-

times struggle with long-range coherence, they are

good at providing variations of text that satisfy lo-

cal constraints. This makes them great partners

to humans in poem writing, where humans focus

on the long-range writing plan and the machine

implements the ideas locally.

Effective collaboration in co-creative writing is

challenging as it requires the model to understand

user intention. For example, as shown in Figure 1,

a user may have a rough plan around two related

concepts such as chocolate and volcano, and want

the model to suggest a verse that contains chocolate

and ends with volcano; or they may be looking for

a verse that rhymes with a speciﬁc word (rainbow)

to satisfy the constraints. An auto-completion in-

terface is not able to anticipate such user needs and

provide targeted suggestions. To enable richer in-

teraction, we rely on instructional prompts (Wang

et al.,2022;Sanh et al.,2021;Mishra et al.,2022;

Mishra and Nouri,2022) that act as a natural lan-

guage interface between the user and the assistant.

Speciﬁcally, we present CoPoet, a collaborative

poem writing system with a natural language inter-

face. During a writing session, the user can itera-

tively request suggestions through natural language

instructions such as Write a simile about ‘cake’,

and edit their draft based on the suggestions (Fig-

ure 1). To build CoPoet, we ﬁnetune a pretrained

sequence-to-sequence model on a parallel corpus

of instruction-output pairs. We obtain the outputs

from publicly available datasets of creative text and

synthesize the corresponding instructions by rules,

including both lexical and rhyming constraints as

well as requests on rhetorical devices.

To understand how well the model follows in-

structions, we test it on instructions with varying

levels of difﬁculty, from those seen during training

to unseen compositional instructions that contain

multiple constraints. Both automatic and human

evaluation show that our ﬁnetuned model satisﬁes

the constraints 86% of the time, 10% better than a

much larger 175B version of InstructGPT (Brown

et al.,2020). On unseen compositional instructions,

our best model satisﬁes them 77.6% of the time,

outperforming InstructGPT by a margin of 28%.

To understand its collaboration capabilities, we

run a user study on Amazon Mechanical Turk

Subject Write a poetic sentence about ‘sun’

O crimson sun, your warming draft’s pulsa-

tion.

End Write a poetic sentence ending in ‘glory’

Am I exalted here unto that glory.

Rhyme

Write a poetic sentence that ends in a word

which rhymes with ‘replace’

Diminishing for me, with delicate grace.

Sentence

Write a next sentence in a poem given the

previous sentence ‘‘The only thing I know’

‘for sure’

Is that love is hard and can be obscure.

Metaphor Write a metaphor about ‘brain’

My brain is a tangled mess of circuits.

Simile

Write a simile about ‘making someone feel

desired’

I want to make you feel like a ﬂower near

a hummingbird

Onoma-

topoeia

Write a poetic sentence about ‘bottles’

showcasing onomatopoeia

The stamping of feet and the ring of bottles.

Subject

+ End

Write a poetic sentence about ‘tears’ and

ending in ‘wives’

Awash in the tears of soldier’s wives.

Table 1: Natural language instructions for poem writ-

ing paired with example outputs. Each instruction con-

sists of a template and an argument.

(AMT) where CoPoet assists expert crowd workers

(recruited through a qualiﬁcation test) in writing

poems (Section 4). We observe that the recruited

users are able to write coherent and creative poems

on diverse topics ranging from Glass Ceiling to Cli-

mate Change. About 70% of model suggested text

is retained in the ﬁnal poem and users give CoPoet

a rating of 4.3 out of 5 on both the suggestion qual-

ity and the overall helpfulness. Further, a separate

group of annotators on AMT prefers the collabora-

tively written poems more often than those written

without CoPoet assistance. In particular, we ﬁnd

model assistance improves rhyming and vocabulary

diversity of the poems.

2 Data

To train a model to follow instructions, we need

<instruction, poem_line>

pairs where the text

satisﬁes the instruction. The key challenge to build-

ing such a model is the lack of parallel data, so

we collect our own dataset of creative writing in-

structions from publicly available poem corpora or

relevant subreddits from Reddit (Table 7).

Based on some initial feedback from profes-

sional poets, we decided to include 3 major

types of instructions: 1) Continuation based in-

structions that suggest content when writers are

blocked/clueless on how to proceed; 2) Instructions

on Lexical Constraints to enable greater control

of poetic form such as rhyme, sound, and meter.

These are instructions that force language models

to obey speciﬁc choices such as generating a line

that contains a speciﬁc topic,start word,end word

or a sentence with a particular rhyme; 3) Instruc-

tions on Rhetorical devices that are mostly used for

introducing embellishments and imagery in a poem

such as metaphor, similes, and onomatopoeia.

Table 1shows the primary instructions used to

train our models. These instructions are crafted by

the authors of the paper, who convert every poem

line to an

<instruction, poem_line>

pair using

rules.

Each instruction consists of a template (unique

to the instruction type) and one or more arguments,

as can be seen in Table 1. Given a poem line in

the corpus, we reverse-engineer the instruction by

picking a template and extracting the arguments

from the poem line. For continuation instructions,

we use the previous context as the argument. For

instructions on lexical constraints, we extract noun

phrases and start/end words as arguments using

NLTK for tokenization. To construct instructions

on rhymes, we use the CMU dictionary to ﬁnd

rhyming words.

We describe more details in Ap-

pendix Aon how we create instructions for each

particular type.

To allow models to adapt to linguistic variations

of the instruction templates, we also include para-

phrases of the instruction templates, e.g., instead

of “Write" we also use“Generate”, or instead of

“Write a sentence about” we use “Write a sentence

that contains the word” or “Write a sentence that

includes the word”. In total, our dataset consists of

873,574

<instruction, poem_line>

pairs which

we randomly split into 808,180 train and 65,394

held-out validation examples.

We evaluate perfor-

mance on three test sets of hand-crafted instructions

of varying difﬁculty (Section 3.2).

2https://pypi.org/project/pronouncing/

Our dataset is publicly available at

https://github.

com/vishakhpk/creative-instructions.

3 How Well Do LLMs Follow

Instructions?

In this section, we ﬁrst describe our models and

baselines, followed by the evaluation results using

both automatic metrics (Section 3.3) and human

evaluation (Section 3.4).

3.1 Experiment Setup

Model Details

We ﬁnetune the pretrained T5

(Raffel et al.,2020) and T0 (Sanh et al.,2021)

models from HuggingFace (Wolf et al.,2019) on

the collected data (Section 2) to produce the out-

put given the instruction using cross-entropy loss.

We report results on ﬁnetuned T5-3B, T5-11B and

T0-3B models, which are henceforth referred to as

T5-3B-poem, T5-11B-poem, and T0-3B-poem. We

select the hyperparameters by the validation loss:

for T5-11B-poem, we use the Adam optimizer with

a learning rate of

1e−4

; for T5-3B-poem and T0-

3B-poem, we use the Adafactor optimizer with a

learning rate of

1e−3

. Each model is trained for 3

epochs with early stopping based on validation loss.

We ﬁnetune all models on an A100 GPU and use

Deepspeed (Rasley et al.,2020) integration for the

11B model. During ﬁnetuning, we restrict the max-

imum sequence length of both the source and the

target to 64 tokens (via truncation).

At inference

time, we generate output sequences using top-k

sampling with

k= 5

and a temperature of

0.7

per

recommendations from earlier work in open-ended

creative text generation (Fan et al.,2018;Holtzman

et al.,2020;Padmakumar and He,2022).

Baselines

We compare our ﬁnetuned models

with two other models: (i) the T0pp model (Sanh

et al.,2021), trained on instruction-based prompts

from 49 datasets;

and (ii) the 175B davinci variant

of InstructGPT (Ouyang et al.,2022) that is trained

on human-written instructions on diverse tasks in a

human-in-the-loop fashion. Given an instruction,

we generate text directly (i.e. zero-shot) from T0pp

using top-k sampling (Fan et al.,2018).

For InstructGPT, we evaluate on both zero-

shot and few-shot settings. For zero-shot, the

prompt consists of only the instruction. For few-

shot, the prompt consists of 26

<instruction,

The length limit is chosen to avoid memory explosion. It

has minimal impact on model performance since most verses

are shorter.

These include question-answering, summarization,

structure-to-text generation, sentiment and topic classiﬁcation

tasks but no explicit creative writing tasks.

poem_line>

pairs from our training data (selected

to cover all the instruction templates), followed by

the test instruction.

We use the OpenAI API with

a temperature of

0.7

, no frequency penalty, and

a maximum sequence length of 64 to match our

setting.

3.2 Test Sets

While our training instructions cover many tem-

plates and topics, user instructions may deviate

from the training distribution during interaction.

To evaluate the generalization capabilities of the

models, we identify three settings with increasing

difﬁculty based on whether the instruction tem-

plates or arguments are seen during training.

Known Instruction Templates with Known Ar-

guments (KIKA)

The simplest setting requires

the model to generalize to novel combinations of

the templates and arguments. Speciﬁcally, we cre-

ate instructions where both the templates and the

arguments are seen in the training set, although

each speciﬁc combination is unseen (i.e. the train-

ing and test sets have no overlapping instructions).

Known Instruction Templates with Unknown

Arguments (KIUA)

To handle novel concepts

from users, the model must generalize to unseen

arguments, which may include new entities or

phrases. For example, it might be easier for a model

to write a poetic sentence about a known argument

such as beauty, but difﬁcult to write about an un-

known argument beauty without virtue. For this

set, we include instructions where the instruction

templates are seen during training but the corre-

sponding arguments are unseen.

Unknown Compositional Instruction Tem-

plates

One of the main beneﬁts of natural

language instructions is that they can be easily

composed in new ways to cover various user

intentions. This is particularly useful in creative

writing because it enables users to request text

from the model with multiple constraints. There-

fore, we also test whether the model understands

compositional instructions using two templates,

as seen in Table 2. Our model is exposed to a

single compositional template during training:

Subject+End. For this test set, we create a variety

of unseen compositions.

In total, we create 242 test examples (82 KIKA,

82 KIUA, 78 compositional) by selecting instruc-

6The exact prompt can be found in our code repository.

Start

+End

Write a poetic sentence that starts with the

word ‘Maybe’ and ending in ‘void’

Maybe one day, you will ﬁnd me in the void

Subject

+Rhyme

Write a poetic sentence that contains the

word ‘breaks’ and ending in a word which

rhymes with ‘bound’

She cracks and breaks and hits the ground.

Sentence

+End

Write a next sentence in a poetry given

the previous sentence ‘Every once a while

I lower the blinds’ and ending in ‘play’

Waiting for someone to call me out to play

Metaphor

+End

Write a metaphor that includes the word

‘ﬁlm’ and ending in ‘thought’

A ﬁlm is a petriﬁed fountain of thought.

Table 2: Examples of compositional natural language

instructions for creative tasks paired with their respec-

tive outputs from our test sets.

tions according to the above criteria, followed by

manual veriﬁcation.

3.3 Automatic Evaluation

We evaluate how well the models satisfy constraints

speciﬁed in the instructions on each of the test sets

(Section 3.2). We report the success rate of satis-

fying the instructions where the success condition

for each instruction type is listed in Table 3.7

Instruction

Type

Success Condition

Rhyme Last word of the model generation

rhymes with the desired subject using the

CMU Pronouncing Dictionary

Haiku Model generation contains 15–19

syllables and contains the desired subject

Simile /

Metaphor

Model generation contains the desired

subject as well as a comparator

Start / End

First/last word of the model generation

matches the desired subject

Subject Model generation contains the desired

subject in the instruction

Table 3: Success conditions for different instruction

templates.

Finetuned Models Have Strong In-Domain Per-

formance but Drop on Out-of-Domain Data

Figure 2shows the average success rate and stan-

dard deviations of each model on the three test

Prior work on instruction tuning reports metrics such

as BLEU score for generation tasks (Sanh et al.,2021;Wei

et al.,2021) and these are unsuitable for our poetry writing

instructions, thus we deﬁne custom success conditions.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Helpmewriteapoem:InstructionTuningasaVehicleforCollaborativePoetryWritingTuhinChakrabarty1VishakhPadmakumar2HeHe2;31DepartmentofComputerScience,ColumbiaUniversity2CenterforDataScience,NewYorkUniversity3DepartmentofComputerScience,NewYorkUniversitytuhin.chakr@cs.columbia.edu,vishakh@nyu.edu,hhe@nyu...

展开>> 收起<<

Help me write a poem Instruction Tuning as a Vehicle for Collaborative Poetry Writing Tuhin Chakrabarty1Vishakh Padmakumar2He He23.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Help me write a poem Instruction Tuning as a Vehicle for Collaborative Poetry Writing Tuhin Chakrabarty1Vishakh Padmakumar2He He23

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: