Context-Situated Pun Generation Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2y Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng23

2025-05-06 0 0 1.66MB 14 页 10玖币

侵权投诉

Context-Situated Pun Generation

Jiao Sun1∗Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2†

Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng2,3

1University of Southern California

2Amazon Alexa AI

3University of California, Los Angeles

jiaosun@usc.edu

{naraanja,orabys,shuyag,tagyoung,jhuangz,yangliud}@amazon.com

violetpeng@cs.ucla.edu

Abstract

Previous work on pun generation commonly

begins with a given pun word (a pair of ho-

mophones for heterographic pun generation

and a polyseme for homographic pun genera-

tion) and seeks to generate an appropriate pun.

While this may enable efﬁcient pun generation,

we believe that a pun is most entertaining if it

ﬁts appropriately within a given context, e.g.,

a given situation or dialogue. In this work, we

propose a new task, context-situated pun gen-

eration, where a speciﬁc context represented

by a set of keywords is provided, and the task

is to ﬁrst identify suitable pun words that are

appropriate for the context, then generate puns

based on the context keywords and the identi-

ﬁed pun words. We collect VCUP (Context-

sitUated Pun), containing 4.5k tuples of con-

text words and pun pairs. Based on the new

data and setup, we propose a pipeline system

for context-situated pun generation, including

a pun word retrieval module that identiﬁes suit-

able pun words for a given context, and a gen-

eration module that generates puns from con-

text keywords and pun words. Human evalua-

tion shows that 69% of our top retrieved pun

words can be used to generate context-situated

puns, and our generation module yields suc-

cessful puns 31% of the time given a plausi-

ble tuple of context words and pun pair, almost

tripling the yield of a state-of-the-art pun gen-

eration model. With an end-to-end evaluation,

our pipeline system with the top-1 retrieved

pun pair for a given context can generate suc-

cessful puns 40% of the time, better than all

other modeling variations but 32% lower than

the human success rate. This highlights the

difﬁculty of the task, and encourages more re-

search in this direction.

1 Introduction

Pun generation is a challenging creative genera-

tion task that has attracted some recent attention in

∗Work done during Jiao’s internship at Amazon.

†Work done while Shuyang was at Amazon.

Figure 1: Context-situated pun generation aims to ﬁnd

relevant pun words to generate puns within a given con-

text. We propose a uniﬁed framework to generate both

homographic and heterographic puns; examples shown

here are human-written puns from our corpus.

the research community (He et al.,2019;Yu et al.,

2018,2020;Mittal et al.,2022;Horri,2011). As

one of the most important ways to communicate

humor (Abbas and Dhiaa,2016), puns can help

relieve anxiety, avoid painful feelings and facilitate

learning (Buxman,2008). At the same time, spon-

taneity is the twin concept of creativity (Moreno,

1955), which means the context matters greatly for

making an appropriate and funny pun.

Existing work on pun generation mainly focuses

on generating puns given a pair of pun-alternative

words or senses (we call it a pun pair). Specif-

ically, in heterographic pun generation, systems

generate puns using a pair of homophones involv-

ing a pun word and an alternative word (He et al.,

2019;Yu et al.,2020;Mittal et al.,2022). Alter-

natively, in homographic pun generation, systems

generate puns that must support both given senses

of a single polysemous word (Yu et al.,2018;Luo

et al.,2019;Tian et al.,2022). Despite the great

progress that has been made under such experimen-

tal settings, real-world applications for pun gener-

arXiv:2210.13522v1 [cs.CL] 24 Oct 2022

Type Pun pw/awContext C SpwSaw

het.

Two construction workers

had a staring contest.

stair/

stare

construction

workers

support consisting of a place

to rest the foot while ascending

or descending a stairway

look at with ﬁxed eyes

“I’ve stuck a pin through my

nose”, said Tom punctually.

punctually/

puncture pin, nose at the expected or proper time a small hole made

by a sharp object

hom.

A new type of broom came

out, it is sweeping the country.

sweep/

sweep

broom,

nation

sweep with a broom or as if

with a broom

win an overwhelming

victory in or on

If you sight a whale, it could

be a ﬂuke.

ﬂuke/

ﬂuke whale a stroke of luck either of the two lobes

of the tail of a cetacean

Table 1: Two examples each of heterographic puns and homographic puns in the SemEval 2017 Task 7 dataset. We

construct context Cby extracting keywords from the pun and excluding the pun word pw. Word sense information

Spwand Saware retrieved from WordNet from SemEval annotated senses.

ation (e.g., in dialogue systems or creative slogan

generation) rarely have these pun pairs provided.

Instead, puns need to be generated given a more

naturally-occurring conversational or creative con-

text, requiring the identiﬁcation of a pun pair that is

relevant and appropriate for that context. For exam-

ple, given a conversation turn “How was the magic

show?”, a context-situated pun response might be,

“The magician got so mad he pulled his hare out.”

Motivated by real-world applications and the

theory that the funniness of a pun heavily relies

on the context, we formally deﬁne and introduce

a new setting for pun generation, which we call

context-situated pun generation: given a context

represented by a set of keywords, the task is to

generate puns that ﬁt the given context (Figure 1).

Our contributions are as follows:

•

We introduce a new setting of context situated

pun generation.

•

To facilitate research in this direction, we

collect a large-scale corpus called

CUP

(

ontext-sit

ated

un), which contains 4,551

tuples of context keywords and an associated

pun pair, each labelled with whether they are

compatible for composing a pun. If a tuple is

compatible, we additionally collect a human-

written pun that incorporates both the context

keywords and the pun word.1

•

We build a pipeline system with a retrieval

module to predict proper pun words given the

current context, and a generation module to

incorporate both the context keywords and the

pun word to generate puns. Our system serves

1Resources will be available at:

https://github.com/amazon-research/

context-situated-pun-generation

as a strong baseline for context situated pun

generation.

2 Task Formulation

Preliminaries.

Ambiguity is the key to pun gen-

eration (Ritchie,2005). First, we deﬁne the term

pun pair in our work. For heterographic pun gen-

eration, there exists a pair of homophones, which

we call pun word (

) and alternative word (

While only

appears in the pun, both the mean-

ing of

and

are supported in the pun sentence.

Therefore, the input of heterographic pun genera-

tion can be written as (

Spw

Saw

), where

Spw

and

Saw

are the senses of the pun word and

alternative word, respectively. We refer to these

pun pairs

, and use the shorthand

(pw, aw)

for

simplicity. For homographic pun generation, the

pun word is a polyseme that has two meanings;

here, we can use the same representation, where

pw=awfor homographic puns.

Formulation.

Given the uniﬁed representation

for heterographic and homographic puns, we de-

ﬁne the task of context-situated pun generation as:

Given a context

, which can be a sentence or

a list of keywords, ﬁnd a pun pair (

Spw

Saw

) that is suitable to generate a pun, then gen-

erate a pun using the chosen pun pair situated in

the given context. In this work, we assume we are

given a ﬁxed set of pun pair candidates

(Pw, Aw)

from which (

) are retrieved. The uniﬁed

format between heterographic and heterographic

puns makes it possible for us to propose a uniﬁed

framework for pun generation.

pw/awLContext-Situated Pun for hunts, deer

hedges/

edges 1Why is the hunter so good at hunting deer?

Because he hunts life on the hedges

husky/husk 0 -

catch/

catch 1He hunts deer but the catch is that they

rarely show up.

pine/

pine 1Hunting deer in the forest always makes

him pine for the loss.

boar/

bore 1He is so mundane about hunting deer,

but it is hardly a boar.

jerky/

jerky 1What do you call an erratic deer that is

being hunted? Jerky

Table 2: Example annotations from the VCUP dataset.

Labels Lindicate whether the annotator was able to

write a pun given the context and pun pair.

3VCUP Dataset

Motivation.

The largest and most commonly-

used dataset in the pun generation community is the

SemEval 2017 Task 7 dataset (Miller et al.,2017).

Under our setting of context-situated pun genera-

tion, we can utilize keywords from the puns them-

selves as context. However, the majority of pun

pairs only occur once in the the SemEval dataset,

while one given context could have been compati-

ble with many other pun pairs. For example, given

the context

beauty school, class

, the original pun in

the SemEval dataset uses the homographic pun pair

(makeup, makeup) and says: “If you miss a

class

beauty school

you’ll need a makeup session.” At

the same time, a creative human can use the hetero-

graphic pun pair (dyed, die) to instead generate “I

inhaled so much ash from the eye shadow palette at

the

beauty school class

– I might have dyed a little

inside.” Because of the limitation of the SemEval

dataset, we need a dataset that has a diverse set of

pun pairs combined with given contexts. Further-

more, the dataset should be annotated to indicate

whether the context words and pun pair combina-

tion is suitable to make context-situated puns.

Data Preparation.

We sample puns that contain

both sense annotations and pun word annotations

from SemEval Task 7. We show two examples

of heterographic puns and homographic puns and

their annotations from the SemEval dataset in Ta-

ble 1. From this set, we sample from the 500 most

frequent (

) pairs and randomly sample 100

2https://alt.qcri.org/semeval2017/

task7/

. The data is released under CC BY-NC 4.0 license

(

https://creativecommons.org/licenses/

by-nc/4.0/legalcode).

unique context words

Combining the sampled

pun pairs and context words, we construct 4,552

(C,pw,aw) instances for annotation.

Annotation.

For our annotation task, we asked

annotators to indicate whether they can come up

with a pun, using pun pair (

), that is situated

in a given context

and supports both senses

Spw

and

Saw

. If an annotator indicated that they could

create such a pun, we then asked the annotator to

write down the pun they came up with. Meanwhile,

we asked annotators how difﬁcult it is for them to

come up with the pun from a scale of 1 to 5, where 1

means very easy and 5 means very hard.

To aid in

writing puns, we also provided four T5-generated

puns as references. 5

We deployed our annotation task on Amazon Me-

chanical Turk using a pool of 250 annotators with

whom we have collaborated in the past, and have

been previously identiﬁed as good annotators. Each

HIT contained three (

) tuples and we paid

one US dollar per HIT.

To ensure dataset quality,

we manually checked the annotations and accepted

HITs from annotators who tended not to skip all

the annotations (i.e., did not mark everything as

“cannot come up with a pun”). After iterative com-

munication and manual examination, we narrowed

down and selected three annotators that we marked

as highly creative to work on the annotation. To

check inter-annotator agreement, we collected mul-

tiple annotations for 150 instances and measured

agreement using Fleiss’ kappa (Fleiss and Cohen,

1973) (

κ= 0.43

), suggesting moderate agreement.

Statistics.

After annotation, we ended up with

2,753 (

) tuples that are annotated as

compatible and 1,798 as incompatible. For the

2,753 compatible tuples, we additionally collected

human-written puns from annotators. The number

of puns we collected exceeds the number of an-

notated puns in SemEval 2017 Task 7 which have

annotated pun word and alternative word sense an-

notations (2,396 puns). The binary compatibility

labels and human-written puns comprise our re-

sulting dataset,

CUP (

ontext Sit

ated

uns).

Table 2shows examples of annotations in CUP.

We sample a limited number of context words to keep the

scale of data annotation feasible.

4Full annotation guidelines in Appendix D.

Annotators ﬁnd it extremely hard to come up with puns

from scratch. Generated texts greatly ease the pain.

6This translates to be well over $15/hr.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Context-SituatedPunGenerationJiaoSun1AnjaliNarayan-Chen2ShereenOraby2ShuyangGao2yTagyoungChung2JingHuang2YangLiu2NanyunPeng2;31UniversityofSouthernCalifornia2AmazonAlexaAI3UniversityofCalifornia,LosAngelesjiaosun@usc.edu{naraanja,orabys,shuyag,tagyoung,jhuangz,yangliud}@amazon.comvioletpeng@cs.ucla...

展开>> 收起<<

Context-Situated Pun Generation Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2y Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng23.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Context-Situated Pun Generation Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2y Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng23

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: