Context-Situated Pun Generation Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2y Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng23

2025-05-06 0 0 1.66MB 14 页 10玖币
侵权投诉
Context-Situated Pun Generation
Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2
Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng2,3
1University of Southern California
2Amazon Alexa AI
3University of California, Los Angeles
jiaosun@usc.edu
{naraanja,orabys,shuyag,tagyoung,jhuangz,yangliud}@amazon.com
violetpeng@cs.ucla.edu
Abstract
Previous work on pun generation commonly
begins with a given pun word (a pair of ho-
mophones for heterographic pun generation
and a polyseme for homographic pun genera-
tion) and seeks to generate an appropriate pun.
While this may enable efficient pun generation,
we believe that a pun is most entertaining if it
fits appropriately within a given context, e.g.,
a given situation or dialogue. In this work, we
propose a new task, context-situated pun gen-
eration, where a specific context represented
by a set of keywords is provided, and the task
is to first identify suitable pun words that are
appropriate for the context, then generate puns
based on the context keywords and the identi-
fied pun words. We collect VCUP (Context-
sitUated Pun), containing 4.5k tuples of con-
text words and pun pairs. Based on the new
data and setup, we propose a pipeline system
for context-situated pun generation, including
a pun word retrieval module that identifies suit-
able pun words for a given context, and a gen-
eration module that generates puns from con-
text keywords and pun words. Human evalua-
tion shows that 69% of our top retrieved pun
words can be used to generate context-situated
puns, and our generation module yields suc-
cessful puns 31% of the time given a plausi-
ble tuple of context words and pun pair, almost
tripling the yield of a state-of-the-art pun gen-
eration model. With an end-to-end evaluation,
our pipeline system with the top-1 retrieved
pun pair for a given context can generate suc-
cessful puns 40% of the time, better than all
other modeling variations but 32% lower than
the human success rate. This highlights the
difficulty of the task, and encourages more re-
search in this direction.
1 Introduction
Pun generation is a challenging creative genera-
tion task that has attracted some recent attention in
Work done during Jiao’s internship at Amazon.
Work done while Shuyang was at Amazon.
Figure 1: Context-situated pun generation aims to find
relevant pun words to generate puns within a given con-
text. We propose a unified framework to generate both
homographic and heterographic puns; examples shown
here are human-written puns from our corpus.
the research community (He et al.,2019;Yu et al.,
2018,2020;Mittal et al.,2022;Horri,2011). As
one of the most important ways to communicate
humor (Abbas and Dhiaa,2016), puns can help
relieve anxiety, avoid painful feelings and facilitate
learning (Buxman,2008). At the same time, spon-
taneity is the twin concept of creativity (Moreno,
1955), which means the context matters greatly for
making an appropriate and funny pun.
Existing work on pun generation mainly focuses
on generating puns given a pair of pun-alternative
words or senses (we call it a pun pair). Specif-
ically, in heterographic pun generation, systems
generate puns using a pair of homophones involv-
ing a pun word and an alternative word (He et al.,
2019;Yu et al.,2020;Mittal et al.,2022). Alter-
natively, in homographic pun generation, systems
generate puns that must support both given senses
of a single polysemous word (Yu et al.,2018;Luo
et al.,2019;Tian et al.,2022). Despite the great
progress that has been made under such experimen-
tal settings, real-world applications for pun gener-
arXiv:2210.13522v1 [cs.CL] 24 Oct 2022
Type Pun pw/awContext C SpwSaw
het.
Two construction workers
had a staring contest.
stair/
stare
construction
workers
support consisting of a place
to rest the foot while ascending
or descending a stairway
look at with fixed eyes
“I’ve stuck a pin through my
nose”, said Tom punctually.
punctually/
puncture pin, nose at the expected or proper time a small hole made
by a sharp object
hom.
A new type of broom came
out, it is sweeping the country.
sweep/
sweep
broom,
nation
sweep with a broom or as if
with a broom
win an overwhelming
victory in or on
If you sight a whale, it could
be a fluke.
fluke/
fluke whale a stroke of luck either of the two lobes
of the tail of a cetacean
Table 1: Two examples each of heterographic puns and homographic puns in the SemEval 2017 Task 7 dataset. We
construct context Cby extracting keywords from the pun and excluding the pun word pw. Word sense information
Spwand Saware retrieved from WordNet from SemEval annotated senses.
ation (e.g., in dialogue systems or creative slogan
generation) rarely have these pun pairs provided.
Instead, puns need to be generated given a more
naturally-occurring conversational or creative con-
text, requiring the identification of a pun pair that is
relevant and appropriate for that context. For exam-
ple, given a conversation turn “How was the magic
show?”, a context-situated pun response might be,
“The magician got so mad he pulled his hare out.
Motivated by real-world applications and the
theory that the funniness of a pun heavily relies
on the context, we formally define and introduce
a new setting for pun generation, which we call
context-situated pun generation: given a context
represented by a set of keywords, the task is to
generate puns that fit the given context (Figure 1).
Our contributions are as follows:
We introduce a new setting of context situated
pun generation.
To facilitate research in this direction, we
collect a large-scale corpus called
V
CUP
(
C
ontext-sit
U
ated
P
un), which contains 4,551
tuples of context keywords and an associated
pun pair, each labelled with whether they are
compatible for composing a pun. If a tuple is
compatible, we additionally collect a human-
written pun that incorporates both the context
keywords and the pun word.1
We build a pipeline system with a retrieval
module to predict proper pun words given the
current context, and a generation module to
incorporate both the context keywords and the
pun word to generate puns. Our system serves
1Resources will be available at:
https://github.com/amazon-research/
context-situated-pun-generation
as a strong baseline for context situated pun
generation.
2 Task Formulation
Preliminaries.
Ambiguity is the key to pun gen-
eration (Ritchie,2005). First, we define the term
pun pair in our work. For heterographic pun gen-
eration, there exists a pair of homophones, which
we call pun word (
pw
) and alternative word (
aw
).
While only
pw
appears in the pun, both the mean-
ing of
pw
and
aw
are supported in the pun sentence.
Therefore, the input of heterographic pun genera-
tion can be written as (
pw
,
Spw
,
aw
,
Saw
), where
Spw
and
Saw
are the senses of the pun word and
alternative word, respectively. We refer to these
as
pun pairs
, and use the shorthand
(pw, aw)
for
simplicity. For homographic pun generation, the
pun word is a polyseme that has two meanings;
here, we can use the same representation, where
pw=awfor homographic puns.
Formulation.
Given the unified representation
for heterographic and homographic puns, we de-
fine the task of context-situated pun generation as:
Given a context
C
, which can be a sentence or
a list of keywords, find a pun pair (
pw
,
Spw
,
aw
,
Saw
) that is suitable to generate a pun, then gen-
erate a pun using the chosen pun pair situated in
the given context. In this work, we assume we are
given a fixed set of pun pair candidates
(Pw, Aw)
from which (
pw
,
aw
) are retrieved. The unified
format between heterographic and heterographic
puns makes it possible for us to propose a unified
framework for pun generation.
pw/awLContext-Situated Pun for hunts, deer
hedges/
edges 1Why is the hunter so good at hunting deer?
Because he hunts life on the hedges
husky/husk 0 -
catch/
catch 1He hunts deer but the catch is that they
rarely show up.
pine/
pine 1Hunting deer in the forest always makes
him pine for the loss.
boar/
bore 1He is so mundane about hunting deer,
but it is hardly a boar.
jerky/
jerky 1What do you call an erratic deer that is
being hunted? Jerky
Table 2: Example annotations from the VCUP dataset.
Labels Lindicate whether the annotator was able to
write a pun given the context and pun pair.
3VCUP Dataset
Motivation.
The largest and most commonly-
used dataset in the pun generation community is the
SemEval 2017 Task 7 dataset (Miller et al.,2017).
2
Under our setting of context-situated pun genera-
tion, we can utilize keywords from the puns them-
selves as context. However, the majority of pun
pairs only occur once in the the SemEval dataset,
while one given context could have been compati-
ble with many other pun pairs. For example, given
the context
beauty school, class
, the original pun in
the SemEval dataset uses the homographic pun pair
(makeup, makeup) and says: “If you miss a
class
at
beauty school
you’ll need a makeup session.At
the same time, a creative human can use the hetero-
graphic pun pair (dyed, die) to instead generate “I
inhaled so much ash from the eye shadow palette at
the
beauty school class
– I might have dyed a little
inside.Because of the limitation of the SemEval
dataset, we need a dataset that has a diverse set of
pun pairs combined with given contexts. Further-
more, the dataset should be annotated to indicate
whether the context words and pun pair combina-
tion is suitable to make context-situated puns.
Data Preparation.
We sample puns that contain
both sense annotations and pun word annotations
from SemEval Task 7. We show two examples
of heterographic puns and homographic puns and
their annotations from the SemEval dataset in Ta-
ble 1. From this set, we sample from the 500 most
frequent (
pw
,
aw
) pairs and randomly sample 100
2https://alt.qcri.org/semeval2017/
task7/
. The data is released under CC BY-NC 4.0 license
(
https://creativecommons.org/licenses/
by-nc/4.0/legalcode).
unique context words
C
.
3
Combining the sampled
pun pairs and context words, we construct 4,552
(C,pw,aw) instances for annotation.
Annotation.
For our annotation task, we asked
annotators to indicate whether they can come up
with a pun, using pun pair (
pw
,
aw
), that is situated
in a given context
C
and supports both senses
Spw
and
Saw
. If an annotator indicated that they could
create such a pun, we then asked the annotator to
write down the pun they came up with. Meanwhile,
we asked annotators how difficult it is for them to
come up with the pun from a scale of 1 to 5, where 1
means very easy and 5 means very hard.
4
To aid in
writing puns, we also provided four T5-generated
puns as references. 5
We deployed our annotation task on Amazon Me-
chanical Turk using a pool of 250 annotators with
whom we have collaborated in the past, and have
been previously identified as good annotators. Each
HIT contained three (
C
,
pw
,
aw
) tuples and we paid
one US dollar per HIT.
6
To ensure dataset quality,
we manually checked the annotations and accepted
HITs from annotators who tended not to skip all
the annotations (i.e., did not mark everything as
“cannot come up with a pun”). After iterative com-
munication and manual examination, we narrowed
down and selected three annotators that we marked
as highly creative to work on the annotation. To
check inter-annotator agreement, we collected mul-
tiple annotations for 150 instances and measured
agreement using Fleiss’ kappa (Fleiss and Cohen,
1973) (
κ= 0.43
), suggesting moderate agreement.
Statistics.
After annotation, we ended up with
2,753 (
C
,
pw
,
aw
) tuples that are annotated as
compatible and 1,798 as incompatible. For the
2,753 compatible tuples, we additionally collected
human-written puns from annotators. The number
of puns we collected exceeds the number of an-
notated puns in SemEval 2017 Task 7 which have
annotated pun word and alternative word sense an-
notations (2,396 puns). The binary compatibility
labels and human-written puns comprise our re-
sulting dataset,
V
CUP (
C
ontext Sit
U
ated
P
uns).
Table 2shows examples of annotations in CUP.
3
We sample a limited number of context words to keep the
scale of data annotation feasible.
4Full annotation guidelines in Appendix D.
5
Annotators find it extremely hard to come up with puns
from scratch. Generated texts greatly ease the pain.
6This translates to be well over $15/hr.
摘要:

Context-SituatedPunGenerationJiaoSun1AnjaliNarayan-Chen2ShereenOraby2ShuyangGao2yTagyoungChung2JingHuang2YangLiu2NanyunPeng2;31UniversityofSouthernCalifornia2AmazonAlexaAI3UniversityofCalifornia,LosAngelesjiaosun@usc.edu{naraanja,orabys,shuyag,tagyoung,jhuangz,yangliud}@amazon.comvioletpeng@cs.ucla...

展开>> 收起<<
Context-Situated Pun Generation Jiao Sun1Anjali Narayan-Chen2Shereen Oraby2Shuyang Gao2y Tagyoung Chung2Jing Huang2Yang Liu2Nanyun Peng23.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.66MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注