homophonic pun data to learn the structure of a
pun – the type of each word in the sentence, which
could be one of ‘A’ – ambiguous, ‘D1’ – distinct to
the pun word, or ‘D2’ – distinct to the alternative
word. One challenge, however, is that there are
no ground truth labels. To this end, we collect a
small amount of human annotations and boost from
weak, unsupervised models to stronger, supervised
models. At inference time, a label predictor is used
to guide a base GPT-2 model to generate puns. At
each generation step, we re-score the tokens gener-
ated by the base language model according to the
predicted type, except for the case when the label
predictor’s confidence is under a set threshold. Our
model outperforms existing baselines for both pun
types.
2 Related Works
Linguistic traits of puns.
Kao et al. (2016) de-
compose puns into two dimensions — ambiguity
of meaning and distinctiveness of viewpoints, and
show that that ambiguity is useful in distinguishing
non-puns from puns, while distinctiveness is useful
when spotting good and funny puns from bad or
boring non-puns. To the best of our knowledge,
we are the first to formally incorporate the famous
ambiguity-distinctiveness principle to guide pun
generation. In addition, He et al. (2019) propose
the local-global surprisal principle to measure the
humorous effect aroused when a word appears un-
expectedly in the local context but makes sense
given the global context, based on which we im-
prove the way surprise is introduced in generation.
Pun generation.
Existing works on pun gener-
ation often rely on naive intuitions of semantic
ambivalence. For example, Yu et al. (2018) and
Luo et al. (2019) promote the ambivalence of the
pun word via a constrained language model and re-
inforcement learning; others find related words to
support semantic ambiguity (Yu et al.,2020;Mittal
et al.,2022). However, these systems lack seri-
ous theoretical backbones and therefore none could
evaluate their generated results with regards to the
proposed intuitions. What’s more, the nature of
‘ambivalence’ alone leads to generic/boring word
choice and short outputs. By incorporating distinc-
tiveness and surprise, we ensure that the generated
puns are informative and interesting.
One reason that previous works leverage those
simple intuitions to generate puns (He et al.,2019;
Yu and Wan,2019;Yu et al.,2020) is that the small
corpora size (Miller et al.,2017;Sun et al.,2022a)
makes it impractical to train generation models
end-to-end using human written puns. We hence
propose to learn the structure of puns instead of
the actual texts, which requires far less data to train
on. Finally, all previous works (except a concurrent
one (Sun et al.,2022b)) can only generate either
homographic puns or homophonic puns. Leverag-
ing the shared structure of puns regardless of the
pun type, our model can generate both pun types.
3 Methodology
The input to our system is a pun word-alternative
word pair (
pw
-
aw
, e.g., soled-sold), and the target
output is a high-quality pun sentence that contains
pw
, e.g., ‘The leather boots he was wearing were
heavily abraded, and were soled at the store at
half price.’ In this section, we first describe the
three components to generate homophonic puns
as shown in Figure 1: a context word and phrase
selector, a label predictor and the procedure of cu-
rating training signals, and the generation module
in Section 3.1 to 3.3. Then, we migrate the whole
system to homographic puns in Section 3.4.
3.1 Obtaining Context Words and Phrases
We retrieve and select two things: a context word
that supports the meaning of the pun word, and a
phrase that is both characteristic to the alternative
word and compatible with the pun word.
Inspired by He et al. (2019), given a pun-
alternative word pair (
pw −aw
), we look for an
ideal phrase that contains
aw
and replace it with
pw
to arouse surprise. To this end, we first extract
multiple (
N1
=20) phrases that contain
aw
from
a large non-pun corpus consisting of 20,000 sen-
tences from Wikipedia and Gutenberg BookCorpus
(Lebert,2009), and rank the phrases by how well
they exhibit the semantics of the pun pair. Specifi-
cally, we first replace
aw
with a ‘<mask>’ token,
and run RoBERTa-Large (Liu et al.,2019) to obtain
the probability of
aw
in the masked position. We
remove the less probable half, filtering out those
that are less characteristic of
aw
, as shown in Table
1. Next, we conduct a similar mask infilling proce-
dure for
pw
, and select the middle-ranking phrase
to avoid it being either too general (e.g., ‘a new taxi
was created’) or too incompatible (e.g., ‘an export
taxi on agricultural products’). These two rankings
ensure the final selected phrase arouses surprisal
when people see
pw
instead of
aw
, but also still
find it reasonable.
For obtaining the context words, our idea is sim-
ilar to that proposed by (Mittal et al.,2022). We