
Passage: Peel and cut apples into wedges.Press apple wedges partly into batter.Combine sugar and
cinnamon.Sprinkle over apple.Bake at 425 degF for 25 to 30 minutes.
Dense Paraphrased (DP’ed) Passage:
Using peeler, peel apples, resulting in peeled apples; and using knife on cutting board, cut peeled
apples into peeled wedges.
Using hands, press peeled apple wedges partly into batter in the cake pan.
Combine sugar and cinnamon in a bowl, resulting in cinnamon sugar.
Sprinkle cinnamon sugar over peeled apple wedges in batter in cake pan, resulting in appelkoek.
In oven, bake appelkoek at 425 degF for 25 to 30 minutes, resulting in baked appelkoek.
Table 1: Example recipe passage. Color-coded text spans represent locations of cooking events in the
input text where Dense Paraphrases (DPs) are generated to enrich local context. Underlined text shows a
chain of coreferential entities for the ingredient “apple”.
seen as undifferentiated choices over surface con-
structional forms of an expression, the resulting the-
ory can be called a paraphrase grammar (Hi˙
z,1964;
Smaby,1971;Culicover,1968). Formally, a para-
phrase is a relation between two lexical, phrasal, or
sentential expressions, Eiand Ej, where meaning
is preserved (Smaby,1971).
For NLP uses, paraphrasing has been a major
part of machine translation and summarization
system performance (Culicover,1968;Goldman,
1977;Muraki,1982;Boyer and Lapalme,1985;
McKeown,1983;Barzilay and Elhadad,1999;Bha-
gat and Hovy,2013). In fact, statistical and neural
paraphrasing is a robust and richly evaluated com-
ponent of many benchmarked tasks, notably MT
and summarization (Weston et al.,2021), as well as
Question Answering (Fader et al.,2013) and seman-
tic parsing (Berant and Liang,2014). To this end,
significant efforts have gone towards the collection
and compilation of paraphrase datasets for training
and evaluation (Dolan and Brockett,2005;Ganitke-
vitch et al.,2013;Ganitkevitch and Callison-Burch,
2014;Pavlick et al.,2015;Williams et al.,2017).
In addition to above meaning-preserving para-
phrase strategies, there are several directions cur-
rently that use strategies of “decontextualization”
or “enrichment” of a textual sequence, whereby
missing, elliptical, or underspecified material is
re-inserted into the expression. The original and
target sentences are compared and judged by an
evaluation as a text generation or completion task
(Choi et al.,2021;Elazar et al.,2021).
Enrichment of VerbNet predicates can be seen
as an early attempt to provide a kind of Dense
Paraphrasing for the verb’s meaning. In Im and
Pustejovsky (2009,2010), the basic logic of Gen-
erative Lexicon’s subevent structure was applied
to VerbNet classes, to enrich the event repre-
sentation for inference. The VerbNet classes
were associated with event frames within an
Event Structure Lexicon (ESL) (Im and Puste-
jovsky,2010), encoding the subevent structure of
the predicate. If the textual form for the verb
is replaced with the subeventual description it-
self, classes such as
change_of_location
and
change_of_possession
can help encode and
describe event dynamics in the text, as shown in
(Brown et al.,2018;Dhole and Manning,2021;
Brown et al.,2022). For example, the VerbNet
entry drive is enriched with the ESL subevent struc-
ture below:
(4) drive in John drove to Boston
se1: pre-state: not_located_in (john,boston)
se2: process: driving (john)
se3: post-state: located_in (john,boston)
In the remainder of the paper, such techniques will
be utilized as part of our Dense Paraphrasing strat-
egy to enrich the surface text available for language
modeling algorithms.
3 Method: Dense Paraphrasing
In this section, we detail the procedure involved
in creating DPs for a text. Compared to decon-
textualization, DP can be seen as similar, but is a
much broader method for creating sets of seman-
tically equivalent or “enriched consistent" expres-
sions, that can be exploited for either human or
machine consumption.
Unlike traditional paraphrases that are evaluated
in terms of how faithful and complete they are,
while preserving the literal interpretation of the
source, the goal of our task is to generate dense