
with pre-defined verb-argument templates (Zhang
et al.,2020b,2022). Such structured events might
harm coherence as only head words are retained
after extraction. Consider the first sub-event in Fig-
ure 1. After parsing, we lost the indispensable mod-
ifier “dry” and the sub-event becomes (mix,ingre-
dients)
2
, which includes the wet ingredients (e.g.,
“milk”) in the second sub-event. Thus, the logical
relation between the two adjacent sub-events (i.e.,
coherence (Van Dijk,1980)) is defective.
On the other hand, narrative cloze tasks (Cham-
bers and Jurafsky,2008;Granroth-Wilding and
Clark,2016;Chambers,2017;Mostafazadeh et al.,
2016) evaluate whether a model can predict the
missing (usually the last) event in a narrative.
These tasks essentially evaluate the semantic simi-
larity and relatedness between the target event and
the context. However, they did not emphasize how
all events in the contexts are unified as a whole
process in an ordered and coherent way.
To evaluate complex process understanding, we
propose a new generation-based task to directly
generate sub-event sequences in the free-text form,
as shown in Figure 1. In the task, better genera-
tion of a process means better understanding of
the coherence among action verbs as well as their
operational objects. In fact, we find that generat-
ing free-text events is a non-trivial task, even with
existing strong pre-trained models like T5 (Raffel
et al.,2020) and BART (Lewis et al.,2020). First,
generating an overlong piece of text containing
several temporally ordered sub-events at once is
challenging to current pre-trained models (Zhou
et al.,2022;Lin et al.,2021;Brown et al.,2020).
Next, sub-events are generated without considering
the coherence of actions and their objects, which
might give rise to irrelevant or redundant results.
To solve the task, we propose SubeventWriter
to generate sub-events iteratively in the temporal
order. SubeventWriter only generates the next sub-
event in each generation iteration, given the process
and prior generated sub-events. It eases the gen-
eration difficulty by decomposing the sub-event
sequence. Moreover, sub-events should be coher-
ently organized to complete a process. To con-
sider coherence in each iteration, we can get a few
sub-event candidates from the beam search and se-
lect the most coherent one, as shown in Figure 1.
In SubeventWriter, we introduce a coherence con-
troller to score whether a candidate is coherent with
2The matched pre-defined template is (verb,object).
the process and prior generated sub-events. As a
result, SubeventWriter can construct more reliable
and meaningful sub-event sequences.
To evaluate our framework, we extract a large-
scale general-domain process dataset from Wiki-
How
3
, containing over 80k examples. We conduct
extensive experiments with multiple pre-trained
models, and automatic and human evaluations
show that SubeventWriter can produce more mean-
ingful sub-event sequences compared to existing
models by a large margin. Moreover, we con-
duct few-shot experiments to demonstrate that our
framework has a strong ability to handle few-shot
cases. Last but not least, we evaluate the gener-
alization ability of SubeventWriter on two out-of-
domain datasets: SMILE (Regneri et al.,2010) and
DeScript (Wanzare et al.,2016). The results mani-
fest our framework can generalize well.
2 Textual Sub-event Sequence
Generation
We formally define the sub-event sequence gen-
eration task as follows. Given a process
S
, we
ask the model to generate sub-event sequences
E
, which are steps to solve the process. This
task is essentially a conditional language modeling
problem. Specifically, given a process
S
consist-
ing of
n
tokens:
x1, x2, . . . , xn
and a sequence
E
consists of
m
sub-events
e1, e2, . . . , em
(each
sub-event refers to a sentence containing
ti
tokens:
yi,1, yi,2, . . . , yi,ti
), models aim to learn the condi-
tional probability distribution by maximizing the
following conditional probabilities in Eq. (1):
Pθ(E|S) =
m
Y
i=1
Pθ(ei|e<i, S)
Pθ(ei|e<i, S) =
ti
Y
j=1
Pθ(yi,j |yi,<j , e<i, S).
(1)
3 The SubeventWriter Framework
Figure 2illustrates the details of the proposed
SubeventWriter framework. For a given process,
the framework decomposes the generation into
multiple iterations. The sequence-to-sequence
(seq2seq) language model generates a few candi-
dates for the next sub-event in each iteration. We
then leverage a coherence controller to re-rank the
generated candidates by considering whether they
3wikihow.com