
Prior Work Language Template
DEGREE (Hsu et al.,2022)
somebody was moved to somewhere from some place by some way.somebody or some organization was
responsible for the movement. something was sent to somewhere from some place.somebody or some
organization was responsible for the transport.
BART-Gen (Li et al.,2021)<arg1> transported <arg2> in <arg3> vehicle from <arg4> place to <arg5> place
Text2Event (Lu et al.,2021) ( (Transport returned (Agent <arg>) (Artifact <arg>) (Destination <arg>) (Origin <arg>) (Vehicle <arg>) )
Table 2: Example of language templates for Event Argument Extraction used by Hsu et al. (2022); Li et al. (2021);
Lu et al. (2021).
mark the ground truth trigger words for the input
text by surrounding them with
**
. We choose to
use
**
as it is used to set text to bold in
Markdown
(a markup language for creating formatted text),
which is commonly found in code bases and web
data on which our LLM is trained. The incomplete
code prompt assigns a partial instantiation of an
event class to a variable to trigger the model for
completion, for example,
transport_event
= Transport(.
We observed that LLM tends to generate addi-
tional sentences paired with extracted arguments
if no stopping constraint is applied. To focus on
the given EAE task, we stop the code generation
whenever any of the following patterns is generated
by the model: """,class,print, or #.
2.3 In-context Learning
Optionally, we can include in-context learning ex-
amples, which are task prompts (§2.2) paired with
completed event instantiations using ground-truth
arguments (see Figure 2for a specific example).
For
k
-shot learning, we concatenate
k
such exam-
ples together. Given a task prompt, we determin-
istically gather
k
learning examples by collecting
training instances with the same event type, follow-
ing the order of occurrences in the training set.
3 Why Represent Event Structure in PL?
A wide range of NLP tasks have benefited from
LLM (Brown et al.,2020;Hoffmann et al.,2022;
Chowdhery et al.,2022) trained on web-scale lan-
guage corpora. To effectively use LLM trained on
language for EAE, one of the biggest challenges is
to specify the desired output, namely event struc-
tures in our case, using natural language.
There is a tradeoff between the effort put into
defining the output or designing the prompt (e.g.,
Text2Event in Table 2) and the benefit from pre-
training in natural language (e.g., DEGREE and
BART-Gen in Table 2). Text2Event (Lu et al.,2021)
resides at one end of the spectrum with a concise
but unnatural output format. As a result, this formu-
lation under-utilizes the pretraining power of the
model and does not work in low-resource settings
as shown in Table 4. Towards the other end, Hsu
et al. (2022); Li et al. (2021) design manual tem-
plates for the model to fill in. We also design two
variants of language prompt as shown in Figure A.5
and A.6 miciking our code prompt and BART-Gen
style prompt for comparison. Note that these natu-
ral language prompts are much more verbose and,
as shown in §4.2, usually result in sub-optimal per-
formance with sufficient in-context examples.
Essentially, this tradeoff is a result of the mis-
match between the pretraining corpora and task
output formats. Instead of using LLM trained on
only unstructured text, we turn to LLM trained with
a mixture of text and code, where the text is often
aligned in semantics with the accompanying code.
Such Code-LLMs have the ability to convert text
into corresponding code as demonstrated by (Chen
et al.,2021;Nijkamp et al.,2022). Then we can
map the desired output event structure into code
in a straightforward manner and leverage the full
pretraining power of these models. PLs like Python
offer features (e.g., class, docstrings, type annota-
tions, inheritance) that have a significant presence
in the pre-training corpus of Code-LLM due to
frequent usage. CODE4STRUCT leverages these
features to succinctly describe event structures,
which makes it better aligned with Code-LLM. By
leveraging LLM’s learned knowledge from diverse
pre-training domains, CODE4STRUCT can work
well in open-domain, achieving non-trivial zero-
shot performance given unseen event types (§4.5).
CODE4STRUCT is also data-efficient as exempli-
fied by reaching comparable performance to fully-
supervised methods with much fewer annotated
examples (20 per event type) (§4.5).