CODE4STRUCT Code Generation for Few-Shot Event Structure Prediction Xingyao Wang and Sha Li and Heng Ji University of Illinois Urbana-Champaign IL USA

2025-04-29 0 0 777.38KB 22 页 10玖币
侵权投诉
CODE4STRUCT: Code Generation for Few-Shot Event Structure Prediction
Xingyao Wang and Sha Li and Heng Ji
University of Illinois Urbana-Champaign, IL, USA
{xingyao6, shal2, hengji}@illinois.edu
Abstract
Large Language Model (LLM) trained on
a mixture of text and code has demon-
strated impressive capability in translating
natural language (NL) into structured code.
We observe that semantic structures can be
conveniently translated into code and pro-
pose CODE4STRUCT to leverage such text-to-
structure translation capability to tackle struc-
tured prediction tasks. As a case study, we
formulate Event Argument Extraction (EAE)
as converting text into event-argument struc-
tures that can be represented as a class object
using code. This alignment between structures
and code enables us to take advantage of Pro-
gramming Language (PL) features such as in-
heritance
1
and type annotation
2
to introduce ex-
ternal knowledge or add constraints. We show
that, with sufficient in-context examples, for-
mulating EAE as a code generation problem is
advantageous over using variants of text-based
prompts. Despite only using 20 training event
instances for each event type, CODE4STRUCT
is comparable to supervised models trained on
4,202 instances and outperforms current state-
of-the-art (SOTA) trained on 20-shot data by
29.5% absolute F1. By leveraging the inheri-
tance feature of PL, CODE4STRUCT can use
10-shot training data from a sibling event type
to predict arguments for zero-resource event
types and outperforms the zero-shot baseline
by 12% absolute F1. 3
1 Introduction
Large Language Model (LLM) trained on massive
corpora of code mixed with natural language (NL)
comments and docstrings
4
(e.g.,Chen et al. 2021,
1
Inheritance is a way to create a hierarchy of classes in PL.
A child class can base upon another class, retaining similar
implementation.
2
Developers use type annotations to indicate the data types
of variables and input/outputs of functions.
3
All code and resources are publicly available at
https:
//github.com/xingyaoww/code4struct.
4Text used to document a specific segment of code.
class Transport(Movement):
...
Transport
(Event Type)
GPE or ORG or PER
FAC or ORG or PER
or VEH or WEA LAC or GPE or LOC
FAC or GPE or LOC
VEH
agent
artifact
destination
origin
vehicle
(1) Event Ontology
(2) Event Definition
"""
Translate the following sentence into an instance of Transport.
The trigger word(s) of the event is marked with **trigger word**.
"Kelly , the US assistant secretary for East Asia and Pacific
Affairs , **arrived** in Seoul from Beijing Friday to brief Yoon ,
the foreign minister ."
"""
transport_event = Transport(
artifact=[
PER("Kelly"),
],
destination=[
GPE("Seoul"),
],
origin=[
GPE("Beijing"),
],
)
Input Sentence
Generated
Code
(3) Event Instantiation
Convert to Python class
Transport
(Event Instance)
PER: Kelly
GPE: Seoul
GPE: Beijing
agent
destination
origin
Prompt LLM
Figure 1: Event Argument Extraction using code gen-
eration. We convert the existing event type ontology to
PYTHON class definitions. Conditioned on these defi-
nitions, we put the input sentence for event argument
extraction into a docstring as the prompt for code gen-
eration. The generated code (colored in green) can be
mapped to an instance graph of Transport event.
Nijkamp et al. 2022) has demonstrated the abil-
ity to translate natural language instructions into
structured code. We ask if this conversion between
language and code can serve as a bridge to build a
connection between language and semantic struc-
ture, which is the goal of many structured predic-
tion tasks (e.g., semantic parsing, information ex-
traction) in Natural Language Processing (NLP). In
particular, the target structure (e.g., event-argument
graph in Figure 1) can be mapped to code more
straightforwardly compared to natural language,
which often requires careful prompt engineering
(Hsu et al. 2022,Li et al. 2021, Table 2). In addi-
tion, code written in programming languages has
an inherent advantage in representing complex and
arXiv:2210.12810v2 [cs.CL] 25 May 2023
Event Argument Extraction Programming Language (Python)
Event / Entity Type
Transport, VEH
Class definition
class Transport, class VEH
Hierarchical Event Ontology
Movement:Transport
Inheritance
Inheritance is a way to create a hierarchy of classes in PL. A child class can base upon another class,
retaining similar implementation.
class Transport(Movement)
Event Arguments
vehicle
Function arguments
def function(vehicle=...)
Argument Constraint
Each argument can has a list of multiple
entities; Argument vehicle should be entities of
type VEH.
Type Annotation & Argument Default Value
Type annotations are used by developers to indicate the data types of variables and input/outputs of
functions. If a function is called without the argument, the argument gets its default value (a list in this
case).
def function(
vehicle: List[VEH] = [],
)
Weakly-supervised Information
Transport Event describes someone transporting
something in a vehicle from one place to another
place.
Docstring or Comments
class Transport(Movement):
"""
self.agent transported self.artifact in self.vehicle vehicle from self.origin
place to self.destination place.
"""
Table 1: Mapping between Event Argument Extraction requirements and features of Python programming language.
interdependent structures (Miller,1981;Sebrechts
and Gross,1985) with features such as inheritance
and type annotation.
As a case study, we showcase our proposed
CODE4STRUCT on the Event Argument Extrac-
tion (EAE) task, which aims to extract event struc-
tures from unstructured text. EAE is the ideal
testbed for our method due to the close alignment
between EAE and PL as shown in Table 1. In
CODE4STRUCT (Figure 1), we first translate the
entity and event type ontology into Python class
definitions. Conditioned on the relevant class defi-
nitions and the input sentence, we prompt an LLM
to generate an instantiation of the event class, from
which we can extract the predicted arguments.
By leveraging the alignment between PL and
NLP problems, CODE4STRUCT enjoys various ad-
vantages as shown in Table 1. Using PL features
like type annotation and argument default value,
we can naturally enforce argument constraints for
output structures. This allows CODE4STRUCT to
handle multiple or zero argument fillers for the
same argument role by annotating the expected
type (i.e., expect a list of entities) and setting the
default value for each argument (i.e., an empty list
without any entity by default). Furthermore, we can
naturally utilize the event hierarchy by leveraging
inheritance. Inheritance allows a child event class
(e.g.,
Transport
) to reuse most components of
its parent class (e.g.,
Movement
) while preserving
its unique properties. We demonstrate that hierar-
chical event types allow zero-resource event types
to use annotated training examples from their high-
resource sibling types (§4.6).
We outline our contributions as follows:
We propose CODE4STRUCT to tackle struc-
tured prediction problems in NLP using
code generation. As a case study, we use
CODE4STRUCT for Event Argument Extrac-
tion (EAE).
We perform extensive experiments contrasting
the performance of code-based prompt and
two variants of text prompt on different LLMs
and show that code prompt is generally ad-
vantageous over text prompt when sufficient
in-context examples are provided (§4.2).
We demonstrate that 20-shot CODE4STRUCT
rivals fully-supervised methods trained on
4,202 instances. CODE4STRUCT outperforms
a SOTA approach by 29.5% absolute F1 gain
when 20-shot data are given to both. 0-
shot CODE4STRUCT can even outperform the
SOTA on both 20 and 50 shots (§4.5).
We show that integrating the event ontology hi-
erarchy by class inheritance can improve pre-
diction. Compared to the zero-shot baseline,
we see 12% F1 gains for zero-resource event
types when using 10-shot examples from their
sibling event types (§4.6).
2 Code Generation Prompt Construction
In Event Argument Extraction (EAE) task, a model
is provided with an event ontology and the tar-
get text to extract from. Similarly, we prompt an
from typing import List
class Entity:
def __init__(self, name: str):
self.name = name
class Event:
def __init__(self, name: str):
self.name = name
class Movement(Event): # Inherit from `Event` class
... # omitted for space
class Transport(Movement):
"""
self.agent transported self.artifact in self.vehicle vehicle from
self.origin place to self.destination place.
"""
def __init__(
self,
agent: List[GPE | ORG | PER] = [],
artifact: List[FAC | ORG | PER | VEH | WEA] = [],
destination: List[FAC | GPE | LOC] = [],
origin: List[FAC | GPE | LOC] = [],
vehicle: List[VEH] = [],
):
self.agent = agent
self.artifact = artifact
self.destination = destination
self.origin = origin
self.vehicle = vehicle
"""
Translate the following sentence into an instance of
Transport. The trigger word(s) of the event is marked
with **trigger word**.
"Kelly , the US assistant secretary for East Asia and
Pacific Affairs , **arrived** in Seoul from Beijing
Friday to brief Yoon , the foreign minister ."
"""
transport_event = Transport(
"""
Translate the following sentence into an instance of Transport. The trigger
word(s) of the event is marked with **trigger word**.
"Kelly , who declined to talks to reporters here , **travels** to Tokyo Sunday
for talks with Japanese officials ."
"""
transport_event = Transport(
artifact=[PER("Kelly"),],
destination=[GPE("Tokyo"),],
)
Relevant Entity Definition(s)
class ORG(Entity):
"""Corporations, agencies, and other groups of people
defined by an established organizational structure..."""
def __init__(self, name: str):
super().__init__(name=name)
class GPE(Entity):
"""Geopolitical entities such as countries, provinces,
states, cities, towns, etc. GPEs are composite entities,
consisting of ..."""
def __init__(self, name: str):
super().__init__(name=name)
Base Class
Definition
"""
Translate the following sentence into an instance of Transport. The trigger
word(s) of the event is marked with **trigger word**.
"Renowned Hollywood madam Heidi Fleiss has been **flown** to Melbourne as guest
of honour at Thursday's market debut and , according to Harris , has already
played a key role in attracting worldwide media attention to the event ."
"""
transport_event = Transport(
artifact=[PER("Heidi Fleiss"),],
destination=[GPE("Melbourne"),],
)
(optional) k In-context Examples
Event Definition
Ontology
Code
Representation
Task
Prompt
Groundtruth Code
Trigger Marking
LLM Prompt
Event templateHierarchical
Ontology
Entity Type Annotation
Figure 2: Prompt components. (1) Ontology code representation contains definitions of entity and event classes,
colored in yellow and blue (§2.1). (2)
k
-shot examples for in-context learning, colored in orange (§2.3). (3) The
task prompt, appended at the end with partial class instantiation for LLM completion, colored in green (§2.2).
LLM with the ontology that consists of definitions
of event types and argument roles, and input sen-
tences to generate code that instantiates the given
event type. We breakdown the input prompt into
three components: (1) ontology code representa-
tion which consists of Python class definitions for
entity types and an event type (§2.1); (2) optional
k-shot in-context learning examples for the event
type defined in (1) (§2.3); (3) task prompt for com-
pletion (§2.2). We show a breakdown of the full
prompt in Figure 2.
2.1 Ontology Code Representation
To represent the event ontology as code, we con-
catenate the base class definition, entity class defi-
nitions, and event class definitions.
Base Class Definition We define base type
Entity
and
Event
to be inherited by other
classes.
Entity Class Definition We use entity type def-
initions from the Automatic Content Extraction
(ACE) program
5
. We construct Python classes that
inherit from
Entity
and use the entity type as the
class name (e.g.,
class GPE(Entity)
). We
add a natural language description as a docstring
of the defined class for each entity type.
2.1.1 Event Class Definition
We define the event class using the name of the
event type (e.g.,
class Transport
). As ACE
5https://www.ldc.upenn.edu/
collaborations/past-projects/ace
defines its event types in a hierarchical ontology,
mimicking class definitions in Object-Oriented PL,
we inherit the event class definition from its par-
ent (e.g.,
class Transport(Movement)
) or
root event type if the event class does not has a par-
ent (e.g.,
class Movement(Event)
). An ex-
ample of hierarchical event definition can be found
in Figure A.9.
We define the argument roles (e.g., destination
of
Transport
) as input arguments of the con-
structor
__init__6
. We specify the type of each
argument role using Python type annotation, a com-
monly used PL feature: For example,
agent:
List[GPE | ORG | PER]
means that the
agent
argument accepts a list of entities which
could be either of type GPE (Geo-Political Entity),
ORG (Organization), or PER (Person). We assign
each input argument (e.g.,
agent
) to a class mem-
ber variable of the same name.
We include event description templates into the
docstring of the class definition. The event descrip-
tion templates are modified from Li et al. (2021)
by replacing each role with their corresponding
member variable (e.g.,self.agent).
2.2 Task Prompt
The task prompt consists of a docstring describing
the task and incomplete event instantiation code for
completion. An example of a task prompt can be
found in Figure 2. The text-based docstring con-
tains a task instruction and an input sentence. We
6
A constructor is a special function that initializes an in-
stance of a class.
Prior Work Language Template
DEGREE (Hsu et al.,2022)
somebody was moved to somewhere from some place by some way.somebody or some organization was
responsible for the movement. something was sent to somewhere from some place.somebody or some
organization was responsible for the transport.
BART-Gen (Li et al.,2021)<arg1> transported <arg2> in <arg3> vehicle from <arg4> place to <arg5> place
Text2Event (Lu et al.,2021) ( (Transport returned (Agent <arg>) (Artifact <arg>) (Destination <arg>) (Origin <arg>) (Vehicle <arg>) )
Table 2: Example of language templates for Event Argument Extraction used by Hsu et al. (2022); Li et al. (2021);
Lu et al. (2021).
mark the ground truth trigger words for the input
text by surrounding them with
**
. We choose to
use
**
as it is used to set text to bold in
Markdown
(a markup language for creating formatted text),
which is commonly found in code bases and web
data on which our LLM is trained. The incomplete
code prompt assigns a partial instantiation of an
event class to a variable to trigger the model for
completion, for example,
transport_event
= Transport(.
We observed that LLM tends to generate addi-
tional sentences paired with extracted arguments
if no stopping constraint is applied. To focus on
the given EAE task, we stop the code generation
whenever any of the following patterns is generated
by the model: """,class,print, or #.
2.3 In-context Learning
Optionally, we can include in-context learning ex-
amples, which are task prompts (§2.2) paired with
completed event instantiations using ground-truth
arguments (see Figure 2for a specific example).
For
k
-shot learning, we concatenate
k
such exam-
ples together. Given a task prompt, we determin-
istically gather
k
learning examples by collecting
training instances with the same event type, follow-
ing the order of occurrences in the training set.
3 Why Represent Event Structure in PL?
A wide range of NLP tasks have benefited from
LLM (Brown et al.,2020;Hoffmann et al.,2022;
Chowdhery et al.,2022) trained on web-scale lan-
guage corpora. To effectively use LLM trained on
language for EAE, one of the biggest challenges is
to specify the desired output, namely event struc-
tures in our case, using natural language.
There is a tradeoff between the effort put into
defining the output or designing the prompt (e.g.,
Text2Event in Table 2) and the benefit from pre-
training in natural language (e.g., DEGREE and
BART-Gen in Table 2). Text2Event (Lu et al.,2021)
resides at one end of the spectrum with a concise
but unnatural output format. As a result, this formu-
lation under-utilizes the pretraining power of the
model and does not work in low-resource settings
as shown in Table 4. Towards the other end, Hsu
et al. (2022); Li et al. (2021) design manual tem-
plates for the model to fill in. We also design two
variants of language prompt as shown in Figure A.5
and A.6 miciking our code prompt and BART-Gen
style prompt for comparison. Note that these natu-
ral language prompts are much more verbose and,
as shown in §4.2, usually result in sub-optimal per-
formance with sufficient in-context examples.
Essentially, this tradeoff is a result of the mis-
match between the pretraining corpora and task
output formats. Instead of using LLM trained on
only unstructured text, we turn to LLM trained with
a mixture of text and code, where the text is often
aligned in semantics with the accompanying code.
Such Code-LLMs have the ability to convert text
into corresponding code as demonstrated by (Chen
et al.,2021;Nijkamp et al.,2022). Then we can
map the desired output event structure into code
in a straightforward manner and leverage the full
pretraining power of these models. PLs like Python
offer features (e.g., class, docstrings, type annota-
tions, inheritance) that have a significant presence
in the pre-training corpus of Code-LLM due to
frequent usage. CODE4STRUCT leverages these
features to succinctly describe event structures,
which makes it better aligned with Code-LLM. By
leveraging LLM’s learned knowledge from diverse
pre-training domains, CODE4STRUCT can work
well in open-domain, achieving non-trivial zero-
shot performance given unseen event types (§4.5).
CODE4STRUCT is also data-efficient as exempli-
fied by reaching comparable performance to fully-
supervised methods with much fewer annotated
examples (20 per event type) (§4.5).
4 Experiments
4.1 Experiment Setup
LLM We use CODEX
code-davinci-002
(Chen et al.,2021), a GPT-3 (Brown et al.,
2020) model finetuned on code, which supports
up to 8k input tokens. We compare its perfor-
mance with InstructGPT (Ouyang et al.,2022)
text-davinci-002
and its improved version
text-davinci-003
, both support up to 4k in-
put tokens. We access these LLMs through OpenAI
API7.
Hyperparameters We prompt LLM to generate
code that instantiates an event using sampling tem-
perature
t= 0
(i.e., greedy decoding). We set the
max number of new tokens for each generation to
128, which fits all code outputs for the test set.
Evaluation Tasks We use ground truth event
type and gold-standard trigger words to perform
Event Argument Extraction.
Dataset We evaluate our performance of EAE on
the English subset of Automatic Content Extraction
2005 dataset (ACE05-E)
8
(Doddington et al.,2004).
We follow Wadden et al. (2019); Lin et al. (2020)
for dataset processing. ACE05-E has hierarchical
event types with 8 parent types and 33 child types.
Among all child types, roughly half of the event
types (14 out of 33) in ACE05-E have less than
50 event instances in the training set. We show
statistics for each event type in Table A.4.
Evaluation metrics We use Argument F1-score
following prior work (Ji and Grishman,2008;Li
et al.,2021;Hsu et al.,2022): We consider an argu-
ment to be correctly identified when the head word
span of predicted text9matches that of the human-
annotated text (denoted as Arg-I); We consider an
argument to be correctly classified if the role (e.g.,
agent
) of a correctly identified argument matches
that of the human annotation (denoted as Arg-C).
4.2 Comparison with Text Prompt
To compare our code-based prompt with text-based
prompts, we design two variants of text prompt:
T(1)
mimicking our code prompt (i.e., code im-
itation, Figure A.5) and
T(2)
following BART-
7https://openai.com/api/
8https://www.ldc.upenn.edu/
collaborations/past-projects/ace
9
We find the span of predicted text in the given sentence,
then use spacy library to find its head word.
Gen style prompt (Li et al.,2021) (Figure A.6)
which resembles natural language more compared
to
T(1)
. Both text prompts have similar compo-
nents as our code-based prompt in Figure 2. Text
prompts rely on natural language to define the re-
quirement and format of the desired output, while
the code prompt utilizes PL syntax. We com-
pare the F1 score difference between the code
prompt (§2) and two variants of text prompts (i.e.,
(i)
CT=F1code F1(i)
text, i ∈ {1,2}
) on different
LLMs in Table 3. We include exact performance
numbers of text prompts in Table A.3. We summa-
rize our findings as follows:
Code prompt outperforms both text prompts
on Arg-C F1 (i.e.,
(i)
CT>0
) for two
text prompt variants and all LLMs except
text-davinci-003
when sufficient in-
context examples are given (i.e.,k5).
For
*-davinci-002
LLMs, there are more
significant performance gains from using a
code prompt (i.e., increasing
(i)
CT
for all
i
)
when the number of in-context examples
k
increases (for k5).
There is no clear trend on Arg-I F1 to dif-
ferentiate code and text prompts, except for
text-davinci-003
, which exhibits simi-
lar behavior that code prompt performs better
with larger k.
Text prompt
T(2)
(BART-Gen style), which
resembles natural language more, performs
poorly under low-shot (
k1
), primarily
due to the LLM being unable to produce the
desired structure output described using lan-
guage in
T(2)
, causing the low-shot code-text
performance gap
(2)
CT
to be larger compared
to
T(1)
. These low-shot performance differ-
ences between
T(1)
and
T(2)
further signify
the need to prompt engineering for language-
based prompts to work well in a low-shot set-
ting.
4.3 Comparison with different LLM
We measure the performance of the same
CODE4STRUCT code prompt across differ-
ent foundational LLMs in §4.1. LLM per-
formance comparison can be found in Fig-
ure 3.
text-davinci-002
is an InstructGPT
(Ouyang et al.,2022) model finetuned with human
demonstrations based on
code-davinci-002
,
摘要:

CODE4STRUCT:CodeGenerationforFew-ShotEventStructurePredictionXingyaoWangandShaLiandHengJiUniversityofIllinoisUrbana-Champaign,IL,USA{xingyao6,shal2,hengji}@illinois.eduAbstractLargeLanguageModel(LLM)trainedonamixtureoftextandcodehasdemon-stratedimpressivecapabilityintranslatingnaturallanguage(NL)int...

展开>> 收起<<
CODE4STRUCT Code Generation for Few-Shot Event Structure Prediction Xingyao Wang and Sha Li and Heng Ji University of Illinois Urbana-Champaign IL USA.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:777.38KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注