CODE4STRUCT Code Generation for Few-Shot Event Structure Prediction Xingyao Wang and Sha Li and Heng Ji University of Illinois Urbana-Champaign IL USA

2025-04-29 0 0 777.38KB 22 页 10玖币

侵权投诉

CODE4STRUCT: Code Generation for Few-Shot Event Structure Prediction

Xingyao Wang and Sha Li and Heng Ji

University of Illinois Urbana-Champaign, IL, USA

{xingyao6, shal2, hengji}@illinois.edu

Abstract

Large Language Model (LLM) trained on

a mixture of text and code has demon-

strated impressive capability in translating

natural language (NL) into structured code.

We observe that semantic structures can be

conveniently translated into code and pro-

pose CODE4STRUCT to leverage such text-to-

structure translation capability to tackle struc-

tured prediction tasks. As a case study, we

formulate Event Argument Extraction (EAE)

as converting text into event-argument struc-

tures that can be represented as a class object

using code. This alignment between structures

and code enables us to take advantage of Pro-

gramming Language (PL) features such as in-

heritance

and type annotation

to introduce ex-

ternal knowledge or add constraints. We show

that, with sufﬁcient in-context examples, for-

mulating EAE as a code generation problem is

advantageous over using variants of text-based

prompts. Despite only using 20 training event

instances for each event type, CODE4STRUCT

is comparable to supervised models trained on

4,202 instances and outperforms current state-

of-the-art (SOTA) trained on 20-shot data by

29.5% absolute F1. By leveraging the inheri-

tance feature of PL, CODE4STRUCT can use

10-shot training data from a sibling event type

to predict arguments for zero-resource event

types and outperforms the zero-shot baseline

by 12% absolute F1. 3

1 Introduction

Large Language Model (LLM) trained on massive

corpora of code mixed with natural language (NL)

comments and docstrings

(e.g.,Chen et al. 2021,

Inheritance is a way to create a hierarchy of classes in PL.

A child class can base upon another class, retaining similar

implementation.

Developers use type annotations to indicate the data types

of variables and input/outputs of functions.

All code and resources are publicly available at

https:

//github.com/xingyaoww/code4struct.

4Text used to document a speciﬁc segment of code.

class Transport(Movement):

...

Transport

(Event Type)

GPE or ORG or PER

FAC or ORG or PER

or VEH or WEA LAC or GPE or LOC

FAC or GPE or LOC

VEH

agent

artifact

destination

origin

vehicle

(1) Event Ontology

(2) Event Definition

"""

Translate the following sentence into an instance of Transport.

The trigger word(s) of the event is marked with **trigger word**.

"Kelly , the US assistant secretary for East Asia and Pacific

Affairs , **arrived** in Seoul from Beijing Friday to brief Yoon ,

the foreign minister ."

"""

transport_event = Transport(

artifact=[

PER("Kelly"),

destination=[

GPE("Seoul"),

origin=[

GPE("Beijing"),

)

Input Sentence

Generated

Code

(3) Event Instantiation

Convert to Python class

Transport

(Event Instance)

PER: Kelly

GPE: Seoul

GPE: Beijing

agent

destination

origin

Prompt LLM

Figure 1: Event Argument Extraction using code gen-

eration. We convert the existing event type ontology to

PYTHON class deﬁnitions. Conditioned on these deﬁ-

nitions, we put the input sentence for event argument

extraction into a docstring as the prompt for code gen-

eration. The generated code (colored in green) can be

mapped to an instance graph of Transport event.

Nijkamp et al. 2022) has demonstrated the abil-

ity to translate natural language instructions into

structured code. We ask if this conversion between

language and code can serve as a bridge to build a

connection between language and semantic struc-

ture, which is the goal of many structured predic-

tion tasks (e.g., semantic parsing, information ex-

traction) in Natural Language Processing (NLP). In

particular, the target structure (e.g., event-argument

graph in Figure 1) can be mapped to code more

straightforwardly compared to natural language,

which often requires careful prompt engineering

(Hsu et al. 2022,Li et al. 2021, Table 2). In addi-

tion, code written in programming languages has

an inherent advantage in representing complex and

arXiv:2210.12810v2 [cs.CL] 25 May 2023

Event Argument Extraction Programming Language (Python)

Event / Entity Type

Transport, VEH

Class definition

class Transport, class VEH

Hierarchical Event Ontology

Movement:Transport

Inheritance

Inheritance is a way to create a hierarchy of classes in PL. A child class can base upon another class,

retaining similar implementation.

class Transport(Movement)

Event Arguments

vehicle

Function arguments

def function(vehicle=...)

Argument Constraint

Each argument can has a list of multiple

entities; Argument vehicle should be entities of

type VEH.

Type Annotation & Argument Default Value

Type annotations are used by developers to indicate the data types of variables and input/outputs of

functions. If a function is called without the argument, the argument gets its default value (a list in this

case).

def function(

vehicle: List[VEH] = [], …

)

Weakly-supervised Information

Transport Event describes someone transporting

something in a vehicle from one place to another

place.

Docstring or Comments

class Transport(Movement):

"""

self.agent transported self.artifact in self.vehicle vehicle from self.origin

place to self.destination place.

"""

Table 1: Mapping between Event Argument Extraction requirements and features of Python programming language.

interdependent structures (Miller,1981;Sebrechts

and Gross,1985) with features such as inheritance

and type annotation.

As a case study, we showcase our proposed

CODE4STRUCT on the Event Argument Extrac-

tion (EAE) task, which aims to extract event struc-

tures from unstructured text. EAE is the ideal

testbed for our method due to the close alignment

between EAE and PL as shown in Table 1. In

CODE4STRUCT (Figure 1), we ﬁrst translate the

entity and event type ontology into Python class

deﬁnitions. Conditioned on the relevant class deﬁ-

nitions and the input sentence, we prompt an LLM

to generate an instantiation of the event class, from

which we can extract the predicted arguments.

By leveraging the alignment between PL and

NLP problems, CODE4STRUCT enjoys various ad-

vantages as shown in Table 1. Using PL features

like type annotation and argument default value,

we can naturally enforce argument constraints for

output structures. This allows CODE4STRUCT to

handle multiple or zero argument ﬁllers for the

same argument role by annotating the expected

type (i.e., expect a list of entities) and setting the

default value for each argument (i.e., an empty list

without any entity by default). Furthermore, we can

naturally utilize the event hierarchy by leveraging

inheritance. Inheritance allows a child event class

(e.g.,

Transport

) to reuse most components of

its parent class (e.g.,

Movement

) while preserving

its unique properties. We demonstrate that hierar-

chical event types allow zero-resource event types

to use annotated training examples from their high-

resource sibling types (§4.6).

We outline our contributions as follows:

•

We propose CODE4STRUCT to tackle struc-

tured prediction problems in NLP using

code generation. As a case study, we use

CODE4STRUCT for Event Argument Extrac-

tion (EAE).

•

We perform extensive experiments contrasting

the performance of code-based prompt and

two variants of text prompt on different LLMs

and show that code prompt is generally ad-

vantageous over text prompt when sufﬁcient

in-context examples are provided (§4.2).

•

We demonstrate that 20-shot CODE4STRUCT

rivals fully-supervised methods trained on

4,202 instances. CODE4STRUCT outperforms

a SOTA approach by 29.5% absolute F1 gain

when 20-shot data are given to both. 0-

shot CODE4STRUCT can even outperform the

SOTA on both 20 and 50 shots (§4.5).

•

We show that integrating the event ontology hi-

erarchy by class inheritance can improve pre-

diction. Compared to the zero-shot baseline,

we see 12% F1 gains for zero-resource event

types when using 10-shot examples from their

sibling event types (§4.6).

2 Code Generation Prompt Construction

In Event Argument Extraction (EAE) task, a model

is provided with an event ontology and the tar-

get text to extract from. Similarly, we prompt an

from typing import List

class Entity:

def __init__(self, name: str):

self.name = name

class Event:

def __init__(self, name: str):

self.name = name

class Movement(Event): # Inherit from `Event` class

... # omitted for space

class Transport(Movement):

"""

self.agent transported self.artifact in self.vehicle vehicle from

self.origin place to self.destination place.

"""

def __init__(

self,

agent: List[GPE | ORG | PER] = [],

artifact: List[FAC | ORG | PER | VEH | WEA] = [],

destination: List[FAC | GPE | LOC] = [],

origin: List[FAC | GPE | LOC] = [],

vehicle: List[VEH] = [],

self.agent = agent

self.artifact = artifact

self.destination = destination

self.origin = origin

self.vehicle = vehicle

"""

Translate the following sentence into an instance of

Transport. The trigger word(s) of the event is marked

with **trigger word**.

"Kelly , the US assistant secretary for East Asia and

Pacific Affairs , **arrived** in Seoul from Beijing

Friday to brief Yoon , the foreign minister ."

"""

transport_event = Transport(

"""

Translate the following sentence into an instance of Transport. The trigger

word(s) of the event is marked with **trigger word**.

"Kelly , who declined to talks to reporters here , **travels** to Tokyo Sunday

for talks with Japanese officials ."

"""

transport_event = Transport(

artifact=[PER("Kelly"),],

destination=[GPE("Tokyo"),],

)

Relevant Entity Definition(s)

class ORG(Entity):

"""Corporations, agencies, and other groups of people

defined by an established organizational structure..."""

def __init__(self, name: str):

super().__init__(name=name)

class GPE(Entity):

"""Geopolitical entities such as countries, provinces,

states, cities, towns, etc. GPEs are composite entities,

consisting of ..."""

def __init__(self, name: str):

super().__init__(name=name)

Base Class

Definition

"""

Translate the following sentence into an instance of Transport. The trigger

word(s) of the event is marked with **trigger word**.

"Renowned Hollywood madam Heidi Fleiss has been **flown** to Melbourne as guest

of honour at Thursday's market debut and , according to Harris , has already

played a key role in attracting worldwide media attention to the event ."

"""

transport_event = Transport(

artifact=[PER("Heidi Fleiss"),],

destination=[GPE("Melbourne"),],

)

(optional) k In-context Examples

Event Definition

Ontology

Code

Representation

Task

Prompt

Groundtruth Code

Trigger Marking

LLM Prompt

Event templateHierarchical

Ontology

Entity Type Annotation

Figure 2: Prompt components. (1) Ontology code representation contains deﬁnitions of entity and event classes,

colored in yellow and blue (§2.1). (2)

-shot examples for in-context learning, colored in orange (§2.3). (3) The

task prompt, appended at the end with partial class instantiation for LLM completion, colored in green (§2.2).

LLM with the ontology that consists of deﬁnitions

of event types and argument roles, and input sen-

tences to generate code that instantiates the given

event type. We breakdown the input prompt into

three components: (1) ontology code representa-

tion which consists of Python class deﬁnitions for

entity types and an event type (§2.1); (2) optional

k-shot in-context learning examples for the event

type deﬁned in (1) (§2.3); (3) task prompt for com-

pletion (§2.2). We show a breakdown of the full

prompt in Figure 2.

2.1 Ontology Code Representation

To represent the event ontology as code, we con-

catenate the base class deﬁnition, entity class deﬁ-

nitions, and event class deﬁnitions.

Base Class Deﬁnition We deﬁne base type

Entity

and

Event

to be inherited by other

classes.

Entity Class Deﬁnition We use entity type def-

initions from the Automatic Content Extraction

(ACE) program

. We construct Python classes that

inherit from

Entity

and use the entity type as the

class name (e.g.,

class GPE(Entity)

). We

add a natural language description as a docstring

of the deﬁned class for each entity type.

2.1.1 Event Class Deﬁnition

We deﬁne the event class using the name of the

event type (e.g.,

class Transport

). As ACE

5https://www.ldc.upenn.edu/

collaborations/past-projects/ace

deﬁnes its event types in a hierarchical ontology,

mimicking class deﬁnitions in Object-Oriented PL,

we inherit the event class deﬁnition from its par-

ent (e.g.,

class Transport(Movement)

) or

root event type if the event class does not has a par-

ent (e.g.,

class Movement(Event)

). An ex-

ample of hierarchical event deﬁnition can be found

in Figure A.9.

We deﬁne the argument roles (e.g., destination

Transport

) as input arguments of the con-

structor

__init__6

. We specify the type of each

argument role using Python type annotation, a com-

monly used PL feature: For example,

agent:

List[GPE | ORG | PER]

means that the

agent

argument accepts a list of entities which

could be either of type GPE (Geo-Political Entity),

ORG (Organization), or PER (Person). We assign

each input argument (e.g.,

agent

) to a class mem-

ber variable of the same name.

We include event description templates into the

docstring of the class deﬁnition. The event descrip-

tion templates are modiﬁed from Li et al. (2021)

by replacing each role with their corresponding

member variable (e.g.,self.agent).

2.2 Task Prompt

The task prompt consists of a docstring describing

the task and incomplete event instantiation code for

completion. An example of a task prompt can be

found in Figure 2. The text-based docstring con-

tains a task instruction and an input sentence. We

A constructor is a special function that initializes an in-

stance of a class.

Prior Work Language Template

DEGREE (Hsu et al.,2022)

somebody was moved to somewhere from some place by some way.somebody or some organization was

responsible for the movement. something was sent to somewhere from some place.somebody or some

organization was responsible for the transport.

BART-Gen (Li et al.,2021)<arg1> transported <arg2> in <arg3> vehicle from <arg4> place to <arg5> place

Text2Event (Lu et al.,2021) ( (Transport returned (Agent <arg>) (Artifact <arg>) (Destination <arg>) (Origin <arg>) (Vehicle <arg>) )

Table 2: Example of language templates for Event Argument Extraction used by Hsu et al. (2022); Li et al. (2021);

Lu et al. (2021).

mark the ground truth trigger words for the input

text by surrounding them with

. We choose to

use

as it is used to set text to bold in

Markdown

(a markup language for creating formatted text),

which is commonly found in code bases and web

data on which our LLM is trained. The incomplete

code prompt assigns a partial instantiation of an

event class to a variable to trigger the model for

completion, for example,

transport_event

= Transport(.

We observed that LLM tends to generate addi-

tional sentences paired with extracted arguments

if no stopping constraint is applied. To focus on

the given EAE task, we stop the code generation

whenever any of the following patterns is generated

by the model: """,class,print, or #.

2.3 In-context Learning

Optionally, we can include in-context learning ex-

amples, which are task prompts (§2.2) paired with

completed event instantiations using ground-truth

arguments (see Figure 2for a speciﬁc example).

For

-shot learning, we concatenate

such exam-

ples together. Given a task prompt, we determin-

istically gather

learning examples by collecting

training instances with the same event type, follow-

ing the order of occurrences in the training set.

3 Why Represent Event Structure in PL?

A wide range of NLP tasks have beneﬁted from

LLM (Brown et al.,2020;Hoffmann et al.,2022;

Chowdhery et al.,2022) trained on web-scale lan-

guage corpora. To effectively use LLM trained on

language for EAE, one of the biggest challenges is

to specify the desired output, namely event struc-

tures in our case, using natural language.

There is a tradeoff between the effort put into

deﬁning the output or designing the prompt (e.g.,

Text2Event in Table 2) and the beneﬁt from pre-

training in natural language (e.g., DEGREE and

BART-Gen in Table 2). Text2Event (Lu et al.,2021)

resides at one end of the spectrum with a concise

but unnatural output format. As a result, this formu-

lation under-utilizes the pretraining power of the

model and does not work in low-resource settings

as shown in Table 4. Towards the other end, Hsu

et al. (2022); Li et al. (2021) design manual tem-

plates for the model to ﬁll in. We also design two

variants of language prompt as shown in Figure A.5

and A.6 miciking our code prompt and BART-Gen

style prompt for comparison. Note that these natu-

ral language prompts are much more verbose and,

as shown in §4.2, usually result in sub-optimal per-

formance with sufﬁcient in-context examples.

Essentially, this tradeoff is a result of the mis-

match between the pretraining corpora and task

output formats. Instead of using LLM trained on

only unstructured text, we turn to LLM trained with

a mixture of text and code, where the text is often

aligned in semantics with the accompanying code.

Such Code-LLMs have the ability to convert text

into corresponding code as demonstrated by (Chen

et al.,2021;Nijkamp et al.,2022). Then we can

map the desired output event structure into code

in a straightforward manner and leverage the full

pretraining power of these models. PLs like Python

offer features (e.g., class, docstrings, type annota-

tions, inheritance) that have a signiﬁcant presence

in the pre-training corpus of Code-LLM due to

frequent usage. CODE4STRUCT leverages these

features to succinctly describe event structures,

which makes it better aligned with Code-LLM. By

leveraging LLM’s learned knowledge from diverse

pre-training domains, CODE4STRUCT can work

well in open-domain, achieving non-trivial zero-

shot performance given unseen event types (§4.5).

CODE4STRUCT is also data-efﬁcient as exempli-

ﬁed by reaching comparable performance to fully-

supervised methods with much fewer annotated

examples (20 per event type) (§4.5).

4 Experiments

4.1 Experiment Setup

LLM We use CODEX

code-davinci-002

(Chen et al.,2021), a GPT-3 (Brown et al.,

2020) model ﬁnetuned on code, which supports

up to 8k input tokens. We compare its perfor-

mance with InstructGPT (Ouyang et al.,2022)

text-davinci-002

and its improved version

text-davinci-003

, both support up to 4k in-

put tokens. We access these LLMs through OpenAI

API7.

Hyperparameters We prompt LLM to generate

code that instantiates an event using sampling tem-

perature

t= 0

(i.e., greedy decoding). We set the

max number of new tokens for each generation to

128, which ﬁts all code outputs for the test set.

Evaluation Tasks We use ground truth event

type and gold-standard trigger words to perform

Event Argument Extraction.

Dataset We evaluate our performance of EAE on

the English subset of Automatic Content Extraction

2005 dataset (ACE05-E)

(Doddington et al.,2004).

We follow Wadden et al. (2019); Lin et al. (2020)

for dataset processing. ACE05-E has hierarchical

event types with 8 parent types and 33 child types.

Among all child types, roughly half of the event

types (14 out of 33) in ACE05-E have less than

50 event instances in the training set. We show

statistics for each event type in Table A.4.

Evaluation metrics We use Argument F1-score

following prior work (Ji and Grishman,2008;Li

et al.,2021;Hsu et al.,2022): We consider an argu-

ment to be correctly identiﬁed when the head word

span of predicted text9matches that of the human-

annotated text (denoted as Arg-I); We consider an

argument to be correctly classiﬁed if the role (e.g.,

agent

) of a correctly identiﬁed argument matches

that of the human annotation (denoted as Arg-C).

4.2 Comparison with Text Prompt

To compare our code-based prompt with text-based

prompts, we design two variants of text prompt:

T(1)

mimicking our code prompt (i.e., code im-

itation, Figure A.5) and

T(2)

following BART-

7https://openai.com/api/

8https://www.ldc.upenn.edu/

collaborations/past-projects/ace

We ﬁnd the span of predicted text in the given sentence,

then use spacy library to ﬁnd its head word.

Gen style prompt (Li et al.,2021) (Figure A.6)

which resembles natural language more compared

T(1)

. Both text prompts have similar compo-

nents as our code-based prompt in Figure 2. Text

prompts rely on natural language to deﬁne the re-

quirement and format of the desired output, while

the code prompt utilizes PL syntax. We com-

pare the F1 score difference between the code

prompt (§2) and two variants of text prompts (i.e.,

∆(i)

C−T=F1code −F1(i)

text, i ∈ {1,2}

) on different

LLMs in Table 3. We include exact performance

numbers of text prompts in Table A.3. We summa-

rize our ﬁndings as follows:

•

Code prompt outperforms both text prompts

on Arg-C F1 (i.e.,

∆(i)

C−T>0

) for two

text prompt variants and all LLMs except

text-davinci-003

when sufﬁcient in-

context examples are given (i.e.,k≥5).

•

For

*-davinci-002

LLMs, there are more

signiﬁcant performance gains from using a

code prompt (i.e., increasing

∆(i)

C−T

for all

)

when the number of in-context examples

increases (for k≥5).

•

There is no clear trend on Arg-I F1 to dif-

ferentiate code and text prompts, except for

text-davinci-003

, which exhibits simi-

lar behavior that code prompt performs better

with larger k.

•

Text prompt

T(2)

(BART-Gen style), which

resembles natural language more, performs

poorly under low-shot (

k≤1

), primarily

due to the LLM being unable to produce the

desired structure output described using lan-

guage in

T(2)

, causing the low-shot code-text

performance gap

∆(2)

C−T

to be larger compared

T(1)

. These low-shot performance differ-

ences between

T(1)

and

T(2)

further signify

the need to prompt engineering for language-

based prompts to work well in a low-shot set-

ting.

4.3 Comparison with different LLM

We measure the performance of the same

CODE4STRUCT code prompt across differ-

ent foundational LLMs in §4.1. LLM per-

formance comparison can be found in Fig-

ure 3.

text-davinci-002

is an InstructGPT

(Ouyang et al.,2022) model ﬁnetuned with human

demonstrations based on

code-davinci-002

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CODE4STRUCT:CodeGenerationforFew-ShotEventStructurePredictionXingyaoWangandShaLiandHengJiUniversityofIllinoisUrbana-Champaign,IL,USA{xingyao6,shal2,hengji}@illinois.eduAbstractLargeLanguageModel(LLM)trainedonamixtureoftextandcodehasdemon-stratedimpressivecapabilityintranslatingnaturallanguage(NL)int...

展开>> 收起<<

CODE4STRUCT Code Generation for Few-Shot Event Structure Prediction Xingyao Wang and Sha Li and Heng Ji University of Illinois Urbana-Champaign IL USA.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CODE4STRUCT Code Generation for Few-Shot Event Structure Prediction Xingyao Wang and Sha Li and Heng Ji University of Illinois Urbana-Champaign IL USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: