Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

2025-05-03 0 0 8.14MB 12 页 10玖币

侵权投诉

Schema-aware Reference as Prompt Improves Data-Eicient

Knowledge Graph Construction

Yunzhi Yao∗

Zhejiang University

AZFT Joint Lab for Knowledge Engine

Zhejiang, China

yyztodd@zju.edu.cn

Shengyu Mao∗

Zhejiang University

AZFT Joint Lab for Knowledge Engine

Zhejiang, China

shengyu@zju.edu.cn

Ningyu Zhang†

Zhejiang University

AZFT Joint Lab for Knowledge Engine

Zhejiang, China

zhangningyu@zju.edu.cn

Xiang Chen

Zhejiang University

AZFT Joint Lab for Knowledge Engine

Zhejiang, China

xiang_chen@zju.edu.cn

Shumin Deng

National University of Singapore

NUS-NCS Joint Lab

Singapore

shumin@nus.edu.sg

Xi Chen

Tencent

Guangdong, China

jasonxchen@tencent.com

Huajun Chen†

Zhejiang University

AZFT Joint Lab for Knowledge Engine

Donghai Laboratory

Zhejiang, China

huajunsir@zju.edu.cn

ABSTRACT

With the development of pre-trained language models, many prompt-

based approaches to data-ecient knowledge graph construction

have achieved impressive performance. However, existing prompt-

based learning methods for knowledge graph construction are still

susceptible to several potential limitations: (i) semantic gap between

natural language and output structured knowledge with pre-dened

schema, which means model cannot fully exploit semantic knowl-

edge with the constrained templates; (ii) representation learning

with locally individual instances limits the performance given the

insucient features, which are unable to unleash the potential ana-

logical capability of pre-trained language models. Motivated by

these observations, we propose a retrieval-augmented approach,

which retrieves schema-aware Reference AsPrompt (RAP), for

data-ecient knowledge graph construction. It can dynamically

leverage schema and knowledge inherited from human-annotated

and weak-supervised data as a prompt for each sample, which is

model-agnostic and can be plugged into widespread existing ap-

proaches. Experimental results demonstrate that previous methods

integrated with RAP can achieve impressive performance gains in

∗Equal contribution and shared co-rst authorship.

†Corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

ACM ISBN 978-1-4503-9408-6/23/07. . . $15.00

https://doi.org/10.1145/3539618.3591763

low-resource settings on ve datasets of relational triple extraction

and event extraction for knowledge graph construction1.

CCS CONCEPTS

•Information systems

→

Information retrieval;Language

models.

KEYWORDS

Triple Extraction, Event Extraction, Prompt-based Learning

ACM Reference Format:

Yunzhi Yao, Shengyu Mao, Ningyu Zhang, Xiang Chen, Shumin Deng, Xi

Chen, and Huajun Chen. 2023. Schema-aware Reference as Prompt Im-

proves Data-Ecient Knowledge Graph Construction. In Proceedings of the

46th International ACM SIGIR Conference on Research and Development in

Information Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM,

New York, NY, USA, 12 pages. https://doi.org/10.1145/3539618.3591763

1 INTRODUCTION

Knowledge Graphs (KGs) as a form of structured knowledge can

provide back-end support for various practical applications, includ-

ing information retrieval [

], question answering [

], and recom-

mender systems [

]. Knowledge graph construction aims to au-

tomatically retrieve specic relational triples and events from texts

[

]. Most prior works on knowledge graph extraction rely on a

large amount of labeled data for training [

]; however, high-quality

annotations are expensive to obtain. Thus, many data-ecient ap-

proaches have been proposed [

], in which prompt-based learn-

ing with Pre-trained Language Models (PLMs) yields promising

performance. For example, [

] designs a structured prompt tem-

plate for generating synthetic relation samples for data-ecient

1Code is available in https://github.com/zjunlp/RAP.

arXiv:2210.10709v5 [cs.CL] 18 Sep 2023

SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Yunzhi Yao, et al.

Schema-aware references

Convict

Movement Transport

convicted

transport

PER

LOC

vehicle

Prompt

Schema graph

Instance

Retrieval

Figure 1: Schema-aware reference as prompt. We construct

a schema-instance hybrid reference store from which we

retrieve related knowledge as a prompt for data-ecient

learning with PLMs (e.g., BART [24]).

relational triple extraction. [

] formulates event extraction as a

conditional generation problem with a manually designed prompt,

which achieves high performance with only a few training data.

Existing methods have notable limitations. Unlike general NLP

tasks, knowledge graph construction requires structured prediction

that adheres to a pre-dened schema. Raw text data for PLMs may

not have sucient task-specic patterns, leading to a semantic

gap between the input sequence and schema. Constrained prompt

templates struggle to fully utilize semantic knowledge and generate

schema-conforming outputs. Moreover, prior prompt-based learn-

ing relies on the parametric-based paradigm, which is unable to

unleash the potential analogical capability of pre-trained language

models [

]. Notably, they may fail to generalize well for complex

examples and perform unstably with limited training data since

the scarce or complex examples are not easy to be learned in para-

metric space during optimization. For example, texts mentioning

the same event type can vary signicantly in structure and expres-

sion. “A man was hacked to death by the criminal” and “The aircraft

received re from an enemy machine gun” both describe an Attack

event, although they are almost literally dierent. With only few-

shot training samples, the model may struggle to discriminate such

complex patterns and extract correct information.

To overcome the aforementioned limitations, we try to fully

leverage the schema and global information in training data as

references for help. Note that humans can use associative learning

to recall relevant skills in memories to conquer complex tasks with

little practice. Similarly, given the insucient features of a single

sentence in the low-resource setting, it is benecial to leverage

that schema knowledge and the similar annotated examples to

enrich the semantics of individual instances and provide reference

[

]. Motivated by this, as shown in Figure 1, we propose a novel

approach of schema-aware Reference AsPrompt (RAP), which

dynamically leverages symbolic schema and knowledge inherited

from examples as prompts to enhance the PLMs for knowledge

graph construction.

However, there exist two problems: (1) Collecting reference knowl-

edge: Since rich schema and training instances are complemen-

tary to each other, it is necessary to combine and map these data

accordingly to construct reference store. (2) Leveraging reference

knowledge: Plugin-in-play integrating those reference knowledge to

existing KG construction models is also challenging since there are

various types of models (e.g., generation-based and classication-

based methods).

To address the problem of collecting reference knowledge, we

propose a schema-aware reference store that enriches schema with

text instances. Specically, we align instances from human-annotated

and weak-supervised text with structured schema; thus, symbolic

knowledge and textual corpora are in the same space for repre-

sentation learning. Then we construct a unied reference store

containing the knowledge derived from both symbolic schema and

training instances. To address the problem of leveraging reference

knowledge, we propose retrieval-based reference integration to se-

lect informative knowledge as prompts [

]. Since not all external

knowledge is advantageous, we utilize a retrieval-based method

to dynamically select knowledge as prompts that are the most

relevant to the input sequence from the schema-aware reference

store. In this way, each sample can achieve diverse and suitable

knowledgeable prompts that can provide rich symbolic guidance

in low-resource settings.

To demonstrate the eectiveness of our proposed RAP, we ap-

ply it to knowledge graph construction tasks of relational triple

extraction and event extraction tasks. Note that our approach is

model-agnostic and readily pluggable into any previous approaches.

We evaluate the model on two relation triple extraction datasets:

NYT and WebNLG, and two event extraction datasets: ACE05-E

and CASIE. Experimental results show that the RAP model can

perform better in low-resource settings.

2 PRELIMINARIES

In this paper, we apply our approach, RAP, to two representative

tasks of knowledge graph construction, namely: relation triple

extraction and event extraction.

2.1 Task Denition

Event Extraction. Event extraction is the process of automatically

extracting events from unstructured natural language texts, guided

by an event schema. To clarify the process, the following terms

are used: a trigger word is a word or phrase that most accurately

describes the event, and an event argument is an entity or attribute

involved in the event, such as the time or tool used. For example,

the sentence “A man was hacked to death by the criminal” describes

an Attack event triggered by the word ‘hacked’. This event includes

two argument roles: the Attacker (criminal) and the Victim (a man).

The model should be able to identify event triggers, their types,

arguments, and their corresponding roles.

Relation Triple Extraction. Joint extraction of entity mentions

and their relations which are in the form of a triple (subject, relation,

object) from unstructured texts, is an important task in knowledge

graph construction. Given the input sentences, the desired outputs

are relational triples

(𝑒ℎ𝑒𝑎𝑑, 𝑟, 𝑒𝑡𝑎𝑖𝑙 )

, where

𝑒ℎ𝑒𝑎𝑑

is the head entity,

𝑟

is the relation, and

𝑒𝑡𝑎𝑖𝑙

is the tail entity. For instance, given

Schema-aware Reference as Prompt Improves Data-Eicient Knowledge Graph Construction SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

Hariri submitted his resignation during a

10minute meeting with the head of state at the

Baabda presidential palace.

Event Extraction

Relation Triple Extraction

Paul Allen, a co-founder of Microsoft, paid the bills

for aircraft designer Burt Rutan to develop Space

Ship One.

He commanded several ships to transport convicted

felons from London to Maryland.

Instance

Justice

Convict

Movement

Transport

convicted

transport

PER

LOC

vehicle

Schema

Text Instances

Schema Graph

Event Trigger

Meeting

Person I

Hariri

Person II

head of state

Place

Palace

Duration

10-minute

Retrieve

Head Entity

Paul Allen

Relation

Company

Tail Entity

Microsoft

(b) Task Input

(a) Reference Store Construction

Language

Models

Contact

The meeting … with the delegates … Meet

, Meet ，

meet

Person

Text

Location

… through his company, Virgin Galactic …

Company

, Company，

Text

Organization

context node

Type node

trigger node

argument node

Person

He commanded several ships to

transport convicted

felons from London to Maryland.

Lookup Table

guilty

Convict

transport

Transport

……

Charge_Indict

Justice

Convict

convicted

guilty

verdict

charge

indict

Adjudicator

Word Sense

Disambiguation

Mapping

Figure 2: The architecture of schema-aware Reference As Prompt (RAP), which is model-agnostic and is readily pluggable into

many existing KGC approaches TEXT2EVENT [34], DEGREE [18], PRGC [57], RelationPrompt [12] and so on.

the sentence “His 35-year career at Moil Oil included a four-year

assignment in Tokyo, Japan as head of Mobil Far East.”, the model

should identify two entities Tokyo and Japan and their relation

capital-of, described as triple (Tokyo, capital-of, Japan).

2.2 Problem Formulation

Given an original text

𝑋

, the purpose of the information extraction

task is to obtain target information

Y={Y1, ..., Y𝑡}

, where

Y𝑖, 𝑖 ∈

𝑡

represents the information to extract for the j-th type, and

𝑡

refer

to the number of types. For the relation triple extraction task,

Y𝑖

is in the form of triples

Y𝑖=(𝑒ℎ𝑒𝑎𝑑, 𝑟, 𝑒𝑡𝑎𝑖𝑙 )

, including the head

entity, tail entity, and their relation. For the event extraction,

Y𝑖

contains the corresponding event record in the sentence, which can

be represented as

Y𝑖={𝑒𝑣𝑒𝑛𝑡 −𝑡𝑦𝑝𝑒, 𝑡𝑟𝑖𝑔𝑔𝑒𝑟, 𝑎𝑟𝑔𝑢𝑚𝑒𝑛𝑡 −𝑟𝑜𝑙𝑒}

. In

the following part, we will introduce the prompt construction and

application details.

3 METHODOLOGY

Figure 2 illustrates the framework of RAP. We collect knowledge

from dierent sources and construct a schema-aware reference store

(Section 3.1). Then, we dynamically retrieve related references for

each query as the prompt to inject into the model (Section 3.2).

3.1

Schema-aware Reference Store Construction

3.1.1 Base Reference Store.The base reference store contains

the text instances

which contain a wealth of information that may

share semantic similarities with the query

. A well-sized retrieval

source is crucial for the text instances, as too large of a textbase

can lead to noise and increased search space, while too small of a

textbase would be ineective. Previous research [

] indicates that

using training data as the datastore can improve downstream tasks;

therefore, we use training data to construct the base reference store.

3.1.2 Schema-instance Hybrid Reference Store.Since the

base reference store does not contain any structure schema knowl-

edge; we employ schema information to augment the references. A

task schema is a symbolic graph

describing the conguration of

each target type. As demonstrated in Figure 2, these nodes (knowl-

edge types) are connected through their intrinsic relationships.

Taking the event extraction task as an example, the event ‘meet’

is linked with ‘Meet’ since ‘meet’ is a trigger word for the Meet

event. For the event extraction task, the schema graph includes

three types of nodes: the event type

, trigger word

, and argu-

ment role

. We follow previous work [

] and leverage

the event schema

provided by the dataset. For the relational triple

extraction task, the schema graph contains both the relation type

and the entity information

, and we build the schema graph

based on the original dataset such as WebNLG or NYT. The base

reference store contains the labeled training data and we link the

text instance to the schema graph Gbased on the label.

Note that the size of the schema-aware reference store is based

on the number of annotated training data; however, high-quality

data is usually scarce due to the expensive cost of annotation in

low-resource scenarios. Since previous work [

] has demonstrated

that randomly replacing labels in the demonstrations barely hurts

www.ldc.upenn.edu/sites/www.ldc.upenn.edu/les/english-events-guidelines-

v5.4.3.pdf

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Schema-awareReferenceasPromptImprovesData-EfficientKnowledgeGraphConstructionYunzhiYao∗ZhejiangUniversityAZFTJointLabforKnowledgeEngineZhejiang,Chinayyztodd@zju.edu.cnShengyuMao∗ZhejiangUniversityAZFTJointLabforKnowledgeEngineZhejiang,Chinashengyu@zju.edu.cnNingyuZhang†ZhejiangUniversityAZFTJointLab...

展开>> 收起<<

Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: