Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

2025-05-03 0 0 8.14MB 12 页 10玖币
侵权投诉
Schema-aware Reference as Prompt Improves Data-Eicient
Knowledge Graph Construction
Yunzhi Yao
Zhejiang University
AZFT Joint Lab for Knowledge Engine
Zhejiang, China
yyztodd@zju.edu.cn
Shengyu Mao
Zhejiang University
AZFT Joint Lab for Knowledge Engine
Zhejiang, China
shengyu@zju.edu.cn
Ningyu Zhang
Zhejiang University
AZFT Joint Lab for Knowledge Engine
Zhejiang, China
zhangningyu@zju.edu.cn
Xiang Chen
Zhejiang University
AZFT Joint Lab for Knowledge Engine
Zhejiang, China
xiang_chen@zju.edu.cn
Shumin Deng
National University of Singapore
NUS-NCS Joint Lab
Singapore
shumin@nus.edu.sg
Xi Chen
Tencent
Guangdong, China
jasonxchen@tencent.com
Huajun Chen
Zhejiang University
AZFT Joint Lab for Knowledge Engine
Donghai Laboratory
Zhejiang, China
huajunsir@zju.edu.cn
ABSTRACT
With the development of pre-trained language models, many prompt-
based approaches to data-ecient knowledge graph construction
have achieved impressive performance. However, existing prompt-
based learning methods for knowledge graph construction are still
susceptible to several potential limitations: (i) semantic gap between
natural language and output structured knowledge with pre-dened
schema, which means model cannot fully exploit semantic knowl-
edge with the constrained templates; (ii) representation learning
with locally individual instances limits the performance given the
insucient features, which are unable to unleash the potential ana-
logical capability of pre-trained language models. Motivated by
these observations, we propose a retrieval-augmented approach,
which retrieves schema-aware Reference AsPrompt (RAP), for
data-ecient knowledge graph construction. It can dynamically
leverage schema and knowledge inherited from human-annotated
and weak-supervised data as a prompt for each sample, which is
model-agnostic and can be plugged into widespread existing ap-
proaches. Experimental results demonstrate that previous methods
integrated with RAP can achieve impressive performance gains in
Equal contribution and shared co-rst authorship.
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9408-6/23/07. . . $15.00
https://doi.org/10.1145/3539618.3591763
low-resource settings on ve datasets of relational triple extraction
and event extraction for knowledge graph construction1.
CCS CONCEPTS
Information systems
Information retrieval;Language
models.
KEYWORDS
Triple Extraction, Event Extraction, Prompt-based Learning
ACM Reference Format:
Yunzhi Yao, Shengyu Mao, Ningyu Zhang, Xiang Chen, Shumin Deng, Xi
Chen, and Huajun Chen. 2023. Schema-aware Reference as Prompt Im-
proves Data-Ecient Knowledge Graph Construction. In Proceedings of the
46th International ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM,
New York, NY, USA, 12 pages. https://doi.org/10.1145/3539618.3591763
1 INTRODUCTION
Knowledge Graphs (KGs) as a form of structured knowledge can
provide back-end support for various practical applications, includ-
ing information retrieval [
51
], question answering [
14
], and recom-
mender systems [
5
,
48
]. Knowledge graph construction aims to au-
tomatically retrieve specic relational triples and events from texts
[
22
]. Most prior works on knowledge graph extraction rely on a
large amount of labeled data for training [
57
]; however, high-quality
annotations are expensive to obtain. Thus, many data-ecient ap-
proaches have been proposed [
10
], in which prompt-based learn-
ing with Pre-trained Language Models (PLMs) yields promising
performance. For example, [
12
] designs a structured prompt tem-
plate for generating synthetic relation samples for data-ecient
1Code is available in https://github.com/zjunlp/RAP.
arXiv:2210.10709v5 [cs.CL] 18 Sep 2023
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Yunzhi Yao, et al.
Schema-aware references
Convict
Movement Transport
convicted
transport
PER
LOC
vehicle
Prompt
Schema graph
Instance
Retrieval
Figure 1: Schema-aware reference as prompt. We construct
a schema-instance hybrid reference store from which we
retrieve related knowledge as a prompt for data-ecient
learning with PLMs (e.g., BART [24]).
relational triple extraction. [
18
] formulates event extraction as a
conditional generation problem with a manually designed prompt,
which achieves high performance with only a few training data.
Existing methods have notable limitations. Unlike general NLP
tasks, knowledge graph construction requires structured prediction
that adheres to a pre-dened schema. Raw text data for PLMs may
not have sucient task-specic patterns, leading to a semantic
gap between the input sequence and schema. Constrained prompt
templates struggle to fully utilize semantic knowledge and generate
schema-conforming outputs. Moreover, prior prompt-based learn-
ing relies on the parametric-based paradigm, which is unable to
unleash the potential analogical capability of pre-trained language
models [
4
]. Notably, they may fail to generalize well for complex
examples and perform unstably with limited training data since
the scarce or complex examples are not easy to be learned in para-
metric space during optimization. For example, texts mentioning
the same event type can vary signicantly in structure and expres-
sion. “A man was hacked to death by the criminal” and “The aircraft
received re from an enemy machine gun” both describe an Attack
event, although they are almost literally dierent. With only few-
shot training samples, the model may struggle to discriminate such
complex patterns and extract correct information.
To overcome the aforementioned limitations, we try to fully
leverage the schema and global information in training data as
references for help. Note that humans can use associative learning
to recall relevant skills in memories to conquer complex tasks with
little practice. Similarly, given the insucient features of a single
sentence in the low-resource setting, it is benecial to leverage
that schema knowledge and the similar annotated examples to
enrich the semantics of individual instances and provide reference
[
49
]. Motivated by this, as shown in Figure 1, we propose a novel
approach of schema-aware Reference AsPrompt (RAP), which
dynamically leverages symbolic schema and knowledge inherited
from examples as prompts to enhance the PLMs for knowledge
graph construction.
However, there exist two problems: (1) Collecting reference knowl-
edge: Since rich schema and training instances are complemen-
tary to each other, it is necessary to combine and map these data
accordingly to construct reference store. (2) Leveraging reference
knowledge: Plugin-in-play integrating those reference knowledge to
existing KG construction models is also challenging since there are
various types of models (e.g., generation-based and classication-
based methods).
To address the problem of collecting reference knowledge, we
propose a schema-aware reference store that enriches schema with
text instances. Specically, we align instances from human-annotated
and weak-supervised text with structured schema; thus, symbolic
knowledge and textual corpora are in the same space for repre-
sentation learning. Then we construct a unied reference store
containing the knowledge derived from both symbolic schema and
training instances. To address the problem of leveraging reference
knowledge, we propose retrieval-based reference integration to se-
lect informative knowledge as prompts [
54
]. Since not all external
knowledge is advantageous, we utilize a retrieval-based method
to dynamically select knowledge as prompts that are the most
relevant to the input sequence from the schema-aware reference
store. In this way, each sample can achieve diverse and suitable
knowledgeable prompts that can provide rich symbolic guidance
in low-resource settings.
To demonstrate the eectiveness of our proposed RAP, we ap-
ply it to knowledge graph construction tasks of relational triple
extraction and event extraction tasks. Note that our approach is
model-agnostic and readily pluggable into any previous approaches.
We evaluate the model on two relation triple extraction datasets:
NYT and WebNLG, and two event extraction datasets: ACE05-E
and CASIE. Experimental results show that the RAP model can
perform better in low-resource settings.
2 PRELIMINARIES
In this paper, we apply our approach, RAP, to two representative
tasks of knowledge graph construction, namely: relation triple
extraction and event extraction.
2.1 Task Denition
Event Extraction. Event extraction is the process of automatically
extracting events from unstructured natural language texts, guided
by an event schema. To clarify the process, the following terms
are used: a trigger word is a word or phrase that most accurately
describes the event, and an event argument is an entity or attribute
involved in the event, such as the time or tool used. For example,
the sentence “A man was hacked to death by the criminal” describes
an Attack event triggered by the word ‘hacked’. This event includes
two argument roles: the Attacker (criminal) and the Victim (a man).
The model should be able to identify event triggers, their types,
arguments, and their corresponding roles.
Relation Triple Extraction. Joint extraction of entity mentions
and their relations which are in the form of a triple (subject, relation,
object) from unstructured texts, is an important task in knowledge
graph construction. Given the input sentences, the desired outputs
are relational triples
(𝑒ℎ𝑒𝑎𝑑, 𝑟, 𝑒𝑡𝑎𝑖𝑙 )
, where
𝑒ℎ𝑒𝑎𝑑
is the head entity,
𝑟
is the relation, and
𝑒𝑡𝑎𝑖𝑙
is the tail entity. For instance, given
Schema-aware Reference as Prompt Improves Data-Eicient Knowledge Graph Construction SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
Hariri submitted his resignation during a
10minute meeting with the head of state at the
Baabda presidential palace.
Event Extraction
Relation Triple Extraction
Paul Allen, a co-founder of Microsoft, paid the bills
for aircraft designer Burt Rutan to develop Space
Ship One.
He commanded several ships to transport convicted
felons from London to Maryland.
Justice
Convict
Movement
Transport
convicted
transport
PER
LOC
vehicle
Schema
Text Instances
Schema Graph
Event Trigger
Meeting
Person I
Hariri
Person II
head of state
Place
Palace
Duration
10-minute
Retrieve
Head Entity
Paul Allen
Relation
Company
Tail Entity
Microsoft
(b) Task Input
(a) Reference Store Construction
(c) Reference as Prompt (d)Knowledge Graph Construction
Language
Models
Contact
The meeting … with the delegates … Meet
, Meet
meet
Person
Text
Location
… through his company, Virgin Galactic …
Company
, Company
Text
Organization
context node
Type node
trigger node
argument node
Person
He commanded several ships to
transport convicted
felons from London to Maryland.
Lookup Table
guilty
Convict
transport
Transport
……
Charge_Indict
Justice
Convict
convicted
guilty
verdict
charge
indict
Adjudicator
Word Sense
Disambiguation
Mapping
Figure 2: The architecture of schema-aware Reference As Prompt (RAP), which is model-agnostic and is readily pluggable into
many existing KGC approaches TEXT2EVENT [34], DEGREE [18], PRGC [57], RelationPrompt [12] and so on.
the sentence “His 35-year career at Moil Oil included a four-year
assignment in Tokyo, Japan as head of Mobil Far East., the model
should identify two entities Tokyo and Japan and their relation
capital-of, described as triple (Tokyo, capital-of, Japan).
2.2 Problem Formulation
Given an original text
𝑋
, the purpose of the information extraction
task is to obtain target information
Y={Y1, ..., Y𝑡}
, where
Y𝑖, 𝑖
𝑡
represents the information to extract for the j-th type, and
𝑡
refer
to the number of types. For the relation triple extraction task,
Y𝑖
is in the form of triples
Y𝑖=(𝑒ℎ𝑒𝑎𝑑, 𝑟, 𝑒𝑡𝑎𝑖𝑙 )
, including the head
entity, tail entity, and their relation. For the event extraction,
Y𝑖
contains the corresponding event record in the sentence, which can
be represented as
Y𝑖={𝑒𝑣𝑒𝑛𝑡 𝑡𝑦𝑝𝑒, 𝑡𝑟𝑖𝑔𝑔𝑒𝑟, 𝑎𝑟𝑔𝑢𝑚𝑒𝑛𝑡 𝑟𝑜𝑙𝑒}
. In
the following part, we will introduce the prompt construction and
application details.
3 METHODOLOGY
Figure 2 illustrates the framework of RAP. We collect knowledge
from dierent sources and construct a schema-aware reference store
(Section 3.1). Then, we dynamically retrieve related references for
each query as the prompt to inject into the model (Section 3.2).
3.1
Schema-aware Reference Store Construction
3.1.1 Base Reference Store.The base reference store contains
the text instances
I
which contain a wealth of information that may
share semantic similarities with the query
X
. A well-sized retrieval
source is crucial for the text instances, as too large of a textbase
can lead to noise and increased search space, while too small of a
textbase would be ineective. Previous research [
46
] indicates that
using training data as the datastore can improve downstream tasks;
therefore, we use training data to construct the base reference store.
3.1.2 Schema-instance Hybrid Reference Store.Since the
base reference store does not contain any structure schema knowl-
edge; we employ schema information to augment the references. A
task schema is a symbolic graph
G
describing the conguration of
each target type. As demonstrated in Figure 2, these nodes (knowl-
edge types) are connected through their intrinsic relationships.
Taking the event extraction task as an example, the event ‘meet
is linked with ‘Meet’ since ‘meet’ is a trigger word for the Meet
event. For the event extraction task, the schema graph includes
three types of nodes: the event type
E
, trigger word
T
, and argu-
ment role
A
. We follow previous work [
21
,
28
,
31
] and leverage
the event schema
2
provided by the dataset. For the relational triple
extraction task, the schema graph contains both the relation type
R
and the entity information
S
, and we build the schema graph
based on the original dataset such as WebNLG or NYT. The base
reference store contains the labeled training data and we link the
text instance to the schema graph Gbased on the label.
Note that the size of the schema-aware reference store is based
on the number of annotated training data; however, high-quality
data is usually scarce due to the expensive cost of annotation in
low-resource scenarios. Since previous work [
37
] has demonstrated
that randomly replacing labels in the demonstrations barely hurts
2
www.ldc.upenn.edu/sites/www.ldc.upenn.edu/les/english-events-guidelines-
v5.4.3.pdf
摘要:

Schema-awareReferenceasPromptImprovesData-EfficientKnowledgeGraphConstructionYunzhiYao∗ZhejiangUniversityAZFTJointLabforKnowledgeEngineZhejiang,Chinayyztodd@zju.edu.cnShengyuMao∗ZhejiangUniversityAZFTJointLabforKnowledgeEngineZhejiang,Chinashengyu@zju.edu.cnNingyuZhang†ZhejiangUniversityAZFTJointLab...

展开>> 收起<<
Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:8.14MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注