Knowledge Prompts Injecting World Knowledge into Language Models through Soft Prompts Cicero Nogueira dos Santos Zhe Dong Daniel Cer John Nham

2025-05-03 0 0 1.4MB 10 页 10玖币
侵权投诉
Knowledge Prompts: Injecting World Knowledge into
Language Models through Soft Prompts
Cicero Nogueira dos Santos, Zhe Dong, Daniel Cer, John Nham,
Siamak Shakeri, Jianmo Ni, Yun-hsuan Sung
Google Research
{cicerons, zhedong, cer, jnham, siamaks, jianmon, yhsung}@google.com
Abstract
Soft prompts have been recently proposed as
a tool for adapting large frozen language mod-
els (LMs) to new tasks. In this work, we repur-
pose soft prompts to the task of injecting world
knowledge into LMs. We introduce a method
to train soft prompts via self-supervised learn-
ing on data from knowledge bases. The result-
ing soft knowledge prompts (KPs) are task in-
dependent and work as an external memory of
the LMs. We perform qualitative and quanti-
tative experiments and demonstrate that: (1)
KPs can effectively model the structure of the
training data; (2) KPs can be used to improve
the performance of LMs in different knowl-
edge intensive tasks.
1 Introduction
Very large neural language models (LMs) are
known to perform well on knowledge intensive
natural language understanding (NLU) tasks, be-
cause they memorize a significant amount of world
knowledge from the training data. The larger the
LM, the more facts it can memorize at the training
time, and the better the results at the inference time
(Roberts et al.,2020). Despite their success, these
models also present some important drawbacks
such as: the parametric memory of these models
have a fixed size and cannot grow (or shrink) over
time without fully retraining the model; there is
no control in terms of which part of the memory
stores data about what; facts that do not co-occur
frequently in the training data are not well repre-
sented in the model; very large models are needed
to memorize enough data in order to perform well
on knowledge intensive tasks such as generative
question answering; and at last, but not the least,
the memorized knowledge gets obsolete overtime,
and requires re-training the model for refreshness.
In this work, we employ soft prompts to over-
come some of these issues of LMs. Soft prompts
(Lester et al.,2021;Li and Liang,2021;Ham-
bardzumyan et al.,2021) have been recently pro-
posed as a tool for adapting large frozen LMs
to new tasks. Nevertheless, we repurpose soft
prompts to the task of injecting world knowledge
into LMs. The goal is to train an external memory
that is composed of a large set of soft prompts
that encode world knowledge. We introduce a
method to train knowledge driven soft prompts via
self-supervised learning on data from knowledge
bases. The resulting soft prompts, which we call
knowledge prompts (KPs), function as an auxiliary
memory of the LMs that is activated when solving
knowledge intensive tasks. Different from regu-
lar applications of soft prompts that concatenate a
fixed small set of embeddings to every input, our
approach learns a very large set of KPs, which are
sparsely activated depending on the input.
We focus on entity-centric KPs, which means
that each prompt primarily encodes information
about one entity from a knowledge base. We use
Wikidata (Vrandeˇ
ci´
c and Krötzsch,2014) triples
as our training data and train KPs for the top 1.1M
entities, based on the number of triples. We present
a qualitative analysis of KPs using t-SNE plots
and k-nearest neighbors approaches. In terms of
quantitative analysis, we show experimental results
for three knowledge intensive tasks: question an-
swering, fact checking and relation classification.
For all datasets, the use of KPs improves the per-
formance of the T5 baseline. Our experimental
results demonstrate that KPs are an effective way
to expand the memory of frozen LMs.
The main contributions of this work are the fol-
lowing:
we propose a self-supervised approach to train
knowledge driven soft prompts that can be
used to inject world knowledge into LMs.
we demonstrate that knowledge prompts can
effectively model the structure of the training
data and can also improve the performance of
arXiv:2210.04726v1 [cs.CL] 10 Oct 2022
Figure 1: Training of Knowledge Prompts: given a serialized KB triple where one of the entities has been
masked out, the frozen LM has to predict the masked entity given the input and the knowledge prompt of the
non-masked entity, which is Michelle Obama in the example. The cross-entropy loss is computed and the error is
back-propagated through the frozen LM in order to update the KP.
LMs on knowledge intensive tasks.
this work sheds light on the usability of soft
prompts for storing data rather than storing
instructions on how to solve specific tasks.
2 Methods
2.1 Soft Prompts
Different approaches have been recently proposed
to train soft prompts (Lester et al.,2021;Li and
Liang,2021;Hambardzumyan et al.,2021). One
of the most popular methods, and probably the
simplest one, consists of the following steps (Lester
et al.,2021):
(1)
for a task in the dataset, prepend a fixed num-
ber of embeddings (soft prompts) to the word
embeddings of every input;
(2)
during finetuning, update the soft prompt
while keeping all the other parameters of the
LM frozen.
Despite its simplicity, this approach has demon-
strated to be very effective when used with large
language models.
2.2 Soft Knowledge Prompts
We are interested in training soft knowledge
prompts (KPs) to encode world knowledge, which
could work as an external memory for LMs. In
this work, we focus on the training of entity-centric
KPs, each of which stores the knowledge related
to a specific entity from a knowledge base (KB).
In other words, the KP of an entity encodes in-
formation from the KB triples that mention the
entity either as a subject or an object. We adopt
KB triples from Wikidata (Vrandeˇ
ci´
c and Krötzsch,
2014), as a simple and trustworthy source of world
knowledge.
2.2.1 KP Training
We train KPs with a masked language model-
ing (MLM) objective (Devlin et al.,2019;Taylor,
1953), where the goal is to generate the object entity
of a KB triple given the subject entity and relation,
and vice versa. As an example, the input / target
pair "
Germany capital <MASK>
"/"
Berlin
" will
be used to update the KP for
Germany
, while the
pair "
<MASK> capital Berlin
"/"
Germany
" will
be used to update the KP for Berlin.
The KPs are randomly initialized, and are up-
dated only when the corresponding entities appear
(not masked) in the input. This makes the training
of KPs sparse and parallelizable.
Given an input triple with the object entity being
masked, a training iteration has the following steps:
(1)
retrieve the KP of the subject entity, which is
a simple lookup operation;
摘要:

KnowledgePrompts:InjectingWorldKnowledgeintoLanguageModelsthroughSoftPromptsCiceroNogueiradosSantos,ZheDong,DanielCer,JohnNham,SiamakShakeri,JianmoNi,Yun-hsuanSungGoogleResearchfcicerons,zhedong,cer,jnham,siamaks,jianmon,yhsungg@google.comAbstractSoftpromptshavebeenrecentlyproposedasatoolforadapting...

展开>> 收起<<
Knowledge Prompts Injecting World Knowledge into Language Models through Soft Prompts Cicero Nogueira dos Santos Zhe Dong Daniel Cer John Nham.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.4MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注