Can Language Models Be Specific How Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu13 1University of Illinois at Urbana-Champaign USA

2025-04-27 0 0 332.89KB 10 页 10玖币
侵权投诉
Can Language Models Be Specific? How?
Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu1,3
1University of Illinois at Urbana-Champaign, USA
2University at Buffalo, USA
3NVIDIA, USA
{jeffhj, kcchang, w-hwu}@illinois.edu
jinjun@buffalo.edu
Abstract
He is a person”, “Paris is located on the
earth”. Both statements are correct but mean-
ingless – due to lack of specificity. In this paper,
we propose to measure how specific the lan-
guage of pre-trained language models (PLMs)
is. To achieve this, we introduce a novel ap-
proach to build a benchmark for specificity test-
ing by forming masked token prediction tasks
with prompts. For instance, given “Toronto is
located in [MASK].”, we want to test whether
a more specific answer will be better filled in
by PLMs, e.g., Ontario instead of Canada.
From our evaluations, we show that existing
PLMs have only a slight preference for more
specific answers. We identify underlying fac-
tors affecting the specificity and design two
prompt-based methods to improve the speci-
ficity. Results show that the specificity of the
models can be improved by the proposed meth-
ods without additional training. We hope this
work can bring to awareness the notion of speci-
ficity of language models and encourage the
research community to further explore this im-
portant but understudied problem.1
1 Introduction
Pre-trained language models (PLMs) such as BERT
(Devlin et al.,2019) and GPT-2/3 (Radford et al.,
2019;Brown et al.,2020) have achieved quite im-
pressive results in various natural language pro-
cessing tasks. Recent works show that the param-
eters of these models contain significant amounts
of knowledge (Petroni et al.,2019;Roberts et al.,
2020;Jiang et al.,2020a,b;Wang et al.,2020), and
knowledge stored in PLMs can be extracted by
predicting the mask token(s) using prompts. For
instance, given prompt “J. K. Rowling was born
in [MASK].”, PLMs can predict the birthplace of
Rowling based on its knowledge.
1
Code and data are available at
https://github.com/
jeffhj/S-TEST.
Toronto is located on the earth.
Dante is a person.
Cat is a subclass of animal.
Figure 1: Examples of language modeling that lack
specificity. More specific descriptions could be:
feline
,
poet, and in Ontario, respectively.
However, there may exist multiple answers for
a query, while not all answers are equally specific.
In many situations, we desire a specific answer.
For the example above, the masked token can be re-
placed by Yate (a town), Gloucestershire (a county),
or England (a country). To acquire the maximum
knowledge (in this example, the town, the county,
and the country where Rowling was born), we may
prefer the model to fill in Yate since Gloucester-
shire and England can be further predicted using
prompts, e.g., “Yate is located in [MASK].” This
means, if the prediction is more specific, we can re-
trieve more fine-grained information from language
models, and further acquire more information. Be-
sides, sometimes, the less specific answer is not
useful. For instance, it is well known that Chicago
is located in the USA, users will not get additional
information if the model only predicts Chicago is
located in the USA instead of Illinois. More exam-
ples are shown in Figure 1. To make an analogy: A
good speaker not only needs to be correct, but also
has the ability to be specific when desired. The
same is true for language models.
Although there are works on measuring how
much knowledge is stored in PLMs or improving
the correctness of the predictions (Petroni et al.,
2019;Roberts et al.,2020;Jiang et al.,2020b),
few attempted to measure or improve the speci-
ficity of predictions made by PLMs. Noteworthy
exceptions include the work by Adiwardana et al.
(2020); Thoppilan et al. (2022), who evaluated the
specificity of conversational language models. In
arXiv:2210.05159v2 [cs.CL] 26 May 2023
their research, specificity was defined and mea-
sured within a conversational context – for instance,
the response “Me too. I love Eurovision songs” is
deemed more specific than simply “Me too” to the
statement “I love Eurovision”. Understanding how
specific the language of PLMs is can help us better
understand the behavior of language models and
facilitate downstream applications such as ques-
tion answering, text generation, and information
extraction (Liu et al.,2021a;Khashabi et al.,2020;
Brown et al.,2020;Wang et al.,2020), e.g., mak-
ing the generated answers/sentences or extracted
information more specific or fine-grained.
Therefore, we propose to build a benchmark to
measure the specificity of the language of PLMs.
For reducing human effort and easier to further
expand the dataset (e.g., to specific domains), we
introduce a novel way to construct test data au-
tomatically based on transitive relations in Wiki-
data (Vrandeˇ
ci´
c and Krötzsch,2014). Specifi-
cally, we extract reasoning paths from Wikidata,
e.g., (
J. K. Rowling
,birthplace,
Yate
,location,
Gloucestershire
,location,England). Based on
the average distance of each object to the subject
and the property of transitive relations, we form
masked-token-prediction based probing tasks to
measure the specificity, e.g., whether the masked
token in “J. K. Rowling was born in [MASK].” is
better filled by Yate than England by PLMs. The
resulting benchmark dataset contains more than
20,000
probes covering queries from 5 different
categories. The quality of the benchmark is high,
where the judgment on which answer is more spe-
cific is 97% consistent with humans.
We provide in-depth analyses on model speci-
ficity and study two factors that affect the speci-
ficity with our benchmark. As shown by our evalu-
ations in Section 4, existing PLMs, e.g., BERT and
GPT-2, similarly have only a slight preference for
more specific answers (in only about
60%
of cases
where a more specific answer is preferred). We also
show that, in general, PLMs prefer less specific an-
swers without subjects given, and they only have
a weak ability to differentiate coarse-grained/fine-
grained objects by measuring their similarities to
subjects. The results indicate that specificity was
neglected by existing research on language models.
How to improve and control it is undoubtedly an
interesting and valuable problem.
Based on our observations and analyses, we pro-
pose two techniques to improve the specificity of
the predictions by modifying the prompts without
additional training: Few-shot Prompting, where
demonstrations with more specific answers are pro-
vided to guide the models to produce more specific
answers; and Cascade Prompting, where which
clauses are added as suffixes to bias the predictions
to be more specific. Results show that Few-shot
Prompting can improve the specificity for unidi-
rectional language models like GPT-2 well, while
Cascade Prompting works well for bidirectional
language models such as BERT.
The main contributions of our work are summa-
rized as follows:
We propose a novel automatic approach to
build a benchmark for specificity testing based
on the property of transitive relations.
We analyze the specificity of several existing
PLMs and study two factors that affect the
specificity.
We propose two methods to improve the speci-
ficity by modifying the prompts without addi-
tional training.
We provide in-depth analyses and discussions,
suggesting further works to explore and fur-
ther improve the specificity.
2 Background and Related Work
Pre-Trained Language Models: Pre-trained lan-
guage models (PLMs) are language models pre-
trained on large corpora. In this paper, we will
cover two types of pre-trained language models:
unidirectional language models, such as GPT-2
(Radford et al.,2019), where the prediction of the
current token is only based on previous tokens; and
bidirectional language models, such as BERT (De-
vlin et al.,2019) and RoBERTa (Liu et al.,2019),
where both left and right contexts are utilized to
predict the current token.
Knowledge Retrieval from LMs and Prompt-
ing: Previous works have worked on extracting
factual knowledge from PLMs without incorporat-
ing external knowledge, which is usually achieved
by creating prompts and letting PLMs predict the
masked token(s) (Petroni et al.,2019;Bouraoui
et al.,2020;Jiang et al.,2020a,b;Wang et al.,2020).
They demonstrated that PLMs contain a significant
amount of knowledge. By creating appropriate
prompts with some additional training, such meth-
ods can even achieve performance comparable to
SOTA for some specific tasks (Shin et al.,2020;
Liu et al.,2021b). Our work is inspired by these
摘要:

CanLanguageModelsBeSpecific?How?JieHuang1KevinChen-ChuanChang1JinjunXiong2Wen-meiHwu1,31UniversityofIllinoisatUrbana-Champaign,USA2UniversityatBuffalo,USA3NVIDIA,USA{jeffhj,kcchang,w-hwu}@illinois.edujinjun@buffalo.eduAbstract“Heisaperson”,“Parisislocatedontheearth”.Bothstatementsarecorrectbutmean-i...

展开>> 收起<<
Can Language Models Be Specific How Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu13 1University of Illinois at Urbana-Champaign USA.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:332.89KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注