Can Language Models Be Specific How Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu13 1University of Illinois at Urbana-Champaign USA

2025-04-27 0 0 332.89KB 10 页 10玖币

侵权投诉

Can Language Models Be Speciﬁc? How?

Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu1,3

1University of Illinois at Urbana-Champaign, USA

2University at Buffalo, USA

3NVIDIA, USA

{jeffhj, kcchang, w-hwu}@illinois.edu

jinjun@buffalo.edu

Abstract

“He is a person”, “Paris is located on the

earth”. Both statements are correct but mean-

ingless – due to lack of speciﬁcity. In this paper,

we propose to measure how speciﬁc the lan-

guage of pre-trained language models (PLMs)

is. To achieve this, we introduce a novel ap-

proach to build a benchmark for speciﬁcity test-

ing by forming masked token prediction tasks

with prompts. For instance, given “Toronto is

located in [MASK].”, we want to test whether

a more speciﬁc answer will be better ﬁlled in

by PLMs, e.g., Ontario instead of Canada.

From our evaluations, we show that existing

PLMs have only a slight preference for more

speciﬁc answers. We identify underlying fac-

tors affecting the speciﬁcity and design two

prompt-based methods to improve the speci-

ﬁcity. Results show that the speciﬁcity of the

models can be improved by the proposed meth-

ods without additional training. We hope this

work can bring to awareness the notion of speci-

ﬁcity of language models and encourage the

research community to further explore this im-

portant but understudied problem.1

1 Introduction

Pre-trained language models (PLMs) such as BERT

(Devlin et al.,2019) and GPT-2/3 (Radford et al.,

2019;Brown et al.,2020) have achieved quite im-

pressive results in various natural language pro-

cessing tasks. Recent works show that the param-

eters of these models contain signiﬁcant amounts

of knowledge (Petroni et al.,2019;Roberts et al.,

2020;Jiang et al.,2020a,b;Wang et al.,2020), and

knowledge stored in PLMs can be extracted by

predicting the mask token(s) using prompts. For

instance, given prompt “J. K. Rowling was born

in [MASK].”, PLMs can predict the birthplace of

Rowling based on its knowledge.

Code and data are available at

https://github.com/

jeffhj/S-TEST.

Toronto is located on the earth.

Dante is a person.

Cat is a subclass of animal.

Figure 1: Examples of language modeling that lack

speciﬁcity. More speciﬁc descriptions could be:

feline

poet, and in Ontario, respectively.

However, there may exist multiple answers for

a query, while not all answers are equally speciﬁc.

In many situations, we desire a speciﬁc answer.

For the example above, the masked token can be re-

placed by Yate (a town), Gloucestershire (a county),

or England (a country). To acquire the maximum

knowledge (in this example, the town, the county,

and the country where Rowling was born), we may

prefer the model to ﬁll in Yate since Gloucester-

shire and England can be further predicted using

prompts, e.g., “Yate is located in [MASK].” This

means, if the prediction is more speciﬁc, we can re-

trieve more ﬁne-grained information from language

models, and further acquire more information. Be-

sides, sometimes, the less speciﬁc answer is not

useful. For instance, it is well known that Chicago

is located in the USA, users will not get additional

information if the model only predicts Chicago is

located in the USA instead of Illinois. More exam-

ples are shown in Figure 1. To make an analogy: A

good speaker not only needs to be correct, but also

has the ability to be speciﬁc when desired. The

same is true for language models.

Although there are works on measuring how

much knowledge is stored in PLMs or improving

the correctness of the predictions (Petroni et al.,

2019;Roberts et al.,2020;Jiang et al.,2020b),

few attempted to measure or improve the speci-

ﬁcity of predictions made by PLMs. Noteworthy

exceptions include the work by Adiwardana et al.

(2020); Thoppilan et al. (2022), who evaluated the

speciﬁcity of conversational language models. In

arXiv:2210.05159v2 [cs.CL] 26 May 2023

their research, speciﬁcity was deﬁned and mea-

sured within a conversational context – for instance,

the response “Me too. I love Eurovision songs” is

deemed more speciﬁc than simply “Me too” to the

statement “I love Eurovision”. Understanding how

speciﬁc the language of PLMs is can help us better

understand the behavior of language models and

facilitate downstream applications such as ques-

tion answering, text generation, and information

extraction (Liu et al.,2021a;Khashabi et al.,2020;

Brown et al.,2020;Wang et al.,2020), e.g., mak-

ing the generated answers/sentences or extracted

information more speciﬁc or ﬁne-grained.

Therefore, we propose to build a benchmark to

measure the speciﬁcity of the language of PLMs.

For reducing human effort and easier to further

expand the dataset (e.g., to speciﬁc domains), we

introduce a novel way to construct test data au-

tomatically based on transitive relations in Wiki-

data (Vrandeˇ

ci´

c and Krötzsch,2014). Speciﬁ-

cally, we extract reasoning paths from Wikidata,

e.g., (

J. K. Rowling

,birthplace,

Yate

,location,

Gloucestershire

,location,England). Based on

the average distance of each object to the subject

and the property of transitive relations, we form

masked-token-prediction based probing tasks to

measure the speciﬁcity, e.g., whether the masked

token in “J. K. Rowling was born in [MASK].” is

better ﬁlled by Yate than England by PLMs. The

resulting benchmark dataset contains more than

20,000

probes covering queries from 5 different

categories. The quality of the benchmark is high,

where the judgment on which answer is more spe-

ciﬁc is ∼97% consistent with humans.

We provide in-depth analyses on model speci-

ﬁcity and study two factors that affect the speci-

ﬁcity with our benchmark. As shown by our evalu-

ations in Section 4, existing PLMs, e.g., BERT and

GPT-2, similarly have only a slight preference for

more speciﬁc answers (in only about

60%

of cases

where a more speciﬁc answer is preferred). We also

show that, in general, PLMs prefer less speciﬁc an-

swers without subjects given, and they only have

a weak ability to differentiate coarse-grained/ﬁne-

grained objects by measuring their similarities to

subjects. The results indicate that speciﬁcity was

neglected by existing research on language models.

How to improve and control it is undoubtedly an

interesting and valuable problem.

Based on our observations and analyses, we pro-

pose two techniques to improve the speciﬁcity of

the predictions by modifying the prompts without

additional training: Few-shot Prompting, where

demonstrations with more speciﬁc answers are pro-

vided to guide the models to produce more speciﬁc

answers; and Cascade Prompting, where which

clauses are added as sufﬁxes to bias the predictions

to be more speciﬁc. Results show that Few-shot

Prompting can improve the speciﬁcity for unidi-

rectional language models like GPT-2 well, while

Cascade Prompting works well for bidirectional

language models such as BERT.

The main contributions of our work are summa-

rized as follows:

•

We propose a novel automatic approach to

build a benchmark for speciﬁcity testing based

on the property of transitive relations.

•

We analyze the speciﬁcity of several existing

PLMs and study two factors that affect the

speciﬁcity.

•

We propose two methods to improve the speci-

ﬁcity by modifying the prompts without addi-

tional training.

•

We provide in-depth analyses and discussions,

suggesting further works to explore and fur-

ther improve the speciﬁcity.

2 Background and Related Work

Pre-Trained Language Models: Pre-trained lan-

guage models (PLMs) are language models pre-

trained on large corpora. In this paper, we will

cover two types of pre-trained language models:

unidirectional language models, such as GPT-2

(Radford et al.,2019), where the prediction of the

current token is only based on previous tokens; and

bidirectional language models, such as BERT (De-

vlin et al.,2019) and RoBERTa (Liu et al.,2019),

where both left and right contexts are utilized to

predict the current token.

Knowledge Retrieval from LMs and Prompt-

ing: Previous works have worked on extracting

factual knowledge from PLMs without incorporat-

ing external knowledge, which is usually achieved

by creating prompts and letting PLMs predict the

masked token(s) (Petroni et al.,2019;Bouraoui

et al.,2020;Jiang et al.,2020a,b;Wang et al.,2020).

They demonstrated that PLMs contain a signiﬁcant

amount of knowledge. By creating appropriate

prompts with some additional training, such meth-

ods can even achieve performance comparable to

SOTA for some speciﬁc tasks (Shin et al.,2020;

Liu et al.,2021b). Our work is inspired by these

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CanLanguageModelsBeSpecific?How?JieHuang1KevinChen-ChuanChang1JinjunXiong2Wen-meiHwu1,31UniversityofIllinoisatUrbana-Champaign,USA2UniversityatBuffalo,USA3NVIDIA,USA{jeffhj,kcchang,w-hwu}@illinois.edujinjun@buffalo.eduAbstract“Heisaperson”,“Parisislocatedontheearth”.Bothstatementsarecorrectbutmean-i...

展开>> 收起<<

Can Language Models Be Specific How Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu13 1University of Illinois at Urbana-Champaign USA.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Can Language Models Be Specific How Jie Huang1Kevin Chen-Chuan Chang1Jinjun Xiong2Wen-mei Hwu13 1University of Illinois at Urbana-Champaign USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: