Probing Commonsense Knowledge in Pre-trained Language Models with Sense-level Precision and Expanded Vocabulary Daniel Loureiro Alípio Mário Jorge

2025-04-26 0 0 369.64KB 14 页 10玖币

侵权投诉

Probing Commonsense Knowledge in Pre-trained Language Models

with Sense-level Precision and Expanded Vocabulary

Daniel Loureiro♦♣, Alípio Mário Jorge♣

♦Cardiff NLP, School of Computer Science and Informatics, Cardiff University, UK

♣LIAAD - INESC TEC, Faculty of Sciences, University of Porto, Portugal

boucanovaloureirod@cardiff.ac.uk, amjorge@fc.up.pt

Abstract

Progress on commonsense reasoning is usually

measured from performance improvements on

Question Answering tasks designed to re-

quire commonsense knowledge. However,

ﬁne-tuning large Language Models (LMs) on

these speciﬁc tasks does not directly evalu-

ate commonsense learned during pre-training.

The most direct assessments of commonsense

knowledge in pre-trained LMs are arguably

cloze-style tasks targeting commonsense as-

sertions (e.g., A pen is used for [MASK].).

However, this approach is restricted by the

LM’s vocabulary available for masked predic-

tions, and its precision is subject to the con-

text provided by the assertion. In this work,

we present a method for enriching LMs with

a grounded sense inventory (i.e., WordNet)

available at the vocabulary level, without fur-

ther training. This modiﬁcation augments the

prediction space of cloze-style prompts to the

size of a large ontology while enabling ﬁner-

grained (sense-level) queries and predictions.

In order to evaluate LMs with higher preci-

sion, we propose SenseLAMA, a cloze-style

task featuring verbalized relations from disam-

biguated triples sourced from WordNet, Wiki-

Data, and ConceptNet. Applying our method

to BERT, producing a WordNet-enriched ver-

sion named SynBERT, we ﬁnd that LMs

can learn non-trivial commonsense knowledge

from self-supervision, covering numerous re-

lations, and more effectively than comparable

similarity-based approaches.

1 Introduction

A relatively new direction for benchmarking Lan-

guage Models (LMs) are tasks designed to require

commonsense knowledge and reasoning. These

tasks usually target commonsense concepts under a

Question Answering (QA) format (Mihaylov et al.,

2018;Talmor et al.,2019;Bisk et al.,2020;Nie

et al.,2020) and follow scaling trends. Increasing

the model’s parameters leads to improved results,

specially in few-shot learning settings (Chowdh-

ery et al.,2022). Hybrid methods, particularly

those fusing LMs with Graph Neural Networks,

have shown that Commonsense Knowledge Graphs

(CKGs) can help improve performance on these

tasks (Xu et al.,2021;Yasunaga et al.,2021;Zhang

et al.,2022). The results obtained by these works,

using relatively small LMs, suggest that CKGs can

be an alternative (or complement) to increasing

model size, with the added beneﬁt of supporting

more interpretable results.

Nevertheless, the QA approach provides only an

indirect measure of a pre-trained model’s ability

to understand and reason with commonsense con-

cepts. The models attaining best results on these

tasks are often too large for thorough analysis, and

the QA format can promote shallow learning from

annotation artifacts or spurious cues unrelated to

commonsense (Branco et al.,2021).

There are more direct ways of evaluating com-

monsense knowledge in LMs, such as scoring

generated triples (Davison et al.,2019), inﬁlling

cloze-style statements (Petroni et al.,2019), or

ﬁne-tuning for explicit generation of commonsense

statements (Bosselut et al.,2019). However, these

approaches are either limited by each LM’s partic-

ular vocabulary or biased by the available training

data (Wang et al.,2021). Additionally, existing

tasks and methods do not target grounded represen-

tations, which is essential for high-precision CKGs

(Tandon et al.,2014;Dalvi Mishra et al.,2017), and

context-independent reference (Eyal et al.,2022).

Commonsense tasks and approaches typically

leverage ConceptNet (Speer et al.,2017), a pop-

ular CKG built from an extensive crowdsourcing

effort (Storks et al.,2019). Although ConceptNet

is arguably the most popular CKG available, its

nodes are composed of free-form text rather than

disambiguated (canonical) representations, allow-

ing for misleading associations and aggravating

the network’s sparsity (Li et al.,2016;Jastrz˛eb-

arXiv:2210.06376v1 [cs.CL] 12 Oct 2022

ski et al.,2018;Wang et al.,2020). The WordNet

(Miller,1992) sense inventory is a natural choice

for a set of ontologically grounded concept-level

representations, having been curated by experts

over decades and spanning various knowledge do-

mains and syntactic categories of the English lan-

guage. Recent developments on WSD and Unin-

formed Sense Matching (USM) have shown that

WordNet senses can be mapped to naturally occur-

ring sentences with high precision (Loureiro et al.,

2022), including at higher-abstraction levels (e.g.,

‘Marlon Brando’ to actor

). WordNet’s utility for

commonsense tasks is limited by its narrow set of

relations, focused on lexical relations (mostly hy-

pernymy). However, its smaller size, compared to

WikiData (Vrandeˇ

ci´

c and Krötzsch,2014) or Ba-

belNet (Navigli and Ponzetto,2012), for example,

also presents an opportunity for effective expan-

sion with reduced sparsity, which is important for

symbolic reasoning (Huang et al.,2021).

In this work, we propose that a LM augmented

with explicit sense-level representations (see Fig-

ure 1) may present a solution for precise evalua-

tion of commonsense knowledge learned during

pre-training that is not limited by the LM’s vocab-

ulary. Additionally, we explore how this enriched

model can be used for grounded commonsense rela-

tion extraction towards precise and unbiased (w.r.t.

commonsense training data) CKG construction that

hybrid approaches may use. Considering there is

currently no set of grounded assertions available

to assess progress in this direction, we propose a

cloze-style probing task targeting speciﬁc senses

and commonsense relations, inspired by Petroni

et al. (2019). Our contributions

are the following:

•

A BERT

model with 117k new sense-speciﬁc

embeddings added to its vocabulary, based on

the model’s own internal states (SynBERT).

•

The SenseLAMA probing task targeting wide-

ranging and precise commonsense – based on

WordNet, WikiData, and ConceptNet.

•

Analyses on the impact of different input

types for eliciting accurate commonsense

knowledge from BERT.

•

A new CKG grounded on WordNet with 23k

unseen triples over 18 commonsense relations

(e.g., UsedFor) generated by prompting.

1https://github.com/danlou/synbert

While we focus on BERT and WordNet, our methods are

broadly applicable to LMs and alternative representations.

[MASK].

mouse4

isakindof

mouse4

ismadeof

...

Sense!Embeddings

BERT

Represent

Inject

Vocab.!Embeddings

Internal!Parameters

Masked!LM!Head

IsA

MadeOf

UsedFor

AtLocation

mouse4

computer_accessory1

program2

home_computer1

ﬂuorocarbon_plastic1

Predict

Figure 1: Our 3-step method for extracting unsuper-

vised commonsense relations between concepts (i.e.,

word senses) from pre-trained language models. Re-

lations are expressed as verbalizations that may be ex-

changed to target any other property of interest.

2 Related Work

Large LMs have featured prominently in the lat-

est efforts to build richer and more accurate CKGs.

COMET (Hwang et al.,2021) is a generative model

based on BART (Lewis et al.,2020) trained on

ConceptNet and ATOMIC (Sap et al.,2019) and

proven capable of producing novel accurate triples

for challenging relation types, such as HinderedBy.

More recently, West et al. (2021) have proposed

ATOMIC-10x, which leverages generated text from

GPT-3 (Brown et al.,2020) in combination with a

critic model to create the largest and most accurate

semi-automatically constructed CKG. This accu-

racy was determined using both qualitative human

ratings and quantitative measures. However, these

works are primarily concerned with extracting large

CKGs using ﬁne-tuned or distilled LMs, and do not

focus on directly evaluating the CSK learned dur-

ing pre-training. Additionally, these works do not

target grounded representations, considering only

relations between free-text nodes, similarly to Con-

ceptNet.

Addressing both disambiguated representations

and sparsity resulting from free-text redundancy,

WebChild (Tandon et al.,2014) proposes a CKG,

grounded on WordNet senses, assembled from la-

bel propagation and pattern matching on Web cor-

pora. WebChild features a large CKG (over 4M

triples), but it predates large contextual LMs and

the ensuing progress in WSD, making this resource

unreliable by current standards. Recent works on

CKGs also focus on other aspects besides size and

accuracy, such as salience (Chalier et al.,2020) or

alternatives to triples (Nguyen et al.,2021).

Our work is most related to LAMA (Petroni

et al.,2019), which compiles masked assertions

based on triples from ConceptNet and other re-

sources, and measures how many triples can be ac-

curately recovered when masking the object term.

However, LAMA was designed for single-token

masked prediction based on the intersection of the

subword or byte-level token vocabularies used by

the particular set of LMs considered in that work

Consequently, LAMA is limited by design to a total

of 21k prediction candidates.

LAMA is an important early result of LM prob-

ing, but besides the previously mentioned technical

limitations, its ﬁndings have also been challenged

in later works. Kassner and Schütze (2020) demon-

strated that LMs are susceptible to mispriming and

often unable to handle negation. Poerner et al.

(2020) further showed that LMs could be biased by

the surface form of entity names. Moreover, Dufter

et al. (2021) found that static embeddings using

a nearest neighbors (

-NN) approach can outper-

form LMs on the LAMA benchmark, casting doubt

on the presumed advantages of large LMs for the

task. Still, LAMA inspired others to use knowl-

edge graphs (KGs) generated by LMs for intrinsic

evaluation. Swamy et al. (2021) proposes extract-

ing KGs from LMs to support interpretability and

direct comparison between different LMs, or train-

ing stages. Aspillaga et al. (2021) follows a similar

direction but proposes evaluating extracted KGs

by concept relatedness, using hypernymy relations

from WordNet and sense-tagged glosses.

Our approach overcomes the vocabulary limita-

tions of LAMA while outperforming a comparable

-NN baseline. We also explore using extracted

CKGs to evaluate LMs, alongside the generation

of novel CKGs.

This limitation stems from the fact that each word may be

split into several tokens, whose number conditions predictions

to words that match it, and is speciﬁc to each LM’s tokenizer.

3 SenseLAMA

We begin by describing our probing task to evaluate

the commonsense knowledge learned during LM

pre-training. SenseLAMA features verbalized re-

lations

between word senses from triples sourced

from WordNet, WikiData, and ConceptNet. In the

following, we describe how we compiled Sense-

LAMA using these resources, including mapping

triples to speciﬁc WordNet senses (i.e., synsets).

Unlike other works (e.g., Feng et al.,2020), we

do not merge similar relations. Since our approach

is unsupervised, we do not beneﬁt from additional

examples per relation. Thus, we prefer preserving

performance metrics speciﬁc to each source.

We use the core WordNet synsets, initially de-

ﬁned by Boyd-Graber et al. (2005), to create an

easier subset of SenseLAMA. While the full Word-

Net covers over 117k synsets, core synsets are re-

stricted to the 5k

most frequently occurring word

senses, dramatically reducing the number of predic-

tion candidates. Thus, our ‘Core’ subset is derived

from the ‘Full’ SenseLAMA, including only in-

stances where both arguments of the triple belong

to the set of core WordNet synsets. If this ﬁlter

results in a relation with less than ten instances,

that relation is discarded from the ‘Core’ subset.

Table 1 reports counts for each source and relation

in SenseLAMA.

WordNet

Our base ontology already contains

several relations which arguably fall under the

scope of commonsense knowledge, such as hy-

pernymy, meronymy, or antonymy. Since these

relations already target synsets within WordNet, no

additional mapping or disambiguation is required.

Very frequent relations are capped at 10k samples.

WikiData

This vast resource contains millions

of triples for thousands of relations. We only con-

sider a few select relations most associated with

commonsense knowledge. Furthermore, we only

admit triples for which the head and tail can be

mapped to WordNet v3.0, either via the direct link

available in WikiData’s item properties or through

linking to BabelNet, which we map to WordNet us-

ing the mapping from Navigli and Ponzetto (2012).

Alternatively, we map some triples via hapax link-

ing (McCrae and Cillessen,2021), when the triple’s

arguments correspond to unambiguous words.

Appendix A shows handcrafted templates used for Word-

Net and WikiData triples, following Petroni et al. (2019).

5Only 4,960 synsets can be mapped to WordNet v3.0.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProbingCommonsenseKnowledgeinPre-trainedLanguageModelswithSense-levelPrecisionandExpandedVocabularyDanielLoureiro}|,AlípioMárioJorge|}CardiffNLP,SchoolofComputerScienceandInformatics,CardiffUniversity,UK|LIAAD-INESCTEC,FacultyofSciences,UniversityofPorto,Portugalboucanovaloureirod@cardiff.ac.uk,amjo...

展开>> 收起<<

Probing Commonsense Knowledge in Pre-trained Language Models with Sense-level Precision and Expanded Vocabulary Daniel Loureiro Alípio Mário Jorge.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Probing Commonsense Knowledge in Pre-trained Language Models with Sense-level Precision and Expanded Vocabulary Daniel Loureiro Alípio Mário Jorge

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: