ComFact A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao1 Jena D. Hwang2 Saya Kanno3 Hiromi Wakaki3

2025-05-06 0 0 492.36KB 20 页 10玖币

侵权投诉

ComFact: A Benchmark for Linking

Contextual Commonsense Knowledge

Silin Gao1, Jena D. Hwang2, Saya Kanno3, Hiromi Wakaki3,

Yuki Mitsufuji3, Antoine Bosselut1†

1NLP Lab, IC, EPFL, Switzerland, 2Allen Institute for AI, WA, USA

3Sony Group Corporation, Tokyo, Japan

1{silin.gao,antoine.bosselut}@epfl.ch, 2jenah@allenai.org,

3{saya.kanno,hiromi.wakaki,yuhki.mitsufuji}@sony.com

Abstract

Understanding rich narratives, such as di-

alogues and stories, often requires natural

language processing systems to access rele-

vant knowledge from commonsense knowl-

edge graphs. However, these systems typically

retrieve facts from KGs using simple heuris-

tics that disregard the complex challenges

of identifying situationally-relevant common-

sense knowledge (e.g., contextualization, im-

plicitness, ambiguity).

In this work, we propose the new task of com-

monsense fact linking, where models are given

contexts and trained to identify situationally-

relevant commonsense knowledge from KGs.

Our novel benchmark, ComFact, contains

∼293k in-context relevance annotations for

commonsense triplets across four stylistically

diverse dialogue and storytelling datasets. Ex-

perimental results conﬁrm that heuristic fact

linking approaches are imprecise knowledge

extractors. Learned fact linking models

demonstrate across-the-board performance im-

provements (∼34.6% F1) over these heuris-

tics. Furthermore, improved knowledge re-

trieval yielded average downstream improve-

ments of 9.8% for a dialogue response gen-

eration task. However, fact linking models

still signiﬁcantly underperform humans, sug-

gesting our benchmark is a promising testbed

for research in commonsense augmentation of

NLP systems.1

1 Introduction

In conversations, stories, and other varieties of nar-

ratives, language users systematically elide infor-

mation that readers (or listeners) reliably ﬁll in with

world knowledge. For example, in Figure 1, the

speaker of utterance

(i.e.,pink) infers that their

counterpart (cyan) wants to be a doctor because

they are studying medicine, even though the cyan

†Corresponding author.

We release our data and code to the community at

https:

//github.com/Silin159/ComFact

Utterance t-1: I continue to write while

studying medicine.

Utterance t: Good luck, being a doctor is

hard. Maybe you will write medical books.

Utterance t+1: Not a chance! I love making

up stories. Medicine is too real sometimes.

book, used for,

learning about

medicine

X wants to be a

doctor, but before, X

needs, go to college

X writes books,

because X wants,

to tell stories

good, used for,

destroying evil

Figure 1: Commonsense fact linking in a conversation.

Triples in bubbles represent linked facts. Words and

phrases in green,blue,purple and orange illustrate four

different linking relationships for facts (§3.4).

speaker does not explicitly mention their career

goals. To reﬂect this ability, language understand-

ing systems are often augmented with knowledge

bases (KBs, e.g.,Speer et al.,2017) that allow them

to access relevant background knowledge.

Considerable research has examined how to con-

struct large databases of world knowledge for this

purpose (Lenat,1995;Suchanek et al.,2007;Speer

et al.,2017;Sap et al.,2019a), as well as how to

design models that can reason over relevant subsets

of this knowledge to form a richer understanding

of language (e.g.,Lin et al.,2019). However, less

work examines how to retrieve these inferences

(or facts) from the KB in the ﬁrst place. Current

methods typically rely on pattern-based heuristics

(Mihaylov and Frank,2018;Feng et al.,2020),

unsupervised scoring using corpus statistics (Weis-

senborn et al.,2018) or neural re-rankers (Yasunaga

et al.,2021), or combinations of these methods

(Bauer et al.,2018).

These simple methods produce computationally

tractable knowledge representations, but frequently

retrieve noisy information that is irrelevant to the

arXiv:2210.12678v1 [cs.CL] 23 Oct 2022

narrative they are constructed to represent. Re-

cent work demonstrates that models trained with

heuristically-retrieved commonsense knowledge

learn simpliﬁed reasoning patterns (Wang et al.,

2021) and provide false notions of interpretability

(Raman et al.,2021). We posit that inadequate re-

trieval from large-scale knowledge resources is a

key contributor to the spurious reasoning abilities

learned by these systems.

Acknowledging the importance of retrieving rel-

evant commonsense knowledge to augment models,

we identify a set of challenges that commonsense

knowledge retrievers must address. First, retrieved

commonsense knowledge must be

contextually-

relevant

, rather than generically related to the en-

tities mentioned in the context. Second, relevant

commonsense knowledge can often be

implicit

e.g., in Figure 1, writing may be a leisure hobby for

the cyan speaker, explaining why they “love mak-

ing up stories”. Finally, knowledge may be

am-

biguously

relevant to a context. The cyan speaker

in Figure 1may write as a relaxing hobby, or be

thinking of quitting medical school to pursue a ca-

reer as a writer. Without knowing the rest of the

conversation, both inferences are potentially valid.

To more adequately address these challenges,

we introduce the new task of commonsense fact

linking,

where models are given contexts and

trained to identify situationally-relevant common-

sense knowledge from KGs. For this task, we

construct a

Com

monsense

Fact

linking dataset

(

ComFact

) to benchmark the next generation of

models designed to improve commonsense fact

retrieval.

ComFact

contains

∼

293k contextual

relevance annotations for four diverse dialogue

and storytelling corpora. Our empirical analysis

shows that heuristic methods over-retrieve many

unrelated facts, yielding poor performance on the

benchmark. Meanwhile, models trained on our re-

source are much more precise extractors with an

average 34.6% absolute F1 boost (though they still

fall short of human performance). The knowledge

retriever developed on our resource also brings

an average 9.8% relative improvement on a down-

stream dialogue response generation task. These

results demonstrate that

ComFact

is a promising

testbed for developing improved fact linkers that

beneﬁt downstream NLP applications.

We follow prior naming convention for entity linking

(Ling et al.,2015) and multilingual fact linking (Kolluru et al.,

2021), though the task can also be viewed as information

retrieval (IR) from a commonsense knowledge base.

2 Related Work

Commonsense Knowledge Graphs

Common-

sense knowledge graphs (KGs) are standard tools

for providing background knowledge to models

for various NLP tasks such as question answering

(Talmor et al.,2019;Sap et al.,2019b) and text

generation (Lin et al.,2020). ConceptNet (Liu

and Singh,2004;Speer et al.,2017), a commonly

used commonsense KG, contains high-precision

facts collected from crowdsourcing (Singh et al.,

2002) and web ontologies (Miller,1995;Lehmann

et al.,2015), but is generally limited to taxo-

nomic, lexical and physical relationships (Davis

and Marcus,2015;Sap et al.,2019a). ATOMIC

(Sap et al.,2019a) and ANION (Jiang et al.,

2021) are fully crowdsourced, and focus on rep-

resenting knowledge about social interactions and

events. ATOMIC

(Hwang et al.,2021) expands on

ATOMIC by annotating additional event-centered

relations and integrating the facts from ConceptNet

that are not easily represented by language models,

yielding a rich resource of complex entities. In this

work, we construct our

ComFact

dataset based on

the most advanced ATOMIC20

20 KG.

Commonsense Fact Linking

Knowledge-

intensive NLP tasks are often tackled using

commonsense KGs to augment the input contexts

provided by the dataset (Wang et al.,2019;Ye

et al.,2019;Gajbhiye et al.,2021;Yin et al.,2022).

Models for various NLP applications beneﬁt from

this fact linking, including question answering

(Feng et al.,2020;Yasunaga et al.,2021;Zhang

et al.,2022), dialogue modeling (Zhou et al.,2018;

Wu et al.,2020) and story generation (Guan et al.,

2019;Ji et al.,2020). All above works typically

conduct fact linking using heuristic solutions.

Recent research explores unsupervised learn-

ing approaches for improving on the shortcom-

ings of heuristic commonsense fact linking. Huang

et al. (2021) and Zhou et al. (2022) use soft match-

ing based on embedding similarity to link com-

monsense facts with implicit semantic relatedness.

Guan et al. (2020) use knowledge-enhanced pre-

training to implicitly incorporate commonsense

facts into narrative systems, but their approach

reduces the controllability and interpretability of

knowledge integration. Finally, several works

(Arabshahi et al.,2021;Bosselut et al.,2021;Peng

et al.,2021a,b;Tu et al.,2022) use knowledge mod-

els (Bosselut et al.,2019;Da et al.,2021;West

et al.,2022) to generate commonsense facts instead

of linking from knowledge graphs. However, the

contextual quality of generated facts from knowl-

edge models is also under-explored in these appli-

cation scenarios. In this paper, we conduct more

rigorous study on commonsense fact linking.

3ComFact Construction

In this section, we give an overview of common-

sense fact linking and its associated challenges, and

describe our approach for building the

ComFact

dataset centered around these challenges.

3.1 Overview

Notation

We are given narrative samples

(e.g., a dialogue or story snippet) containing

multiple statements (or utterances for dialogues)

[U1, U2, ..., UT]

. For the

t-th

statement

, the col-

lections of statements that comprise its past and fu-

ture context are deﬁned as

U<t = [Ut−k, ..., Ut−1]

and U>t = [Ut+1, ..., Ut+l], respectively.

A commonsense knowledge graph

is made up

of a set of interconnected commonsense facts, each

represented as a triple containing a head entity,

a tail entity, and a relation connecting them, as

depicted in Figure 1. The task in this work is to

identify the subset of commonsense facts from

that may be relevant for understanding the situation

described in the context Ct= [U<t, Ut, U>t].

Challenges

The task of commonsense fact link-

ing poses several challenges:

•Contextualization

: many facts linked using sim-

ple heuristic methods, such as string-matching,

are not actually relevant to the situation described

in a context. For example, in Figure 1, the facts in

bubbles are all pattern-matched to the dialogue,

but (good,used for,destroying evil) turns out

to not be relevant to the situation when some-

one says good luck. Our study shows that only

∼25%

of facts linked through string matching

end up being fully relevant to the context.

•Implicitness

: some facts are linked to the con-

text in implicit ways. For example, in Figure 1,

the fact with go to college is implicitly linked

to the phrase studying medicine, which makes it

relevant to the context even though no direct ref-

erence to college is made in the dialogue, preclud-

ing it from being linked using string-matching.

•Ambiguity

: different observers can disagree on

on whether a fact is relevant to reason about a

situation, particularly if the future context of a

narrative is unknown. For example, (X writes

books,because X wants,to tell stories) in Fig-

ure 1is relevant to the ﬁnal produced utterance,

but would not be if the ﬁnal utterance had been

about wanting to write scientiﬁc research papers

instead (n.b., the best use of writing skill).

While many methods have been proposed for link-

ing facts in

, these methods typically rely on

rule-based heuristics or unsupervised scoring meth-

ods, which do not adequately address the unique

challenges of this task. In the following sections,

we present our approach for building the

ComFact

dataset that addresses the above challenges.

3.2 Fact Candidate Linking

Given

Ct= [U<t, Ut, U>t]

from a natural language

sample

, we link an initial set of potentially rele-

vant fact candidates from Gusing two approaches,

one designed to extract explicit relevant facts and

one designed for implicitly relevant facts.

Extracting Fact Candidates

Similar to prior

works (e.g.,Feng et al.,2020), we use surface-form

pattern matching to retrieve head entities in Gthat

are explicitly linked to

, and collect facts that

contain the retrieved head entities as candidates. In

particular, we lemmatize and part-of-speech (POS)

tag

and every head entity in

. Then, we match

patterns between these sources that are words that

are informative parts of speech (e.g., nouns, verbs,

adjectives, adverbs) or that correspond to

-grams

in a master list of English idioms from Wiktionary.

We retrieve head entities whose informative pat-

terns all appear in the set of patterns from Ut.

However, pattern matching only extracts a set of

fact candidates whose head entities can be explic-

itly recovered from the context

. To retrieve facts

that may be semantically related to the context, but

cannot be explicitly linked through patterns (e.g.,

paraphrased facts), we use embedding similarity

matching (Zhou et al.,2022). In particular, we use

Sentence-BERT (Reimers and Gurevych,2019) to

encode

along with every head entity in

as em-

bedding vectors, and select the

top-5

head entities

whose embeddings have the highest cosine similar-

ity with the embedding of

. Using this approach,

we extend the sets of available candidates often

retrieved by pattern matching methods and include

implicit inferences in our candidate set.

3https://en.wiktionary.org/w/index.php?title=

Category:English_idioms

𝑼𝒕"𝟐

Jamie was sleeping at a friend's house.

𝑼𝒕"𝟏

It was her first time away at a friend's house.

𝑼𝒕

Jamie was scared and missed her home and family.

𝑼𝒕%𝟏

She called her mom to pick her up.

𝑼𝒕%𝟐

Jamie went home to sleep in her own bed.

Fact: X feels homesick, as a result, X will, leave for home

Is the fact relevant to the story?

Always Sometimes Not Relevant

Round 3 – Present +Past + Future

𝑼𝒕"𝟐

Jamie was sleeping at a friend's house.

𝑼𝒕"𝟏

It was her first time away at a friend's house.

𝑼𝒕

Jamie was scared and missed her home and family.

Fact: X feels homesick, as a result, X will, leave for home

Is the fact relevant to the story?

Always Sometimes Not Relevant

Round 2 – Present + Past

𝑼𝒕

Jamie was scared and missed her home and family.

Fact: X feels homesick, as a result, X will, leave for home

Is the fact relevant to the story?

Always Sometimes Not Relevant

Round 1 – Present

Figure 2: Illustration of our three-round fact candidate validation

Stop

SR SR

IRR

(b)(a)

Worker 2Worker 2

IRR

AR SR IRR

Worker 1

Figure 3: Summary of rules in fact candidate validation

rounds. (a) Mapping from worker annotations to rele-

vance labels: always relevant (AR), sometimes relevant

(SR), at odds (AO) and irrelevant (IRR). (b) Mapping

from worker annotations to action of the round: evalu-

ate in the next round (Next) and end validation (Stop).

Filtering Fact Candidates

Head entities linked

via pattern and embedding matching may connect

to tail entities whose semantics are far different

from that of

(e.g.,destroying evil in Figure 1).

Consequently, we perform a ﬁrst round of auto-

matic ﬁltering by pruning the tail entities of each

head entity according to their similarity to

. Us-

ing Sentence-BERT, we encode each tail entity and

as embedding vectors. For each head entity,

we keep its

top-5

tail entities that have the highest

embedding cosine similarity with that of Ct.

3.3 Crowdsourcing Relevance Judgements

We use the prior heuristics to over-sample a large

initial set of knowledge (

∼

46 facts per example

context). We then devise a two-step procedure for

evaluating the contextual relevance of these linked

fact candidates using crowdworkers from Amazon

Mechanical Turk, which we describe below.

Validating Head Entities

First, we task workers

with validating the relevance of head entities with

respect to the context. For each head entity, we

show two workers

and a head candidate associ-

ated with

, and independently ask them to judge

whether the head candidate is relevant to

. Head

candidates are labeled as: a)

relevant with full con-

ﬁdence

if both workers identify the head entity as

relevant, b)

relevant with half conﬁdence

if only

one of the workers choose relevant, or c)

irrelevant

if neither of the workers choose relevant.

Validating Fact Candidates

After curating a set

of relevant head entities, workers then validate the

relevance of the fact candidates associated with

those head entities.

To evaluate contextual rele-

vance of facts in a ﬁne-grained manner, we deﬁne a

three-round task for workers, as shown in Figure 2.

In the ﬁrst round, we show two workers

and

the set of fact candidates, and independently ask

them to judge whether the fact candidate is always

relevant,sometimes relevant, or irrelevant to

In the second round, we repeat this task, but show

the past context along with

, namely

[U<t, Ut]

In the third round, we repeat the task again, but

show the full context Ct= [U<t, Ut, U>t].

After each round, we assign or update the rele-

vance label of a fact candidate as: a)

always rel-

evant

if both workers label it always relevant, b)

sometimes relevant

if one or both of the workers

label it sometimes instead of always relevant, c)

odds

if one worker chooses always or sometimes

relevant and the other chooses not relevant, d)

irrel-

evant

if both workers select not relevant (as shown

in Figure 3a). In practice, we ﬁnd that including

more context (i.e.,

U<t

U>t

) rarely changes the

validation of an initially always relevant or irrele-

vant fact. So after each round, if a fact candidate

is labeled as always relevant or irrelevant, we do

not evaluate it in the next round. Otherwise, there

is relevance ambiguity over a fact, and we validate

it again in the next round with additional context

(as shown in Figure 3b). In the second and third

rounds, if a worker annotates a fact candidate as al-

ways or sometimes relevant, we ask them to justify

If a head entity is deemed

irrelevant

, we assume that all

fact candidates associated with it are irrelevant as well.

From feedback, we observe that crowdworkers prefer our

ﬁne-grained annotation scheme as it allows them to express

uncertainty in the judgment compared to a binary choice.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ComFact:ABenchmarkforLinkingContextualCommonsenseKnowledgeSilinGao1,JenaD.Hwang2,SayaKanno3,HiromiWakaki3,YukiMitsufuji3,AntoineBosselut1y1NLPLab,IC,EPFL,Switzerland,2AllenInstituteforAI,WA,USA3SonyGroupCorporation,Tokyo,Japan1{silin.gao,antoine.bosselut}@epfl.ch,2jenah@allenai.org,3{saya.kanno,hiro...

展开>> 收起<<

ComFact A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao1 Jena D. Hwang2 Saya Kanno3 Hiromi Wakaki3.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ComFact A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao1 Jena D. Hwang2 Saya Kanno3 Hiromi Wakaki3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: