ComFact A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao1 Jena D. Hwang2 Saya Kanno3 Hiromi Wakaki3

2025-05-06 0 0 492.36KB 20 页 10玖币
侵权投诉
ComFact: A Benchmark for Linking
Contextual Commonsense Knowledge
Silin Gao1, Jena D. Hwang2, Saya Kanno3, Hiromi Wakaki3,
Yuki Mitsufuji3, Antoine Bosselut1
1NLP Lab, IC, EPFL, Switzerland, 2Allen Institute for AI, WA, USA
3Sony Group Corporation, Tokyo, Japan
1{silin.gao,antoine.bosselut}@epfl.ch, 2jenah@allenai.org,
3{saya.kanno,hiromi.wakaki,yuhki.mitsufuji}@sony.com
Abstract
Understanding rich narratives, such as di-
alogues and stories, often requires natural
language processing systems to access rele-
vant knowledge from commonsense knowl-
edge graphs. However, these systems typically
retrieve facts from KGs using simple heuris-
tics that disregard the complex challenges
of identifying situationally-relevant common-
sense knowledge (e.g., contextualization, im-
plicitness, ambiguity).
In this work, we propose the new task of com-
monsense fact linking, where models are given
contexts and trained to identify situationally-
relevant commonsense knowledge from KGs.
Our novel benchmark, ComFact, contains
293k in-context relevance annotations for
commonsense triplets across four stylistically
diverse dialogue and storytelling datasets. Ex-
perimental results confirm that heuristic fact
linking approaches are imprecise knowledge
extractors. Learned fact linking models
demonstrate across-the-board performance im-
provements (34.6% F1) over these heuris-
tics. Furthermore, improved knowledge re-
trieval yielded average downstream improve-
ments of 9.8% for a dialogue response gen-
eration task. However, fact linking models
still significantly underperform humans, sug-
gesting our benchmark is a promising testbed
for research in commonsense augmentation of
NLP systems.1
1 Introduction
In conversations, stories, and other varieties of nar-
ratives, language users systematically elide infor-
mation that readers (or listeners) reliably fill in with
world knowledge. For example, in Figure 1, the
speaker of utterance
t
(i.e.,pink) infers that their
counterpart (cyan) wants to be a doctor because
they are studying medicine, even though the cyan
Corresponding author.
1
We release our data and code to the community at
https:
//github.com/Silin159/ComFact
Utterance t-1: I continue to write while
studying medicine.
Utterance t: Good luck, being a doctor is
hard. Maybe you will write medical books.
Utterance t+1: Not a chance! I love making
up stories. Medicine is too real sometimes.
book, used for,
learning about
medicine
X wants to be a
doctor, but before, X
needs, go to college
X writes books,
because X wants,
to tell stories
good, used for,
destroying evil
Figure 1: Commonsense fact linking in a conversation.
Triples in bubbles represent linked facts. Words and
phrases in green,blue,purple and orange illustrate four
different linking relationships for facts (§3.4).
speaker does not explicitly mention their career
goals. To reflect this ability, language understand-
ing systems are often augmented with knowledge
bases (KBs, e.g.,Speer et al.,2017) that allow them
to access relevant background knowledge.
Considerable research has examined how to con-
struct large databases of world knowledge for this
purpose (Lenat,1995;Suchanek et al.,2007;Speer
et al.,2017;Sap et al.,2019a), as well as how to
design models that can reason over relevant subsets
of this knowledge to form a richer understanding
of language (e.g.,Lin et al.,2019). However, less
work examines how to retrieve these inferences
(or facts) from the KB in the first place. Current
methods typically rely on pattern-based heuristics
(Mihaylov and Frank,2018;Feng et al.,2020),
unsupervised scoring using corpus statistics (Weis-
senborn et al.,2018) or neural re-rankers (Yasunaga
et al.,2021), or combinations of these methods
(Bauer et al.,2018).
These simple methods produce computationally
tractable knowledge representations, but frequently
retrieve noisy information that is irrelevant to the
arXiv:2210.12678v1 [cs.CL] 23 Oct 2022
narrative they are constructed to represent. Re-
cent work demonstrates that models trained with
heuristically-retrieved commonsense knowledge
learn simplified reasoning patterns (Wang et al.,
2021) and provide false notions of interpretability
(Raman et al.,2021). We posit that inadequate re-
trieval from large-scale knowledge resources is a
key contributor to the spurious reasoning abilities
learned by these systems.
Acknowledging the importance of retrieving rel-
evant commonsense knowledge to augment models,
we identify a set of challenges that commonsense
knowledge retrievers must address. First, retrieved
commonsense knowledge must be
contextually-
relevant
, rather than generically related to the en-
tities mentioned in the context. Second, relevant
commonsense knowledge can often be
implicit
,
e.g., in Figure 1, writing may be a leisure hobby for
the cyan speaker, explaining why they “love mak-
ing up stories”. Finally, knowledge may be
am-
biguously
relevant to a context. The cyan speaker
in Figure 1may write as a relaxing hobby, or be
thinking of quitting medical school to pursue a ca-
reer as a writer. Without knowing the rest of the
conversation, both inferences are potentially valid.
To more adequately address these challenges,
we introduce the new task of commonsense fact
linking,
2
where models are given contexts and
trained to identify situationally-relevant common-
sense knowledge from KGs. For this task, we
construct a
Com
monsense
Fact
linking dataset
(
ComFact
) to benchmark the next generation of
models designed to improve commonsense fact
retrieval.
ComFact
contains
293k contextual
relevance annotations for four diverse dialogue
and storytelling corpora. Our empirical analysis
shows that heuristic methods over-retrieve many
unrelated facts, yielding poor performance on the
benchmark. Meanwhile, models trained on our re-
source are much more precise extractors with an
average 34.6% absolute F1 boost (though they still
fall short of human performance). The knowledge
retriever developed on our resource also brings
an average 9.8% relative improvement on a down-
stream dialogue response generation task. These
results demonstrate that
ComFact
is a promising
testbed for developing improved fact linkers that
benefit downstream NLP applications.
2
We follow prior naming convention for entity linking
(Ling et al.,2015) and multilingual fact linking (Kolluru et al.,
2021), though the task can also be viewed as information
retrieval (IR) from a commonsense knowledge base.
2 Related Work
Commonsense Knowledge Graphs
Common-
sense knowledge graphs (KGs) are standard tools
for providing background knowledge to models
for various NLP tasks such as question answering
(Talmor et al.,2019;Sap et al.,2019b) and text
generation (Lin et al.,2020). ConceptNet (Liu
and Singh,2004;Speer et al.,2017), a commonly
used commonsense KG, contains high-precision
facts collected from crowdsourcing (Singh et al.,
2002) and web ontologies (Miller,1995;Lehmann
et al.,2015), but is generally limited to taxo-
nomic, lexical and physical relationships (Davis
and Marcus,2015;Sap et al.,2019a). ATOMIC
(Sap et al.,2019a) and ANION (Jiang et al.,
2021) are fully crowdsourced, and focus on rep-
resenting knowledge about social interactions and
events. ATOMIC
20
20
(Hwang et al.,2021) expands on
ATOMIC by annotating additional event-centered
relations and integrating the facts from ConceptNet
that are not easily represented by language models,
yielding a rich resource of complex entities. In this
work, we construct our
ComFact
dataset based on
the most advanced ATOMIC20
20 KG.
Commonsense Fact Linking
Knowledge-
intensive NLP tasks are often tackled using
commonsense KGs to augment the input contexts
provided by the dataset (Wang et al.,2019;Ye
et al.,2019;Gajbhiye et al.,2021;Yin et al.,2022).
Models for various NLP applications benefit from
this fact linking, including question answering
(Feng et al.,2020;Yasunaga et al.,2021;Zhang
et al.,2022), dialogue modeling (Zhou et al.,2018;
Wu et al.,2020) and story generation (Guan et al.,
2019;Ji et al.,2020). All above works typically
conduct fact linking using heuristic solutions.
Recent research explores unsupervised learn-
ing approaches for improving on the shortcom-
ings of heuristic commonsense fact linking. Huang
et al. (2021) and Zhou et al. (2022) use soft match-
ing based on embedding similarity to link com-
monsense facts with implicit semantic relatedness.
Guan et al. (2020) use knowledge-enhanced pre-
training to implicitly incorporate commonsense
facts into narrative systems, but their approach
reduces the controllability and interpretability of
knowledge integration. Finally, several works
(Arabshahi et al.,2021;Bosselut et al.,2021;Peng
et al.,2021a,b;Tu et al.,2022) use knowledge mod-
els (Bosselut et al.,2019;Da et al.,2021;West
et al.,2022) to generate commonsense facts instead
of linking from knowledge graphs. However, the
contextual quality of generated facts from knowl-
edge models is also under-explored in these appli-
cation scenarios. In this paper, we conduct more
rigorous study on commonsense fact linking.
3ComFact Construction
In this section, we give an overview of common-
sense fact linking and its associated challenges, and
describe our approach for building the
ComFact
dataset centered around these challenges.
3.1 Overview
Notation
We are given narrative samples
S
(e.g., a dialogue or story snippet) containing
multiple statements (or utterances for dialogues)
[U1, U2, ..., UT]
. For the
t-th
statement
Ut
, the col-
lections of statements that comprise its past and fu-
ture context are defined as
U<t = [Utk, ..., Ut1]
and U>t = [Ut+1, ..., Ut+l], respectively.
A commonsense knowledge graph
G
is made up
of a set of interconnected commonsense facts, each
represented as a triple containing a head entity,
a tail entity, and a relation connecting them, as
depicted in Figure 1. The task in this work is to
identify the subset of commonsense facts from
G
that may be relevant for understanding the situation
described in the context Ct= [U<t, Ut, U>t].
Challenges
The task of commonsense fact link-
ing poses several challenges:
Contextualization
: many facts linked using sim-
ple heuristic methods, such as string-matching,
are not actually relevant to the situation described
in a context. For example, in Figure 1, the facts in
bubbles are all pattern-matched to the dialogue,
but (good,used for,destroying evil) turns out
to not be relevant to the situation when some-
one says good luck. Our study shows that only
25%
of facts linked through string matching
end up being fully relevant to the context.
Implicitness
: some facts are linked to the con-
text in implicit ways. For example, in Figure 1,
the fact with go to college is implicitly linked
to the phrase studying medicine, which makes it
relevant to the context even though no direct ref-
erence to college is made in the dialogue, preclud-
ing it from being linked using string-matching.
Ambiguity
: different observers can disagree on
on whether a fact is relevant to reason about a
situation, particularly if the future context of a
narrative is unknown. For example, (X writes
books,because X wants,to tell stories) in Fig-
ure 1is relevant to the final produced utterance,
but would not be if the final utterance had been
about wanting to write scientific research papers
instead (n.b., the best use of writing skill).
While many methods have been proposed for link-
ing facts in
G
to
Ct
, these methods typically rely on
rule-based heuristics or unsupervised scoring meth-
ods, which do not adequately address the unique
challenges of this task. In the following sections,
we present our approach for building the
ComFact
dataset that addresses the above challenges.
3.2 Fact Candidate Linking
Given
Ct= [U<t, Ut, U>t]
from a natural language
sample
S
, we link an initial set of potentially rele-
vant fact candidates from Gusing two approaches,
one designed to extract explicit relevant facts and
one designed for implicitly relevant facts.
Extracting Fact Candidates
Similar to prior
works (e.g.,Feng et al.,2020), we use surface-form
pattern matching to retrieve head entities in Gthat
are explicitly linked to
Ut
, and collect facts that
contain the retrieved head entities as candidates. In
particular, we lemmatize and part-of-speech (POS)
tag
Ut
and every head entity in
G
. Then, we match
patterns between these sources that are words that
are informative parts of speech (e.g., nouns, verbs,
adjectives, adverbs) or that correspond to
n
-grams
in a master list of English idioms from Wiktionary.
3
We retrieve head entities whose informative pat-
terns all appear in the set of patterns from Ut.
However, pattern matching only extracts a set of
fact candidates whose head entities can be explic-
itly recovered from the context
Ut
. To retrieve facts
that may be semantically related to the context, but
cannot be explicitly linked through patterns (e.g.,
paraphrased facts), we use embedding similarity
matching (Zhou et al.,2022). In particular, we use
Sentence-BERT (Reimers and Gurevych,2019) to
encode
Ut
along with every head entity in
G
as em-
bedding vectors, and select the
top-5
head entities
whose embeddings have the highest cosine similar-
ity with the embedding of
Ut
. Using this approach,
we extend the sets of available candidates often
retrieved by pattern matching methods and include
implicit inferences in our candidate set.
3https://en.wiktionary.org/w/index.php?title=
Category:English_idioms
𝑼𝒕"𝟐
:
Jamie was sleeping at a friend's house.
𝑼𝒕"𝟏
:#
It was her first time away at a friend's house.
𝑼𝒕
:#
Jamie was scared and missed her home and family.
𝑼𝒕%𝟏
:
She called her mom to pick her up.
𝑼𝒕%𝟐
:#
Jamie went home to sleep in her own bed.
Fact: X feels homesick, as a result, X will, leave for home
Is the fact relevant to the story?
Always Sometimes Not Relevant
Round 3 – Present +Past + Future
𝑼𝒕"𝟐
:
Jamie was sleeping at a friend's house.
𝑼𝒕"𝟏
:#
It was her first time away at a friend's house.
𝑼𝒕
:#
Jamie was scared and missed her home and family.
Fact: X feels homesick, as a result, X will, leave for home
Is the fact relevant to the story?
Always Sometimes Not Relevant
Round 2 – Present + Past
𝑼𝒕
:#
Jamie was scared and missed her home and family.
Fact: X feels homesick, as a result, X will, leave for home
Is the fact relevant to the story?
Always Sometimes Not Relevant
Round 1 – Present
Figure 2: Illustration of our three-round fact candidate validation
Stop
Next
Next
Next
Next
Next
Next
Next
Stop
AR
SR
AO
SR SR
AO
AO
AO
IRR
(b)(a)
Worker 2Worker 2
AR
AR
SR
SR
IRR
IRR
AR
SR
IRR
AR SR IRR
Worker 1
Worker 1
Figure 3: Summary of rules in fact candidate validation
rounds. (a) Mapping from worker annotations to rele-
vance labels: always relevant (AR), sometimes relevant
(SR), at odds (AO) and irrelevant (IRR). (b) Mapping
from worker annotations to action of the round: evalu-
ate in the next round (Next) and end validation (Stop).
Filtering Fact Candidates
Head entities linked
via pattern and embedding matching may connect
to tail entities whose semantics are far different
from that of
Ct
(e.g.,destroying evil in Figure 1).
Consequently, we perform a first round of auto-
matic filtering by pruning the tail entities of each
head entity according to their similarity to
Ct
. Us-
ing Sentence-BERT, we encode each tail entity and
Ct
as embedding vectors. For each head entity,
we keep its
top-5
tail entities that have the highest
embedding cosine similarity with that of Ct.
3.3 Crowdsourcing Relevance Judgements
We use the prior heuristics to over-sample a large
initial set of knowledge (
46 facts per example
context). We then devise a two-step procedure for
evaluating the contextual relevance of these linked
fact candidates using crowdworkers from Amazon
Mechanical Turk, which we describe below.
Validating Head Entities
First, we task workers
with validating the relevance of head entities with
respect to the context. For each head entity, we
show two workers
Ct
and a head candidate associ-
ated with
Ut
, and independently ask them to judge
whether the head candidate is relevant to
Ut
. Head
candidates are labeled as: a)
relevant with full con-
fidence
if both workers identify the head entity as
relevant, b)
relevant with half confidence
if only
one of the workers choose relevant, or c)
irrelevant
if neither of the workers choose relevant.
Validating Fact Candidates
After curating a set
of relevant head entities, workers then validate the
relevance of the fact candidates associated with
those head entities.
4
To evaluate contextual rele-
vance of facts in a fine-grained manner, we define a
three-round task for workers, as shown in Figure 2.
In the first round, we show two workers
Ut
and
the set of fact candidates, and independently ask
them to judge whether the fact candidate is always
relevant,sometimes relevant, or irrelevant to
Ut
.
5
In the second round, we repeat this task, but show
the past context along with
Ut
, namely
[U<t, Ut]
.
In the third round, we repeat the task again, but
show the full context Ct= [U<t, Ut, U>t].
After each round, we assign or update the rele-
vance label of a fact candidate as: a)
always rel-
evant
if both workers label it always relevant, b)
sometimes relevant
if one or both of the workers
label it sometimes instead of always relevant, c)
at
odds
if one worker chooses always or sometimes
relevant and the other chooses not relevant, d)
irrel-
evant
if both workers select not relevant (as shown
in Figure 3a). In practice, we find that including
more context (i.e.,
U<t
or
U>t
) rarely changes the
validation of an initially always relevant or irrele-
vant fact. So after each round, if a fact candidate
is labeled as always relevant or irrelevant, we do
not evaluate it in the next round. Otherwise, there
is relevance ambiguity over a fact, and we validate
it again in the next round with additional context
(as shown in Figure 3b). In the second and third
rounds, if a worker annotates a fact candidate as al-
ways or sometimes relevant, we ask them to justify
4
If a head entity is deemed
irrelevant
, we assume that all
fact candidates associated with it are irrelevant as well.
5
From feedback, we observe that crowdworkers prefer our
fine-grained annotation scheme as it allows them to express
uncertainty in the judgment compared to a binary choice.
摘要:

ComFact:ABenchmarkforLinkingContextualCommonsenseKnowledgeSilinGao1,JenaD.Hwang2,SayaKanno3,HiromiWakaki3,YukiMitsufuji3,AntoineBosselut1y1NLPLab,IC,EPFL,Switzerland,2AllenInstituteforAI,WA,USA3SonyGroupCorporation,Tokyo,Japan1{silin.gao,antoine.bosselut}@epfl.ch,2jenah@allenai.org,3{saya.kanno,hiro...

展开>> 收起<<
ComFact A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao1 Jena D. Hwang2 Saya Kanno3 Hiromi Wakaki3.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:492.36KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注