Varifocal Question Generation for Fact-checking Nedjma OusidhoumZhangdie YuanAndreas Vlachos Department of Computer Science and Technology

2025-05-06 0 0 800.22KB 13 页 10玖币
侵权投诉
Varifocal Question Generation for Fact-checking
Nedjma OusidhoumZhangdie YuanAndreas Vlachos
Department of Computer Science and Technology
University of Cambridge
ndo24,zy317,av308@cam.ac.uk
Abstract
Fact-checking requires retrieving evidence re-
lated to a claim under investigation. The
task can be formulated as question generation
based on a claim, followed by question an-
swering. However, recent question generation
approaches assume that the answer is known
and typically contained in a passage given as
input, whereas such passages are what is be-
ing sought when verifying a claim. In this pa-
per, we present Varifocal, a method that gener-
ates questions based on different focal points
within a given claim, i.e. different spans of the
claim and its metadata, such as its source and
date. Our method outperforms previous work
on a fact-checking question generation dataset
on a wide range of automatic evaluation met-
rics. These results are corroborated by our
manual evaluation, which indicates that our
method generates more relevant and informa-
tive questions. We further demonstrate the po-
tential of focal points in generating sets of clar-
ification questions for product descriptions.
1 Introduction
The growing amount of information online and its
impact have increased the need for fact-checking,
i.e. judging whether a claim is true or false. To
determine the truthfulness of a claim, fact-checkers
need to answer questions related to the claim, world
knowledge within its time frame, local politics,
etc. (Graves,2017). Using questions and answers
has also been shown to be an effective way of
conveying fact-checks. For instance, Altay et al.
(2021) found that presenting information related
to COVID-19 as answers to questions improved
attitudes towards vaccination more than merely pre-
senting the relevant facts.
As professional fact-checkers can spend a day
to verify a single claim depending on its complex-
ity (Adair et al.,2017;Hassan et al.,2017), there
*Equal contribution.
has been a growing focus on how to accelerate
the fact-checking process via automation (Cohen
et al.,2011;Graves and Anderson,2020). Fan et al.
(2020) showed that generating questions and an-
swering them reduces the time spent on verification
by approximately 20%. Fact verification questions
tackle information that is missing from the claim,
which renders the generation task challenging yet
useful for assisting professionals.
Previous work on question generation assumes
that the answer is known, typically contained in a
passage given in the input (Rajpurkar et al.,2016;
Duan et al.,2017;Wang et al.,2017). Such pas-
sages, though, are what is being sought when fact-
checking a claim. The only exception is recent
work on clarification questions (Rao and Daumé,
2019;Majumder et al.,2021). However, work
in this area examines a specific narrow domain
where a limited number of questions can be asked,
e.g. questions related to product descriptions in
the Amazon dataset of McAuley and Yang (2016),
or dialogues in the Ubuntu dataset (Lowe et al.,
2015), which is not the case in fact-checking. Fact-
checking questions are more diverse and may rely
on the experience and intuition of the fact-checker
as they aim to scrutinize every piece of informa-
tion within the claim. They can be as generic as
questions about general definitions or as specific as
those tackling details about a one-time event.
In this paper, we propose an approach that gen-
erates questions for claim verification, which we
name Varifocal. It uses focal points from different
spans of the claim as well as its metadata, i.e. its
source, and its date. Each focal point guides the
generator to question a different part of the claim;
e.g. in Figure 1when “Miss Universe Guyana 2017”
is used as the focal point, the question generated is
about who she is.
We evaluate our approach on the QABriefs
dataset introduced by Fan et al. (2020) using a
wide range of automatic metrics, and show that
arXiv:2210.12400v1 [cs.CL] 22 Oct 2022
Figure 1: The architecture of Varifocal. We use a dependency parser to extract the different focal points, i.e. spans,
then generate questions based on them. We rank the generated questions using a re-ranker and return the top n
questions. The example in the figure was generated by our system. We show three highlighted focal points along
with the (output) questions they led to.
Varifocal performs the best among the different sys-
tems considered. In addition, we conduct a human
evaluation on questions generated by four different
systems and gold standard questions based on four
criteria: a) intelligibility, b) clarity, c) relevance,
and d) informativeness. The results show that Var-
ifocal generates more intelligible, clear, relevant,
and informative questions than the other systems,
corroborating the results of the automatic evalua-
tion.*
Finally, we apply Varifocal to generating sets of
clarification questions on Amazon product descrip-
tions (McAuley and Yang,2016), where it shows
competitive performance against methods that gen-
erate single questions while having other ones in
the set as part of the input (Majumder et al.,2021).
2 Related Work
The main piece of previous work on question gener-
ation for fact-checking is by Fan et al. (2020). They
proposed the QABriefs dataset, which consists of
claims with manually annotated question-answer
pairs containing additional information about the
claims (e.g. the exact definition of a term, the con-
tent of a bill, details about a political statement
or a vote). The QABriefs dataset contained ques-
tions asked by crowd-workers, who had to read
both the claim and its fact-checking article. Fan
*
Our code is available on
https://github.com/
nedjmaou/Varifocal_Fact_Checking_QG
et al. (2020) presented the QABriefer model, which
generates a set of questions conditioned on the
claim, searches the web for evidence and retrieves
answers. However, they evaluated the questions
generated only using BLEU scores without con-
ducting a human evaluation. More recently, Yang
et al. (2021) addressed the problem of explainabil-
ity in fact-checking through question answering
using the Fool Me Twice corpus (Eisenschlos et al.,
2021). They generated questions from the claim,
retrieved answers from the evidence, and compared
them to the generated ones. Yet, they did not evalu-
ate their question generation process, and assumed
that the evidence is given as input to generate the
questions, which is unrealistic since the questions
are typically used for evidence retrieval.
Other related work includes Saeidi et al. (2018)
who introduced a dataset containing 32k instances
of real-world policies, crowd-sourced fictional life
scenarios, and dialogues in order to reach a final
yes/no answer. The policies were given as input and
were explicitly stating what information needed to
be asked for, and the questions had to have a yes or
no answer. Neither of these hold in fact-checking,
where questions are not usually answered by yes
or no, and the information to be searched for is not
known in advance. More recently, Majumder et al.
(2021) presented a method to generate clarification
questions. They built a two-stage framework that
identifies missing information using the notions
of global and local schemas. The global schema
was built using filtered key phrases extracted from
contexts that were part of the same class of the
data, e.g. a class of similar products in the Amazon
data (McAuley and Yang,2016) and similar dia-
logues in the Ubuntu dataset (Lowe et al.,2015),
whereas the local schema was built using one given
context, and they defined the missing information
as the difference between the global and the local
schema. The extraction of comparable schemas
across different contexts was possible due to the
repetitive nature of the datasets considered, e.g. the
descriptions of products of the same type such as
laptops allow the prediction of potentially missing
properties which need clarification, in contrast to
fact-checking claims which are less repetitive.
The standard sequence-to-sequence architecture
(Sutskever et al.,2014) is typically used in ques-
tion generation approaches (Du et al.,2017;Zhou
et al.,2017). Although answer-aware approaches
allow for the generation of multiple questions con-
ditioned on the same passage (Sun et al.,2018),
providing the answer during inference is not pos-
sible in fact-checking since one would typically
ask questions about what is missing from the claim.
Other work includes question generation for ques-
tion answering (Duan et al.,2017), question gener-
ation for educational purposes (Heilman and Smith,
2010), and poll question generation from social me-
dia posts (Lu et al.,2021). Furthermore, Hosking
and Riedel (2019) evaluated rewards in question
generation, showed that they did not correlate with
human judgments, and explained why rewards did
not help when using reinforcement learning.
Commonly used evaluation metrics such as
BLEU (Papineni et al.,2002) and ROUGE (Lin,
2004) fall short at correlating with human judg-
ments when evaluating the quality of automatically
generated questions (Liu et al.,2016;Sultan et al.,
2020;Nema and Khapra,2018). Majumder et al.
(2021) carried a human evaluation based on fluency,
relevance, whether the question dealt with missing
information, and usefulness. In addition, Cheng
et al. (2021) proposed to assess the quality of au-
tomatically generated questions based on whether
they were well-formed, concise, answerable, and
answer-matching. Similarly, we conduct a human
evaluation of the generated questions adapted to
fact-checking.
3 Varifocal Question Generation
In this section, we describe Varifocal, an approach
that generates multiple questions per claim based
on its different aspects, which correspond to textual
spans that we call focal points.
Varifocal consists of three components: (1) a
focal point extractor, (2) a question generator that
generates a question for each focal point, and (3)
a re-ranker that ranks the generated questions, re-
moves duplicates and promotes questions that are
more likely to match the gold standard ones.
3.1 Focal Point Extraction
We consider two types of focal points: contigu-
ous spans from the claim and metadata elements.
For the former, we consider all the subtrees of
its syntactic parse tree, thus obtaining more co-
herent phrases than if we extracted randomly se-
lected n-grams. In addition, the metadata, which
includes (1) the source of the claim or the name
of the speaker, and (2) the date when the claim
was made, can be useful in question generation for
fact-checking. As shown in Figure 1, having access
to the date of the claim helped the model generate
a precise question, i.e. Where was Miss Universe
Guyana arrested in 2017?. As the metadata is not
part of the claim, we incorporate it using a template.
For instance, we combined the claim and metadata
of the example shown in Figure 1as follows: state-
news.com reported on 11/15/17 that Miss Universe
Guyana 2017 was arrested at London Heathrow
airport with 2 kilograms of cocaine.
3.2 Question Generation
This component takes a claim and its focal points
as input and generates a set of questions. Given
a claim
c
, the set of all focal points is denoted as
F
, where each focal point
fiF
is a span in the
claim
c
and its metadata, such as
fi= [ws, ..., we]
where
s
and
e
mark the start and the end of the
span, respectively. Then, for each focal point
fi
,
the model generates autoregressively a question
ˆqi
of nwords, as follows:
p( ˆqi|c, fi) =
n
Y
k=1
p(ˆqi[k]|ˆqi[0:k1] ,c;˜
fi]) (1)
c;˜
fi]
is the transformer-based encoding of
c
con-
catenated to
fi
. The question generation compo-
nent in Varifocal is similar to the answer-aware
sequence-to-sequence model (Sun et al.,2018).
摘要:

VarifocalQuestionGenerationforFact-checkingNedjmaOusidhoumZhangdieYuanAndreasVlachosDepartmentofComputerScienceandTechnologyUniversityofCambridgendo24,zy317,av308@cam.ac.ukAbstractFact-checkingrequiresretrievingevidencere-latedtoaclaimunderinvestigation.Thetaskcanbeformulatedasquestiongenerationba...

展开>> 收起<<
Varifocal Question Generation for Fact-checking Nedjma OusidhoumZhangdie YuanAndreas Vlachos Department of Computer Science and Technology.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:800.22KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注