Varifocal Question Generation for Fact-checking Nedjma OusidhoumZhangdie YuanAndreas Vlachos Department of Computer Science and Technology

2025-05-06 0 0 800.22KB 13 页 10玖币

侵权投诉

Varifocal Question Generation for Fact-checking

Nedjma Ousidhoum∗Zhangdie Yuan∗Andreas Vlachos

Department of Computer Science and Technology

University of Cambridge

ndo24,zy317,av308@cam.ac.uk

Abstract

Fact-checking requires retrieving evidence re-

lated to a claim under investigation. The

task can be formulated as question generation

based on a claim, followed by question an-

swering. However, recent question generation

approaches assume that the answer is known

and typically contained in a passage given as

input, whereas such passages are what is be-

ing sought when verifying a claim. In this pa-

per, we present Varifocal, a method that gener-

ates questions based on different focal points

within a given claim, i.e. different spans of the

claim and its metadata, such as its source and

date. Our method outperforms previous work

on a fact-checking question generation dataset

on a wide range of automatic evaluation met-

rics. These results are corroborated by our

manual evaluation, which indicates that our

method generates more relevant and informa-

tive questions. We further demonstrate the po-

tential of focal points in generating sets of clar-

iﬁcation questions for product descriptions.

1 Introduction

The growing amount of information online and its

impact have increased the need for fact-checking,

i.e. judging whether a claim is true or false. To

determine the truthfulness of a claim, fact-checkers

need to answer questions related to the claim, world

knowledge within its time frame, local politics,

etc. (Graves,2017). Using questions and answers

has also been shown to be an effective way of

conveying fact-checks. For instance, Altay et al.

(2021) found that presenting information related

to COVID-19 as answers to questions improved

attitudes towards vaccination more than merely pre-

senting the relevant facts.

As professional fact-checkers can spend a day

to verify a single claim depending on its complex-

ity (Adair et al.,2017;Hassan et al.,2017), there

*Equal contribution.

has been a growing focus on how to accelerate

the fact-checking process via automation (Cohen

et al.,2011;Graves and Anderson,2020). Fan et al.

(2020) showed that generating questions and an-

swering them reduces the time spent on veriﬁcation

by approximately 20%. Fact veriﬁcation questions

tackle information that is missing from the claim,

which renders the generation task challenging yet

useful for assisting professionals.

Previous work on question generation assumes

that the answer is known, typically contained in a

passage given in the input (Rajpurkar et al.,2016;

Duan et al.,2017;Wang et al.,2017). Such pas-

sages, though, are what is being sought when fact-

checking a claim. The only exception is recent

work on clariﬁcation questions (Rao and Daumé,

2019;Majumder et al.,2021). However, work

in this area examines a speciﬁc narrow domain

where a limited number of questions can be asked,

e.g. questions related to product descriptions in

the Amazon dataset of McAuley and Yang (2016),

or dialogues in the Ubuntu dataset (Lowe et al.,

2015), which is not the case in fact-checking. Fact-

checking questions are more diverse and may rely

on the experience and intuition of the fact-checker

as they aim to scrutinize every piece of informa-

tion within the claim. They can be as generic as

questions about general deﬁnitions or as speciﬁc as

those tackling details about a one-time event.

In this paper, we propose an approach that gen-

erates questions for claim veriﬁcation, which we

name Varifocal. It uses focal points from different

spans of the claim as well as its metadata, i.e. its

source, and its date. Each focal point guides the

generator to question a different part of the claim;

e.g. in Figure 1when “Miss Universe Guyana 2017”

is used as the focal point, the question generated is

about who she is.

We evaluate our approach on the QABriefs

dataset introduced by Fan et al. (2020) using a

wide range of automatic metrics, and show that

arXiv:2210.12400v1 [cs.CL] 22 Oct 2022

Figure 1: The architecture of Varifocal. We use a dependency parser to extract the different focal points, i.e. spans,

then generate questions based on them. We rank the generated questions using a re-ranker and return the top n

questions. The example in the ﬁgure was generated by our system. We show three highlighted focal points along

with the (output) questions they led to.

Varifocal performs the best among the different sys-

tems considered. In addition, we conduct a human

evaluation on questions generated by four different

systems and gold standard questions based on four

criteria: a) intelligibility, b) clarity, c) relevance,

and d) informativeness. The results show that Var-

ifocal generates more intelligible, clear, relevant,

and informative questions than the other systems,

corroborating the results of the automatic evalua-

tion.*

Finally, we apply Varifocal to generating sets of

clariﬁcation questions on Amazon product descrip-

tions (McAuley and Yang,2016), where it shows

competitive performance against methods that gen-

erate single questions while having other ones in

the set as part of the input (Majumder et al.,2021).

2 Related Work

The main piece of previous work on question gener-

ation for fact-checking is by Fan et al. (2020). They

proposed the QABriefs dataset, which consists of

claims with manually annotated question-answer

pairs containing additional information about the

claims (e.g. the exact deﬁnition of a term, the con-

tent of a bill, details about a political statement

or a vote). The QABriefs dataset contained ques-

tions asked by crowd-workers, who had to read

both the claim and its fact-checking article. Fan

Our code is available on

https://github.com/

nedjmaou/Varifocal_Fact_Checking_QG

et al. (2020) presented the QABriefer model, which

generates a set of questions conditioned on the

claim, searches the web for evidence and retrieves

answers. However, they evaluated the questions

generated only using BLEU scores without con-

ducting a human evaluation. More recently, Yang

et al. (2021) addressed the problem of explainabil-

ity in fact-checking through question answering

using the Fool Me Twice corpus (Eisenschlos et al.,

2021). They generated questions from the claim,

retrieved answers from the evidence, and compared

them to the generated ones. Yet, they did not evalu-

ate their question generation process, and assumed

that the evidence is given as input to generate the

questions, which is unrealistic since the questions

are typically used for evidence retrieval.

Other related work includes Saeidi et al. (2018)

who introduced a dataset containing 32k instances

of real-world policies, crowd-sourced ﬁctional life

scenarios, and dialogues in order to reach a ﬁnal

yes/no answer. The policies were given as input and

were explicitly stating what information needed to

be asked for, and the questions had to have a yes or

no answer. Neither of these hold in fact-checking,

where questions are not usually answered by yes

or no, and the information to be searched for is not

known in advance. More recently, Majumder et al.

(2021) presented a method to generate clariﬁcation

questions. They built a two-stage framework that

identiﬁes missing information using the notions

of global and local schemas. The global schema

was built using ﬁltered key phrases extracted from

contexts that were part of the same class of the

data, e.g. a class of similar products in the Amazon

data (McAuley and Yang,2016) and similar dia-

logues in the Ubuntu dataset (Lowe et al.,2015),

whereas the local schema was built using one given

context, and they deﬁned the missing information

as the difference between the global and the local

schema. The extraction of comparable schemas

across different contexts was possible due to the

repetitive nature of the datasets considered, e.g. the

descriptions of products of the same type such as

laptops allow the prediction of potentially missing

properties which need clariﬁcation, in contrast to

fact-checking claims which are less repetitive.

The standard sequence-to-sequence architecture

(Sutskever et al.,2014) is typically used in ques-

tion generation approaches (Du et al.,2017;Zhou

et al.,2017). Although answer-aware approaches

allow for the generation of multiple questions con-

ditioned on the same passage (Sun et al.,2018),

providing the answer during inference is not pos-

sible in fact-checking since one would typically

ask questions about what is missing from the claim.

Other work includes question generation for ques-

tion answering (Duan et al.,2017), question gener-

ation for educational purposes (Heilman and Smith,

2010), and poll question generation from social me-

dia posts (Lu et al.,2021). Furthermore, Hosking

and Riedel (2019) evaluated rewards in question

generation, showed that they did not correlate with

human judgments, and explained why rewards did

not help when using reinforcement learning.

Commonly used evaluation metrics such as

BLEU (Papineni et al.,2002) and ROUGE (Lin,

2004) fall short at correlating with human judg-

ments when evaluating the quality of automatically

generated questions (Liu et al.,2016;Sultan et al.,

2020;Nema and Khapra,2018). Majumder et al.

(2021) carried a human evaluation based on ﬂuency,

relevance, whether the question dealt with missing

information, and usefulness. In addition, Cheng

et al. (2021) proposed to assess the quality of au-

tomatically generated questions based on whether

they were well-formed, concise, answerable, and

answer-matching. Similarly, we conduct a human

evaluation of the generated questions adapted to

fact-checking.

3 Varifocal Question Generation

In this section, we describe Varifocal, an approach

that generates multiple questions per claim based

on its different aspects, which correspond to textual

spans that we call focal points.

Varifocal consists of three components: (1) a

focal point extractor, (2) a question generator that

generates a question for each focal point, and (3)

a re-ranker that ranks the generated questions, re-

moves duplicates and promotes questions that are

more likely to match the gold standard ones.

3.1 Focal Point Extraction

We consider two types of focal points: contigu-

ous spans from the claim and metadata elements.

For the former, we consider all the subtrees of

its syntactic parse tree, thus obtaining more co-

herent phrases than if we extracted randomly se-

lected n-grams. In addition, the metadata, which

includes (1) the source of the claim or the name

of the speaker, and (2) the date when the claim

was made, can be useful in question generation for

fact-checking. As shown in Figure 1, having access

to the date of the claim helped the model generate

a precise question, i.e. Where was Miss Universe

Guyana arrested in 2017?. As the metadata is not

part of the claim, we incorporate it using a template.

For instance, we combined the claim and metadata

of the example shown in Figure 1as follows: state-

news.com reported on 11/15/17 that Miss Universe

Guyana 2017 was arrested at London Heathrow

airport with 2 kilograms of cocaine.

3.2 Question Generation

This component takes a claim and its focal points

as input and generates a set of questions. Given

a claim

, the set of all focal points is denoted as

, where each focal point

fi∈F

is a span in the

claim

and its metadata, such as

fi= [ws, ..., we]

where

and

mark the start and the end of the

span, respectively. Then, for each focal point

the model generates autoregressively a question

ˆqi

of nwords, as follows:

p( ˆqi|c, fi) =

k=1

p(ˆqi[k]|ˆqi[0:k−1] ,[˜c;˜

fi]) (1)

[˜c;˜

fi]

is the transformer-based encoding of

con-

catenated to

. The question generation compo-

nent in Varifocal is similar to the answer-aware

sequence-to-sequence model (Sun et al.,2018).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VarifocalQuestionGenerationforFact-checkingNedjmaOusidhoumZhangdieYuanAndreasVlachosDepartmentofComputerScienceandTechnologyUniversityofCambridgendo24,zy317,av308@cam.ac.ukAbstractFact-checkingrequiresretrievingevidencere-latedtoaclaimunderinvestigation.Thetaskcanbeformulatedasquestiongenerationba...

展开>> 收起<<

Varifocal Question Generation for Fact-checking Nedjma OusidhoumZhangdie YuanAndreas Vlachos Department of Computer Science and Technology.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Varifocal Question Generation for Fact-checking Nedjma OusidhoumZhangdie YuanAndreas Vlachos Department of Computer Science and Technology

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: