Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation Max Glockner1 Yufang Hou2 Iryna Gurevych1

2025-05-02 0 0 474.48KB 21 页 10玖币

侵权投诉

Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for

Misinformation

Max Glockner1, Yufang Hou2, Iryna Gurevych1

1Ubiquitous Knowledge Processing Lab (UKP Lab),

Department of Computer Science and Hessian Center for AI (hessian.AI), Technical University of Darmstadt

2IBM Research Europe, Ireland

www.ukp.tu-darmstadt.de,yhou@ie.ibm.com

Abstract

Misinformation emerges in times of uncer-

tainty when credible information is lim-

ited. This is challenging for NLP-based

fact-checking as it relies on counter-evidence,

which may not yet be available. Despite in-

creasing interest in automatic fact-checking, it

is still unclear if automated approaches can

realistically refute harmful real-world misin-

formation. Here, we contrast and compare

NLP fact-checking with how professional fact-

checkers combat misinformation in the ab-

sence of counter-evidence. In our analysis, we

show that, by design, existing NLP task def-

initions for fact-checking cannot refute mis-

information as professional fact-checkers do

for the majority of claims. We then deﬁne

two requirements that the evidence in datasets

must fulﬁll for realistic fact-checking: It must

be (1) sufﬁcient to refute the claim and (2)

not leaked from existing fact-checking articles.

We survey existing fact-checking datasets and

ﬁnd that all of them fail to satisfy both cri-

teria. Finally, we perform experiments to

demonstrate that models trained on a large-

scale fact-checking dataset rely on leaked ev-

idence, which makes them unsuitable in real-

world scenarios. Taken together, we show

that current NLP fact-checking cannot realis-

tically combat real-world misinformation be-

cause it depends on unrealistic assumptions

about counter-evidence in the data1.

1 Introduction

According to van der Linden (2022), misinforma-

tion is “false or misleading information masquerad-

ing as legitimate news, regardless of intent”. Mis-

information is dangerous as it can directly impact

human behavior and have harmful real-world con-

sequences such as the Pizzagate shooting (Fisher

et al.,2016), interfering in the 2016 democratic US

Code provided at

https://github.com/UKPLab/

emnlp2022-missing-counter-evidence

Figure 1: A false claim from PolitiFact. It is unlikely to

ﬁnd counter-evidence. Fact-checkers refute the claim

by disproving why it was made.

election (Bovet and Makse,2019), or the promo-

tion of false COVID-19 cures (Aghababaeian et al.,

2020). Surging misinformation during the COVID-

19 pandemic, coined “infodemic” by WHO (Zaro-

costas,2020), exempliﬁes the danger coming from

misinformation. To combat misinformation, jour-

nalists from fact-checking organizations (e.g., Poli-

tiFact or Snopes) conduct a laborious manual effort

to verify claims based on possible harms and their

prominence (Arnold,2020). However, manual fact-

checking cannot keep pace with the rate at which

misinformation is posted and circulated. Auto-

matic fact-checking has gained signiﬁcant attention

within the NLP community in recent years, with the

goal of developing tools to assist fact-checkers in

combating misinformation. For the past few years,

NLP researchers have created a wide range of fact-

checking datasets with claims from fact-checking

organization websites (Vlachos and Riedel,2014;

Wang,2017;Augenstein et al.,2019;Hanselowski

et al.,2019;Ostrowski et al.,2021;Gupta and

Srikumar,2021;Khan et al.,2022). The fundamen-

tal goal of fact-checking is, given a claim made

arXiv:2210.13865v1 [cs.CL] 25 Oct 2022

by a claimant, to ﬁnd a collection of evidence and

provide a verdict about the claim’s veracity based

on the evidence. The underlying technique used by

fact-checkers, and journalists in general, to assess

the veracity of a claim is called veriﬁcation (Silver-

man,2016). In a comprehensive survey, Guo et al.

(2022) proposed an NLP fact-checking framework

(FCNLP) that aggregates existing (sub)tasks and

approaches of automated fact-checking. FCNLP

reﬂects current research trends on automatic fact-

checking in NLP and divides the aforementioned

process into evidence retrieval,verdict prediction,

and justiﬁcation production.

In this paper, we focus on harmful misinfor-

mation claims that satisﬁed the professional fact-

checkers’ selection criteria and refer to them as

real-world misinformation. Our goal is to answer

the following research question:

Can evidence-

based NLP fact-checking approaches in FCNLP

refute novel real-world misinformation?

FC-

NLP assumes a system has access to counter-

evidence (e.g., through information retrieval) to re-

fute a claim. Consider the false claim “Telemundo

is an English-language television network” from

FEVER (Thorne et al.,2018): A system following

FCNLP must ﬁnd counter-evidence contradicting

the claim (i.e., Telemundo is a Spanish company)

to refute the claim. This may require more com-

plex reasoning over multiple documents. We con-

trast this example to the real-world false claim that

“Half a million sharks could be killed to make the

COVID-19 vaccine” (Figure 1). If true, credible

sources would likely report this incident, providing

supporting evidence. As it is not, before being fact-

checked, there is no refuting evidence stating that

COVID-19 vaccine production will not kill sharks.

Only after guaranteeing that the claim relies on the

false premise of COVID-19 vaccines using squa-

lene (harvested from sharks), it can be refuted. Af-

ter the claim’s veriﬁcation, fact-checkers publish

reports explaining the verdict and thereby produce

counter-evidence. Relying on counter-evidence

leaked from such reports is unrealistic if a system

is to be applied to new claims.

In this work, we identify gaps between current

research on FCNLP and the veriﬁcation process of

professional fact-checkers. Via analysis from dif-

ferent perspectives, we argue that the assumption

of the existence of counter-evidence in FCNLP is

unrealistic and does not reﬂect real-world require-

ments. We hope our analysis sheds light on future

Figure 2: Ratio of verdicts per year (PolitiFact).

research directions in automatic fact-checking. In

summary, our major contributions are:

•

We identify two criteria from the journalistic

veriﬁcation process, which allow overcoming

the reliance on counter-evidence (Section 2).

•

We show that FCNLP is incapable of satis-

fying these criteria, preventing the success-

ful veriﬁcation of most misinformation claims

from the journalistic perspective (Section 3).

•

We identify two evidence criteria (sufﬁcient &

unleaked) for realistic fact-checking. We ﬁnd

that all existing datasets in FCNLP containing

real-world misinformation violate at least one

criterion (Section 4) and are hence unrealistic.

•

We semi-automatically analyze MULTIFC, a

large-scale fact-checking dataset to support

our ﬁndings, and show that models trained

on claims from PolitiFact and Snopes (via

MULTIFC) rely on leaked evidence.

2 How Humans Fact-check

To motivate our distinct focus on misinforma-

tion, we investigate what claims professional fact-

checkers verify. We crawl 20,274 fact-checked

claims from PolitiFact

ranging from 2007–2021.

Figure 2shows the ratio of different verdicts

per

year. After 2016, fact-checkers increasingly select

false claims as important for fact-checking. In 2021

less than 10% of the selected claims were correct.

Some claims can be refuted via counter-evidence

(as required by FCNLP). For example, ofﬁcial

2https://www.politifact.com/

We conservatively group verdicts “pants on ﬁre” and

“false” to False, “mostly false” and “half true” to Mixed and

“mostly true” and “true” to True.

Claim Based Upon

(1) If you were forced to use a Sharpie to ﬁll out your ballot, that is voter fraud. false assumption

(2) The Biden administration will begin "spying" on bank and cash app accounts starting 2022. tax legislation

(3) Barcelona terrorist is cousins with former President Barack Obama. satire article

(4) The Democratic health care plan is a government takeover of our health programs. health care plan

(5) People in Holland protests against of COVID-19 measures. protests event

Table 1: Example misinformation claims for source guarantee.

statistics can contradict the false claim about the

U.S. that “In the 1980s, the lowest income people

had the biggest gains”. If the evidence makes it

impossible for the claim to be true (e.g., because

of mutually exclusive statistics) we refer to the evi-

dence as global counter-evidence. Global counter-

evidence attacks the textual claim itself without

relying on reasoning and sources behind it. In

contrast, to refute the claim that “COVID-19 vac-

cines may kill sharks” (Figure 1), fact-checkers

did not rely on global counter-evidence speciﬁcally

prooﬁng that sharks will not be killed to produce

COVID-19 vaccines. Neither is it plausible that

such counter-evidence exists. Here, the counter-

evidence is bound to the claim’s underlying (false)

reasoning. The claim is only refuted because it

follows the false assumption, not because it was

disproved. The absence of global counter-evidence

is not an exceptional problem for this speciﬁc claim

but is common among misinformation: Misinfor-

mation surges when the high demand for infor-

mation cannot be met with a sufﬁcient supply of

credible answers (Silverman,2014;FullFact,2020).

Non-credible and possibly false and harmful infor-

mation ﬁll these deﬁcits of credible information

(Golebiewski and Boyd,2019;Shane and Noel,

2020). The very existence of misinformation often

builds on the absence of credible counter-evidence,

which in turn, is essential for FCNLP.

Professional fact-checkers refute misinformation

even if no global counter-evidence exists, e.g., by

rebutting underlying assumptions (Figure 1). Ta-

ble 1shows a few false claims built on top of vari-

ous resources: (1) relies on a false assumption that

sharpies invalidate election ballots, (2 & 4) misin-

terpret ofﬁcial documents or laws, (3) is based on

non-credible sources, and (5) changes a topic of a

speciﬁc event from “gas extraction” to “COVID-19

measures”. Fact-checkers use the reasoning for

the claim to consider evidence that is, or refers to,

the claimant’s source: the original tax legislation

(2), or alternate (correct) descriptions of protests

against gas extraction (5). Here, the content of the

evidence alone is often insufﬁcient. The assertion

that the claimant’s source and the used counter-

evidence are identical, or refer to the same event

is crucial to refute the claim: Claim (2) is refuted

because the tax legislation it relies upon does not

support the “spying” claim. However, the docu-

ment does not speciﬁcally refute the claim, and

without knowing that the claimant relied on it, it

becomes useless as counter-evidence. Similarly,

the correct narrative of protests against gas extrac-

tion is only mutually exclusive to the false claim

(5) of protests against COVID-19 measures when

assuring both refer to the identical incident. For

similar reasons, the co-reference assumption is crit-

ical to the task deﬁnition of SNLI (Bowman et al.,

2015). After this assertion, mutual exclusiveness

is not required to refute the claim: It is sufﬁcient

if the claim is not entailed (i.e. incorrectly derived

or relies on unveriﬁable speculations) or based on

invalid sources (such as satire) to refute it. Based

on these observations we identify two criteria to

refute claims if no global-counter evidence exists.

We validate their relevance in Section 3:

•Source Guarantee:

The guarantee that iden-

tiﬁed evidence either constitutes or refers to

the claimant’s reason for the claim.

•Context Availability:

We broadly consider

context as the claim’s original environment,

which allows us to unambiguously compre-

hend the claim, and trace the claim and its

sources across multiple platforms if required.

It is a logical precondition for the source guar-

antee.

Both criteria are challenging for computers but nat-

urally satisﬁed by human fact-checkers. Buttry

(2014) deﬁnes the question “How do you know

that?” to be at the heart of veriﬁcation. After se-

lecting a claim, ﬁnding provenance and sourcing

are the ﬁrst steps in journalistic veriﬁcation. Prove-

nance provides crucial information about context

and motivation (Urbani,2020). Journalists must

then identify solid sources to compare the claim

with (Silverman,2014;Borel,2016). Ideally, the

claimant provides sources, which must be included

and assessed in the veriﬁcation process. During

veriﬁcation, journalists rely, if possible, on relevant

primary sources, such as uninterpreted and original

legislation documents (for claim 2, Table 1). Fact-

checking organisations see sourcing as one of the

most important parts of their work (Arnold,2020).

3 Can FCNLP Help Human Veriﬁcation?

In this section, we ﬁrst analyze human veriﬁcation

strategies based on an analysis of 100 misinforma-

tion claims. We then contrast human veriﬁcation

strategies with FCNLP.

3.1 Human Veriﬁcation Strategies

We manually analyze 100 misinformation claims

from two well-known fact-checking organizations:

PolitiFact and Snopes. We randomly choose 50

misinformation claims from each website which

contains 25 claims from MULTIFC (a large NLP

fact-checking dataset with real-world claims before

2019) and 25 claims from 2020/2021. We extract

the URL for each claim and analyze its veriﬁca-

tion strategy based on the entire fact-checking arti-

cle. Claims that require the identiﬁcation of scam

webpages, imposter messages, or multi-modal rea-

soning

such as detecting misrepresented, mis-

captioned or manipulated images (Zlatkova et al.,

2019) were marked as not applicable to FCNLP

by nature. In the ﬁrst round of analysis, we assess

whether humans relied on the source guarantee to

refute the claim. Each claim (and its veriﬁcation)

is unique and can be refuted using different strate-

gies. In the second round of analysis we identify

the primary strategy to refute the claim and verify

that it is based on the source guarantee. This led us

to identify 4 primary human-veriﬁcation strategies:

Global counter-evidence (GCE): Counter-

evidence via arbitrarily complex reasoning but

without the source guarantee.

Local counter-evidence (LCE): Evidence re-

quires the source guarantee to refute the (rea-

soning behind) the claim.

Claims are from the following categories: “pants on ﬁre”,

“false” and “mostly false”.

If a claim can be expressed in text and veriﬁed without

multi-modal reasoning we consider the verbalized variant of

the claim and do not discard it.

Src. Strategy MULTIFC 20/21 All %

yes LCE 19 16 35 46.7

yes NCS 9 5 14 18.7

no GCE 10 10 20 26.7

no NEA 1 4 5 6.7

no other 0 1 1 1.3

yes all 28 21 49 65.3

no all 11 15 26 34.7

all all 39 36 75 100.0

Table 2: Strategies used to refute 75 of 100 misinforma-

tion claims with and without source guarantee (Src.).

Non-credible source (NCS): Evidence re-

quires the source guarantee to refute the claim

based on non-credible sources (e.g. satire).

No evidence assertion (NEA): The claim is

refuted as no (trusted) evidence supports it.

We discard 25 non-applicable claims and show

the results of the remaining 75 claims in Table 2.

Please refer to Appendix Afor more analysis de-

tails and examples. In some cases, the selection of

one strategy is ambiguous if multiple strategies are

applied. In a pilot study to analyze human veriﬁca-

tion strategies, two co-authors agreed on 9/10 ap-

plicable misinformation claims. In general, about

two-thirds of the claims were refuted by relying on

the source guarantee. In 20 cases fact-checkers re-

futed the claim by ﬁnding global counter-evidence.

In one case (other), fact-checkers relied entirely on

expert statements. In general, experts supported

the fact-checkers in identifying and discussing ev-

idence, or strengthened their argument via state-

ments but did not affect the underlying veriﬁcation

strategy.

3.2 NLP Fact Veriﬁcation

Focusing on evidence-based approaches.

Ap-

proaches in FCNLP estimate the claim’s veracity

based on surface cues within the claim (Rashkin

et al.,2017;Patwa et al.,2021), assisted with meta-

data (Wang,2017;Cui and Lee,2020;Li et al.,

2020;Dadgar and Ghatee,2021), or using evidence

documents. Here, the system uses the stance of

the evidence towards the claim to predict the ver-

dict. Verdict labels are often non-binary and in-

clude a neutral stance (Thorne et al.,2018), or ﬁne-

grained veracity labels from fact-checking organi-

zations (Augenstein et al.,2019). Evidence-based

approaches either rely on unveriﬁed documents or

user comments (Ferreira and Vlachos,2016;Zu-

biaga et al.,2016;Pomerleau and Rao,2017), or

assume access to a presumed trusted knowledge

base such as Wikipedia (Thorne et al.,2018), scien-

tiﬁc publications (Wadden et al.,2020), or search

engine results (Augenstein et al.,2019). In this pa-

per, we focus on trusted evidence-based veriﬁcation

approaches which can deal with the truth changing

over time (Schuster et al.,2019). More importantly,

they are the most representative of professional fact

veriﬁcation. Effectively debunking misinformation

requires stating the corrected fact and explaining

the myth’s fallacy (Lewandowsky et al.,2020), both

of which require trusted evidence.

Global counter-evidence assumption in FCNLP.

In FCNLP, evidence retrieval-based approaches

assume that the semantic content of a claim is

sufﬁcient to ﬁnd relevant (counter-) evidence in a

trusted knowledge base (Thorne et al.,2018;Jiang

et al.,2020;Wadden et al.,2020;Aly et al.,2021).

This becomes problematic for misinformation that

requires the source guarantee to refute the claim.

By nature, in this case, the claim and evidence

content are distinct and not entailing. Content can-

not assert that two different narratives describe the

same protests (e.g., Claim 5 in Table 1), or that

a non-entailing fact (squalene is harvested from

sharks) serves as a basis for the false claim (e.g.,

Figure 1). The consequence is a circular reasoning

problem: Knowing that a claim is false is a precon-

dition to establishing the source guarantee, which

in turn is needed to refute the claim. To escape

this cycle, one must (a) provide the source guar-

antee by other means than content (e.g., context),

or (b) ﬁnd evidence that refutes the claim without

the source guarantee (global counter-evidence). By

relying only on the content of the claim, FCNLP

cannot provide the source guarantee and is limited

to global counter-evidence, which only accounts

for 20% of misinformation claims analyzed in the

previous section.

Current FCNLP fails to provide source guaran-

tees.

We note that providing the source guarantee

goes beyond entity disambiguation, as required in

FEVER (Thorne et al.,2018). The self-contained

context within claims in FEVER is typically sufﬁ-

cient to disambiguate named entities if required.

After disambiguation, the retrieved evidence serves

as global counter-evidence.

In the claim “Poseidon grossed $181,674,817 at the world-

wide box ofﬁce on a budget of $160 million” it is clear that

“Poseidon” refers to the ﬁlm, not an ancient god. (FEVER)

Recent approaches further add context snippets

from Wikipedia (Sathe et al.,2020) or dialogues

(Gupta et al.,2022) to resolve ambiguities and can-

not provide the source guarantee to break the circu-

lar reasoning problem. These snippets differ from

the context used by professional fact-checkers who

often need to trace claims and their sources across

different platforms. Recently, Thorne et al. (2021)

annotate more realistic claims w.r.t. multiple evi-

dence passages. They found supporting and refut-

ing passages for the same claim, which prevents

the prediction of an overall verdict. Some works

collect evidence for the respective claims by identi-

fying scenarios where the claimant’s source is nat-

urally provided: such as a strictly moderated forum

(Saakyan et al.,2021), scientiﬁc publications (Wad-

den et al.,2020), or Wikipedia references (Sathe

et al.,2020). However, such source evidence is

only collected for true claims. Adhering to the

global counter-evidence assumptions of previous

work, false claims in these works are generated

artiﬁcially and do not reﬂect real-world misinfor-

mation.

3.3 Human and NLP Comparison

Our analysis (Table 2) ﬁnds fact-checkers only

refuted 26% of false claims with global counter-

evidence. In all other cases, fact-checkers relied on

source guarantees (LCE, NCS) or asserted that no

supporting evidence exists (NEA). The veriﬁcation

strategy is not evident given the claim alone but

dependent on existing evidence. The claim that

“President Barack Obama’s policies have forced

many parts of the country to experience rolling

blackouts” is refuted via global counter-evidence

(that rolling blackouts had natural causes). The

claim that “90% of rural women and 55% of all

women are illiterate in Morocco” seems veriﬁable

via ofﬁcial statistics. Yet, no comparable statistics

exist and the claim is refuted due to relying on a

decade-old USAID request report.

We further analyze claims refuted via global

counter-evidence, that FCNLP, in theory, can re-

fute. Some claims only require shallow reasoning

as directly contradicting evidence naturally exists:

A transcript of an interview in which Ron DeSantis

was asked about the coronavirus can easily refute

the claim “Ron DeSantis was never asked about

coronavirus”. Another case is when information

about the claim’s veracity already exists, e.g., be-

cause those affected by the myth already corrected

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MissingCounter-EvidenceRendersNLPFact-CheckingUnrealisticforMisinformationMaxGlockner1,YufangHou2,IrynaGurevych11UbiquitousKnowledgeProcessingLab(UKPLab),DepartmentofComputerScienceandHessianCenterforAI(hessian.AI),TechnicalUniversityofDarmstadt2IBMResearchEurope,Irelandwww.ukp.tu-darmstadt.de,yhou@...

展开>> 收起<<

Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation Max Glockner1 Yufang Hou2 Iryna Gurevych1.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation Max Glockner1 Yufang Hou2 Iryna Gurevych1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: