Empowering the Fact-checkers Automatic Identiﬁcation of Claim Spans on Twitter Megha Sundriyal1 Atharva Kulkarni1 Vaibhav Pulastya1 Md Shad Akhtar1 Tanmoy Chakraborty2

2025-04-29 0 0 647.02KB 16 页 10玖币

侵权投诉

Empowering the Fact-checkers! Automatic Identiﬁcation

of Claim Spans on Twitter

Megha Sundriyal1∗, Atharva Kulkarni1∗, Vaibhav Pulastya1, Md Shad Akhtar1, Tanmoy Chakraborty2

1IIIT Delhi, India, 2IIT Delhi, India

{meghas, atharvak, vaibhav17271, shad.akhtar}@iiitd.ac.in,tanchak@ee.iitd.ac.in

Abstract

The widespread diffusion of medical and polit-

ical claims in the wake of COVID-19 has led to

a voluminous rise in misinformation and fake

news. The current vogue is to employ manual

fact-checkers to efﬁciently classify and verify

such data to combat this avalanche of claim-

ridden misinformation. However, the rate of

information dissemination is such that it vastly

outpaces the fact-checkers’ strength. There-

fore, to aid manual fact-checkers in eliminat-

ing the superﬂuous content, it becomes im-

perative to automatically identify and extract

the snippets of claim-worthy (mis)information

present in a post. In this work, we introduce

the novel task of Claim Span Identiﬁcation

(CSI). We propose CURT, a large-scale Twitter

corpus with token-level claim spans on more

than 7.5ktweets. Furthermore, along with

the standard token classiﬁcation baselines, we

benchmark our dataset with DABERTa, an

adapter-based variation of RoBERTa. The ex-

perimental results attest that DABERTa outper-

forms the baseline systems across several eval-

uation metrics, improving by about 1.5points.

We also report detailed error analysis to vali-

date the model’s performance along with the

ablation studies. Lastly, we release our com-

prehensive span annotation guidelines for pub-

lic use.

1 Introduction

The swift acceleration of Online Social Media

(OSM) platforms has led to tremendous democ-

ratized content creation and information exchange.

Consequently, these platforms serve as ideal breed-

ing grounds for malicious rumormongers and tale-

bearers, abetting a colossal upsurge of misinfor-

mation. Such misinformation manifests in many

ways, including bogus claims, fabricated informa-

tion, and rumors. The massive COVID-19 ‘In-

fodemic’ (Naeem and Bhatti,2020) is one such

∗Equal contribution

RT @PirateAtLaw: No no no. Corona beer is the cure

not the disease.

We don’t have evidence but we are positive our wine

keeps you from getting #COVID19 if you drink enough

of it. Better alternative to #DisinfectantInjection

don’t you think? #winecures.

RT @angeliicamdc: Mexicans are immune to the

coronavirus because we have sana sana colita de rana

@adamseconomics Vaccine is probably made from

Chinese ingredients sourced in Wuhan.

Figure 1: Examples of claim tweets and their ground

truth claim spans highlighted in boldface text (blue).

malignant byproduct that led to the rampant spread

of political and social calumny (Ferrara,2020;Mar-

golin,2020;Ziems et al.,2020), accompanied by

counterfeit pharmaceutical claims (O’Connor and

Murphy,2020). Therefore, ﬁnding such claim-

ridden posts on OSM platforms, investigating their

plausibility, and differentiating the credible claims

from the apocryphal ones has risen to be a pertinent

research problem in Argument Mining (AM).

‘Claim’, as coined by Toulmin (2003), is ‘an

assertion that deserves our attention’. It is the key

component of any argument (Daxenberger et al.,

2017). Consider the second tweet, ‘We don’t have

evidence...’, as given in Figure 1. For the task of

claim identiﬁcation at the coarse level, the entire

tweet will be marked as a claim. However, on

closer inspection, we ﬁnd that the text fragments

of ‘our wine keeps you from getting #COVID19’

and ‘Better alternative to #DisinfectantInjection’

represent the ﬁner argumentative units of claim

and form the set of evidence, based on which this

tweet is considered a claim. Segregating such argu-

mentative units of misinformed claims from their

benign counterparts fosters many beneﬁts. To be-

gin with, it partitions the otherwise independent

claims in a single post, enabling us to retrieve a

larger number of claims. Secondly, it acts as a

arXiv:2210.04710v2 [cs.CL] 11 Oct 2022

precursor to the downstream tasks of claim check-

worthiness and claim veriﬁcation. Thirdly, it will

also bring in the angle of explainability in coarse-

grained claim identiﬁcation. Finally, it will serve

the manual fact-checkers and hoax-debunkers

1,2

conveniently strain out the unnecessary shreds of

text from further processing. We further elaborate

on the necessity of claim span identiﬁcation and

exemplify it in Section 2.

Though the recent literature reﬂects extensive

work done in claim detection (Daxenberger et al.,

2017;Chakrabarty et al.,2019;Gupta et al.,2021),

limited forays have been made in claim span iden-

tiﬁcation i.e., recognizing the argumentative com-

ponents of a claim (Wührl and Klinger,2021). In

the recent past, commendable work has been done

on span-level argument unit recognition pertaining

to other computational counterparts under the um-

brella of AM, such as hate speech (Mathew et al.,

2021), toxic language (Pavlopoulos et al.,2021)

etc. Such study, however, has eluded the realm

of claims, owning to the lack of quality annotated

datasets. This heralds a specialized corpus creation

on claim span identiﬁcation.

To this end, we propose

CURT

(

laim

nit

ecognition in

weets), a large-scale, claim span

annotated Twitter corpus. We also present several

baseline models for solving claim span identiﬁca-

tion as a token classiﬁcation task and evaluate them

CURT

. Furthermore, we introduce claim descrip-

tions, which are generic prompts aimed to assist the

model in focusing on the most signiﬁcant regions

of the input text using explicit instructions on what

to designate as a ‘claim’. They are elucidated later

in detail. Finally, we benchmark our dataset with

DABERTa

(

escription

ware

oBERTa), a plug-

and-play adapter-based variant of RoBERTa (Liu

et al.,2019), endeavored to infuse the Pre-trained

Language Model (PLM) with the description in-

formation. Empirical results attest that

DABERTa

outperforms the conventional baselines and generic

PLMs for our task consistently across various met-

rics.

Contributions.

Through this work, we make the

following tangible contributions:

1. Formulation of a novel problem statement

We propose the novel task of Claim Span Identi-

ﬁcation that aims to identify argument units of

claims in the given text.

1https://www.snopes.com/

2https://www.politifact.com/

2. Claim span identiﬁcation dataset and exten-

sive annotation guidelines

: We posit a large-

scale Twitter dataset, the ﬁrst of its kind, with

7.5k

claim span annotated tweets, to placate the

absence of the annotated dataset for claim span

identiﬁcation. Additionally, we develop com-

prehensive annotation guidelines for the same.

3. Claim span identiﬁcation system

: We pro-

pose a robust claim span identiﬁcation frame-

work based on Compositional De-Attention

(CoDA) and Interactive Gating Mechanism

(IGM).

4. Extensive evaluation and analysis

: We eval-

uate our model against different baselines to

conﬁrm sizable improvements over them. We

also report thorough qualitative and quantitative

analysis along with the ablation studies.

Reproducibility.

We release our dataset (

CURT

)

and source code for

DABERTa

publicly at

https://github.com/LCS2-IIITD/

DABERTA-EMNLP-2022.

2 Why Claim Span Identiﬁcation?

As stated in Section 1, we hypothesize that claim

span identiﬁcation would aid fact-checkers to

quickly segregate claim-ridden content from the

rest of the post. Moreover, we suppose that it will

be a propitious precursor for claim veriﬁcation and

fact-checking, facilitating better retrieval of rele-

vant evidences. We back our hypothesis with a

small experiment of evidence-based document re-

trieval. We collect 50 random samples from

CURT

along with their corresponding ground-truth claim

spans. Further, for both the tweets and the claim

spans, we extract top-

relevant articles from a

knowledge-base leveraging the traditional retrieval

system, BM25 (Robertson et al.,1995). We use

the recently released publicly available CORD19

corpus (Wang et al.,2020) to retrieve factual doc-

uments. Finally, we present retrieved documents

to three evaluators and ask them to mark whether

or not the retrieved shreds of evidence are relevant

to the given input tweet/span from our dataset. All

three annotators label each text-evidence pair inde-

pendently. Eventually, to obtain the ﬁnal relevancy

score, majority voting is employed. We obtain a

high inter-annotator score (Fleiss Kappa) of 0.63

and 0.67 for tweets and spans, respectively.

We compare the performance of tweet-based and

span-based retrievals in terms of precision (P) and

Input P@5 P@3 nDCG@5 nDCG@3

Tweets 0.3922 0.2745 0.2733 0.2280

Spans 0.4407 0.3390 0.3038 0.2521

Table 1: nDCG@kand P@kscores for tweet and spans

using BM25 retrieval system and CORD19 dataset.

normalized Discounted Cumulative Gain (nDCG)

scores and report them in Table 1. For comparison,

we consider two different top-

settings (

=3 and

5). We begin by examining the retrieval perfor-

mance using P@

, which measures the fraction of

relevant documents extracted in the top-

set. Span-

based document retrieval consistently improves

precision scores when compared to tweets. For

nDCG@5, we discover that span-based retrieval

outperforms tweet-based retrieval by more than

When we limit the retrieval depth to 3, we see a

similar pattern. This, in turn, demonstrates that

entire posts contain much extraneous information,

frequently impeding the performance of evidence

retrieval systems that are a prerequisite for both

automated and manual fact-checking. In summary,

we reinforce that our hypothesis positively stands

true, as span-based document retrieval results in a

better score for precision as well as nDCG. This

attests to the task’s feasibility and importance in

the realm of claims.

3 Related Work

Claims on Social Media.

The prevailing re-

search on claims could be cleft into three categories

– claim detection (Levy et al.,2014;Chakrabarty

et al.,2019;Gupta et al.,2021), claim check-

worthiness (Jaradat et al.,2018;Wright and Au-

genstein,2020), and claim veriﬁcation (Zhi et al.,

2017;Hanselowski et al.,2018;Soleimani et al.,

2020). Bender et al. (2011) pioneered the efforts in

claim detection by introducing the AAWD corpus.

Subsequent studies largely relied on using linguisti-

cally motivated features such as sentiment, syntax,

context-free grammars, and parse-trees (Rosenthal

and McKeown,2012;Levy et al.,2014;Lippi and

Torroni,2015).

Recent works in claim detection have engen-

dered the use of large language models (LMs).

Chakrabarty et al. (2019) re-enforced the power

of ﬁne-tuning, as their ULMFiT LM, ﬁne-tuned on

a large Reddit corpus of about 5M opinionated

claims, showed notable improvements in claim

detection benchmark. Gupta et al. (2021) pro-

posed a generalized claim detection model for de-

tecting claims independent of its source. They

handled structured and unstructured data in con-

junction by training a blend of linguistic encoders

(POS and dependency trees) and a contextual en-

coder (BERT) to exploit the input text’s semantics

and syntax. As LMs account for signiﬁcant com-

putational overheads, Sundriyal et al. (2021) ad-

dressed this quandary and proposed a lighter frame-

work that attempted to fabricate discernible feature

spaces. The CheckThat! Lab’s CLEF-

2020

shared

task (Barrón-Cedeno et al.,2020) has garnered the

attention of several researchers. Williams et al.

(2020) won the task by ﬁne-tuning the RoBERTa

(Liu et al.,2019) accentuated by mean pooling

and dropout. Nikolov et al. (2020) ranked second

with their out-of-the-box RoBERTa vectors supple-

mented with Twitter meta-data.

Span Identiﬁcation.

Zaidan et al. (2007) intro-

duced the concept of rationales, which highlighted

text segments that supported their label’s judgment.

Trautmann et al. (2020) released AURC-

dataset

with token-level span annotations for the argumen-

tative components of stance along with their cor-

responding label. Mathew et al. (2021) proposed

a quality corpus for explainable hate identiﬁcation

with token-level annotations. The SemEval com-

munity has initiated ﬁne-grained span identiﬁca-

tion concerning other domains of argument mining

such as toxic comments (Pavlopoulos et al.,2021)

and propaganda techniques (Da San Martino et al.,

2020). These shared tasks amassed many solutions

constituting transformers (Chhablani et al.,2021),

convolutional neural networks (Coope et al.,2020),

data augmentation techniques (Rusert,2021;Plu-

ci´

nski and Klimczak,2021), and ensemble frame-

works (Zhu et al.,2021a;Nguyen et al.,2021).

Wührl and Klinger (2021) resembled the closest

study to ours, wherein they compiled a corpus of

around

1.2k

biomedical tweets with claim phrases.

In summary, existing literature on claims concen-

trates entirely on sentence-level claim identiﬁcation

and does not investigate on eliciting ﬁne-grained

claim spans. In this work, we endeavor to move

from coarse-grained claim detection to ﬁne-grained

claim span identiﬁcation. We consolidate a large

manually annotated Twitter dataset for claim span

identiﬁcation task and benchmark it with various

baselines and a dedicated description-based model.

4 Dataset

Over the past few years, several claim detection

datasets have been released (Rosenthal and McK-

Dataset Train Test Validation

Total no. of claims 6044 755 756

Avg. length of tweets 27.40 26.93 27.29

Avg. length of spans 10.90 10.97 10.71

No. of span per tweet 1.25 1.20 1.27

No. of single span tweets 4817 629 593

No. of multiple span tweets 1201 121 161

Table 2: Dataset statistics. All the lengths are in tokens.

eown,2012;Chakrabarty et al.,2019). However,

none of these corpora come with claim-based ratio-

nales that quantify a post as a claim. To bridge this

gap, we propose

CURT

(

laim

nit

ecognition

weets), a large scale Twitter corpus with token-

level claim span annotations.

Data Selection.

We annotate claim detection

Twitter dataset released by Gupta et al. (2021) for

our task. However, the guidelines they presented

have certain reservations, wherein they do not ex-

plicitly account for benedictions, proverbs, warn-

ings, advice, predictions, and indirect questions.

As a result, tweets such as ‘Dear God, Please put

an end to the Coronavirus. Amen’ and ‘@FLO-

TUS Melania, do you approve of ingesting bleach

and shining a bright light in the rectal area as a

quick cure for #COVID19? #BeBest’ have been

mislabeled claims. This prompted us to extend the

existing guidelines and introduce a more exclusive

and nuanced set of deﬁnitions based on claim span

identiﬁcation. We present details of the extended

annotation guidelines and guideline development

procedure in Appendix (A.1). In total, we anno-

tated

7555

tweets from the Twitter corpus by Gupta

et al. (2021) which met our guidelines.

Dataset Statistics and Analysis.

We segment

CURT

into three partitions – training set, validation

set, and test set, in the split of

. Dataset

related statistics are given in Table 2. One impor-

tant point to note here is that while a claim tweet

is typically

tokens long, a claim span is only

around

tokens long. This implies that the claim-

ridden tweets have a lot of extraneous informa-

tion. Arguments can also perhaps comprise several

claims that may or may not be related to each other.

Around

19%

of the claim tweets in our dataset con-

tain multiple claim spans. As a result, in total, we

obtain

9458

claim spans from

7555

tweets. We ob-

serve that the majority of the tweets contain single

claims. Out of

7555

tweets,

6039

include a single

claim, demonstrating that the majority of tweets

contemplate single assertions at a time.

5 Proposed Methodology

In this section, we outline

DABERTa

and its intri-

cacies. The main aim is to seamlessly coalesce crit-

ical domain-speciﬁc information into Pre-trained

Language Models (PLM). To this end, we intro-

duce Description Infuser Network (DescNet), a

plug-and-play adapter module that conditions the

LM representations with respect to the handcrafted

descriptions. The underlying principle behind this

theoretical formalization is to link a claim span to

a claim description to guide the model on what to

focus on explicitly. As shown in Figure 2, Desc-

Net houses two sub-components, namely, Compo-

sitional De-Attention block (CoDA) and Interactive

Gating Mechanism (IGM). The particulars of each

component are delineated in the following sections.

Claim Descriptions.

Before delving into CoDA

and IGM, we ﬁrst examine Claim Descriptions,

which are the cornerstone of the proposed model.

Claim Descriptions are handcrafted templates that

guide the model where to concentrate its focus.

The inclusion of claim description encourages the

model to focus on the most essential phrases in the

input tweet, which may be thought of as guided

attention that leads to increased performance. We

judiciously curated our claim descriptions in accor-

dance with the annotation guidelines for claims and

non-claims offered by Gupta et al. (2021). In Table

3we list some of the claim descriptions along with

the claims that they most align with. It is notewor-

thy that a claim can align with more than one claim

descriptions as well.

Overview of PLMs for Token Classiﬁcation.

To begin with the details of the proposed frame-

work,

DABERTa

, we present the working of PLMs

for the token classiﬁcation task. PLMs such as

BERT (Devlin et al.,2019), DistilBERT (Sanh

et al.,2019), and RoBERTa (Liu et al.,2019) are

widely used for various downstream NLP tasks

owning to their strong contextual language repre-

sentation capabilities and ﬁne-tuning ease. As the

input to these PLMs, each

ith

input text is ﬁrst to-

kenized into a sequence of sub-word embeddings

Xi∈RN×d

, where

is the maximum sequence

length and

is the feature dimension. Then a posi-

tional embedding vector

P Epos ∈RN×d

is added

to the token embeddings in a pointwise fashion to

retain the positional information (Vaswani et al.,

2017).

The vector

Zi∈RN×d

, hence obtained, is fed

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EmpoweringtheFact-checkers!AutomaticIdenticationofClaimSpansonTwitterMeghaSundriyal1,AtharvaKulkarni1,VaibhavPulastya1,MdShadAkhtar1,TanmoyChakraborty21IIITDelhi,India,2IITDelhi,India{meghas,atharvak,vaibhav17271,shad.akhtar}@iiitd.ac.in,tanchak@ee.iitd.ac.inAbstractThewidespreaddiffusionofmedica...

展开>> 收起<<

Empowering the Fact-checkers Automatic Identiﬁcation of Claim Spans on Twitter Megha Sundriyal1 Atharva Kulkarni1 Vaibhav Pulastya1 Md Shad Akhtar1 Tanmoy Chakraborty2.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Empowering the Fact-checkers Automatic Identiﬁcation of Claim Spans on Twitter Megha Sundriyal1 Atharva Kulkarni1 Vaibhav Pulastya1 Md Shad Akhtar1 Tanmoy Chakraborty2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: