Empowering the Fact-checkers Automatic Identification of Claim Spans on Twitter Megha Sundriyal1 Atharva Kulkarni1 Vaibhav Pulastya1 Md Shad Akhtar1 Tanmoy Chakraborty2

2025-04-29 0 0 647.02KB 16 页 10玖币
侵权投诉
Empowering the Fact-checkers! Automatic Identification
of Claim Spans on Twitter
Megha Sundriyal1, Atharva Kulkarni1, Vaibhav Pulastya1, Md Shad Akhtar1, Tanmoy Chakraborty2
1IIIT Delhi, India, 2IIT Delhi, India
{meghas, atharvak, vaibhav17271, shad.akhtar}@iiitd.ac.in,tanchak@ee.iitd.ac.in
Abstract
The widespread diffusion of medical and polit-
ical claims in the wake of COVID-19 has led to
a voluminous rise in misinformation and fake
news. The current vogue is to employ manual
fact-checkers to efficiently classify and verify
such data to combat this avalanche of claim-
ridden misinformation. However, the rate of
information dissemination is such that it vastly
outpaces the fact-checkers’ strength. There-
fore, to aid manual fact-checkers in eliminat-
ing the superfluous content, it becomes im-
perative to automatically identify and extract
the snippets of claim-worthy (mis)information
present in a post. In this work, we introduce
the novel task of Claim Span Identification
(CSI). We propose CURT, a large-scale Twitter
corpus with token-level claim spans on more
than 7.5ktweets. Furthermore, along with
the standard token classification baselines, we
benchmark our dataset with DABERTa, an
adapter-based variation of RoBERTa. The ex-
perimental results attest that DABERTa outper-
forms the baseline systems across several eval-
uation metrics, improving by about 1.5points.
We also report detailed error analysis to vali-
date the model’s performance along with the
ablation studies. Lastly, we release our com-
prehensive span annotation guidelines for pub-
lic use.
1 Introduction
The swift acceleration of Online Social Media
(OSM) platforms has led to tremendous democ-
ratized content creation and information exchange.
Consequently, these platforms serve as ideal breed-
ing grounds for malicious rumormongers and tale-
bearers, abetting a colossal upsurge of misinfor-
mation. Such misinformation manifests in many
ways, including bogus claims, fabricated informa-
tion, and rumors. The massive COVID-19 ‘In-
fodemic’ (Naeem and Bhatti,2020) is one such
Equal contribution
RT @PirateAtLaw: No no no. Corona beer is the cure
not the disease.
We don’t have evidence but we are positive our wine
keeps you from getting #COVID19 if you drink enough
of it. Better alternative to #DisinfectantInjection
don’t you think? #winecures.
RT @angeliicamdc: Mexicans are immune to the
coronavirus because we have sana sana colita de rana
@adamseconomics Vaccine is probably made from
Chinese ingredients sourced in Wuhan.
Figure 1: Examples of claim tweets and their ground
truth claim spans highlighted in boldface text (blue).
malignant byproduct that led to the rampant spread
of political and social calumny (Ferrara,2020;Mar-
golin,2020;Ziems et al.,2020), accompanied by
counterfeit pharmaceutical claims (O’Connor and
Murphy,2020). Therefore, finding such claim-
ridden posts on OSM platforms, investigating their
plausibility, and differentiating the credible claims
from the apocryphal ones has risen to be a pertinent
research problem in Argument Mining (AM).
‘Claim’, as coined by Toulmin (2003), is ‘an
assertion that deserves our attention’. It is the key
component of any argument (Daxenberger et al.,
2017). Consider the second tweet, ‘We don’t have
evidence..., as given in Figure 1. For the task of
claim identification at the coarse level, the entire
tweet will be marked as a claim. However, on
closer inspection, we find that the text fragments
of ‘our wine keeps you from getting #COVID19’
and ‘Better alternative to #DisinfectantInjection’
represent the finer argumentative units of claim
and form the set of evidence, based on which this
tweet is considered a claim. Segregating such argu-
mentative units of misinformed claims from their
benign counterparts fosters many benefits. To be-
gin with, it partitions the otherwise independent
claims in a single post, enabling us to retrieve a
larger number of claims. Secondly, it acts as a
arXiv:2210.04710v2 [cs.CL] 11 Oct 2022
precursor to the downstream tasks of claim check-
worthiness and claim verification. Thirdly, it will
also bring in the angle of explainability in coarse-
grained claim identification. Finally, it will serve
the manual fact-checkers and hoax-debunkers
1,2
to
conveniently strain out the unnecessary shreds of
text from further processing. We further elaborate
on the necessity of claim span identification and
exemplify it in Section 2.
Though the recent literature reflects extensive
work done in claim detection (Daxenberger et al.,
2017;Chakrabarty et al.,2019;Gupta et al.,2021),
limited forays have been made in claim span iden-
tification i.e., recognizing the argumentative com-
ponents of a claim (Wührl and Klinger,2021). In
the recent past, commendable work has been done
on span-level argument unit recognition pertaining
to other computational counterparts under the um-
brella of AM, such as hate speech (Mathew et al.,
2021), toxic language (Pavlopoulos et al.,2021)
etc. Such study, however, has eluded the realm
of claims, owning to the lack of quality annotated
datasets. This heralds a specialized corpus creation
on claim span identification.
To this end, we propose
CURT
(
C
laim
U
nit
R
ecognition in
T
weets), a large-scale, claim span
annotated Twitter corpus. We also present several
baseline models for solving claim span identifica-
tion as a token classification task and evaluate them
on
CURT
. Furthermore, we introduce claim descrip-
tions, which are generic prompts aimed to assist the
model in focusing on the most significant regions
of the input text using explicit instructions on what
to designate as a ‘claim’. They are elucidated later
in detail. Finally, we benchmark our dataset with
DABERTa
(
D
escription
A
ware
R
oBERTa), a plug-
and-play adapter-based variant of RoBERTa (Liu
et al.,2019), endeavored to infuse the Pre-trained
Language Model (PLM) with the description in-
formation. Empirical results attest that
DABERTa
outperforms the conventional baselines and generic
PLMs for our task consistently across various met-
rics.
Contributions.
Through this work, we make the
following tangible contributions:
1. Formulation of a novel problem statement
:
We propose the novel task of Claim Span Identi-
fication that aims to identify argument units of
claims in the given text.
1https://www.snopes.com/
2https://www.politifact.com/
2. Claim span identification dataset and exten-
sive annotation guidelines
: We posit a large-
scale Twitter dataset, the first of its kind, with
7.5k
claim span annotated tweets, to placate the
absence of the annotated dataset for claim span
identification. Additionally, we develop com-
prehensive annotation guidelines for the same.
3. Claim span identification system
: We pro-
pose a robust claim span identification frame-
work based on Compositional De-Attention
(CoDA) and Interactive Gating Mechanism
(IGM).
4. Extensive evaluation and analysis
: We eval-
uate our model against different baselines to
confirm sizable improvements over them. We
also report thorough qualitative and quantitative
analysis along with the ablation studies.
Reproducibility.
We release our dataset (
CURT
)
and source code for
DABERTa
publicly at
https://github.com/LCS2-IIITD/
DABERTA-EMNLP-2022.
2 Why Claim Span Identification?
As stated in Section 1, we hypothesize that claim
span identification would aid fact-checkers to
quickly segregate claim-ridden content from the
rest of the post. Moreover, we suppose that it will
be a propitious precursor for claim verification and
fact-checking, facilitating better retrieval of rele-
vant evidences. We back our hypothesis with a
small experiment of evidence-based document re-
trieval. We collect 50 random samples from
CURT
,
along with their corresponding ground-truth claim
spans. Further, for both the tweets and the claim
spans, we extract top-
k
relevant articles from a
knowledge-base leveraging the traditional retrieval
system, BM25 (Robertson et al.,1995). We use
the recently released publicly available CORD19
corpus (Wang et al.,2020) to retrieve factual doc-
uments. Finally, we present retrieved documents
to three evaluators and ask them to mark whether
or not the retrieved shreds of evidence are relevant
to the given input tweet/span from our dataset. All
three annotators label each text-evidence pair inde-
pendently. Eventually, to obtain the final relevancy
score, majority voting is employed. We obtain a
high inter-annotator score (Fleiss Kappa) of 0.63
and 0.67 for tweets and spans, respectively.
We compare the performance of tweet-based and
span-based retrievals in terms of precision (P) and
Input P@5 P@3 nDCG@5 nDCG@3
Tweets 0.3922 0.2745 0.2733 0.2280
Spans 0.4407 0.3390 0.3038 0.2521
Table 1: nDCG@kand P@kscores for tweet and spans
using BM25 retrieval system and CORD19 dataset.
normalized Discounted Cumulative Gain (nDCG)
scores and report them in Table 1. For comparison,
we consider two different top-
k
settings (
k
=3 and
5). We begin by examining the retrieval perfor-
mance using P@
k
, which measures the fraction of
relevant documents extracted in the top-
k
set. Span-
based document retrieval consistently improves
precision scores when compared to tweets. For
nDCG@5, we discover that span-based retrieval
outperforms tweet-based retrieval by more than
3%
.
When we limit the retrieval depth to 3, we see a
similar pattern. This, in turn, demonstrates that
entire posts contain much extraneous information,
frequently impeding the performance of evidence
retrieval systems that are a prerequisite for both
automated and manual fact-checking. In summary,
we reinforce that our hypothesis positively stands
true, as span-based document retrieval results in a
better score for precision as well as nDCG. This
attests to the task’s feasibility and importance in
the realm of claims.
3 Related Work
Claims on Social Media.
The prevailing re-
search on claims could be cleft into three categories
– claim detection (Levy et al.,2014;Chakrabarty
et al.,2019;Gupta et al.,2021), claim check-
worthiness (Jaradat et al.,2018;Wright and Au-
genstein,2020), and claim verification (Zhi et al.,
2017;Hanselowski et al.,2018;Soleimani et al.,
2020). Bender et al. (2011) pioneered the efforts in
claim detection by introducing the AAWD corpus.
Subsequent studies largely relied on using linguisti-
cally motivated features such as sentiment, syntax,
context-free grammars, and parse-trees (Rosenthal
and McKeown,2012;Levy et al.,2014;Lippi and
Torroni,2015).
Recent works in claim detection have engen-
dered the use of large language models (LMs).
Chakrabarty et al. (2019) re-enforced the power
of fine-tuning, as their ULMFiT LM, fine-tuned on
a large Reddit corpus of about 5M opinionated
claims, showed notable improvements in claim
detection benchmark. Gupta et al. (2021) pro-
posed a generalized claim detection model for de-
tecting claims independent of its source. They
handled structured and unstructured data in con-
junction by training a blend of linguistic encoders
(POS and dependency trees) and a contextual en-
coder (BERT) to exploit the input text’s semantics
and syntax. As LMs account for significant com-
putational overheads, Sundriyal et al. (2021) ad-
dressed this quandary and proposed a lighter frame-
work that attempted to fabricate discernible feature
spaces. The CheckThat! Lab’s CLEF-
2020
shared
task (Barrón-Cedeno et al.,2020) has garnered the
attention of several researchers. Williams et al.
(2020) won the task by fine-tuning the RoBERTa
(Liu et al.,2019) accentuated by mean pooling
and dropout. Nikolov et al. (2020) ranked second
with their out-of-the-box RoBERTa vectors supple-
mented with Twitter meta-data.
Span Identification.
Zaidan et al. (2007) intro-
duced the concept of rationales, which highlighted
text segments that supported their label’s judgment.
Trautmann et al. (2020) released AURC-
8
dataset
with token-level span annotations for the argumen-
tative components of stance along with their cor-
responding label. Mathew et al. (2021) proposed
a quality corpus for explainable hate identification
with token-level annotations. The SemEval com-
munity has initiated fine-grained span identifica-
tion concerning other domains of argument mining
such as toxic comments (Pavlopoulos et al.,2021)
and propaganda techniques (Da San Martino et al.,
2020). These shared tasks amassed many solutions
constituting transformers (Chhablani et al.,2021),
convolutional neural networks (Coope et al.,2020),
data augmentation techniques (Rusert,2021;Plu-
ci´
nski and Klimczak,2021), and ensemble frame-
works (Zhu et al.,2021a;Nguyen et al.,2021).
Wührl and Klinger (2021) resembled the closest
study to ours, wherein they compiled a corpus of
around
1.2k
biomedical tweets with claim phrases.
In summary, existing literature on claims concen-
trates entirely on sentence-level claim identification
and does not investigate on eliciting fine-grained
claim spans. In this work, we endeavor to move
from coarse-grained claim detection to fine-grained
claim span identification. We consolidate a large
manually annotated Twitter dataset for claim span
identification task and benchmark it with various
baselines and a dedicated description-based model.
4 Dataset
Over the past few years, several claim detection
datasets have been released (Rosenthal and McK-
Dataset Train Test Validation
Total no. of claims 6044 755 756
Avg. length of tweets 27.40 26.93 27.29
Avg. length of spans 10.90 10.97 10.71
No. of span per tweet 1.25 1.20 1.27
No. of single span tweets 4817 629 593
No. of multiple span tweets 1201 121 161
Table 2: Dataset statistics. All the lengths are in tokens.
eown,2012;Chakrabarty et al.,2019). However,
none of these corpora come with claim-based ratio-
nales that quantify a post as a claim. To bridge this
gap, we propose
CURT
(
C
laim
U
nit
R
ecognition
in
T
weets), a large scale Twitter corpus with token-
level claim span annotations.
Data Selection.
We annotate claim detection
Twitter dataset released by Gupta et al. (2021) for
our task. However, the guidelines they presented
have certain reservations, wherein they do not ex-
plicitly account for benedictions, proverbs, warn-
ings, advice, predictions, and indirect questions.
As a result, tweets such as ‘Dear God, Please put
an end to the Coronavirus. Amen’ and ‘@FLO-
TUS Melania, do you approve of ingesting bleach
and shining a bright light in the rectal area as a
quick cure for #COVID19? #BeBest’ have been
mislabeled claims. This prompted us to extend the
existing guidelines and introduce a more exclusive
and nuanced set of definitions based on claim span
identification. We present details of the extended
annotation guidelines and guideline development
procedure in Appendix (A.1). In total, we anno-
tated
7555
tweets from the Twitter corpus by Gupta
et al. (2021) which met our guidelines.
Dataset Statistics and Analysis.
We segment
CURT
into three partitions – training set, validation
set, and test set, in the split of
80
:
10
:
10
. Dataset
related statistics are given in Table 2. One impor-
tant point to note here is that while a claim tweet
is typically
27
tokens long, a claim span is only
around
10
tokens long. This implies that the claim-
ridden tweets have a lot of extraneous informa-
tion. Arguments can also perhaps comprise several
claims that may or may not be related to each other.
Around
19%
of the claim tweets in our dataset con-
tain multiple claim spans. As a result, in total, we
obtain
9458
claim spans from
7555
tweets. We ob-
serve that the majority of the tweets contain single
claims. Out of
7555
tweets,
6039
include a single
claim, demonstrating that the majority of tweets
contemplate single assertions at a time.
5 Proposed Methodology
In this section, we outline
DABERTa
and its intri-
cacies. The main aim is to seamlessly coalesce crit-
ical domain-specific information into Pre-trained
Language Models (PLM). To this end, we intro-
duce Description Infuser Network (DescNet), a
plug-and-play adapter module that conditions the
LM representations with respect to the handcrafted
descriptions. The underlying principle behind this
theoretical formalization is to link a claim span to
a claim description to guide the model on what to
focus on explicitly. As shown in Figure 2, Desc-
Net houses two sub-components, namely, Compo-
sitional De-Attention block (CoDA) and Interactive
Gating Mechanism (IGM). The particulars of each
component are delineated in the following sections.
Claim Descriptions.
Before delving into CoDA
and IGM, we first examine Claim Descriptions,
which are the cornerstone of the proposed model.
Claim Descriptions are handcrafted templates that
guide the model where to concentrate its focus.
The inclusion of claim description encourages the
model to focus on the most essential phrases in the
input tweet, which may be thought of as guided
attention that leads to increased performance. We
judiciously curated our claim descriptions in accor-
dance with the annotation guidelines for claims and
non-claims offered by Gupta et al. (2021). In Table
3we list some of the claim descriptions along with
the claims that they most align with. It is notewor-
thy that a claim can align with more than one claim
descriptions as well.
Overview of PLMs for Token Classification.
To begin with the details of the proposed frame-
work,
DABERTa
, we present the working of PLMs
for the token classification task. PLMs such as
BERT (Devlin et al.,2019), DistilBERT (Sanh
et al.,2019), and RoBERTa (Liu et al.,2019) are
widely used for various downstream NLP tasks
owning to their strong contextual language repre-
sentation capabilities and fine-tuning ease. As the
input to these PLMs, each
ith
input text is first to-
kenized into a sequence of sub-word embeddings
XiRN×d
, where
N
is the maximum sequence
length and
d
is the feature dimension. Then a posi-
tional embedding vector
P Epos RN×d
is added
to the token embeddings in a pointwise fashion to
retain the positional information (Vaswani et al.,
2017).
The vector
ZiRN×d
, hence obtained, is fed
摘要:

EmpoweringtheFact-checkers!AutomaticIdenticationofClaimSpansonTwitterMeghaSundriyal1,AtharvaKulkarni1,VaibhavPulastya1,MdShadAkhtar1,TanmoyChakraborty21IIITDelhi,India,2IITDelhi,India{meghas,atharvak,vaibhav17271,shad.akhtar}@iiitd.ac.in,tanchak@ee.iitd.ac.inAbstractThewidespreaddiffusionofmedica...

展开>> 收起<<
Empowering the Fact-checkers Automatic Identification of Claim Spans on Twitter Megha Sundriyal1 Atharva Kulkarni1 Vaibhav Pulastya1 Md Shad Akhtar1 Tanmoy Chakraborty2.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:647.02KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注