
Our contributions are as follows:
•
we mine a large-scale collection of 330,000
tweets paired with fact-checking articles;
•
we propose two distant supervision strategies
to label the CrowdChecked dataset;
•
we propose a novel method to learn from this
data using modified self-adaptive training;
•
we demonstrate sizable improvements over
the state of the art on a standard test set.
2 Our Dataset: CrowdChecked
2.1 Dataset Collection
We use Snopes as our target fact-checking web-
site, due to its popularity among both Internet users
and researchers (Popat et al.,2016;Hanselowski
et al.,2019;Augenstein et al.,2019;Tchechmed-
jiev et al.,2019). We further use Twitter as the
source for collecting user messages, which could
contain claims and fact-checks of these claims.
Our data collection setup is similar to the one
in (Vo and Lee,2019). First, we form a query to
select tweets that contain a link to a fact-check
from Snopes (url:snopes.com/fact-check/ ), which
is either a reply or a quote tweet, and not a retweet.
An example result from the query is shown in Fig-
ure 1, where the tweet from the crowd fact-checker
contains a link to a fact-checking article. We then
assess its relevance to the claim (if any) made in
the first tweet (the root of the conversation) and the
last reply in order to obtain tweet–verified article
pairs. We analyze in more detail the conversational
structure of these threads in Section 2.2.
We collected all tweets matching our query from
October 2017 till October 2021, obtaining a to-
tal of 482,736 unique hits. We further collected
148,503 reply tweets and 204,250 conversation
(root) tweets.
1
Finally, we filter out malformed
pairs, i.e., tweets linking to themselves, empty
tweets, non-English ones, such with no resolved
URLs in the Twitter object (‘entities’), with broken
links to the fact-checking website, and all tweets
in the CheckThat ’21 dataset. We ended up with
332,660 unique tweet–article pairs (shown in first
row in Table 5), 316,564 unique tweets, and 10,340
fact-checking articles from Snopes they point to.
1
The sum of the unique replies and of the conversation
tweets is not equal to the total number of fact-checking tweets,
as more than one tweet might reply to the same comment.
User Post w/ Claim
: Sen. Mitch McConnell: “As recently
as October, now-President Biden said you can’t legislate by
executive action unless you are a dictator. Well, in one week,
he signed more than 30 unilateral actions.” [URL] — Forbes
(@Forbes) January 28, 2021
Verified Claims and their Corresponding Articles
(1)
When he was still a candidate for the presidency in
October 2020, U.S. President Joe Biden said,
“You can’t legislate by executive order unless
you’re a dictator.” http://snopes.com/fact-check/
biden-executive-order-dictator/
3
(2)
U.S. Sen. Mitch McConnell said he would not
participate in 2020 election debates that include
female moderators. http://snopes.com/fact-check/
mitch-mcconnell-debate-female/
7
Table 1: Illustrative examples for the task of detecting
previously fact-checked claims. The post contains a
claim (related to legislation and dictatorship), the Ver-
ified Claims are part of a search collection of previ-
ous fact-checks. In row (1), the fact-check is a correct
match for the claim made in the tweet (3), whereas in
(2), the claim still discusses Sen. Mitch McConnell, but
it is a different claim (7), and thus this is an incorrect
pair.
More detail about the process of collecting fact-
checking articles as well as detailed statistics are
given in Appendix B.1 and on Figure 2.
2.2 Tweet Collection
(Conversation Structure) It is important to note that
the ‘fact-checking’ tweet can be part of a multiple-
turn conversational thread, therefore taking the post
that it replies to (previous turn), does not always
express a claim which the current tweet targets.
In order to better understand this, we performed
manual analysis of some conversational threads.
Conversational threads in Twitter are organized as
shown Figure 1: the root is the first comment, then
there can be a long discussion, followed by a fact-
checking comment (i.e., the one with a link to a fact-
checking article on Snopes). In our analysis, we
identify four patterns: (i) the current tweet verifies a
claim in the tweet it replies to, (ii) the tweet verifies
the root of the conversation, (iii) the tweet does not
verify any claim in the chain (a common scenario),
and (iv) the fact-check targets a claim that was not
expressed in the root or in the closest tweet (this
was in very few cases). This analysis suggests that
for the task of detecting previously fact-checked
claims, it is sufficient to collect the triplet of the
fact-checking tweet, the root of the conversation
(conversation), and the tweet that the target tweet
is replying to (reply).