fying content from both sources which describe the
same
scientific finding. In other words, to answer
relevant questions about and analyze changes in
scientific information at scale, one must first be
able to point to which original information is being
communicated in a new way.
To enable automated analysis of science com-
munication, this work offers the following
con-
tributions
(marked by
C
). First, we present the
SCIENTIFIC PARAPHRASE AND INFORMATION
CHANGE DATASET dataset (SPICED), a manually
annotated dataset of paired scientific findings from
news articles, tweets, and scientific papers (
C1
,
§3). SPICED has the following merits: (1) exist-
ing datasets focus purely on semantic similarity,
while SPICED focuses on differences in the infor-
mation communicated in scientific findings; (2) sci-
entific text datasets tend to focus solely on titles or
paper abstracts, while SPICED includes sentences
extracted from the full-text of papers and news arti-
cles; (3) SPICED is largely multi-domain, covering
the 4 broad scientific fields that get the most media
attention (namely: medicine, biology, computer
science, and psychology) and includes data from
the whole science communication pipeline, from
research articles to science news and social media
discussions.
In addition to extensively benchmarking the per-
formance of current models on SPICED (
C2
, §4),
we demonstrate that the dataset enables multiple
downstream applications. In particular, we demon-
strate how models trained on SPICED improve zero-
shot performance on the task of sentence-level
evidence retrieval for verifying real-world claims
about scientific topics (
C3
, §5), and perform an
applied analysis on unlabelled tweets and news ar-
ticles where we show (1) media tend to exaggerate
findings in the limitations sections of papers; (2)
press releases and SciTech tend to have less infor-
mational change than general news outlets; and (3)
organizations’ Twitter accounts tend to discuss sci-
ence more faithfully than verified users on Twitter
and users with more followers (C4, §6).
2 Related Work
The analysis of scientific communication directly
relates to fact checking, scientific language anal-
ysis, and semantic textual similarity. We briefly
highlight our connections to these.
Fact Checking
Automatic fact checking is con-
cerned with verifying whether or not a given claim
is true, and has been studied extensively in mul-
tiple domains (Thorne et al.,2018;Augenstein
et al.,2019) including science (Wadden et al.,2020;
Boissonnet et al.,2022;Wright et al.,2022). Fact
checking focuses on a specific type of information
change, namely veracity. Additionally, the task gen-
erally assumes access to pre-existing knowledge re-
sources, such as Wikipedia or PubMed, from which
evidence can be retrieved that either supports or re-
futes a given claim. Our task is concerned with a
more general type of information change beyond
categorical falsehood and is a required task to com-
plete prior to performing any kind of fact check.
Scientific Language Analysis
Automating tasks
beneficial for understanding changes in scientific
information between the published literature and
media is a growing area of research (Wright and
Augenstein,2021b;Pei and Jurgens,2021;Bois-
sonnet et al.,2022;Dai et al.,2020;August et al.,
2020b;Tan and Lee,2014;Vadapalli et al.,2018;
August et al.,2020a;Ginev and Miller,2020). The
three tasks most related to our work are under-
standing writing strategies for science communi-
cation (August et al.,2020b), detecting changes
in certainty (Pei and Jurgens,2021), and detecting
changes in causal claim strength i.e. exaggera-
tion (Wright and Augenstein,2021b). However,
studying these requires access to paired scientific
findings. To be able to do so at scale will require
the ability to pair such findings automatically.
Semantic Similarity
The topic of semantic sim-
ilarity is well-studied in NLP. Several datasets ex-
ist with explicit similarity labels, many of which
come from SemEval STS shared tasks (e.g., Cer
et al.,2017) and paraphrasing datasets (Ganitke-
vitch et al.,2013). It is possible to build unla-
belled datasets of semantic similarity automatically,
which is the main method that has been used for
scientific texts (Cohan et al.,2020;Lo et al.,2020).
However, such datasets fail to capture more subtle
aspects of similarity, particularly when the focus
is solely on the scientific findings conveyed by a
sentence (see Appendix A). And as we will show,
approaches based on these datasets are insufficient
for the task we are concerned with in this work,
motivating the need for a new resource.
3 SPICED
We introduce SPICED, a new large-scale dataset of
scientific findings paired with how they are commu-