(Richardson et al.,1995). Such annotations may
permit useful downstream processing: For exam-
ple, in this work we use them to facilitate retrieval
of evidence relevant to a claim.
Specifically, we develop and evaluate a pipeline
to automatically identify and contextualize health-
related claims on social media, as we anticipate that
such a tool might be useful for moderators keen to
keep their communities free of potentially harmful
misinformation. With this use-case in mind, we
propose methods for automatically retrieving trust-
worthy published scientific evidence relevant to a
given claim made on social media, which may in
aggregate support or debunk a particular claim.
The contributions of this work are summarized
as follows. First, we introduce
RedHOT
: A
new dataset comprising
22,000
health-related Red-
dit posts across 24 medical conditions annotated
for claims, questions, and personal experiences.
Claims are additionally annotated with PIO ele-
ments. Second, we introduce the task of identifying
health-related claims on social media, extracting
the associated PIO elements, and then retrieving rel-
evant and trustworthy evidence to support or refute
such claims. Third, we propose
RedHOT
-DER, a
Dense Evidence Retriever trained with heuristically
derived supervision to retrieve medical literature
relevant to health-related claims made on social
media. We evaluate baseline models for the first
two steps on the
RedHOT
dataset and assess the
retrieval step with relevance judgments collected
from domain experts (medical doctors).
The Reddit posts we have collected are public
and typically made under anonymous pseudonyms,
but nonetheless these are health-related comments
and so inherently sensitive. To respect this, we
(a) notified all users in the dataset of their (poten-
tial) inclusion in this corpus, and provided oppor-
tunity to opt-out, and, (b) we do not release the
data directly, but rather a script to download an-
notated comments, so that individuals may choose
to remove their comments in the future. Further-
more, we consulted with our Institutional Review
Board (IRB) and confirmed that the initial collec-
tion and annotation of such data does not constitute
human subjects research. However, EACL review-
ers rightly pointed out that certain uses of this data
may be sensitive. Therefore, to access the collected
dataset we require researchers to self-attest that
they have obtained prior approval from their own
IRB regarding their intended use of the corpus.
2 The RedHOT Dataset
We have collected and manually annotated health
related posts from Reddit to support development
of language technologies which might, e.g., flag po-
tentially problematic claims for moderation. Reddit
is a social media platform that allows users to cre-
ate their own communities (subreddits) focused on
specific topics. Subreddits are often about niche
topics, and this permits in-depth discussion cater-
ing to a long tail of interests and experiences. No-
tably, subreddits exist for most common (and many
rare) medical conditions; we can therefore sample
posts from such communities for annotation.
2.1 Data Annotation
We decomposed data annotation into two stages,
performed in sequence. In the first, workers are
asked to demarcate spans of text corresponding to a
Claim
,
Personal Experience
, or
Question
. We
characterize these classes as follows (we provide
detailed annotation instructions in Appendix A):
Claim
suggests (explicitly or implicitly) a causal
relationship between an Intervention and an Out-
come (e.g., “ Icompletely cured my O”). Opera-
tionally, we are interested in identifying statements
that might reasonably be interpreted by the reader
as implying a causal link between an intervention
and outcome, as this may in turn influence their
perception regarding the efficacy of an interven-
tion for a particular condition and/or outcome (i.e.,
relationship between an Iand O).
Question
poses a direct question, e.g., “Is this
normal?”; “Should I increase my dosage?”.
Personal Experience
describes an individual’s
experience, for instance the trajectory of their con-
dition, or experiences with specific interventions.
This is a multi-label scheme: Spans can (and
often do) belong to more than one of the above
categories. For example, personal experiences can
often be read as implying a causal relationship.
Consider this example: “My doctor put me on Ifor
my P, and I am no longer experiencing O”. This
describes an individual treatment history, but could
also be read as implying that Iis a viable treat-
ment for P(and specifically for the outcome O).
Therefore, we would mark this as both a
Claim
and
a
Personal Experience
. By contrast, a general
statement asserting a causal relationship outside of
any personal context like “Ican cure O” is what