RedHOT A Corpus of Annotated Medical Questions Experiences and Claims on Social Media Somin WadhwayVivek Khetan3Silvio AmiryByron C. Wallacey

2025-04-26 0 0 948.3KB 19 页 10玖币

侵权投诉

RedHOT: A Corpus of Annotated Medical Questions, Experiences, and

Claims on Social Media

Somin Wadhwa†Vivek Khetan3Silvio Amir†Byron C. Wallace†

Northeastern University†Accenture AI Labs3

{wadhwa.s,s.amir,b.wallace}@northeastern.edu

vivek.a.khetan@accenture.com

Abstract

We present Reddit Health Online Talk

(RedHOT), a corpus of 22,000 richly anno-

tated social media posts from Reddit spanning

24 health conditions. Annotations include de-

marcations of spans corresponding to medical

claims, personal experiences, and questions.

We collect additional granular annotations

on identiﬁed claims. Speciﬁcally, we mark

snippets that describe patient Populations,

Interventions, and Outcomes (PIO elements)

within these. Using this corpus, we introduce

the task of retrieving trustworthy evidence rel-

evant to a given claim made on social media.

We propose a new method to automatically

derive (noisy) supervision for this task which

we use to train a dense retrieval model; this

outperforms baseline models. Manual evalu-

ation of retrieval results performed by medi-

cal doctors indicate that while our system per-

formance is promising, there is considerable

room for improvement. We release all anno-

tations collected (and scripts to assemble the

dataset), and all code necessary to reproduce

the results in this paper at: https://sominw.

com/redhot.

1 Introduction

Social media platforms such as Reddit provide in-

dividuals places to discuss (potentially rare) med-

ical conditions that affect them. This allows peo-

ple to communicate with others who share in their

condition, exchanging information about symptom

trajectories, personal experiences, and treatment

options. Such communities can provide support

(Biyani et al.,2014) and access to information

about rare conditions which may otherwise be dif-

ﬁcult to ﬁnd (Glenn,2015).

However, the largely unvetted nature of social

media platforms make them vulnerable to mis and

disinformation (Swire-Thompson and Lazer,2019).

An illustrative and timely example is the idea that

consuming bleach might be a viable treatment for

r/ibs

r/Psychosis

r/Costochondritis

I just ordered Metamucil bc I read

psyllium may be better for IBS-D.

Or maybe the ﬁber is what is making

me go more? Deﬁnitely produces

more gas.

Surprising I'm seeing research articles that

ketamine doesn't increase psychosis risk or

induce psychosis past the duration of the drug. I

only took a brief look into it. Has anyone here had

ketamine induced psychosis? What is r/psychosis

experience with ketamine?

Ive had costo for a while, usually comes and

goes. Done all the heart / lung checks all clear.

Ive just recovered covid and what I'm left with is

chest pain / pressure. I mean it could be a costo

ﬂare up which makes sense, but also been

reading about myocarditis after covid and I’m

worried.

Figure 1: Examples of health-related Reddit posts an-

notated for populations, interventions, and outcomes.

COVID-19,

which quickly gained traction on so-

cial media. All misinformation can be dangerous,

but medical misinformation poses unique risks to

public health, especially as individuals increasingly

turn to social media to inform personal health deci-

sions (Nobles et al.,2018;Barua et al.,2020).

In this paper, we introduce

RedHOT

: an anno-

tated dataset of health-related claims, questions,

and personal experiences posted to Reddit. This

dataset can support development of a wide range

of models for processing health-related posts from

social media. Unlike existing health-related social

media corpora,

RedHOT

: (a) Covers a broad range

of health topics (e.g., not just COVID-19), and,

(b) Comprises “natural” claims collected from real

health-related fora (along with annotated questions

and personal experiences). Furthermore, we have

collected granular annotations on claims, demarcat-

ing descriptions of the Population (e.g., diabetics),

Interventions, and Outcomes, i.e., the PIO elements

1https://www.theguardian.com/world/2020/sep/

19/bleach-miracle-cure-amazon-covid

arXiv:2210.06331v3 [cs.CL] 7 Feb 2023

(Richardson et al.,1995). Such annotations may

permit useful downstream processing: For exam-

ple, in this work we use them to facilitate retrieval

of evidence relevant to a claim.

Speciﬁcally, we develop and evaluate a pipeline

to automatically identify and contextualize health-

related claims on social media, as we anticipate that

such a tool might be useful for moderators keen to

keep their communities free of potentially harmful

misinformation. With this use-case in mind, we

propose methods for automatically retrieving trust-

worthy published scientiﬁc evidence relevant to a

given claim made on social media, which may in

aggregate support or debunk a particular claim.

The contributions of this work are summarized

as follows. First, we introduce

RedHOT

: A

new dataset comprising

22,000

health-related Red-

dit posts across 24 medical conditions annotated

for claims, questions, and personal experiences.

Claims are additionally annotated with PIO ele-

ments. Second, we introduce the task of identifying

health-related claims on social media, extracting

the associated PIO elements, and then retrieving rel-

evant and trustworthy evidence to support or refute

such claims. Third, we propose

RedHOT

-DER, a

Dense Evidence Retriever trained with heuristically

derived supervision to retrieve medical literature

relevant to health-related claims made on social

media. We evaluate baseline models for the ﬁrst

two steps on the

RedHOT

dataset and assess the

retrieval step with relevance judgments collected

from domain experts (medical doctors).

The Reddit posts we have collected are public

and typically made under anonymous pseudonyms,

but nonetheless these are health-related comments

and so inherently sensitive. To respect this, we

(a) notiﬁed all users in the dataset of their (poten-

tial) inclusion in this corpus, and provided oppor-

tunity to opt-out, and, (b) we do not release the

data directly, but rather a script to download an-

notated comments, so that individuals may choose

to remove their comments in the future. Further-

more, we consulted with our Institutional Review

Board (IRB) and conﬁrmed that the initial collec-

tion and annotation of such data does not constitute

human subjects research. However, EACL review-

ers rightly pointed out that certain uses of this data

may be sensitive. Therefore, to access the collected

dataset we require researchers to self-attest that

they have obtained prior approval from their own

IRB regarding their intended use of the corpus.

2 The RedHOT Dataset

We have collected and manually annotated health

related posts from Reddit to support development

of language technologies which might, e.g., ﬂag po-

tentially problematic claims for moderation. Reddit

is a social media platform that allows users to cre-

ate their own communities (subreddits) focused on

speciﬁc topics. Subreddits are often about niche

topics, and this permits in-depth discussion cater-

ing to a long tail of interests and experiences. No-

tably, subreddits exist for most common (and many

rare) medical conditions; we can therefore sample

posts from such communities for annotation.

2.1 Data Annotation

We decomposed data annotation into two stages,

performed in sequence. In the ﬁrst, workers are

asked to demarcate spans of text corresponding to a

Claim

Personal Experience

, or

Question

. We

characterize these classes as follows (we provide

detailed annotation instructions in Appendix A):

Claim

suggests (explicitly or implicitly) a causal

relationship between an Intervention and an Out-

come (e.g., “ Icompletely cured my O”). Opera-

tionally, we are interested in identifying statements

that might reasonably be interpreted by the reader

as implying a causal link between an intervention

and outcome, as this may in turn inﬂuence their

perception regarding the efﬁcacy of an interven-

tion for a particular condition and/or outcome (i.e.,

relationship between an Iand O).

Question

poses a direct question, e.g., “Is this

normal?”; “Should I increase my dosage?”.

Personal Experience

describes an individual’s

experience, for instance the trajectory of their con-

dition, or experiences with speciﬁc interventions.

This is a multi-label scheme: Spans can (and

often do) belong to more than one of the above

categories. For example, personal experiences can

often be read as implying a causal relationship.

Consider this example: “My doctor put me on Ifor

my P, and I am no longer experiencing O”. This

describes an individual treatment history, but could

also be read as implying that Iis a viable treat-

ment for P(and speciﬁcally for the outcome O).

Therefore, we would mark this as both a

Claim

and

Personal Experience

. By contrast, a general

statement asserting a causal relationship outside of

any personal context like “Ican cure O” is what

Reddit post Span labels PIO elements from claims

I’ve seen a bunch of posts on here from people

who say that glycopyrrolate suddenly isn’t work-

ing anymore for hyperhidrosis. I’m one of those

person who has been facing this for a while now.

Just wondering if anyone ﬁxed it? Can’t really

ask my GP about it since he didn’t even know

the meds existed. He just prescribed them for

me when I asked for it

Claim:

I’ve seen a bunch of posts on

here from people who say that gly-

copyrrolate suddenly isn’t working

anymore for Hyperhidrosis

Question:

Just wondering if anyone

ﬁxed it?

Phyperhidrosis

Iglycopyrrolate

so i recently read that adderall can trigger a psy-

chotic break

i was prescribed adderall years

ago for my adhd but now i just have constant

hallucination episodes. anyone else experience

adderall induced psychosis?

Claim:

so i recently read that adder-

all can trigger a psychotic break

Personal Experience:

i was pre-

scribed adderall years ago for my

adhd but now i just have constant hal-

lucination episodes

Question:

anyone else experience

adderall induced psychosis?

Padhd

Iadderall

Ohallucinations

I’ve had costochondritis for a while, usually

comes and goes. Done all the heart/lung checks

all clear. I’ve just recovered covid and what I’m

left with is chest pain/pressure. I mean it could

be a costo ﬂare up which makes sense, but also

been reading about myocarditis after covid and

I’m worried, how can I tell which is which?

Claim:

been reading about my-

ocarditis after covid

Personal Experience:

I’m left

with is chest pain/pressure

Question:

how can I tell which is

which?

Pcostochondritis

Icovid

myocarditis, chest-

pain

Table 1: Example annotations, which include: extracted spans (phase 1), and spans describing Populations,

Interventions, and Outcomes — PIO elements — within them (phase 2). We collect the latter only for claims.

we will refer to as a “pure claim”, meaning it ex-

clusively belongs to the Claim category.

In the second stage, workers are asked to further

annotate “pure claim” instances by marking spans

within them that correspond to the Populations,

Interventions/Comparators,

Outcomes (the PIO

elements) associated with the claim.

2.2 Crowdsourcing Annotations

We hired crowdworkers to perform the above anno-

tation tasks on Amazon Mechanical Turk (AMT).

To estimate required annotation time and determine

fair pay rates, we ran an internal pilot with two PhD

students (both broadly familiar with this research

area) on 100 samples.

To gauge quality and recruit

workers from AMT, we ran two pilot experiments

in which we collected sentence-level annotations

on posts sampled from three medical populations

(i.e., subreddits), comprising ∼6,000 posts in all.

We required all workers have an overall job ap-

proval rate of

≥

. Based on an initial set of

AMT annotations we re-hired only workers who

This is the standard PICO framework, but we collapse

Interventions and Comparators into the Intervention category,

as the distinction is arbitrary.

We consulted with an Institutional Review Board (IRB)

to conﬁrm that this annotation work did not constitute human

subjects research.

Based on the estimate from our pilot experiments, payrate

for AMT workers was ﬁxed to US $9 per hour for stage-

1 annotations and US $11 per hour for stage-2 annotations,

irrespective of geographic location.

Fliess κP R F1

Questions 0.86 0.85 0.82 0.84

Claims 0.69 0.63 0.53 0.58

Experiences 0.71 0.78 0.69 0.73

POP 0.92 0.94 0.91 0.92

INT 0.74 0.76 0.70 0.73

OUT 0.78 0.73 0.68 0.70

Table 2: Token-wise label agreement among experts

measured by Fleiss κon a subset of data. We further

compute precision, recall, and F1 scores for “aggre-

gated” labels by evaluating them against unioned “in-

house” expert labels.

reliably followed annotation instructions (details

in Appendix A), and we actively recruited the top

workers to continue on with increased pay. We

obtained annotations from at least three workers

for each post, allowing for robust inference of ref-

erence labels. Recruited workers were also paid

periodic bonuses (equivalent to two hours of pay)

based on the quality of their annotated samples.

2.3 Quality Validation

To evaluate annotation quality we calculate token-

wise label agreement between annotators, and

amongst ourselves. We emphasize here that token-

level

for sequences is quite strict and disagree-

ments often reﬂect where annotators decide to mark

Ketamine and Psychosis History:

Antidepressant Efﬁcacy and

Psychotomimetic Effects Postinfusion

Abstract: Because of a theoretical risk of

exacerbating psychosis in predisposed patients,

subjects with current psychotic symptoms or a

past history of psychosis are typically excluded

from ketamine trials.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore

magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea

commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla

pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est

laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore

magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea

commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla

pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est

laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore

magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea

commodo consequat. Duis aute irure

—

+dj

dj,l

r/Psychosis

Surprising I'm seeing research articles that

ketamine doesn't increase psychosis risk or

induce psychosis past the duration of the drug. I

only took a brief look into it. Has anyone here had

ketamine induced psychosis? What is r/psychosis

experience with ketamine?

Has anyone here had ketamine induced psychosis?

What is r/psychosis experience with ketamine?

Questions

Personal experiences

I’m seeing research articles that ketamine doesn’t

increase psychosis risk or induce psychosis.

Claims

None

(A) Extract questions,

experiences, and claims

(B) Extract PICO elements

psychosis

Population

Interventions

ketamine

Outcomes

psychosis

Figure 2: Examples portraying potential use cases of our corpus. We showcase three distinct tasks, to be performed

in sequence. The ﬁrst (A) entails extracting spans corresponding to claims (highlighted in bold) from a given

Reddit post. The second step (B) is to identify the PICO elements associated with each claim. In the ﬁnal step (C),

we use the outputs of the ﬁrst two models with the original post to obtain a dense representation, enabling us to

retrieve relevant evidence from a large dataset of trusted medical evidence (e.g., PubMed).

span boundaries. Despite this, for the ﬁrst stage

agreement (Fleiss

) on labeled questions, expe-

riences, and claims was

0.62

, and for the second

stage

0.55

. We consider this moderately strong

agreement, in line with agreement reported for re-

lated annotation tasks in the literature (Nye et al.,

2018;Deléger et al.,2012). To quantify this and

further gauge the quality of collected annotations,

we run a few additional analyses.

As previously stated, prior to collecting annota-

tions on Amazon MTurk, we (the authors) anno-

tated a subset of data (100 samples/stage) internally

to assess task difﬁculty and to estimate the time re-

quired for annotation. As an additional quality

check, we use these annotations to calculate token-

wise label agreement. Table 2reports the results;

while there remains some discrepancy owing to

the inherent complexity of the task, there is higher

agreement between the us than between workers.

Each of these samples was also annotated by

three workers. We aggregate these labels using

majority-vote and compute token-wise precision-

recall of these aggregated labels against the refer-

ence “in-house” labels (Table 2). We report the

same metrics per annotator evaluated against ag-

gregated MTurk labels in Table 9(Appendix B).

Despite moderate agreement between annotators,

aggregated labels agree comparatively well with

the “expert” consensus, indicating that while in-

dividual worker annotations are somewhat noisy,

aggregated annotations are reasonably robust.

2.4 Dataset Details

Table 1provides illustrative samples from

RedHOT

and Table 8provides some descriptive

statistics along with examples of included health

populations. We broadly characterize populations

(conditions) as Very Common,Common or Rare,

and sought a mix of these. This was not the only at-

tribute that informed which conditions we selected

for inclusion in our dataset, however. For example,

we wanted a mix of populations with respect to vol-

ume of online activity (e.g., the Diabetes subreddit

has over

60k

active visitors; Lupus has

). We

also wanted to include both chronic and treatable

conditions (e.g., Narcolepsy is a rare and chronic

condition, while Gout is common and treatable),

and mental and physical disorders (e.g., ADHD,

Rheumatoid Arthritis). Another consideration was

whether a condition can be self-diagnosed or re-

quires professional assessment (e.g., Bulimia is

usually self-diagnosable but can potentially be life-

threatening; Gastroparesis is chronic but requires a

professional medical diagnosis).

The number of claims across different categories

of health populations are far outnumbered by ques-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RedHOT:ACorpusofAnnotatedMedicalQuestions,Experiences,andClaimsonSocialMediaSominWadhwayVivekKhetan3SilvioAmiryByronC.WallaceyNortheasternUniversityyAccentureAILabs3{wadhwa.s,s.amir,b.wallace}@northeastern.eduvivek.a.khetan@accenture.comAbstractWepresentRedditHealthOnlineTalk(RedHOT),acorpusof22,000...

展开>> 收起<<

RedHOT A Corpus of Annotated Medical Questions Experiences and Claims on Social Media Somin WadhwayVivek Khetan3Silvio AmiryByron C. Wallacey.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

RedHOT A Corpus of Annotated Medical Questions Experiences and Claims on Social Media Somin WadhwayVivek Khetan3Silvio AmiryByron C. Wallacey

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: