medical case reports, making it challenging to train
NLP models for pharmacovigilance events extrac-
tion since its annotations are in the document level
and the actual annotated events are sparse.
In this work we introduce a new annotated cor-
pus, PHEE, for adverse and potential therapeutic
effect event extraction for pharmacovigilance study.
The dataset consists of nearly 5,000 sentences ex-
tracted from MEDLINE case reports, and each sen-
tence features two levels of annotations. With re-
spect to coarse-grained annotations, each sentence
is annotated with the event trigger word/phrase,
event type and text spans indicating the event’s
associated subject, treatment, and effect. In a
fine-grained annotation pass, further details are
marked, such as patient demographic information,
the context information about the treatments in-
cluding drug dosage levels, administration routes,
frequency, and attributes relating to events. An
example annotation is shown in Figure 1(c).
Using PHEE as the benchmark, we conduct
thorough experiments to assess the state-of-the-
art NLP technologies for the pharmacovigilance-
related event extraction task. We use sequence
labelling and (both extractive and generative) QA-
based methods as baselines and evaluate event trig-
ger extraction and argument extraction. The ex-
tractive QA method performs best for trigger ex-
traction with the exact match F1 score of 70.09%,
while the generative QA method achieves the best
exact match F1 score of 68.60% and 76.16% for
the main argument and sub-argument extraction,
respectively. Further analysis shows that current
models perform well on average cases but often
fail on more complex examples.
Our contributions can be summarised as fol-
lows:
1) We introduce PHEE, a new pharma-
covigilance dataset containing over 5,000 finely
annotated events from public medical case reports.
To the best of our knowledge, this is the largest
and most comprehensively annotated dataset of
this type to date. 2) We collect hierarchical anno-
tations to provide granular information about pa-
tients and conditions in addition to coarse-grained
event information. 3) We conduct thorough ex-
periments to compare current state-of-the-art ap-
proaches for biomedical event extraction, demon-
strating the strength and weaknesses of current tech-
nologies and use this to highlight challenges for
future research in this area.
2 Related Work
Pharmacovigilance Related Corpora
Prior
pharmacovigilance-related corpora mainly has
focused on annotation of entities (e.g., drugs,
diseases, medications) and binary relations
between them, namely, drug-ADE relations
(Gurulingappa et al.,2012a;Patki et al.,2014;
Ginn et al.,2014), disorder-treatment relations
(Rosario and Hearst,2004;Roberts et al.,2009;
Uzuner et al.,2011;Van Mulligen et al.,2012),
and drug-drug interactions (Segura-Bedmar et al.,
2011;Boyce et al.,2012;Rubrichi and Quaglini,
2012;Herrero-Zazo et al.,2013). More recent
open challenges, including the 2018 n2c2 shared
task (Henry et al.,2020) and MADE1.0 challenge
(Jagannatha et al.,2019), have considered annotat-
ing additional relation types, such as drug-attribute
and drug-reason relations, but they are still binary
relationships.
Thompson et al. (2018) introduced the PHAE-
DRA corpus, extending the drug-ADE annotations
to pharmacovigilance events. Compared to corpora
that only annotate simple drug-ADE relations—
referred to as AE events in PHAEDRA—they fur-
ther annotate three additional relations, namely the
Potential Therapeutic Effect (PTE) event which
refers to the potential beneficial effects of drugs,
the Combination and the Drug-Drug Interaction
event which indicates multiple drug use and inter-
actions between administered drugs, respectively.
In addition, PHAEDRA includes the subject as a
type of named entities (NEs) and annotates three
types event attributes, i.e., negated, speculated and
manner. However, some key informative details are
still missing in PHAEDRA. As the NE annotation
of PHAEDRA is usually a single noun or a short
noun phrase, detailed information about the subject
(such as age and gender), and of the medication
(e.g., dosage and frequency) is not captured.
We set out to annotate a larger corpus with more
detailed information to facilitate training of phar-
macovigilance event extraction models. We build
on existing corpora (PHAEDRA and ADE). The
ADE corpus comprises
∼
3,000 MEDLINE case
reports and annotations on
∼
4,000 sentences indi-
cating adverse effects, but their annotations only
involve drugs, dosages and adverse effects, and
lack sufficient event details of interest. The PHAE-
DRA corpus reuses 227 abstracts from ADE and
integrates an additional 370 abstracts (from other
corpora and some novel entries). However, the