seem to be two different applications, they share
similarities. The role of hyperlinks is to help a
reader to understand an Wiki article. Thus, "dif-
ficult to understand" concepts in the Wiki article
may be more likely to have hyperlinks. Therefore,
we hypothesize that large-scale hyperlink span in-
formation from Wiki can be advantageous for our
models of medical jargon extraction. Our results
show that models trained on WikiHyperlink span
datasets indeed substantially improved the perfor-
mance of MedJEx. Moreover, we also found that
such auxiliary learning improved six out of the
eight benchmark datasets of biomedical NER tasks.
To detect outlier homonymous terms such as
“shock”, we deployed an approach inspired by
masking probing (Petroni et al.,2019), a method
for evaluating linguistic knowledge of large-scale
pre-trained language models (PLMs). Meister et al.
(2022) suggests PLMs are beneficial for predicting
the reading time, with longer reading time indicates
difficult for indicating difficulty in understanding.
In our work, we propose a contextualized masked
language model (MLM) score feature to tackle the
homonym challenge. Note that models will recog-
nize the sense of a word or phrase using contextual
information. Since PLMs calculate the probability
of masked words in consideration of context, we
hypothesize that PLMs trained in the open-domain
corpus would predict poorly masked medical jar-
gon if senses are distributed differently between
the open domain and clinical domain corpora.
We conducted experiments on four state-of-the-
art PLMs, namely BERT (Devlin et al.,2019),
RoBERTa (Liu et al.,2019), BioClinicalBERT
(Alsentzer et al.,2019b) and BioBERT (Lee et al.,
2020). Experimental results show that when both
of the methods are combined, the medical jargon
extraction performance is improved by 2.44%p in
BERT, 2.42%p in RoBERTa, 1.56%p in BioClini-
calBERT, and 1.19%p in BioBERT.
Our contributions are as follows:
•
We
propose a novel NLP task
for identifying
medical jargon terms potentially difficult for
patients to comprehend from EHR notes.
•
We will release
MedJ
, an expert-curated
18K+ sentence dataset for the MedJEx task.
•
We introduce
MedJEx
, a medical jargon ex-
traction model. Herein, MedJEx was first
trained with the auxiliary WikiHyperlink span
dataset before being fine-tuned on the MedJ
dataset. It uses MLM score feature for
homonym resolution.
•
The experimental results show that training
on the Wiki’s hyperlink span datasets consis-
tently improved the performance of not only
MedJ but also six out of eight BioNER bench-
marks. In addition, our qualitative analyses
show that the MLM score can complement
the TF score for detecting the outlier jargon
terms.
2 Related Work
In principle, MedJEx is related to text simplifica-
tion (Kandula et al.,2010). None of the previ-
ous work (Abrahamsson et al.,2014;Qenam et al.,
2017;Nassar et al.,2019) identified terms that im-
portant for comprehension.
On the other hand, MedJEx is relevant to
BioNER, a task for identifying biomedical named
entities such as
disease
,
drug
, and
symptom
from medical text. There are several benchmark
corpora, including i2b2 2010 (Patrick and Li,
2010), ShARe/CLEF 2013 (Zuccon et al.,2013),
and MADE (Jagannatha et al.,2019), all of which
were developed solely based on clinical importance.
In contrast,
MedJ
is patient-centered, taking into
consideration of patients’ comprehension. Identi-
fying BioNER from medical documents has been
an active area of research. Earlier work such as
the MetaMap (Aronson,2001), used linguistic pat-
terns, either manually constructed or learned semi-
automatically, to map free text to external knowl-
edge resources such as UMLS (Lindberg et al.,
1993). The benchmark corpora have promoted su-
pervised machine learning approaches including
conditional random fields and deep learning ap-
proaches (Jagannatha et al.,2019).
Key phrase extraction in the medical domain is
another related task. It identifies important phrases
or clauses that represent topics (Hulth,2003). In
previous studies, key phrases were extracted using
features such as TF, word stickiness, and word cen-
trality (Saputra et al.,2018). Chen and Yu (2017)
proposed an unsupervised learning based method
to elicit important medical terms from EHR notes
using MetaMap (Demner-Fushman et al.,2017)
and various weighting features such as TextRank
(Mihalcea and Tarau,2004) and term familiarity
score (Zeng-Treitler et al.,2007). In another work,
Chen et al. (2017) proposed an adaptive distant su-