
Figure 2: Concept mapping sometimes requires consid-
ering the entire sentence, rather than mentions.
Figure 3: In SapBERT’s latent space, none of the near-
est neighbors of "apyrexial" (i.e. fever-free) happen to
share the word’s meaning. Instead, the alpha-privative
was over-indexed by the model, among other biases.
2.3 Challenges with existing models
BIOCOM and KRISSBERT propose to disam-
biguate mentions of biomedical concepts using con-
textual information. Ambiguous notations requir-
ing context to disambiguate can indeed be found
in clinical notes. However, using these contextual
models for inference is only possible after identify-
ing text spans denoting such concepts in the input
text. This requires introducing a mention detec-
tion model, which comes with its own challenges
and errors. Worse, reducing mentions to text spans
is not always possible, as concepts are sometimes
alluded to in a diffused way (see Fig. 2).
However, models which do not use in-context
mentions usually learn representations of lower
quality than in-context models. By pairing syn-
onyms with a significant word or token overlap
with each other, these models isolate concepts con-
taining rare words or tokens early in the training, in
a way that is rarely semantic (see Fig. 3). Indeed,
the training loss of contrastive models only requires
placing all mentions of a particular concept close
to each other, but it does not provide strong guar-
antees about the relative location of different but
similar concepts in the latent space.
While hierarchical relationships from medical
ontologies have sometimes been used to produce
more meaningful concept embeddings (Zhang
et al.,2021), this is however not sufficient to over-
come the issues stated above, because relatedness
is not always possible to encode hierarchically.
3 Pre-training methodology
To produce representations of biomedical concepts
that overcome the limitations described above, we
modified the way the positive pairs are constructed.
Like the prior works cited in §2.2, we start by es-
tablishing a list of names for each UMLS concept.
However, unlike previous works, we do not use
these names directly to form positive pairs. Instead,
we construct pairs formed with, on the one side, a
randomly selected name for a given concept and,
on the other side, a definition or description for that
concept (see Fig. 1).
We hypothesize that a definition or description
of a given concept provides a more robust semantic
anchor for this concept than another of its names.
As mentioned before, names in the medical domain
can be quite opaque, and do not always offer use-
ful insights into what exactly is being referred to.
By inducing representational similarity between a
concept name and its known definitions, we aim
to distill their respective knowledge into the repre-
sentations of the concept names themselves. This
key idea influenced some design choices for our
experimental setup, including the choice of the data
curation process, model initialization, and training
procedure (as described in this section).
3.1 Curating definitions and descriptions
Around 5% of the concepts found in UMLS are
clarified by one or more definitions. These defi-
nitions aim to provide the most relevant pieces of
information about a given concept to the practi-
tioners reading them, and we can therefore include
them directly in our training set (see Fig. 1).
This is however insufficient, since most concepts
have no matching definition in UMLS. Addition-
ally, definitions might not always cover all the rele-
vant aspects of a given concept, and the particular
aspects they cover vary from one concept to an-
other. Consequently, pairing concept names and
their definitions, alone, cannot be expected to pro-
duce satisfactory results for all UMLS concepts.
We therefore supplement the definitions already
available in UMLS with automatically generated
textual descriptions, based on the structured in-
formation contained in the ontology and its 90M
concept-to-concept relationships.
These concept descriptions are constructed using
the following template: “[more-generic-concept]
which [
has-relationship-with
] [related-concept]”
(e.g. "drug which may treat headache").