also generalizing across different health systems.
In addition, the exact definition of the pathologies
and their severity change can change depending on
the clinical use case. This makes fully supervised
approaches that rely on large labeled datasets ex-
pensive. Having few-shot capabilities allows us to
annotate a handful of cases and rapidly expand the
list of pathologies we can detect and classify. In ad-
dition, we can use our approach to generate pseudo
labels for rare pathologies and enrich our validation
and test sets for annotation by an in-house clinical
team. Lastly our approach can be extended to sup-
port patient search and define custom cohorts of
patients.
Our contributions in this work are the following:
(1)
We develop a novel loss function that extends
the vanilla prototypical networks and introduce a
regularization term that encourages tight cluster-
ing of examples near the class prototypes.
(2)
We
meta-train our models on a large labeled dataset
on shoulder MRI reports (single domain) and show
good performance on
4
diverse downstream classi-
fication tasks on radiology reports on knee, cervical
spine and chest. In addition to our internal datasets,
we show superior performance of our method on
13
public benchmarks over well-known methods
like Leopard. Our model is very simple to train,
easy to deploy unlike gradient based methods and
just requires a few additional lines of codes to a
vanilla prototypical network trainer.
(3)
We deploy
our system and use the dataset statistics to inform
out-of-distribution (OOD) cases.
2 Related Work
There are three common approaches to meta-
learning: metric-based, model-based, and
optimization-based. Model agnostic meta-learning
(MAML) (Finn et al.,2017) is an optimization-
based approach to meta-learning which is agnostic
to the model architecture and task specification.
Over the years, several variants of the method have
shown that it is an ideal candidate for learning
to learn from diverse tasks (Nichol et al.,2018;
Raghu et al.,2019;Bansal et al.,2020b). However,
to solve a new task, MAML type methods would
require training a new classification layer for the
task. In contrast, metric-based approaches, such
as prototypical networks (Vinyals et al.,2016;
Snell et al.,2017), being non-parametric in nature
can handle varied number of classes and thus
can be easily deployed. Given the simple nature
of prototypical networks, a lot of work has been
done to improve them (Allen et al.,2019;Zhang
et al.,2019;Ding et al.,2022;Wang et al.,2021).
Prototypical networks usually construct a class
prototype (mean) using the support vectors to
describe the class and, given a query example,
assigns the class whose class prototype is closest to
the query vector. In (Allen et al.,2019), the authors
use a mixture of Gaussians to describe the class
conditional distribution and in (Zhang et al.,2019);
the authors try to model an unknown general class
distribution. In (Ding et al.,2022), the authors
use spherical Gaussians and a KL-divergence type
function between the Gaussians to compute the
function
d
in equation 2. However, the function
used by the above authors is not a true metric, i.e.
does not satisfy the triangle inequality. Triangle
inequality is implicitly important since we use this
metric as a form of distance which we optimize, so
it makes sense to use a true metric. In this work
we replace it by the Wasserstein distance which is
a true metric and add in a regularization term that
encourages the
L2
norm of the covariance matrices
to be small, encouraging the class examples to be
clustered close to the centroid. One of our main
reasons to work with Gaussians is due to the closed
form formula of the Wasserstein distance.
Few shot learning (FSL) in the medical domain
has been mostly focused in computer vision (Singh
et al.,2021). There are only a few works that have
applied FSL in medical NLP (Ge et al.,2022) but
most of those works have only focused on different
tasks on MIMIC-III (Johnson et al.,2016) which
is a single domain dataset (patients from ICU and
one hospital system). To the best of our knowledge,
ours is the first study to successfully apply FSL on
a diverse set of medical datasets (diverse in terms
of tasks and patient populations).
3 Datasets
All our internal datasets are MRI radiology re-
ports detailing various pathologies in different body
parts. Our models are meta-trained on a dataset of
shoulder pathologies which is collected from 74
unique and de-identified institutions in the United
States. 60 labels are chosen for training and 20
novel labels are chosen for validation. The number
of training labels is similar to some well-known
image datasets (Lake et al.,2015b;Vinyals et al.,
2016;Wah et al.,2011). This diverse dataset has
a rich label space detailing multiple structures in