
Few-Shot Anaphora Resolution in Scientific Protocols via
Mixtures of In-Context Experts
Nghia T. Le, Fan Bai, Alan Ritter
School of Interactive Computing
Georgia Institute of Technology
{nle18,fan.bai,alan.ritter}@cc.gatech.edu
Abstract
Anaphora resolution is an important task,
which traditionally has required costly super-
vised training datasets for each new language,
text genre, and domain. Meanwhile, prompt-
ing large language models with a few in-
context examples has emerged as a promis-
ing approach to reduce labeling costs, how-
ever there are a number of challenges in ap-
plying in-context learning to resolve anaphora.
In this paper, we present MICE (Mixtures of
In-Context Experts), which we demonstrate
is effective for few-shot anaphora resolution
in the domain of scientific protocols (Tamari
et al.,2021). Given only a handful of train-
ing examples, MICE combines the predictions
of hundreds of in-context experts, yielding a
30% increase in F1score over a competitive
prompt retrieval baseline. Furthermore, we
show MICE can be used to train compact stu-
dent models without sacrificing performance.
As far as we are aware, this is the first work
to present experimental results demonstrating
the effectiveness of in-context learning on the
task of few-shot anaphora resolution in scien-
tific protocols.1
1 Introduction
Prompting large language models (LMs) with in-
context demonstrations has enabled surprisingly
effective few-shot learning (Brown et al.,2020).
However, more complex linguistic annotations over
paragraph-length inputs, such as anaphora and
coreference, have proven challenging (Yang et al.,
2022). Prompting language models with demon-
strations of anaphora and their corresponding an-
tecedents requires encoding long sequences of to-
kens, limiting the number of demonstrations that
can be used within a single prompt. Furthermore,
the performance of in-context learning has been
1
Our code and datasets are available at
https://github.
com/nle18/mice
shown to be sensitive to the choice of demonstra-
tions (Liu et al.,2022b) and their ordering in the
prompt (Lu et al.,2022).
To address these challenges, we present
M
ixtures of
I
n-
C
ontext
E
xperts (MICE). We
demonstrate MICE’s effectiveness on anaphora res-
olution in chemical synthesis protocols (see exam-
ples in Figure 1). Natural language understanding
for protocols makes an attractive use-case for few-
shot learning, as experimental procedures contain
rich coreference and bridging links. Anaphora in
protocols are expressed quite differently from those
found in high-resource domains (e.g. newswire),
and scientific protocols are not easily amenable to
annotation by non-expert crowd workers.
MICE works as follows. Given an anaphor, such
as “the mixture”, it uses in-context learning to pre-
dict a list of substances contained in the mixture
that are referenced earlier in the procedure, for ex-
ample: “Bromoacetyl bromide”,“compound 54”
and “water”. With only a handful of training ex-
amples (e.g., 16 or 32), MICE generates an ensem-
ble of up to
kd
in-context experts, each of which
consists of a prompt containing
d
demonstrations
chosen from the
k
available training examples. The
experts’ predictions are then combined in a mixture
model, where mixture weights are computed by
comparing embeddings of the input to the demon-
strations encoded by each in-context expert.
Although some in-context experts perform bet-
ter than others, individual prompts act as local ex-
perts in different regions of the input space (Jacobs
et al.,1991;Jordan and Jacobs,1994;Shazeer et al.,
2017), and no single prompt works better than oth-
ers on all inputs (see Figure 2). Furthermore, if the
same antecedent is predicted by multiple in-context
experts, this provides independent evidence for the
prediction, increasing the probability the answer is
correct (Downey et al.,2005).
In extensive experiments, we show MICE sig-
nificantly improves the performance of in-context
arXiv:2210.03690v2 [cs.CL] 14 Nov 2022