Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts Nghia T. Le Fan Bai Alan Ritter

2025-04-27 0 0 539.27KB 14 页 10玖币
侵权投诉
Few-Shot Anaphora Resolution in Scientific Protocols via
Mixtures of In-Context Experts
Nghia T. Le, Fan Bai, Alan Ritter
School of Interactive Computing
Georgia Institute of Technology
{nle18,fan.bai,alan.ritter}@cc.gatech.edu
Abstract
Anaphora resolution is an important task,
which traditionally has required costly super-
vised training datasets for each new language,
text genre, and domain. Meanwhile, prompt-
ing large language models with a few in-
context examples has emerged as a promis-
ing approach to reduce labeling costs, how-
ever there are a number of challenges in ap-
plying in-context learning to resolve anaphora.
In this paper, we present MICE (Mixtures of
In-Context Experts), which we demonstrate
is effective for few-shot anaphora resolution
in the domain of scientific protocols (Tamari
et al.,2021). Given only a handful of train-
ing examples, MICE combines the predictions
of hundreds of in-context experts, yielding a
30% increase in F1score over a competitive
prompt retrieval baseline. Furthermore, we
show MICE can be used to train compact stu-
dent models without sacrificing performance.
As far as we are aware, this is the first work
to present experimental results demonstrating
the effectiveness of in-context learning on the
task of few-shot anaphora resolution in scien-
tific protocols.1
1 Introduction
Prompting large language models (LMs) with in-
context demonstrations has enabled surprisingly
effective few-shot learning (Brown et al.,2020).
However, more complex linguistic annotations over
paragraph-length inputs, such as anaphora and
coreference, have proven challenging (Yang et al.,
2022). Prompting language models with demon-
strations of anaphora and their corresponding an-
tecedents requires encoding long sequences of to-
kens, limiting the number of demonstrations that
can be used within a single prompt. Furthermore,
the performance of in-context learning has been
1
Our code and datasets are available at
https://github.
com/nle18/mice
shown to be sensitive to the choice of demonstra-
tions (Liu et al.,2022b) and their ordering in the
prompt (Lu et al.,2022).
To address these challenges, we present
M
ixtures of
I
n-
C
ontext
E
xperts (MICE). We
demonstrate MICEs effectiveness on anaphora res-
olution in chemical synthesis protocols (see exam-
ples in Figure 1). Natural language understanding
for protocols makes an attractive use-case for few-
shot learning, as experimental procedures contain
rich coreference and bridging links. Anaphora in
protocols are expressed quite differently from those
found in high-resource domains (e.g. newswire),
and scientific protocols are not easily amenable to
annotation by non-expert crowd workers.
MICE works as follows. Given an anaphor, such
as “the mixture”, it uses in-context learning to pre-
dict a list of substances contained in the mixture
that are referenced earlier in the procedure, for ex-
ample: “Bromoacetyl bromide”,“compound 54”
and “water”. With only a handful of training ex-
amples (e.g., 16 or 32), MICE generates an ensem-
ble of up to
kd
in-context experts, each of which
consists of a prompt containing
d
demonstrations
chosen from the
k
available training examples. The
experts’ predictions are then combined in a mixture
model, where mixture weights are computed by
comparing embeddings of the input to the demon-
strations encoded by each in-context expert.
Although some in-context experts perform bet-
ter than others, individual prompts act as local ex-
perts in different regions of the input space (Jacobs
et al.,1991;Jordan and Jacobs,1994;Shazeer et al.,
2017), and no single prompt works better than oth-
ers on all inputs (see Figure 2). Furthermore, if the
same antecedent is predicted by multiple in-context
experts, this provides independent evidence for the
prediction, increasing the probability the answer is
correct (Downey et al.,2005).
In extensive experiments, we show MICE sig-
nificantly improves the performance of in-context
arXiv:2210.03690v2 [cs.CL] 14 Nov 2022
Figure 1: Resolving antecedents of “the mixture” in a chemical synthesis procedure using MICE. Given a small
training set of 16 examples and a test input, we construct 256 prompts, each with two in-context demonstrations.
The prompts are then fed into a pre-trained language model (e.g. GPT-J) to generate candidate antecedents. The
probabilities of each candidate antecedent are computed and combined in a mixture of in-context experts using a
similarity-based gating function. MICE then selects the antecedents with the highest probabilities. In the figure,
orange, blue, red denote the anaphor,the true antecedents, and incorrect antecedents, respectively.
learning for anaphora resolution in synthetic pro-
cedures. For example, given 32 demonstrations,
a single prompt achieves an F
1
score of
38.6
. By
combining the predictions of an ensemble of 256
in-context experts, MICE achieves 53.9F1.
While MICE consistently improves in-context
anaphora resolution, inference in MICE is costly,
due to the large number of prompts involved. To
address this limitation, we show that fine-tuning
compact BERT-based models on data that is auto-
matically labeled by MICE can yield performance
improvements, while also producing models that
support efficient inference (Schick and Schütze,
2021a,b;Lang et al.,2022).
2 Anaphora Resolution in Scientific
Protocols
Split-antecedent anaphors (Vala et al.,2016;Yu
et al.,2020;Paun et al.,2022) are plural mentions
that refer to two or more antecedents in the previ-
ous discourse. For instance in the following text:
[Alice]antecedent and [Bob]antecedent went to
the store. [They]anaphor bought some bread.
the word “
[They]
” refers to two both “
[Alice]
and “[Bob]”.
Similar references to multiple antecedents often
appear in chemical synthesis protocols, for exam-
ple, “the mixture”. These references arise natu-
rally as the result of context change accommoda-
tion (Webber and Baldwin,1992), and are crucial
for understanding the steps needed to synthesize a
molecule (Fang et al.,2021). Resolving anaphoric
Figure 2: Heatmap visualizing the performance of 64
prompts on 64 sampled anaphors. Each prompt en-
codes two in-context demonstrations randomly sam-
pled from 8 training examples. Each square represents
F1of a single prompt applied to a single anaphor (typ-
ically these are associated with multiple antecedents).
The prompts and test inputs are sorted from high (top,
left) to low (bottom, right) F1. Note that no single
prompt performs best on all test inputs. This suggests
that it could be beneficial to combine lists of predicted
antecedents made independently by many in-context ex-
perts.
references in synthetic protocols could be benefi-
cial for automating protocols described in natural
language (Sanderson,2019;Vaucher et al.,2021),
in addition to automatically extracting chemical
reaction databases from scientific literature (Law-
son et al.,2014;Mysore et al.,2019). However,
anaphora is costly to annotate (Yuan et al.,2022)
and scientific protocols are not easily amenable to
annotation by non-expert crowd workers (Kulka-
rni et al.,2018). This motivates the need for few-
shot learning methods that can resolve anaphora
in procedural texts without extensive annotated re-
sources.
3 Mixtures of In-Context Experts
While in-context learning has achieved good per-
formance when prompted with a few examples, the
performance can vary significantly depending on
different prompt design choices (Lu et al.,2022;
Liu et al.,2022b). Furthermore, anaphora resolu-
tion requires paragraph-length contexts, limiting
the number of in-context examples that can be en-
coded in a single prompt (Figure 3). We address
these challenges with MICE. We show MICE is an
effective method for few-shot anaphora resolution
in §5, and demonstrate that it can be used to auto-
matically label data for fine-tuning more compact
models, without sacrificing performance, in §4.1.
Figure 3: Distribution of the maximum number of in-
context demonstrations of an anaphor, synthesis pro-
tocol, and corresponding antecedents that can be en-
coded in a single prompt. We compute the max num-
ber of demonstration tokens by subtracting the max se-
quence length (2048) by a fixed number for generated
tokens (256) and the number of tokens for the test in-
put. Demonstrations are randomly sampled until the
max number of tokens is reached. Given the longer
contexts needed to demonstrate anaphora resolution, a
prompt can encode at most 8 demonstrations, much less
than the 32 used in Brown et al. (2020).
In-Context Learning
We formulate the task of
anaphora resolution in synthetic protocols as fol-
lows. The input includes a document
D
and a
query anaphor
a
. Our goal is to identify a set
of antecedents
Y={y0, y1, ..., ym}
that corre-
spond to text spans in
D
. To tackle this problem
via in-context learning, we frame it as a SQuAD-
style extractive question answering task (Wu et al.,
2020). Specifically, as shown in Figure 1, each ex-
ample
(D, a)
is formatted as the concatenation of
document
D
and template question:
“What does
acontain?”
An autoregressive language model
then completes this sequence by generating
Y
, with
the antecedents separated by a special marker “
|
”.
Following the typical approach to in-context learn-
ing, the prompt includes a few demonstrations in
the prefix and ends with the test input.
Mixture of Experts
For a given test input
x=
(D, a)
, we aim to find the antecedents
yi
with the
highest probabilities
P(yi|x)
. The notation
yi
de-
notes an antecedent from the union of antecedents
generated by all prompts. MICE computes
P(yi|x)
using a mixture of experts (Jacobs et al.,1991;Cho
et al.,2019), treating the prompt,
z
, as a latent
variable (Guu et al.,2020):
P(yi|x) = X
z
P(yi|z, x)P(z|x)(1)
In Eq.1,
P(z|x)
represents the likelihood that
prompt
z
is constructed given
x
, and
P(yi|z, x)
represents the probability that the LM predicts an-
tecedent yiwhen prompted with zand x.
Similarity-based Gating
We compute
P(z|x)
by summing similarity scores
s(x, u1), ..., s(x, ud)
between
x
and the in-context demonstrations
u1, ..., udencoded in z:2
P(z|x)exp
d
X
i=1
s(x, ui)
where
s(x, ui)
is the cosine similarity between the
embeddings of
x
and
ui
. Details of the similarity
measures used in our experiments are presented in
§4.1.
Estimating Antecedent Probabilities
Comput-
ing probabilities
P(yi|z, x)
that are comparable
across variable-length antecedents is not easy.
Longer sequences will naturally have smaller LM
probabilities, suggesting the need for length nor-
malization, or averaging per-token probabilities,
neither of which we found to work well.
3
There-
fore, following Zhao et al. (2021), we estimate
2
We also experimented with multiplying the similarity
scores and observed similar results.
3
Similar to Zhao et al. (2021), we observe that, for a gen-
erated antecedent, the first token probabilities vary the most,
while probabilities of subsequent tokens are highly determin-
istic.
摘要:

Few-ShotAnaphoraResolutioninScienticProtocolsviaMixturesofIn-ContextExpertsNghiaT.Le,FanBai,AlanRitterSchoolofInteractiveComputingGeorgiaInstituteofTechnology{nle18,fan.bai,alan.ritter}@cc.gatech.eduAbstractAnaphoraresolutionisanimportanttask,whichtraditionallyhasrequiredcostlysuper-visedtrainingda...

展开>> 收起<<
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts Nghia T. Le Fan Bai Alan Ritter.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:539.27KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注