Few-Shot Anaphora Resolution in Scientiﬁc Protocols via Mixtures of In-Context Experts Nghia T. Le Fan Bai Alan Ritter

2025-04-27 0 0 539.27KB 14 页 10玖币

侵权投诉

Few-Shot Anaphora Resolution in Scientiﬁc Protocols via

Mixtures of In-Context Experts

Nghia T. Le, Fan Bai, Alan Ritter

School of Interactive Computing

Georgia Institute of Technology

{nle18,fan.bai,alan.ritter}@cc.gatech.edu

Abstract

Anaphora resolution is an important task,

which traditionally has required costly super-

vised training datasets for each new language,

text genre, and domain. Meanwhile, prompt-

ing large language models with a few in-

context examples has emerged as a promis-

ing approach to reduce labeling costs, how-

ever there are a number of challenges in ap-

plying in-context learning to resolve anaphora.

In this paper, we present MICE (Mixtures of

In-Context Experts), which we demonstrate

is effective for few-shot anaphora resolution

in the domain of scientiﬁc protocols (Tamari

et al.,2021). Given only a handful of train-

ing examples, MICE combines the predictions

of hundreds of in-context experts, yielding a

30% increase in F1score over a competitive

prompt retrieval baseline. Furthermore, we

show MICE can be used to train compact stu-

dent models without sacriﬁcing performance.

As far as we are aware, this is the ﬁrst work

to present experimental results demonstrating

the effectiveness of in-context learning on the

task of few-shot anaphora resolution in scien-

tiﬁc protocols.1

1 Introduction

Prompting large language models (LMs) with in-

context demonstrations has enabled surprisingly

effective few-shot learning (Brown et al.,2020).

However, more complex linguistic annotations over

paragraph-length inputs, such as anaphora and

coreference, have proven challenging (Yang et al.,

2022). Prompting language models with demon-

strations of anaphora and their corresponding an-

tecedents requires encoding long sequences of to-

kens, limiting the number of demonstrations that

can be used within a single prompt. Furthermore,

the performance of in-context learning has been

Our code and datasets are available at

https://github.

com/nle18/mice

shown to be sensitive to the choice of demonstra-

tions (Liu et al.,2022b) and their ordering in the

prompt (Lu et al.,2022).

To address these challenges, we present

ixtures of

ontext

xperts (MICE). We

demonstrate MICE’s effectiveness on anaphora res-

olution in chemical synthesis protocols (see exam-

ples in Figure 1). Natural language understanding

for protocols makes an attractive use-case for few-

shot learning, as experimental procedures contain

rich coreference and bridging links. Anaphora in

protocols are expressed quite differently from those

found in high-resource domains (e.g. newswire),

and scientiﬁc protocols are not easily amenable to

annotation by non-expert crowd workers.

MICE works as follows. Given an anaphor, such

as “the mixture”, it uses in-context learning to pre-

dict a list of substances contained in the mixture

that are referenced earlier in the procedure, for ex-

ample: “Bromoacetyl bromide”,“compound 54”

and “water”. With only a handful of training ex-

amples (e.g., 16 or 32), MICE generates an ensem-

ble of up to

in-context experts, each of which

consists of a prompt containing

demonstrations

chosen from the

available training examples. The

experts’ predictions are then combined in a mixture

model, where mixture weights are computed by

comparing embeddings of the input to the demon-

strations encoded by each in-context expert.

Although some in-context experts perform bet-

ter than others, individual prompts act as local ex-

perts in different regions of the input space (Jacobs

et al.,1991;Jordan and Jacobs,1994;Shazeer et al.,

2017), and no single prompt works better than oth-

ers on all inputs (see Figure 2). Furthermore, if the

same antecedent is predicted by multiple in-context

experts, this provides independent evidence for the

prediction, increasing the probability the answer is

correct (Downey et al.,2005).

In extensive experiments, we show MICE sig-

niﬁcantly improves the performance of in-context

arXiv:2210.03690v2 [cs.CL] 14 Nov 2022

Figure 1: Resolving antecedents of “the mixture” in a chemical synthesis procedure using MICE. Given a small

training set of 16 examples and a test input, we construct 256 prompts, each with two in-context demonstrations.

The prompts are then fed into a pre-trained language model (e.g. GPT-J) to generate candidate antecedents. The

probabilities of each candidate antecedent are computed and combined in a mixture of in-context experts using a

similarity-based gating function. MICE then selects the antecedents with the highest probabilities. In the ﬁgure,

orange, blue, red denote the anaphor,the true antecedents, and incorrect antecedents, respectively.

learning for anaphora resolution in synthetic pro-

cedures. For example, given 32 demonstrations,

a single prompt achieves an F

score of

38.6

. By

combining the predictions of an ensemble of 256

in-context experts, MICE achieves 53.9F1.

While MICE consistently improves in-context

anaphora resolution, inference in MICE is costly,

due to the large number of prompts involved. To

address this limitation, we show that ﬁne-tuning

compact BERT-based models on data that is auto-

matically labeled by MICE can yield performance

improvements, while also producing models that

support efﬁcient inference (Schick and Schütze,

2021a,b;Lang et al.,2022).

2 Anaphora Resolution in Scientiﬁc

Protocols

Split-antecedent anaphors (Vala et al.,2016;Yu

et al.,2020;Paun et al.,2022) are plural mentions

that refer to two or more antecedents in the previ-

ous discourse. For instance in the following text:

“

[Alice]antecedent and [Bob]antecedent went to

the store. [They]anaphor bought some bread.

”

the word “

[They]

” refers to two both “

[Alice]

”

and “[Bob]”.

Similar references to multiple antecedents often

appear in chemical synthesis protocols, for exam-

ple, “the mixture”. These references arise natu-

rally as the result of context change accommoda-

tion (Webber and Baldwin,1992), and are crucial

for understanding the steps needed to synthesize a

molecule (Fang et al.,2021). Resolving anaphoric

Figure 2: Heatmap visualizing the performance of 64

prompts on 64 sampled anaphors. Each prompt en-

codes two in-context demonstrations randomly sam-

pled from 8 training examples. Each square represents

F1of a single prompt applied to a single anaphor (typ-

ically these are associated with multiple antecedents).

The prompts and test inputs are sorted from high (top,

left) to low (bottom, right) F1. Note that no single

prompt performs best on all test inputs. This suggests

that it could be beneﬁcial to combine lists of predicted

antecedents made independently by many in-context ex-

perts.

references in synthetic protocols could be beneﬁ-

cial for automating protocols described in natural

language (Sanderson,2019;Vaucher et al.,2021),

in addition to automatically extracting chemical

reaction databases from scientiﬁc literature (Law-

son et al.,2014;Mysore et al.,2019). However,

anaphora is costly to annotate (Yuan et al.,2022)

and scientiﬁc protocols are not easily amenable to

annotation by non-expert crowd workers (Kulka-

rni et al.,2018). This motivates the need for few-

shot learning methods that can resolve anaphora

in procedural texts without extensive annotated re-

sources.

3 Mixtures of In-Context Experts

While in-context learning has achieved good per-

formance when prompted with a few examples, the

performance can vary signiﬁcantly depending on

different prompt design choices (Lu et al.,2022;

Liu et al.,2022b). Furthermore, anaphora resolu-

tion requires paragraph-length contexts, limiting

the number of in-context examples that can be en-

coded in a single prompt (Figure 3). We address

these challenges with MICE. We show MICE is an

effective method for few-shot anaphora resolution

in §5, and demonstrate that it can be used to auto-

matically label data for ﬁne-tuning more compact

models, without sacriﬁcing performance, in §4.1.

Figure 3: Distribution of the maximum number of in-

context demonstrations of an anaphor, synthesis pro-

tocol, and corresponding antecedents that can be en-

coded in a single prompt. We compute the max num-

ber of demonstration tokens by subtracting the max se-

quence length (2048) by a ﬁxed number for generated

tokens (256) and the number of tokens for the test in-

put. Demonstrations are randomly sampled until the

max number of tokens is reached. Given the longer

contexts needed to demonstrate anaphora resolution, a

prompt can encode at most 8 demonstrations, much less

than the 32 used in Brown et al. (2020).

In-Context Learning

We formulate the task of

anaphora resolution in synthetic protocols as fol-

lows. The input includes a document

and a

query anaphor

. Our goal is to identify a set

of antecedents

Y={y0, y1, ..., ym}

that corre-

spond to text spans in

. To tackle this problem

via in-context learning, we frame it as a SQuAD-

style extractive question answering task (Wu et al.,

2020). Speciﬁcally, as shown in Figure 1, each ex-

ample

(D, a)

is formatted as the concatenation of

document

and template question:

“What does

acontain?”

An autoregressive language model

then completes this sequence by generating

, with

the antecedents separated by a special marker “

”.

Following the typical approach to in-context learn-

ing, the prompt includes a few demonstrations in

the preﬁx and ends with the test input.

Mixture of Experts

For a given test input

(D, a)

, we aim to ﬁnd the antecedents

with the

highest probabilities

P(yi|x)

. The notation

de-

notes an antecedent from the union of antecedents

generated by all prompts. MICE computes

P(yi|x)

using a mixture of experts (Jacobs et al.,1991;Cho

et al.,2019), treating the prompt,

, as a latent

variable (Guu et al.,2020):

P(yi|x) = X

P(yi|z, x)P(z|x)(1)

In Eq.1,

P(z|x)

represents the likelihood that

prompt

is constructed given

, and

P(yi|z, x)

represents the probability that the LM predicts an-

tecedent yiwhen prompted with zand x.

Similarity-based Gating

We compute

P(z|x)

by summing similarity scores

s(x, u1), ..., s(x, ud)

between

and the in-context demonstrations

u1, ..., udencoded in z:2

P(z|x)∝exp

i=1

s(x, ui)

where

s(x, ui)

is the cosine similarity between the

embeddings of

and

. Details of the similarity

measures used in our experiments are presented in

§4.1.

Estimating Antecedent Probabilities

Comput-

ing probabilities

P(yi|z, x)

that are comparable

across variable-length antecedents is not easy.

Longer sequences will naturally have smaller LM

probabilities, suggesting the need for length nor-

malization, or averaging per-token probabilities,

neither of which we found to work well.

There-

fore, following Zhao et al. (2021), we estimate

We also experimented with multiplying the similarity

scores and observed similar results.

Similar to Zhao et al. (2021), we observe that, for a gen-

erated antecedent, the ﬁrst token probabilities vary the most,

while probabilities of subsequent tokens are highly determin-

istic.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Few-ShotAnaphoraResolutioninScienticProtocolsviaMixturesofIn-ContextExpertsNghiaT.Le,FanBai,AlanRitterSchoolofInteractiveComputingGeorgiaInstituteofTechnology{nle18,fan.bai,alan.ritter}@cc.gatech.eduAbstractAnaphoraresolutionisanimportanttask,whichtraditionallyhasrequiredcostlysuper-visedtrainingda...

展开>> 收起<<

Few-Shot Anaphora Resolution in Scientiﬁc Protocols via Mixtures of In-Context Experts Nghia T. Le Fan Bai Alan Ritter.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Few-Shot Anaphora Resolution in Scientiﬁc Protocols via Mixtures of In-Context Experts Nghia T. Le Fan Bai Alan Ritter

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: