Honest Students from Untrusted Teachers Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno

2025-04-29 1 0 481.57KB 13 页 10玖币

侵权投诉

Honest Students from Untrusted Teachers: Learning an Interpretable

Question-Answering Pipeline from a Pretrained Language Model

Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno

Google Research

Abstract

We propose a new style of rationale for open-

book question answering, called markup-and-

mask, which combines aspects of extractive and

free-text explanations. In the markup phase, the

passage is augmented with free-text markup

that enables each sentence to stand on its own

outside the discourse context. In the masking

phase, a sub-span of the marked-up passage is

selected. To train a system to produce markup-

and-mask rationales without annotations, we

leverage in-context learning. Speciﬁcally, we

generate silver annotated data by sending a se-

ries of prompts to a frozen pretrained language

model, which acts as a teacher. We then ﬁne-

tune a smaller student model by training on the

subset of rationales that led to correct answers.

The student is “honest” in the sense that it is

a pipeline: the rationale acts as a bottleneck

between the passage and the answer, while the

“untrusted” teacher operates under no such con-

straints. Thus, we offer a new way to build trust-

worthy pipeline systems from a combination

of end-task annotations and frozen pretrained

language models.

1 Introduction

To be trustworthy and useful, a question answer-

ing system should be able to explain its reasoning

and offer evidence. In open-book question answer-

ing, such explanations often take the form of ra-

tionale masks, which are subsets of tokens from

the original passage (Lei et al.,2016). However,

a challenge for mask-based rationales is that sub-

spans of the original passage are not meant to be

read alone: coherent texts contain anaphora, ellip-

sis, and other cohesion-building elements that limit

the interpretability of individual subspans when

extracted from the discourse (Halliday and Hasan,

1976). An example is shown in Figure 1, in which

the key sentence mentions the answer only through

the nominal the grieving goddess. A sufﬁcient ra-

tionale for this answer would have to include an

Question: What is the name of the

person who revived Eshmun?,→

Passage: ... Eshmun, a young man from

Beirut, was hunting in the woods

when Astarte saw him [Eshmun] and

was stricken by his [Eshmun] beauty.

... The grieving goddess [Astarte]

revived Eshmun and transported him

[Eshmun] to the heavens where she

[Astarte] made him [Eshmun] into a

god of heaven. ...

,→

Answer: Astarte.

Figure 1: An example from QuoRef (Dasigi et al.,2019)

with the generated rationale shown in dark text. The

markup, shown in square brackets, makes it possible to

ﬁnd a more concise rationale than could be extracted

from the original passage.

additional sentence introducing the entity Astarte

and binding it to the nominal in the sentence that

describes the key event.

Despite their limitations, extractive rationales

have an important advantage over free-text expla-

nations: they are directly linked to the original

passage, making it easy for human readers to as-

sess the reliability of the evidence for themselves.

In this paper, we present a new style of explanation,

called markup-and-mask, which preserves the at-

tributability of extractive rationales while overcom-

ing the problems created by extracting propositions

from the discourse in which they were written. The

key idea is that discourse context is made explicit in

free-text markup and then rationales are extracted

from the marked-up passages.

Rather than annotating markup-and-mask ratio-

nales manually, we present a new training method

that leverages the in-context learning capability of

large pretrained language models (Figure 2). First,

we prompt a frozen language model to produce

markup that sequentially decontextualizes each sen-

tence in each passage in the training set. Next,

we prompt the same language model to produce

arXiv:2210.02498v3 [cs.CL] 24 Apr 2024

Figure 2: Schematic of the prompt chain used to produce silver data to ﬁne-tune the honest student. At the

decontextualization stage, one prompt is applied per sentence in the passage in sequence; the remaining stages use

exactly one prompt each.

answers and chain-of-thought rationales from the

decontextualized passage. Finally, we check that

the rationale supports the answer by prompting

the language model again, this time replacing the

full passage with the rationale. When the answer

approximately matches the ground truth, we add

the rationale and markup to a silver training set.

These silver annotations are used to train an “hon-

est student” that is constrained to follow a pipeline:

ﬁrst generate question-neutral markup, then select

a question-based rationale, and ﬁnally produce an

answer using the rationale and not the passage.

Evaluation shows a number of favorable proper-

ties to this approach. Speciﬁcally, the Honest Stu-

dent is able to generate rationales that: (1) support

accurate question answering; (2) help human raters

quickly and accurately judge whether the question-

answering system is correct; (3) quantify predictive

uncertainty; (4) are more likely to entail the pre-

dicted answers than non-pipeline rationales such

as “chain-of-thought”; and (5) accurately match

human-written decontextualizations. Evaluation

also reveals that the student models outperform

their teacher on all three of our key metrics — over-

all accuracy, entailment rate of rationales, and accu-

racy of decontextualizing markup — highlighting

the positive impact of distillation from pretrained

language models. To summarize the contributions

of this paper:

•

We propose markup-and-mask rationales for

open-book question answering, which preserve

a direct link to the original evidence text but use

markup to incorporate non-local information.

•

We show that it is possible to train models to

produce markup-and-mask rationales without ex-

plicit supervision, by leveraging the capabilities

of a pretrained language model.

•

We present a general strategy for using pretrained

language models to help supervise interpretable

pipeline systems in which annotations are avail-

able for only the end task.

•

We empirically validate the proposed approach,

showing that the resulting rationales are accurate,

consistent, and useful.

2 Generating Markup-and-Mask

Annotations

Our goal is to ﬁne-tune a student model to produce

markup-and-mask rationales. Lacking labeled ex-

amples, we obtain silver annotations by applying

three distinct prompting patterns to the pretrained

language model PaLM (Chowdhery et al.,2022)

(540-billion parameter version), which we refer

to as the teacher model. Each prompt combines

passages and questions from open-book question

answering datasets, along with the outputs of pre-

vious prompts, in an approach that has been called

prompt chaining (Wu et al.,2022). There are three

steps to the silver annotation process: (1) decontex-

tualization; (2) chain-of-thought question answer-

ing; (3) rationale validation. The prompt chain is

shown in Figure 2.

Decontextualization. The goal of the decon-

textualization step is to add free-text markup

of the style shown in ﬁg. 1. Decontextual-

ization examples are linearized as

Context:

... Passage: ... Rewrite:

, with the

language model prompted to complete the rewrite.

An example is shown in Figure 4. We use a hand-

crafted prompt with ﬁve exemplars, which were

written to include a few types of decontextual-

ization, including references to people, locations,

times, and events, as well as cases in which the

decontextualizing information was not present in

the context. Because this stage was relatively ex-

pensive — the teacher mode must be queried for

every sentence in the dataset — and because results

were promising from the ﬁrst exploratory prompts,

She [Venus] is often described as looking at herself on the mirror, although this is

physically impossible since viewers can see her [Venus] face reflected in their

direction. This phenomenon [Venus gazing at herself on the mirror] is known as the

Venus effect.

,→

...

Nudes were extremely rare in seventeenth-century Spanish art, which was policed actively

by members of the Spanish Inquisition. Despite this [the fact that nudes were

extremely rare in seventeenth- century Spanish art, which was policed actively by

members of the Spanish Inquisition], nudes by foreign artists were keenly collected

by the court circle, and this painting [The Rokeby Venus] was hung in the houses of

Spanish courtiers until 1813, when it was brought to England to hang in Rokeby Park,

Yorkshire.

,→

...

The painting [The Rokeby Venus] is believed to have been executed during one of

Velázquez's [the artist] visits to Rome, and Prater has observed that in Rome the

artist [Velázquez] "did indeed lead a life of considerable personal liberty..."

,→

Figure 3: Example of output from the decontextualization prompt, applied to the Wikipedia page

https://en.

wikipedia.org/wiki/Rokeby_Venus

we did not consider many alternative prompts. De-

contextualization was performed autoregressively,

rewriting each sentence using the previous

decon-

textualized sentences as context.

The capabilities and limitations of this approach

are highlighted in Figure 3, which shows some

typical outputs. The markup resolves pronominal

references she and her and the nominal references

this painting and this phenomenon. Perhaps most

impressively, the elliptical expression despite this

is decontextualized with the markup [the fact that

nudes were extremely rare. . . ]. However, by the

end of the document, we have lost track of the ﬁrst

name of the artist, so that the artist is decontex-

tualized as only [Velázquez], rather than with the

full name. Future work may address this issue by

exploring more sophisticated strategies than simple

autoregressive decontextualization.

Chain-of-thought question answering. In

chain-of-thought prompting, the language model is

asked to ﬁrst generate a rationale before producing

an answer (Wei et al.,2022). For open-book

question answering, we take the rationale to be a

sentence that is extracted from the passage and

which contains the answer, as shown in Figure 5.

We construct question-speciﬁc few-shot prompts

by concatenating several exemplars in which a

question, passage, rationale, and answer are shown,

before providing the question and passage for

the instance to be predicted. The exemplars are

drawn from the training set, selecting questions

with the highest BM25 similarity to the target

question (Robertson et al.,2009). Exemplars are

added until we reach a limit of 1024 sentencepiece

tokens in the prompt (Kudo and Richardson,2018);

for the QuoRef dataset, this amounts to two or

three exemplars in most cases.

To generate the rationales in the exemplars, we

enumerate all sentences in the passage that contains

an exact match to the answer and select the one

with the highest BM25 similarity to the exemplar’s

question. Each sentence is considered in both its

original surface form and with decontextualizing

markup. If no sentence contains an exact match to

the answer, then the question is not included as an

exemplar. However, prompts are constructed for

all training set examples, even when no rationale

can be extracted using this heuristic.

Rationale validation. Finally, to validate the ra-

tionales that were generated in the chain-of-thought

stage, we perform a ﬁnal validation stage in which

the teacher model must answer questions based

only on the generated rationales. As in the previ-

ous stage, we include each training set example and

construct in-prompt exemplars by BM25 similar-

ity to other questions in the training set. Because

this stage does not include full passages, we can

ﬁt many more exemplars while remaining under

the budget of 1024 tokens, on the order of 20 per

prompt. The resulting “faithful answers” are then

used to ﬁlter the ﬁne-tuning data that is exposed to

the student model.

3 Training the Student Model

The prompt chain described in Section 2produces

markup-and-mask rationales and uses them to an-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HonestStudentsfromUntrustedTeachers:LearninganInterpretableQuestion-AnsweringPipelinefromaPretrainedLanguageModelJacobEisensteinDanielAndorBerndBohnetMichaelCollinsDavidMimnoGoogleResearchAbstractWeproposeanewstyleofrationaleforopen-bookquestionanswering,calledmarkup-and-mask,whichcombinesaspectsofe...

展开>> 收起<<

Honest Students from Untrusted Teachers Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Honest Students from Untrusted Teachers Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: