Honest Students from Untrusted Teachers Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno

2025-04-29 0 0 481.57KB 13 页 10玖币
侵权投诉
Honest Students from Untrusted Teachers: Learning an Interpretable
Question-Answering Pipeline from a Pretrained Language Model
Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno
Google Research
Abstract
We propose a new style of rationale for open-
book question answering, called markup-and-
mask, which combines aspects of extractive and
free-text explanations. In the markup phase, the
passage is augmented with free-text markup
that enables each sentence to stand on its own
outside the discourse context. In the masking
phase, a sub-span of the marked-up passage is
selected. To train a system to produce markup-
and-mask rationales without annotations, we
leverage in-context learning. Specifically, we
generate silver annotated data by sending a se-
ries of prompts to a frozen pretrained language
model, which acts as a teacher. We then fine-
tune a smaller student model by training on the
subset of rationales that led to correct answers.
The student is “honest” in the sense that it is
a pipeline: the rationale acts as a bottleneck
between the passage and the answer, while the
“untrusted” teacher operates under no such con-
straints. Thus, we offer a new way to build trust-
worthy pipeline systems from a combination
of end-task annotations and frozen pretrained
language models.
1 Introduction
To be trustworthy and useful, a question answer-
ing system should be able to explain its reasoning
and offer evidence. In open-book question answer-
ing, such explanations often take the form of ra-
tionale masks, which are subsets of tokens from
the original passage (Lei et al.,2016). However,
a challenge for mask-based rationales is that sub-
spans of the original passage are not meant to be
read alone: coherent texts contain anaphora, ellip-
sis, and other cohesion-building elements that limit
the interpretability of individual subspans when
extracted from the discourse (Halliday and Hasan,
1976). An example is shown in Figure 1, in which
the key sentence mentions the answer only through
the nominal the grieving goddess. A sufficient ra-
tionale for this answer would have to include an
Question: What is the name of the
person who revived Eshmun?,
Passage: ... Eshmun, a young man from
Beirut, was hunting in the woods
when Astarte saw him [Eshmun] and
was stricken by his [Eshmun] beauty.
... The grieving goddess [Astarte]
revived Eshmun and transported him
[Eshmun] to the heavens where she
[Astarte] made him [Eshmun] into a
god of heaven. ...
,
,
,
,
,
,
,
,
Answer: Astarte.
Figure 1: An example from QuoRef (Dasigi et al.,2019)
with the generated rationale shown in dark text. The
markup, shown in square brackets, makes it possible to
find a more concise rationale than could be extracted
from the original passage.
additional sentence introducing the entity Astarte
and binding it to the nominal in the sentence that
describes the key event.
Despite their limitations, extractive rationales
have an important advantage over free-text expla-
nations: they are directly linked to the original
passage, making it easy for human readers to as-
sess the reliability of the evidence for themselves.
In this paper, we present a new style of explanation,
called markup-and-mask, which preserves the at-
tributability of extractive rationales while overcom-
ing the problems created by extracting propositions
from the discourse in which they were written. The
key idea is that discourse context is made explicit in
free-text markup and then rationales are extracted
from the marked-up passages.
Rather than annotating markup-and-mask ratio-
nales manually, we present a new training method
that leverages the in-context learning capability of
large pretrained language models (Figure 2). First,
we prompt a frozen language model to produce
markup that sequentially decontextualizes each sen-
tence in each passage in the training set. Next,
we prompt the same language model to produce
arXiv:2210.02498v3 [cs.CL] 24 Apr 2024
Figure 2: Schematic of the prompt chain used to produce silver data to fine-tune the honest student. At the
decontextualization stage, one prompt is applied per sentence in the passage in sequence; the remaining stages use
exactly one prompt each.
answers and chain-of-thought rationales from the
decontextualized passage. Finally, we check that
the rationale supports the answer by prompting
the language model again, this time replacing the
full passage with the rationale. When the answer
approximately matches the ground truth, we add
the rationale and markup to a silver training set.
These silver annotations are used to train an “hon-
est student” that is constrained to follow a pipeline:
first generate question-neutral markup, then select
a question-based rationale, and finally produce an
answer using the rationale and not the passage.
Evaluation shows a number of favorable proper-
ties to this approach. Specifically, the Honest Stu-
dent is able to generate rationales that: (1) support
accurate question answering; (2) help human raters
quickly and accurately judge whether the question-
answering system is correct; (3) quantify predictive
uncertainty; (4) are more likely to entail the pre-
dicted answers than non-pipeline rationales such
as “chain-of-thought”; and (5) accurately match
human-written decontextualizations. Evaluation
also reveals that the student models outperform
their teacher on all three of our key metrics — over-
all accuracy, entailment rate of rationales, and accu-
racy of decontextualizing markup — highlighting
the positive impact of distillation from pretrained
language models. To summarize the contributions
of this paper:
We propose markup-and-mask rationales for
open-book question answering, which preserve
a direct link to the original evidence text but use
markup to incorporate non-local information.
We show that it is possible to train models to
produce markup-and-mask rationales without ex-
plicit supervision, by leveraging the capabilities
of a pretrained language model.
We present a general strategy for using pretrained
language models to help supervise interpretable
pipeline systems in which annotations are avail-
able for only the end task.
We empirically validate the proposed approach,
showing that the resulting rationales are accurate,
consistent, and useful.
2 Generating Markup-and-Mask
Annotations
Our goal is to fine-tune a student model to produce
markup-and-mask rationales. Lacking labeled ex-
amples, we obtain silver annotations by applying
three distinct prompting patterns to the pretrained
language model PaLM (Chowdhery et al.,2022)
(540-billion parameter version), which we refer
to as the teacher model. Each prompt combines
passages and questions from open-book question
answering datasets, along with the outputs of pre-
vious prompts, in an approach that has been called
prompt chaining (Wu et al.,2022). There are three
steps to the silver annotation process: (1) decontex-
tualization; (2) chain-of-thought question answer-
ing; (3) rationale validation. The prompt chain is
shown in Figure 2.
Decontextualization. The goal of the decon-
textualization step is to add free-text markup
of the style shown in fig. 1. Decontextual-
ization examples are linearized as
Context:
... Passage: ... Rewrite:
, with the
language model prompted to complete the rewrite.
An example is shown in Figure 4. We use a hand-
crafted prompt with five exemplars, which were
written to include a few types of decontextual-
ization, including references to people, locations,
times, and events, as well as cases in which the
decontextualizing information was not present in
the context. Because this stage was relatively ex-
pensive — the teacher mode must be queried for
every sentence in the dataset — and because results
were promising from the first exploratory prompts,
She [Venus] is often described as looking at herself on the mirror, although this is
physically impossible since viewers can see her [Venus] face reflected in their
direction. This phenomenon [Venus gazing at herself on the mirror] is known as the
Venus effect.
,
,
,
...
Nudes were extremely rare in seventeenth-century Spanish art, which was policed actively
by members of the Spanish Inquisition. Despite this [the fact that nudes were
extremely rare in seventeenth- century Spanish art, which was policed actively by
members of the Spanish Inquisition], nudes by foreign artists were keenly collected
by the court circle, and this painting [The Rokeby Venus] was hung in the houses of
Spanish courtiers until 1813, when it was brought to England to hang in Rokeby Park,
Yorkshire.
,
,
,
,
,
,
...
The painting [The Rokeby Venus] is believed to have been executed during one of
Velázquez's [the artist] visits to Rome, and Prater has observed that in Rome the
artist [Velázquez] "did indeed lead a life of considerable personal liberty..."
,
,
Figure 3: Example of output from the decontextualization prompt, applied to the Wikipedia page
https://en.
wikipedia.org/wiki/Rokeby_Venus
we did not consider many alternative prompts. De-
contextualization was performed autoregressively,
rewriting each sentence using the previous
k
decon-
textualized sentences as context.
The capabilities and limitations of this approach
are highlighted in Figure 3, which shows some
typical outputs. The markup resolves pronominal
references she and her and the nominal references
this painting and this phenomenon. Perhaps most
impressively, the elliptical expression despite this
is decontextualized with the markup [the fact that
nudes were extremely rare. . . ]. However, by the
end of the document, we have lost track of the first
name of the artist, so that the artist is decontex-
tualized as only [Velázquez], rather than with the
full name. Future work may address this issue by
exploring more sophisticated strategies than simple
autoregressive decontextualization.
Chain-of-thought question answering. In
chain-of-thought prompting, the language model is
asked to first generate a rationale before producing
an answer (Wei et al.,2022). For open-book
question answering, we take the rationale to be a
sentence that is extracted from the passage and
which contains the answer, as shown in Figure 5.
We construct question-specific few-shot prompts
by concatenating several exemplars in which a
question, passage, rationale, and answer are shown,
before providing the question and passage for
the instance to be predicted. The exemplars are
drawn from the training set, selecting questions
with the highest BM25 similarity to the target
question (Robertson et al.,2009). Exemplars are
added until we reach a limit of 1024 sentencepiece
tokens in the prompt (Kudo and Richardson,2018);
for the QuoRef dataset, this amounts to two or
three exemplars in most cases.
To generate the rationales in the exemplars, we
enumerate all sentences in the passage that contains
an exact match to the answer and select the one
with the highest BM25 similarity to the exemplar’s
question. Each sentence is considered in both its
original surface form and with decontextualizing
markup. If no sentence contains an exact match to
the answer, then the question is not included as an
exemplar. However, prompts are constructed for
all training set examples, even when no rationale
can be extracted using this heuristic.
Rationale validation. Finally, to validate the ra-
tionales that were generated in the chain-of-thought
stage, we perform a final validation stage in which
the teacher model must answer questions based
only on the generated rationales. As in the previ-
ous stage, we include each training set example and
construct in-prompt exemplars by BM25 similar-
ity to other questions in the training set. Because
this stage does not include full passages, we can
fit many more exemplars while remaining under
the budget of 1024 tokens, on the order of 20 per
prompt. The resulting “faithful answers” are then
used to filter the fine-tuning data that is exposed to
the student model.
3 Training the Student Model
The prompt chain described in Section 2produces
markup-and-mask rationales and uses them to an-
摘要:

HonestStudentsfromUntrustedTeachers:LearninganInterpretableQuestion-AnsweringPipelinefromaPretrainedLanguageModelJacobEisensteinDanielAndorBerndBohnetMichaelCollinsDavidMimnoGoogleResearchAbstractWeproposeanewstyleofrationaleforopen-bookquestionanswering,calledmarkup-and-mask,whichcombinesaspectsofe...

展开>> 收起<<
Honest Students from Untrusted Teachers Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model Jacob Eisenstein Daniel Andor Bernd Bohnet Michael Collins David Mimno.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:481.57KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注