Knowledge Transfer from Answer Ranking to Answer Generation Matteo Gabburo1 Rik Koncel-Kedziorski2 Siddhant Garg2 Luca Soldaini3y Alessandro Moschitti2

2025-05-03 0 0 469.74KB 15 页 10玖币
侵权投诉
Knowledge Transfer from Answer Ranking to Answer Generation
Matteo Gabburo1
, Rik Koncel-Kedziorski2, Siddhant Garg2,
Luca Soldaini3, Alessandro Moschitti2
1University of Trento , 2Amazon Alexa AI, 3Allen Institute for AI
matteo.gabburo@unitn.it
{rikdz,sidgarg,amosch}@amazon.com
lucas@allenai.org
Abstract
Recent studies show that Question Answer-
ing (QA) based on Answer Sentence Selec-
tion (AS2) can be improved by generating an
improved answer from the top-kranked an-
swer sentences (termed GenQA). This allows
for synthesizing the information from multi-
ple candidates into a concise, natural-sounding
answer. However, creating large-scale super-
vised training data for GenQA models is very
challenging. In this paper, we propose to train
a GenQA model by transferring knowledge
from a trained AS2 model, to overcome the
aforementioned issue. First, we use an AS2
model to produce a ranking over answer candi-
dates for a set of questions. Then, we use the
top ranked candidate as the generation target,
and the next ktop ranked candidates as con-
text for training a GenQA model. We also pro-
pose to use the AS2 model prediction scores
for loss weighting and score-conditioned in-
put/output shaping, to aid the knowledge trans-
fer. Our evaluation on three public and one
large industrial datasets demonstrates the supe-
riority of our approach over the AS2 baseline,
and GenQA trained using supervised data.
1 Introduction
In recent times, extractive QA research can be cat-
egorized into two broad directions for the task of
producing the final answer for a question: (i) An-
swer Sentence Selection (AS2), which, given a
question and a set of answer-sentence candidates,
selects sentences (e.g., retrieved by a search en-
gine) that correctly answer the question; and (ii)
Machine Reading (MR), e.g., (Chen et al.,2017),
which, given a question and a reference text, in-
volves finding an exact text span that answers the
question. AS2 models can perform more efficiently
with large text databases (as they originated from
the TREC-QA track (Voorhees,1999)), and there
Work done as an intern at Amazon Alexa AI
Work completed at Amazon Alexa AI
seems a renewed research interest in these models
for applications to personal assistants, e.g., Alexa
(Garg et al.,2020;Matsubara et al.,2020a;Garg
and Moschitti,2021).
Both approaches (AS2 and MR) when applied
for QA over unstructured web text, while effective,
may have certain drawbacks. Arbitrary web sen-
tences may not contain all the information needed
to answer a question, or may contain distracting ex-
traneous information. Moreover, they may have a
particular sentiment or style that is not suited to QA,
or be too structurally reliant on longer discourse
context to serve as a standalone answer. In light
of this, researchers have been exploring text gen-
eration systems for writing ‘better’ answers. For
example, in MR, RAG (Lewis et al.,2020b) gener-
ates an answer from a set of documents selected by
dense passage retrieval models.
For AS2 systems, research has focused on learn-
ing to summarize answers from relevant para-
graphs (Lewis et al.,2020a), or to synthesize in-
formation from the top ranked candidates of an
AS2 system (Hsu et al.,2021). The latter approach,
termed as GenQA, has shown improvements in
terms of both answer accuracy and style suitabil-
ity. A distinctive characteristic of GenQA over a
generation-based approach for MR is the length
of the answer: the former uses an entire sentence
as the target, while the latter in practice uses a
short text (primarily targeting entity names). In
this work, we focus on GenQA as we are inter-
ested to generate complete answer sentences from
precise information selected by AS2 models.
A challenge for training effective GenQA mod-
els is the difficulty of obtaining large-scale, high-
quality training data. Producing such data for
GenQA typically requires human annotators to read
questions and paragraphs of relevant background
information, and then author a self-contained, nat-
ural answer (typically a sentence). This fairly
involved procedure highly diminishes the veloc-
arXiv:2210.12865v1 [cs.CL] 23 Oct 2022
ity of annotation. Existing datasets in research
works either offer limited coverage of all domains,
where GenQA can be applied (Bajaj et al.,2018),
or are too small to be used as supervised training
data (Muller et al.,2021). Generally, collecting a
human-authored answer to a question when given
a context is significantly more expensive compared
to annotating the correctness of an extracted web
sentence as an answer for the same question. Con-
sequently, there are a large number of annotated
datasets (Wang et al.,2007;Yang et al.,2015;Garg
et al.,2020) available for the latter type, aimed at
training answer sentence selection (AS2) systems.
In this work, we propose a training paradigm
for transferring the knowledge learned by a dis-
criminative AS2 ranking model to train an answer
generation QA system. Towards this, we learn a
GenQA model using weak supervision provided
by a trained AS2 model on a unlabeled data set
comprising of questions and answer candidates.
Specifically, for each question, the AS2 model is
used to rank a set of answer candidates without
having any label of correctness/incorrectness for
answering the question. The top ranked answer is
used as the generation target for the GenQA model,
while the question along with the next
k
top-ranked
answers are used as the input for the GenQA model.
We supplement the ranking order of answer can-
didates with the prediction confidence scores pro-
vided by the AS2 model for each answer candi-
date. This is done by modifying our knowledge
transfer strategy in two ways. First, we weight
the loss of each training instance (question + con-
text, comprised of
k
answer candidates) using the
AS2 model score of the top ranked answer, which
is to be used as the GenQA target. This allows
the GenQA model to selectively learn more from
‘good’ quality target answers in the weakly super-
vised training data (AS2 models are calibrated to
produce higher confidence scores for correct an-
swers). However, this loss weighting only con-
siders the score of the output target, and does not
exploit the scores of the input candidates. To over-
come this limitation, we discretize and label the
AS2 scores into
l
confidence buckets, add these
bucket labels to the GenQA vocabulary, and finally
prepend the corresponding label to each answer
candidate in the input and/or the output. This con-
fidence bucket label provides the GenQA model
with an additional signal about the answer quality
of each candidate as assigned by the AS2 model.
We show that both these techniques improve the
QA accuracy, and can be combined to provide ad-
ditional improvements.
We empirically evaluate
1
our proposed knowl-
edge transferring technique from AS2 to GenQA
on three popular public datasets: MS-MARCO
NLG (Bajaj et al.,2018), WikiQA (Yang et al.,
2015), TREC-QA (Wang et al.,2007); and one
large scale industrial QA dataset. Our results show
that the GenQA model trained using our paradigm
of weak supervision from an AS2 model can sur-
prisingly outperform both the AS2 model that was
used for knowledge transfer (teacher), as well as
a GenQA model trained on fully supervised data.
On small datasets such as WikiQA and TREC-QA,
we show that AS2 models trained even on small
amounts of labeled data can be effectively used
to weakly supervise a GenQA model, which then
can outperform its teacher in QA accuracy. Addi-
tionally, on MS-MARCO NLG, where fully super-
vised GenQA training data is available, we show
that an initial round of training with our weakly
supervised methods yields additional performance
improvements compared to the standard supervised
training of GenQA. Qualitatively, the answers gen-
erated by our model are often more directly related
to the question being asked, and stylistically more
natural-sounding and suitable as responses than an-
swers from AS2 models, despite being trained only
on extracted sentences from the web.
2 Related Work
Our work builds upon recent research in AS2, an-
swer generation for QA, and transfer learning.
Answer Sentence Selection
Early approaches
for AS2 use CNNs (Severyn and Moschitti,2015)
or alignment networks (Shen et al.,2017;Tran
et al.,2018;Tay et al.,2018) to learn and score
question and answer representations. Compare-
and-aggregate architectures have also been exten-
sively studied (Wang and Jiang,2017;Bian et al.,
2017;Yoon et al.,2019) for AS2. Tayyar Mad-
abushi et al. (2018) exploited fine-grained ques-
tion classification to further improve answer se-
lection. Garg et al. (2020) achieved state-of-the-
art results by fine-tuning transformer-based mod-
els on a large-scale QA dataset first, and then
adapting to smaller AS2 datasets. Matsubara et al.
1
We will release code and all trained models checkpoints
at
https://github.com/amazon-research/
wqa-genqa-knowledge-transfer
(2020b) combine multiple heterogeneous systems
for AS2 to improve a QA pipeline, similar in
spirit to GenQA. Several follow-up works have
further improved the performance of AS2 using
transformer models, using multiple answer can-
didates (Zhang et al.,2021) and document-aware
pre-training strategies (Di Liello et al.,2022a,b).
Answer Generation for QA
Answer generation
for MR has been studied by Izacard and Grave
(2021); Lewis et al. (2020b), while Iida et al.
(2019); Goodwin et al. (2020); Deng et al. (2020)
have studied question-based summarization (QS).
Asai et al. (2022) incorporate the evidentiality of
retrieved passages for training a generator, eval-
uated for the QA task of open-domain MR span-
extraction. Xu et al. (2021) obtain extractive an-
swer spans from a generative model by leveraging
the decoder cross-attention patterns. Fajcik et al.
(2021) combine a generative reader with an extrac-
tive reader to aggregate evidence from multiple
passages for open-domain span-extraction.
All the previously described approaches focus
on identifying short answer spans for answering
questions. Research on generating complete sen-
tences as answers (similar to answer sentences pro-
duced by extractive AS2 systems) is rarer, but in-
cludes Hsu et al. (2021), that propose a QA pipeline
for GenQA (refer Fig 1). This pipeline starts with
an AS2 model that selects ‘good’ answer candi-
dates that are then used for generating the answer.
Hsu et al. learn to generate natural responses to
questions using the top ranked candidates from the
AS2 model as input context to the GenQA model.
GenQA has also been explored for multilingual QA
(Muller et al.,2021) by extending the answer gen-
eration approach to a multilingual setting, where
the answer candidates (that are used as input to
the GenQA model) can be from a mix of different
languages.
In all these works, a major challenge is find-
ing training data for effectively training GenQA
models, which requires annotator-authored natural
responses. In this work, we alleviate this problem
by showing that it is possible to use AS2 ranked
candidates to create the input context and output tar-
get for training GenQA, achieving state-of-the-art
results.
Transfer Learning
Transfer learning is well
studied in NLP, including pre-training (Devlin et al.,
2019;Liu et al.,2019), multi-task learning (Luong
et al.,2016), cross-lingual transfer (Schuster et al.,
Question
Target
Context
GenQA
Model
Figure 1:
A GenQA model (Hsu et al.,2021) is a seq2seq
model that takes in input a question and
k
answer candidates,
and generates an answer.
2019) and domain adaptation (Gururangan et al.,
2020). Our work is squarely located in this space:
our underlying language models are based on pre-
training for text generation (Radford et al.,2019;
Raffel et al.,2020); our main contribution is to
show that knowledge can be transferred sequen-
tially from a ranking (discriminative) task to a gen-
eration task. Recently Wang et al. (2021) propose
a new domain adaptation method leveraging large
unlabeled datasets and a query generator model.
Izacard and Grave used retrieved text passages con-
taining evidences to train a generative model for
open domain QA.
3 Knowledge Transfer: AS2 GenQA
Previous works on GenQA require the use of
labeled data for effectively training the GenQA
model. To reduce the need of expensive large-
scale training data for GenQA, we propose a train-
ing paradigm that uses unlabeled data while being
weakly-supervised by a discriminative AS2 model
(as shown in Fig. 2).
3.1 Answer Sentence Selection (AS2)
AS2 is a popular task in QA, defined as follows:
Given a question
q
, and a set of answer candidates
C={c1, . . . , cn}
(retrieved using a web-index,
KB, etc), find the answer candidate
cqC
that
best answers
q
. This is typically modeled as a bi-
nary classifier
M
over QA pairs, labeled as correct
or incorrect. At inference, the scores assigned by
M
can be used to produce a ranking over
C
, with
cq=argmaxiM(q, ci).
3.2 Generative QA (GenQA)
Generative QA refers to using a text generation
model for generating an answer for a question.
More specifically, when provided with a question
q
and context
¯c
, the GenQA model
MG
should gener-
ate a natural sounding answer
cq=MG(q, ¯c)
that
correctly answers
q
. Following Hsu et al. (2021),
we consider a set of
k
answer candidates as the
context ¯cto be provided to MG.
摘要:

KnowledgeTransferfromAnswerRankingtoAnswerGenerationMatteoGabburo1,RikKoncel-Kedziorski2,SiddhantGarg2,LucaSoldaini3y,AlessandroMoschitti21UniversityofTrento,2AmazonAlexaAI,3AllenInstituteforAImatteo.gabburo@unitn.it{rikdz,sidgarg,amosch}@amazon.comlucas@allenai.orgAbstractRecentstudiesshowthatQues...

展开>> 收起<<
Knowledge Transfer from Answer Ranking to Answer Generation Matteo Gabburo1 Rik Koncel-Kedziorski2 Siddhant Garg2 Luca Soldaini3y Alessandro Moschitti2.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:469.74KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注