Knowledge Transfer from Answer Ranking to Answer Generation Matteo Gabburo1 Rik Koncel-Kedziorski2 Siddhant Garg2 Luca Soldaini3y Alessandro Moschitti2

2025-05-03 0 0 469.74KB 15 页 10玖币

侵权投诉

Knowledge Transfer from Answer Ranking to Answer Generation

Matteo Gabburo1∗

, Rik Koncel-Kedziorski2, Siddhant Garg2,

Luca Soldaini3†, Alessandro Moschitti2

1University of Trento , 2Amazon Alexa AI, 3Allen Institute for AI

matteo.gabburo@unitn.it

{rikdz,sidgarg,amosch}@amazon.com

lucas@allenai.org

Abstract

Recent studies show that Question Answer-

ing (QA) based on Answer Sentence Selec-

tion (AS2) can be improved by generating an

improved answer from the top-kranked an-

swer sentences (termed GenQA). This allows

for synthesizing the information from multi-

ple candidates into a concise, natural-sounding

answer. However, creating large-scale super-

vised training data for GenQA models is very

challenging. In this paper, we propose to train

a GenQA model by transferring knowledge

from a trained AS2 model, to overcome the

aforementioned issue. First, we use an AS2

model to produce a ranking over answer candi-

dates for a set of questions. Then, we use the

top ranked candidate as the generation target,

and the next ktop ranked candidates as con-

text for training a GenQA model. We also pro-

pose to use the AS2 model prediction scores

for loss weighting and score-conditioned in-

put/output shaping, to aid the knowledge trans-

fer. Our evaluation on three public and one

large industrial datasets demonstrates the supe-

riority of our approach over the AS2 baseline,

and GenQA trained using supervised data.

1 Introduction

In recent times, extractive QA research can be cat-

egorized into two broad directions for the task of

producing the ﬁnal answer for a question: (i) An-

swer Sentence Selection (AS2), which, given a

question and a set of answer-sentence candidates,

selects sentences (e.g., retrieved by a search en-

gine) that correctly answer the question; and (ii)

Machine Reading (MR), e.g., (Chen et al.,2017),

which, given a question and a reference text, in-

volves ﬁnding an exact text span that answers the

question. AS2 models can perform more efﬁciently

with large text databases (as they originated from

the TREC-QA track (Voorhees,1999)), and there

∗Work done as an intern at Amazon Alexa AI

†Work completed at Amazon Alexa AI

seems a renewed research interest in these models

for applications to personal assistants, e.g., Alexa

(Garg et al.,2020;Matsubara et al.,2020a;Garg

and Moschitti,2021).

Both approaches (AS2 and MR) when applied

for QA over unstructured web text, while effective,

may have certain drawbacks. Arbitrary web sen-

tences may not contain all the information needed

to answer a question, or may contain distracting ex-

traneous information. Moreover, they may have a

particular sentiment or style that is not suited to QA,

or be too structurally reliant on longer discourse

context to serve as a standalone answer. In light

of this, researchers have been exploring text gen-

eration systems for writing ‘better’ answers. For

example, in MR, RAG (Lewis et al.,2020b) gener-

ates an answer from a set of documents selected by

dense passage retrieval models.

For AS2 systems, research has focused on learn-

ing to summarize answers from relevant para-

graphs (Lewis et al.,2020a), or to synthesize in-

formation from the top ranked candidates of an

AS2 system (Hsu et al.,2021). The latter approach,

termed as GenQA, has shown improvements in

terms of both answer accuracy and style suitabil-

ity. A distinctive characteristic of GenQA over a

generation-based approach for MR is the length

of the answer: the former uses an entire sentence

as the target, while the latter in practice uses a

short text (primarily targeting entity names). In

this work, we focus on GenQA as we are inter-

ested to generate complete answer sentences from

precise information selected by AS2 models.

A challenge for training effective GenQA mod-

els is the difﬁculty of obtaining large-scale, high-

quality training data. Producing such data for

GenQA typically requires human annotators to read

questions and paragraphs of relevant background

information, and then author a self-contained, nat-

ural answer (typically a sentence). This fairly

involved procedure highly diminishes the veloc-

arXiv:2210.12865v1 [cs.CL] 23 Oct 2022

ity of annotation. Existing datasets in research

works either offer limited coverage of all domains,

where GenQA can be applied (Bajaj et al.,2018),

or are too small to be used as supervised training

data (Muller et al.,2021). Generally, collecting a

human-authored answer to a question when given

a context is signiﬁcantly more expensive compared

to annotating the correctness of an extracted web

sentence as an answer for the same question. Con-

sequently, there are a large number of annotated

datasets (Wang et al.,2007;Yang et al.,2015;Garg

et al.,2020) available for the latter type, aimed at

training answer sentence selection (AS2) systems.

In this work, we propose a training paradigm

for transferring the knowledge learned by a dis-

criminative AS2 ranking model to train an answer

generation QA system. Towards this, we learn a

GenQA model using weak supervision provided

by a trained AS2 model on a unlabeled data set

comprising of questions and answer candidates.

Speciﬁcally, for each question, the AS2 model is

used to rank a set of answer candidates without

having any label of correctness/incorrectness for

answering the question. The top ranked answer is

used as the generation target for the GenQA model,

while the question along with the next

top-ranked

answers are used as the input for the GenQA model.

We supplement the ranking order of answer can-

didates with the prediction conﬁdence scores pro-

vided by the AS2 model for each answer candi-

date. This is done by modifying our knowledge

transfer strategy in two ways. First, we weight

the loss of each training instance (question + con-

text, comprised of

answer candidates) using the

AS2 model score of the top ranked answer, which

is to be used as the GenQA target. This allows

the GenQA model to selectively learn more from

‘good’ quality target answers in the weakly super-

vised training data (AS2 models are calibrated to

produce higher conﬁdence scores for correct an-

swers). However, this loss weighting only con-

siders the score of the output target, and does not

exploit the scores of the input candidates. To over-

come this limitation, we discretize and label the

AS2 scores into

conﬁdence buckets, add these

bucket labels to the GenQA vocabulary, and ﬁnally

prepend the corresponding label to each answer

candidate in the input and/or the output. This con-

ﬁdence bucket label provides the GenQA model

with an additional signal about the answer quality

of each candidate as assigned by the AS2 model.

We show that both these techniques improve the

QA accuracy, and can be combined to provide ad-

ditional improvements.

We empirically evaluate

our proposed knowl-

edge transferring technique from AS2 to GenQA

on three popular public datasets: MS-MARCO

NLG (Bajaj et al.,2018), WikiQA (Yang et al.,

2015), TREC-QA (Wang et al.,2007); and one

large scale industrial QA dataset. Our results show

that the GenQA model trained using our paradigm

of weak supervision from an AS2 model can sur-

prisingly outperform both the AS2 model that was

used for knowledge transfer (teacher), as well as

a GenQA model trained on fully supervised data.

On small datasets such as WikiQA and TREC-QA,

we show that AS2 models trained even on small

amounts of labeled data can be effectively used

to weakly supervise a GenQA model, which then

can outperform its teacher in QA accuracy. Addi-

tionally, on MS-MARCO NLG, where fully super-

vised GenQA training data is available, we show

that an initial round of training with our weakly

supervised methods yields additional performance

improvements compared to the standard supervised

training of GenQA. Qualitatively, the answers gen-

erated by our model are often more directly related

to the question being asked, and stylistically more

natural-sounding and suitable as responses than an-

swers from AS2 models, despite being trained only

on extracted sentences from the web.

2 Related Work

Our work builds upon recent research in AS2, an-

swer generation for QA, and transfer learning.

Answer Sentence Selection

Early approaches

for AS2 use CNNs (Severyn and Moschitti,2015)

or alignment networks (Shen et al.,2017;Tran

et al.,2018;Tay et al.,2018) to learn and score

question and answer representations. Compare-

and-aggregate architectures have also been exten-

sively studied (Wang and Jiang,2017;Bian et al.,

2017;Yoon et al.,2019) for AS2. Tayyar Mad-

abushi et al. (2018) exploited ﬁne-grained ques-

tion classiﬁcation to further improve answer se-

lection. Garg et al. (2020) achieved state-of-the-

art results by ﬁne-tuning transformer-based mod-

els on a large-scale QA dataset ﬁrst, and then

adapting to smaller AS2 datasets. Matsubara et al.

We will release code and all trained models checkpoints

https://github.com/amazon-research/

wqa-genqa-knowledge-transfer

(2020b) combine multiple heterogeneous systems

for AS2 to improve a QA pipeline, similar in

spirit to GenQA. Several follow-up works have

further improved the performance of AS2 using

transformer models, using multiple answer can-

didates (Zhang et al.,2021) and document-aware

pre-training strategies (Di Liello et al.,2022a,b).

Answer Generation for QA

Answer generation

for MR has been studied by Izacard and Grave

(2021); Lewis et al. (2020b), while Iida et al.

(2019); Goodwin et al. (2020); Deng et al. (2020)

have studied question-based summarization (QS).

Asai et al. (2022) incorporate the evidentiality of

retrieved passages for training a generator, eval-

uated for the QA task of open-domain MR span-

extraction. Xu et al. (2021) obtain extractive an-

swer spans from a generative model by leveraging

the decoder cross-attention patterns. Fajcik et al.

(2021) combine a generative reader with an extrac-

tive reader to aggregate evidence from multiple

passages for open-domain span-extraction.

All the previously described approaches focus

on identifying short answer spans for answering

questions. Research on generating complete sen-

tences as answers (similar to answer sentences pro-

duced by extractive AS2 systems) is rarer, but in-

cludes Hsu et al. (2021), that propose a QA pipeline

for GenQA (refer Fig 1). This pipeline starts with

an AS2 model that selects ‘good’ answer candi-

dates that are then used for generating the answer.

Hsu et al. learn to generate natural responses to

questions using the top ranked candidates from the

AS2 model as input context to the GenQA model.

GenQA has also been explored for multilingual QA

(Muller et al.,2021) by extending the answer gen-

eration approach to a multilingual setting, where

the answer candidates (that are used as input to

the GenQA model) can be from a mix of different

languages.

In all these works, a major challenge is ﬁnd-

ing training data for effectively training GenQA

models, which requires annotator-authored natural

responses. In this work, we alleviate this problem

by showing that it is possible to use AS2 ranked

candidates to create the input context and output tar-

get for training GenQA, achieving state-of-the-art

results.

Transfer Learning

Transfer learning is well

studied in NLP, including pre-training (Devlin et al.,

2019;Liu et al.,2019), multi-task learning (Luong

et al.,2016), cross-lingual transfer (Schuster et al.,

Question

Target

Context

GenQA

Model

Figure 1:

A GenQA model (Hsu et al.,2021) is a seq2seq

model that takes in input a question and

answer candidates,

and generates an answer.

2019) and domain adaptation (Gururangan et al.,

2020). Our work is squarely located in this space:

our underlying language models are based on pre-

training for text generation (Radford et al.,2019;

Raffel et al.,2020); our main contribution is to

show that knowledge can be transferred sequen-

tially from a ranking (discriminative) task to a gen-

eration task. Recently Wang et al. (2021) propose

a new domain adaptation method leveraging large

unlabeled datasets and a query generator model.

Izacard and Grave used retrieved text passages con-

taining evidences to train a generative model for

open domain QA.

3 Knowledge Transfer: AS2 →GenQA

Previous works on GenQA require the use of

labeled data for effectively training the GenQA

model. To reduce the need of expensive large-

scale training data for GenQA, we propose a train-

ing paradigm that uses unlabeled data while being

weakly-supervised by a discriminative AS2 model

(as shown in Fig. 2).

3.1 Answer Sentence Selection (AS2)

AS2 is a popular task in QA, deﬁned as follows:

Given a question

, and a set of answer candidates

C={c1, . . . , cn}

(retrieved using a web-index,

KB, etc), ﬁnd the answer candidate

cq∈C

that

best answers

. This is typically modeled as a bi-

nary classiﬁer

over QA pairs, labeled as correct

or incorrect. At inference, the scores assigned by

can be used to produce a ranking over

, with

cq=argmaxiM(q, ci).

3.2 Generative QA (GenQA)

Generative QA refers to using a text generation

model for generating an answer for a question.

More speciﬁcally, when provided with a question

and context

¯c

, the GenQA model

should gener-

ate a natural sounding answer

cq=MG(q, ¯c)

that

correctly answers

. Following Hsu et al. (2021),

we consider a set of

answer candidates as the

context ¯cto be provided to MG.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

KnowledgeTransferfromAnswerRankingtoAnswerGenerationMatteoGabburo1,RikKoncel-Kedziorski2,SiddhantGarg2,LucaSoldaini3y,AlessandroMoschitti21UniversityofTrento,2AmazonAlexaAI,3AllenInstituteforAImatteo.gabburo@unitn.it{rikdz,sidgarg,amosch}@amazon.comlucas@allenai.orgAbstractRecentstudiesshowthatQues...

展开>> 收起<<

Knowledge Transfer from Answer Ranking to Answer Generation Matteo Gabburo1 Rik Koncel-Kedziorski2 Siddhant Garg2 Luca Soldaini3y Alessandro Moschitti2.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Knowledge Transfer from Answer Ranking to Answer Generation Matteo Gabburo1 Rik Koncel-Kedziorski2 Siddhant Garg2 Luca Soldaini3y Alessandro Moschitti2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: