Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions Seonjeong Hwang1 Yunsu Kim12 Gary Geunbae Lee12

2025-05-02 0 0 225.46KB 9 页 10玖币

侵权投诉

Multi-Type Conversational Question-Answer Generation

with Closed-ended and Unanswerable Questions

Seonjeong Hwang1, Yunsu Kim1,2, Gary Geunbae Lee1,2,

1Graduate School of Artiﬁcial Intelligence, POSTECH, Pohang, South Korea

2Computer Science and Engineering, POSTECH, Pohang, South Korea

{seonjeongh, yunsu.kim, gblee}@postech.ac.kr

Abstract

Conversational question answering (CQA) fa-

cilitates an incremental and interactive under-

standing of a given context, but building a

CQA system is difﬁcult for many domains due

to the problem of data scarcity. In this paper,

we introduce a novel method to synthesize data

for CQA with various question types, includ-

ing open-ended, closed-ended, and unanswer-

able questions. We design a different genera-

tion ﬂow for each question type and effectively

combine them in a single, shared framework.

Moreover, we devise a hierarchical answerabil-

ity classiﬁcation (hierarchical AC) module that

improves quality of the synthetic data while

acquiring unanswerable questions. Manual in-

spections show that synthetic data generated

with our framework have characteristics very

similar to those of human-generated conver-

sations. Across four domains, CQA systems

trained on our synthetic data indeed show good

performance close to the systems trained on

human-annotated data.

1 Introduction

Conversational question answering (CQA) aims to

answer a question based on a given passage and

previous conversation. Unlike single-turn ques-

tion answering (QA) (Rajpurkar et al.,2016,2018;

Kwiatkowski et al.,2019), CQA encourages ques-

tioners to incrementally make follow-up questions,

which is suitable for services that require active in-

teraction between humans and systems. However,

manually creating large amounts of conversations

is very costly, which is a barrier to its utilization in

various domains.

To alleviate this issue, a few methods for con-

versational question generation have been studied

(Gao et al.,2019;Pan et al.,2019;Nakanishi et al.,

2019;Shen et al.,2021;Gu et al.,2021). Fur-

thermore, we have proposed approaches for auto-

matically synthesizing multi-turn conversational

question-answer (Q–A) pairs in order to build train-

ing data for CQA in our previous studies (Hwang

and Lee,2021,2022). However, our previous

frameworks generate only open-ended questions

that cannot be answered succinctly. In real-world

situations, concise answers, such as yes,no, and

unknown, are essential for fast interaction and sim-

pliﬁed conversations.

In this paper, we introduce MultiCQAG, a frame-

work that can generate multiple types of CQA data.

To enable this, we insert a generation ﬂow for

closed-ended Q–A pairs to our previous framework

(Hwang and Lee,2022). We also design a hier-

archical answerability classiﬁcation (hierarchical

AC) module that collects yet another type of data

— unanswerable questions — while improving data

quality by removing invalid Q–A pairs.

In experiments, CQA systems trained on our

synthetic datasets achieve an average F1 score of

77.2% for four new domains, showing a differ-

ence of only 5.4% from those trained on human-

annotated data. Moreover, we show by manual

evaluation that our synthetic data have a data distri-

bution similar to that of human-annotated data.

The contributions of this work can be summa-

rized as follows:

•

We propose MultiCQAG, which synthesizes a

CQA data consisting of various types of ques-

tions, including open-ended, closed-ended,

and unanswerable questions.

•

We design a hierarchical AC algorithm that ﬁl-

ters out invalid Q–A pairs and acquires unan-

swerable questions.

2 Background

In our previous study, we proposed a conversa-

tional question-answer generation (CQAG) frame-

work that automatically synthesized data for CQA

given passages and that consisted of two modules:

contextual answer extraction (CAE) and conversa-

tional question generation (CQG) (Hwang and Lee,

arXiv:2210.12979v1 [cs.CL] 24 Oct 2022

Figure 1: Generation pipeline of MultiCQAG. Conversation history is not used to generate the ﬁrst Q–A pair of a

conversation (dotted line).

2021). First, the CAE module extracts a potential

answer span from a passage based on a previous

conversation. Second, the CQG module generates

a conversational question for the extracted answer.

During generation, the framework uses previously

generated Q–A pairs as the conversation history for

the next generation. However, synthetic data gen-

erated by this framework only contain extractive

answers that are inﬂexible in form. Moreover, there

is a risk that errors generated by the CAE module

can propagate to subsequent generations.

To resolve this problem, we developed CQAG-

AR, which adopted an answer revision approach

(Hwang and Lee,2022). In this framework, the

CQG with answer revision (CQG-AR) module gen-

erates a question for the extracted answer span and

then modiﬁes the answer span so that it better ﬁts

the question. However, CQAG-AR can only synthe-

size open-ended types of data and cannot generate

closed-ended and unanswerable types, which are

frequently used in human conversations. In this

paper, we improve CQAG-AR to generate those

different types of data in a single framework.

3 Method

3.1 Generation Flows

As shown in Figure 1, we insert two generation

ﬂows between CAE and CQG-AR modules to gen-

erate open-ended and closed-ended data. The CAE

module

P(as|p, h;θA)

extracts an answer span

that is a question worthy phrase in the passage

considering the conversation history

, which is the

concatenation of previously generated Q–A pairs.

After extracting the answer span, the data type to

generate for the current turn is randomly selected

according to a preset ratio (open-ended:yes:no).

When the open-ended type is selected, the CQG-

AR module generates an open-ended question

qopen

and a revised answer

for the answer

span

with consideration for the answer context

and conversation history

, i.e.,

P(qopen, ar|

ca, h, as;θQ)

, where the answer context indicates

the chunk of the passage containing the answer

span and

words front of and behind it. When the

closed-ended type is chosen, however, the module

generates a closed-ended question

qclose

for yes or

no based on the answer context and conversation

history, i.e., P(qclose |ca, h, yes/no;θQ).

We implement both modules the same as in

CQAG-AR. However, in MultiCQAG, the two gen-

eration ﬂows share the same model parameters

θQ

of the CQG-AR module, and the answer revision

is only conducted for open-ended data. Therefore,

the module is trained to return the same answer

(yes/no) as the input instead of a revised answer for

closed-ended data.

3.2 Hierarchical Answerability

Classiﬁcation

Our framework has an autoregressive pipeline over

multiple turns, so if an inappropriate Q–A pair is

synthesized, the errors can propagate to subsequent

data generation. Therefore, we devise a hierarchi-

cal AC module that determines whether a question

can be answered based on the passage. If not, the

module replaces the answer of an unanswerable

question with "unknown".

3.2.1 Algorithm

We classify synthetic questions into three cate-

gories: (1) answerable in correct context or an

answerable question given the context sentence of

the synthetic answer, (2) answerable in different

context or a question whose correct answer can be

found in a sentence outside the context of its syn-

thetic answer, and (3) unanswerable question or a

question that cannot be answered with the informa-

tion in the passage.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-TypeConversationalQuestion-AnswerGenerationwithClosed-endedandUnanswerableQuestionsSeonjeongHwang1,YunsuKim1;2,GaryGeunbaeLee1;2;1GraduateSchoolofArticialIntelligence,POSTECH,Pohang,SouthKorea2ComputerScienceandEngineering,POSTECH,Pohang,SouthKorea{seonjeongh,yunsu.kim,gblee}@postech.ac.krAbs...

展开>> 收起<<

Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions Seonjeong Hwang1 Yunsu Kim12 Gary Geunbae Lee12.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions Seonjeong Hwang1 Yunsu Kim12 Gary Geunbae Lee12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: