Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions Seonjeong Hwang1 Yunsu Kim12 Gary Geunbae Lee12

2025-05-02 0 0 225.46KB 9 页 10玖币
侵权投诉
Multi-Type Conversational Question-Answer Generation
with Closed-ended and Unanswerable Questions
Seonjeong Hwang1, Yunsu Kim1,2, Gary Geunbae Lee1,2,
1Graduate School of Artificial Intelligence, POSTECH, Pohang, South Korea
2Computer Science and Engineering, POSTECH, Pohang, South Korea
{seonjeongh, yunsu.kim, gblee}@postech.ac.kr
Abstract
Conversational question answering (CQA) fa-
cilitates an incremental and interactive under-
standing of a given context, but building a
CQA system is difficult for many domains due
to the problem of data scarcity. In this paper,
we introduce a novel method to synthesize data
for CQA with various question types, includ-
ing open-ended, closed-ended, and unanswer-
able questions. We design a different genera-
tion flow for each question type and effectively
combine them in a single, shared framework.
Moreover, we devise a hierarchical answerabil-
ity classification (hierarchical AC) module that
improves quality of the synthetic data while
acquiring unanswerable questions. Manual in-
spections show that synthetic data generated
with our framework have characteristics very
similar to those of human-generated conver-
sations. Across four domains, CQA systems
trained on our synthetic data indeed show good
performance close to the systems trained on
human-annotated data.
1 Introduction
Conversational question answering (CQA) aims to
answer a question based on a given passage and
previous conversation. Unlike single-turn ques-
tion answering (QA) (Rajpurkar et al.,2016,2018;
Kwiatkowski et al.,2019), CQA encourages ques-
tioners to incrementally make follow-up questions,
which is suitable for services that require active in-
teraction between humans and systems. However,
manually creating large amounts of conversations
is very costly, which is a barrier to its utilization in
various domains.
To alleviate this issue, a few methods for con-
versational question generation have been studied
(Gao et al.,2019;Pan et al.,2019;Nakanishi et al.,
2019;Shen et al.,2021;Gu et al.,2021). Fur-
thermore, we have proposed approaches for auto-
matically synthesizing multi-turn conversational
question-answer (Q–A) pairs in order to build train-
ing data for CQA in our previous studies (Hwang
and Lee,2021,2022). However, our previous
frameworks generate only open-ended questions
that cannot be answered succinctly. In real-world
situations, concise answers, such as yes,no, and
unknown, are essential for fast interaction and sim-
plified conversations.
In this paper, we introduce MultiCQAG, a frame-
work that can generate multiple types of CQA data.
To enable this, we insert a generation flow for
closed-ended Q–A pairs to our previous framework
(Hwang and Lee,2022). We also design a hier-
archical answerability classification (hierarchical
AC) module that collects yet another type of data
— unanswerable questions — while improving data
quality by removing invalid Q–A pairs.
In experiments, CQA systems trained on our
synthetic datasets achieve an average F1 score of
77.2% for four new domains, showing a differ-
ence of only 5.4% from those trained on human-
annotated data. Moreover, we show by manual
evaluation that our synthetic data have a data distri-
bution similar to that of human-annotated data.
The contributions of this work can be summa-
rized as follows:
We propose MultiCQAG, which synthesizes a
CQA data consisting of various types of ques-
tions, including open-ended, closed-ended,
and unanswerable questions.
We design a hierarchical AC algorithm that fil-
ters out invalid Q–A pairs and acquires unan-
swerable questions.
2 Background
In our previous study, we proposed a conversa-
tional question-answer generation (CQAG) frame-
work that automatically synthesized data for CQA
given passages and that consisted of two modules:
contextual answer extraction (CAE) and conversa-
tional question generation (CQG) (Hwang and Lee,
arXiv:2210.12979v1 [cs.CL] 24 Oct 2022
Figure 1: Generation pipeline of MultiCQAG. Conversation history is not used to generate the first Q–A pair of a
conversation (dotted line).
2021). First, the CAE module extracts a potential
answer span from a passage based on a previous
conversation. Second, the CQG module generates
a conversational question for the extracted answer.
During generation, the framework uses previously
generated Q–A pairs as the conversation history for
the next generation. However, synthetic data gen-
erated by this framework only contain extractive
answers that are inflexible in form. Moreover, there
is a risk that errors generated by the CAE module
can propagate to subsequent generations.
To resolve this problem, we developed CQAG-
AR, which adopted an answer revision approach
(Hwang and Lee,2022). In this framework, the
CQG with answer revision (CQG-AR) module gen-
erates a question for the extracted answer span and
then modifies the answer span so that it better fits
the question. However, CQAG-AR can only synthe-
size open-ended types of data and cannot generate
closed-ended and unanswerable types, which are
frequently used in human conversations. In this
paper, we improve CQAG-AR to generate those
different types of data in a single framework.
3 Method
3.1 Generation Flows
As shown in Figure 1, we insert two generation
flows between CAE and CQG-AR modules to gen-
erate open-ended and closed-ended data. The CAE
module
P(as|p, h;θA)
extracts an answer span
as
that is a question worthy phrase in the passage
p
considering the conversation history
h
, which is the
concatenation of previously generated Q–A pairs.
After extracting the answer span, the data type to
generate for the current turn is randomly selected
according to a preset ratio (open-ended:yes:no).
When the open-ended type is selected, the CQG-
AR module generates an open-ended question
qopen
and a revised answer
ar
for the answer
span
as
with consideration for the answer context
ca
and conversation history
h
, i.e.,
P(qopen, ar|
ca, h, as;θQ)
, where the answer context indicates
the chunk of the passage containing the answer
span and
N
words front of and behind it. When the
closed-ended type is chosen, however, the module
generates a closed-ended question
qclose
for yes or
no based on the answer context and conversation
history, i.e., P(qclose |ca, h, yes/no;θQ).
We implement both modules the same as in
CQAG-AR. However, in MultiCQAG, the two gen-
eration flows share the same model parameters
θQ
of the CQG-AR module, and the answer revision
is only conducted for open-ended data. Therefore,
the module is trained to return the same answer
(yes/no) as the input instead of a revised answer for
closed-ended data.
3.2 Hierarchical Answerability
Classification
Our framework has an autoregressive pipeline over
multiple turns, so if an inappropriate Q–A pair is
synthesized, the errors can propagate to subsequent
data generation. Therefore, we devise a hierarchi-
cal AC module that determines whether a question
can be answered based on the passage. If not, the
module replaces the answer of an unanswerable
question with "unknown".
3.2.1 Algorithm
We classify synthetic questions into three cate-
gories: (1) answerable in correct context or an
answerable question given the context sentence of
the synthetic answer, (2) answerable in different
context or a question whose correct answer can be
found in a sentence outside the context of its syn-
thetic answer, and (3) unanswerable question or a
question that cannot be answered with the informa-
tion in the passage.
摘要:

Multi-TypeConversationalQuestion-AnswerGenerationwithClosed-endedandUnanswerableQuestionsSeonjeongHwang1,YunsuKim1;2,GaryGeunbaeLee1;2;1GraduateSchoolofArticialIntelligence,POSTECH,Pohang,SouthKorea2ComputerScienceandEngineering,POSTECH,Pohang,SouthKorea{seonjeongh,yunsu.kim,gblee}@postech.ac.krAbs...

展开>> 收起<<
Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions Seonjeong Hwang1 Yunsu Kim12 Gary Geunbae Lee12.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:225.46KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注