Adversarial and Safely Scaled Question Generation Sreehari Sankar Zhihang Dong October 19 2022

2025-05-06 0 0 664.02KB 15 页 10玖币

侵权投诉

Adversarial and Safely Scaled Question Generation

Sreehari Sankar, Zhihang Dong

October 19, 2022

Abstract

Question generation has recently gained a lot of research interest, especially with the advent

of large language models. In and of itself, question generation can be considered ”AI-hard”, as

there is a lack of unanimously agreed sense of what makes a question ”good” or ”bad”. In this

paper, we tackle two fundamental problems in parallel: on one hand, we try to solve the scaling

problem, where question-generation and answering applications have to be applied on a massive

amount of text without ground truth labeling. The usual approach to solve this problem is to

either downsample or summarize. However, there are critical risks of misinformation with these

approaches. On the other hand, and related to the misinformation problem, we try to solve

the ’safety’ problem, as many public institutions rely on a much higher level of accuracy for the

content they provide. We introduce an adversarial approach to tackle the question generation

safety problem with scale. Speciﬁcally, we designed a question answering system that speciﬁcally

prunes out unanswerable questions that may be generated, and further increases the quality of

the answers that are generated. We build a production-ready, easily-plugged pipeline that can be

used on any given body of text, that is scalable and immune from generating any hate speech,

profanity or misinformation. Based on the results, we are able to generate more than six times

the number of the quality questions generated by the abstractive approach, and with a perceived

quality being 44% higher, according to a survey to 168 participants.

1 Introduction

Question generation is the task of generating questions given context and keywords. Recently, with the

advent of large language models, question generation has been applied to a wide variety of domains,

the most sensitive of which is probably question generation from children’s textbooks and stories for

educational purposes [ZHW+22,LCC+18]. Apart from these, there have been a lot of instances where

question generation is applicable to a large user base. When the data size is suﬃciently large, human

supervision of these generative systems is not feasible. In order to scale such question generation

systems, abstractive summarization is one of the most widely used techniques [LTY21,DTCZ17,

DSC17,NO21,ANKM22].

With that being said, let us invite our readers to examine one decently interesting examples of how

state-of-the-art abstractive models (google/pegasus-xsum) can be shockingly wrong. For Figure 1,

to the best of the knowledge of the two authors, we have failed to discover the commonality between

pumpkins, drugs and, especially, condoms. This question is left for future research. However, what’s

more dangerous than comparing pumpkin against condom is that abstractive summarization has been

shown to create misinformation [SBA+20,EPX19]. Misinformation is dangerous to the society, par-

ticularly with the rise of fabricated misinformation [WAR18,ASL20]. Misinformation destroys trust

between people and institutions, particularly in the area of public health [STL19], political engagement

[JZ20] and science [WB21].

Modern question generation frameworks are usually following a generative approach, where the

input is the ”context” from which the questions are asked – if the process is ”answer-guided”, then ad-

ditional keywords follow suit. One caveat behind this is many modern AI-generated question answering

models may produce false information. As these question answering frameworks are massively scaled,

we would soon reach to a point when human monitoring becomes ineﬀective. Even if human-in-the-

loop monitoring is realistic, there were still mixed results for the human-in-the-loop or crowdsourced

truth veriﬁcation during the inﬂux of information and breaking news [SRLB+21,RSP+21]. There are

certain scenarios in the AI-generated question-answering applications where absolute truths must be

arXiv:2210.09467v1 [cs.IR] 17 Oct 2022

hold: consider using such applications in the event of election, legal inquiries and health documents,

we must generate answers from and only from the traceable texts.

Figure 1: Example of Abstractive Model Failed to Work

In this paper, we tackle the problem of scaling without inviting the risk of creating misinformation.

To do this, we avoid the use of summarizers to downsample the input text. Instead we devise a

’maximal generation’ approach. We also propose an adversarial approach to question generation, and

results show that the generated questions from an adversarial system are more than 40% better than

a vanilla abstractive question generation system. As part of our adversarial system, we integrate a

question answering system in addition to the question generator to ﬁlter out ’bad’ questions. We use

state of the art question answering system trained on SQuAD v2.0 [RJL18], which contains negative

examples. We also show results from a model trained on SQuaD v1.1 for further understanding the

eﬀectiveness of our adversarial approach when the training resources are limited. We are pursuing

question generation in a ’keyphrase-guided’ or perhaps more commonly known as the ’answer-focused’

format, where the inputs are a given ”context” and a ”keyphrase” within that context. Traditionally,

this ”keyphrase” is assumed to be the answer to the question that is generated. This is as opposed to

the end-to-end format, where a body of text is given as input and a set of questions are produced as

output, with or without the answers. We pursue this avenue since we believe that this is much more

controllable, relevant and much more deployable in a real world application.

Another contribution of this paper is the improvement on the coherence and relatedness of the

answers with this new adversarial approach, since the answer is generated by the question answering

system, given the question, instead of being the keyword around which the question was generated.

That being said, we ﬁnd that often the keyword is a subset of the answer span. Due to the fact that

many of the generated questions cannot be answered in one phrase, we ﬁnd that our coherence of

answer to question is much higher, since our adversarial system extracts a span of text and is not

limited to one phrase. We identify this as a major bottleneck to all question generation systems that

have thus far been developed.

In summary, we attempt to solve the following three painpoints in this paper:

•We address the problem of scaling, and the problem of possible misinformation generation from

question generation systems which is a direct result of the currently most widely used solution

to scaling.

•We propose an adversarial approach to question generation which includes question answering.

•We prove that this approach not only eliminates ’bad’ quality questions, but also increases the

cohesiveness between the question and the answer

2 Related Works

In this section, we split the literature review in two parts. We begin the ﬁrst part by providing a brief

history on the development of question-generation models. We then follow it by introducing a concise

discussion over the impact of AI-generated question-answering models on the modern society.

2.1 Understanding Question Generation Models

While there have been several earlier works on question generation, [HS10,BFE05] most are grammar-

focused methods for question generation, where rule-based techniques are employed, while others are

question pattern focused, where commonly used question patterns are mined from large text corpora

and re-used. With the advent of sequence-to-sequence models, there has been renewed interest in

question generation as a ﬁeld. We have seen several attempts at question generation using RNN-

based sequence-to-sequence models, like [WWF+20] where the authors have created an end-to-end

RNN based question generator, with query representation learning applied to query-based question

generation.

There is also RNN-based question generation based on knowledge graphs [RRKJ17]. Interestingly,

[DTCZ17] gives two types of distinct question generation approaches, one using a CNN and a retrieval

based techniques, and another using and RNN and a generation based mechanism. In their work,

they attempt rather the opposite of what we attempt in that they attempt to use question generation

systems to increase the performance of question answering systems. We however, use a question

answering system to further increase the quality of the question generation system. Interestingly, most

of the methods we have seen so far are neglecting the scaling problem almost completely and instead

chooses to focus on the quality of the generated question. Since we do not ﬁnd explicit solutions to

the scaling problem before the transformer era, we can only hypothesize that it was not imagined to

be a problem due to the fact that RNNs can take arbitrarily long sentences. Researchers have noted

decreased performance over larger sequences, but this problem is a generalized problem for all RNNs.

With the advent of transformers [VSP+17] that we ﬁnd several approaches using the encoder-decoder

architecture for question generation. It is at this point that the scaling problem comes into the

picture, since transformers, unlike RNNs, have a limited input capacity, and therefore text has to be

chunkiﬁed before being given as input. Since this is a extremely expansive list with applications across

domains like visual question generation, and since there are dedicated surveys to cover this topic,

we refer the reader to [PLCK19,KLP+20,CS18,DMPS21] for a very broad list of all the various

implementations of transformers with applications to question generation. Within this broad research

area, we identify answer-guided transformer-based question generation as being the best performing,

most widely usable, and most relevant to the real world. [LCCC20] is a transformer-based end-to-end

question generation approach. They use GPT-2 [RWC+19] for the actual generation process. There

have also been BERT [DCLT18] based models for question generation, given by the work in [CF19],

where the authors developed a sequential model for BERT based question generation. However, these

works and results are compelling cases for using a full-transformer for this task instead of just an

encoder or just a decoder stack. Due to the performance demonstrated in [LTY21,RSR+20], and

since a full transformer is an encoder-decoder network, instead of just an encoder stack or a decoder

stack, we choose to use a T5 [RSR+20] for the generation process. Going by the results given in

[LTY21], we realize that a well-trained Pegasus [ZZSL20] could give a comparable performance, but

we leave those studies up to future research. Pretrained transformers for question generation tasks

give state of the art performance across a wide range of domains.

We also see a progression towards question generation with multiple choice answers [LTY21], where

the authors also used a T5 and Pegasus models for generation, and then created distractor options for

a given answer. This line of work is perhaps the closest to what we are doing now. However, most

critically, they use summarizers to scale a given text to ﬁt the input size of a transformer, and further,

they take the keyword as an answer as a given. In our approach, the answer is regenerated using a

question answering system.

We also ﬁnd that there is considerable research in terms of controllable generative models [HNHT21,

KMV+19]. However, for question generation at the moment, there exists no objective metrics by

which we can judge the semantic quality of a question. Although there are ”syntactical” metrics, like

ROUGE [Lin04] and METEOR [BL05], these scores need references to test how good the produced

output is. For question generation, the same question can be asked in a variety of diﬀerent ways.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdversarialandSafelyScaledQuestionGenerationSreehariSankar,ZhihangDongOctober19,2022AbstractQuestiongenerationhasrecentlygainedalotofresearchinterest,especiallywiththeadventoflargelanguagemodels.Inandofitself,questiongenerationcanbeconsidered"AI-hard",asthereisalackofunanimouslyagreedsenseofwhatmake...

展开>> 收起<<

Adversarial and Safely Scaled Question Generation Sreehari Sankar Zhihang Dong October 19 2022.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adversarial and Safely Scaled Question Generation Sreehari Sankar Zhihang Dong October 19 2022

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: