Adversarial and Safely Scaled Question Generation Sreehari Sankar Zhihang Dong October 19 2022

2025-05-06 0 0 664.02KB 15 页 10玖币
侵权投诉
Adversarial and Safely Scaled Question Generation
Sreehari Sankar, Zhihang Dong
October 19, 2022
Abstract
Question generation has recently gained a lot of research interest, especially with the advent
of large language models. In and of itself, question generation can be considered ”AI-hard”, as
there is a lack of unanimously agreed sense of what makes a question ”good” or ”bad”. In this
paper, we tackle two fundamental problems in parallel: on one hand, we try to solve the scaling
problem, where question-generation and answering applications have to be applied on a massive
amount of text without ground truth labeling. The usual approach to solve this problem is to
either downsample or summarize. However, there are critical risks of misinformation with these
approaches. On the other hand, and related to the misinformation problem, we try to solve
the ’safety’ problem, as many public institutions rely on a much higher level of accuracy for the
content they provide. We introduce an adversarial approach to tackle the question generation
safety problem with scale. Specifically, we designed a question answering system that specifically
prunes out unanswerable questions that may be generated, and further increases the quality of
the answers that are generated. We build a production-ready, easily-plugged pipeline that can be
used on any given body of text, that is scalable and immune from generating any hate speech,
profanity or misinformation. Based on the results, we are able to generate more than six times
the number of the quality questions generated by the abstractive approach, and with a perceived
quality being 44% higher, according to a survey to 168 participants.
1 Introduction
Question generation is the task of generating questions given context and keywords. Recently, with the
advent of large language models, question generation has been applied to a wide variety of domains,
the most sensitive of which is probably question generation from children’s textbooks and stories for
educational purposes [ZHW+22,LCC+18]. Apart from these, there have been a lot of instances where
question generation is applicable to a large user base. When the data size is sufficiently large, human
supervision of these generative systems is not feasible. In order to scale such question generation
systems, abstractive summarization is one of the most widely used techniques [LTY21,DTCZ17,
DSC17,NO21,ANKM22].
With that being said, let us invite our readers to examine one decently interesting examples of how
state-of-the-art abstractive models (google/pegasus-xsum) can be shockingly wrong. For Figure 1,
to the best of the knowledge of the two authors, we have failed to discover the commonality between
pumpkins, drugs and, especially, condoms. This question is left for future research. However, what’s
more dangerous than comparing pumpkin against condom is that abstractive summarization has been
shown to create misinformation [SBA+20,EPX19]. Misinformation is dangerous to the society, par-
ticularly with the rise of fabricated misinformation [WAR18,ASL20]. Misinformation destroys trust
between people and institutions, particularly in the area of public health [STL19], political engagement
[JZ20] and science [WB21].
Modern question generation frameworks are usually following a generative approach, where the
input is the ”context” from which the questions are asked – if the process is ”answer-guided”, then ad-
ditional keywords follow suit. One caveat behind this is many modern AI-generated question answering
models may produce false information. As these question answering frameworks are massively scaled,
we would soon reach to a point when human monitoring becomes ineffective. Even if human-in-the-
loop monitoring is realistic, there were still mixed results for the human-in-the-loop or crowdsourced
truth verification during the influx of information and breaking news [SRLB+21,RSP+21]. There are
certain scenarios in the AI-generated question-answering applications where absolute truths must be
1
arXiv:2210.09467v1 [cs.IR] 17 Oct 2022
hold: consider using such applications in the event of election, legal inquiries and health documents,
we must generate answers from and only from the traceable texts.
Figure 1: Example of Abstractive Model Failed to Work
In this paper, we tackle the problem of scaling without inviting the risk of creating misinformation.
To do this, we avoid the use of summarizers to downsample the input text. Instead we devise a
’maximal generation’ approach. We also propose an adversarial approach to question generation, and
results show that the generated questions from an adversarial system are more than 40% better than
a vanilla abstractive question generation system. As part of our adversarial system, we integrate a
question answering system in addition to the question generator to filter out ’bad’ questions. We use
state of the art question answering system trained on SQuAD v2.0 [RJL18], which contains negative
examples. We also show results from a model trained on SQuaD v1.1 for further understanding the
effectiveness of our adversarial approach when the training resources are limited. We are pursuing
question generation in a ’keyphrase-guided’ or perhaps more commonly known as the ’answer-focused’
format, where the inputs are a given ”context” and a ”keyphrase” within that context. Traditionally,
this ”keyphrase” is assumed to be the answer to the question that is generated. This is as opposed to
the end-to-end format, where a body of text is given as input and a set of questions are produced as
output, with or without the answers. We pursue this avenue since we believe that this is much more
controllable, relevant and much more deployable in a real world application.
Another contribution of this paper is the improvement on the coherence and relatedness of the
answers with this new adversarial approach, since the answer is generated by the question answering
system, given the question, instead of being the keyword around which the question was generated.
That being said, we find that often the keyword is a subset of the answer span. Due to the fact that
many of the generated questions cannot be answered in one phrase, we find that our coherence of
answer to question is much higher, since our adversarial system extracts a span of text and is not
limited to one phrase. We identify this as a major bottleneck to all question generation systems that
have thus far been developed.
In summary, we attempt to solve the following three painpoints in this paper:
We address the problem of scaling, and the problem of possible misinformation generation from
question generation systems which is a direct result of the currently most widely used solution
to scaling.
We propose an adversarial approach to question generation which includes question answering.
We prove that this approach not only eliminates ’bad’ quality questions, but also increases the
cohesiveness between the question and the answer
2
2 Related Works
In this section, we split the literature review in two parts. We begin the first part by providing a brief
history on the development of question-generation models. We then follow it by introducing a concise
discussion over the impact of AI-generated question-answering models on the modern society.
2.1 Understanding Question Generation Models
While there have been several earlier works on question generation, [HS10,BFE05] most are grammar-
focused methods for question generation, where rule-based techniques are employed, while others are
question pattern focused, where commonly used question patterns are mined from large text corpora
and re-used. With the advent of sequence-to-sequence models, there has been renewed interest in
question generation as a field. We have seen several attempts at question generation using RNN-
based sequence-to-sequence models, like [WWF+20] where the authors have created an end-to-end
RNN based question generator, with query representation learning applied to query-based question
generation.
There is also RNN-based question generation based on knowledge graphs [RRKJ17]. Interestingly,
[DTCZ17] gives two types of distinct question generation approaches, one using a CNN and a retrieval
based techniques, and another using and RNN and a generation based mechanism. In their work,
they attempt rather the opposite of what we attempt in that they attempt to use question generation
systems to increase the performance of question answering systems. We however, use a question
answering system to further increase the quality of the question generation system. Interestingly, most
of the methods we have seen so far are neglecting the scaling problem almost completely and instead
chooses to focus on the quality of the generated question. Since we do not find explicit solutions to
the scaling problem before the transformer era, we can only hypothesize that it was not imagined to
be a problem due to the fact that RNNs can take arbitrarily long sentences. Researchers have noted
decreased performance over larger sequences, but this problem is a generalized problem for all RNNs.
With the advent of transformers [VSP+17] that we find several approaches using the encoder-decoder
architecture for question generation. It is at this point that the scaling problem comes into the
picture, since transformers, unlike RNNs, have a limited input capacity, and therefore text has to be
chunkified before being given as input. Since this is a extremely expansive list with applications across
domains like visual question generation, and since there are dedicated surveys to cover this topic,
we refer the reader to [PLCK19,KLP+20,CS18,DMPS21] for a very broad list of all the various
implementations of transformers with applications to question generation. Within this broad research
area, we identify answer-guided transformer-based question generation as being the best performing,
most widely usable, and most relevant to the real world. [LCCC20] is a transformer-based end-to-end
question generation approach. They use GPT-2 [RWC+19] for the actual generation process. There
have also been BERT [DCLT18] based models for question generation, given by the work in [CF19],
where the authors developed a sequential model for BERT based question generation. However, these
works and results are compelling cases for using a full-transformer for this task instead of just an
encoder or just a decoder stack. Due to the performance demonstrated in [LTY21,RSR+20], and
since a full transformer is an encoder-decoder network, instead of just an encoder stack or a decoder
stack, we choose to use a T5 [RSR+20] for the generation process. Going by the results given in
[LTY21], we realize that a well-trained Pegasus [ZZSL20] could give a comparable performance, but
we leave those studies up to future research. Pretrained transformers for question generation tasks
give state of the art performance across a wide range of domains.
We also see a progression towards question generation with multiple choice answers [LTY21], where
the authors also used a T5 and Pegasus models for generation, and then created distractor options for
a given answer. This line of work is perhaps the closest to what we are doing now. However, most
critically, they use summarizers to scale a given text to fit the input size of a transformer, and further,
they take the keyword as an answer as a given. In our approach, the answer is regenerated using a
question answering system.
We also find that there is considerable research in terms of controllable generative models [HNHT21,
KMV+19]. However, for question generation at the moment, there exists no objective metrics by
which we can judge the semantic quality of a question. Although there are ”syntactical” metrics, like
ROUGE [Lin04] and METEOR [BL05], these scores need references to test how good the produced
output is. For question generation, the same question can be asked in a variety of different ways.
3
摘要:

AdversarialandSafelyScaledQuestionGenerationSreehariSankar,ZhihangDongOctober19,2022AbstractQuestiongenerationhasrecentlygainedalotofresearchinterest,especiallywiththeadventoflargelanguagemodels.Inandofitself,questiongenerationcanbeconsidered"AI-hard",asthereisalackofunanimouslyagreedsenseofwhatmake...

展开>> 收起<<
Adversarial and Safely Scaled Question Generation Sreehari Sankar Zhihang Dong October 19 2022.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:15 页 大小:664.02KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注