2 Related Works
In this section, we split the literature review in two parts. We begin the first part by providing a brief
history on the development of question-generation models. We then follow it by introducing a concise
discussion over the impact of AI-generated question-answering models on the modern society.
2.1 Understanding Question Generation Models
While there have been several earlier works on question generation, [HS10,BFE05] most are grammar-
focused methods for question generation, where rule-based techniques are employed, while others are
question pattern focused, where commonly used question patterns are mined from large text corpora
and re-used. With the advent of sequence-to-sequence models, there has been renewed interest in
question generation as a field. We have seen several attempts at question generation using RNN-
based sequence-to-sequence models, like [WWF+20] where the authors have created an end-to-end
RNN based question generator, with query representation learning applied to query-based question
generation.
There is also RNN-based question generation based on knowledge graphs [RRKJ17]. Interestingly,
[DTCZ17] gives two types of distinct question generation approaches, one using a CNN and a retrieval
based techniques, and another using and RNN and a generation based mechanism. In their work,
they attempt rather the opposite of what we attempt in that they attempt to use question generation
systems to increase the performance of question answering systems. We however, use a question
answering system to further increase the quality of the question generation system. Interestingly, most
of the methods we have seen so far are neglecting the scaling problem almost completely and instead
chooses to focus on the quality of the generated question. Since we do not find explicit solutions to
the scaling problem before the transformer era, we can only hypothesize that it was not imagined to
be a problem due to the fact that RNNs can take arbitrarily long sentences. Researchers have noted
decreased performance over larger sequences, but this problem is a generalized problem for all RNNs.
With the advent of transformers [VSP+17] that we find several approaches using the encoder-decoder
architecture for question generation. It is at this point that the scaling problem comes into the
picture, since transformers, unlike RNNs, have a limited input capacity, and therefore text has to be
chunkified before being given as input. Since this is a extremely expansive list with applications across
domains like visual question generation, and since there are dedicated surveys to cover this topic,
we refer the reader to [PLCK19,KLP+20,CS18,DMPS21] for a very broad list of all the various
implementations of transformers with applications to question generation. Within this broad research
area, we identify answer-guided transformer-based question generation as being the best performing,
most widely usable, and most relevant to the real world. [LCCC20] is a transformer-based end-to-end
question generation approach. They use GPT-2 [RWC+19] for the actual generation process. There
have also been BERT [DCLT18] based models for question generation, given by the work in [CF19],
where the authors developed a sequential model for BERT based question generation. However, these
works and results are compelling cases for using a full-transformer for this task instead of just an
encoder or just a decoder stack. Due to the performance demonstrated in [LTY21,RSR+20], and
since a full transformer is an encoder-decoder network, instead of just an encoder stack or a decoder
stack, we choose to use a T5 [RSR+20] for the generation process. Going by the results given in
[LTY21], we realize that a well-trained Pegasus [ZZSL20] could give a comparable performance, but
we leave those studies up to future research. Pretrained transformers for question generation tasks
give state of the art performance across a wide range of domains.
We also see a progression towards question generation with multiple choice answers [LTY21], where
the authors also used a T5 and Pegasus models for generation, and then created distractor options for
a given answer. This line of work is perhaps the closest to what we are doing now. However, most
critically, they use summarizers to scale a given text to fit the input size of a transformer, and further,
they take the keyword as an answer as a given. In our approach, the answer is regenerated using a
question answering system.
We also find that there is considerable research in terms of controllable generative models [HNHT21,
KMV+19]. However, for question generation at the moment, there exists no objective metrics by
which we can judge the semantic quality of a question. Although there are ”syntactical” metrics, like
ROUGE [Lin04] and METEOR [BL05], these scores need references to test how good the produced
output is. For question generation, the same question can be asked in a variety of different ways.
3