On QAPs generated from the official SAT practice
reading tests, evaluations by human judges indicate
that 97% of the QAPs are both grammatically and se-
mantically correct.
The initial training dataset is not big enough to
contain tag-set sequences that match the tag-set se-
quences of a number of declarative sentences in the
passages of the SAT reading tests. We manually add
new interrogative sentences for some of these declar-
ative sentences, and show that TSS-Learner is able to
learn new rules and use them to generate additional
adequate QAPs.
The rest of the paper is organized as follows: We
summarize in Section 2 related work and present in
Section 3 a general framework of tag-set-sequence
learning. We then present in Section 4 an implemen-
tation of TSS-Learner for the English language and
describe evaluation results in Section 5. Section 6
concludes the paper.
2 RELATED WORK
Automatic question generation (QG), first studied by
Wolfe (Wolfe, 1976) as a means to aid independent
study, has attracted much research in two lines of
methodologies with a certain success; they are trans-
formative and generative methods.
2.1 Transformative Methods
Transformative methods transform key phrases from
a single declarative sentence into interrogative sen-
tences, including rule-based, semantics-based, and
template-based methods.
Rule-based methods parse sentences using a syn-
tactic parser to identify key phrases and transform a
sentence to an interrogative sentence based on syntac-
tic rules, including methods to identify key phrases
from input sentences and use syntactic rules for dif-
ferent types of questions (Varga and Ha, 2010), gen-
erate QAPs using a syntactic parser, a POS tagger,
and an NE analyzer (Ali et al., 2010), transform a
sentence into a set of interrogative sentences using a
series of domain-independent rules (Danon and Last,
2017), and generate questions using relative pronouns
and adverbs from complex English sentences (Khullar
et al., 2018).
Semantics-based methods create interrogative
sentences using predicate-argument structures and se-
mantic roles (Mannem et al., 2010), semantic pattern
recognition (Mazidi and Nielsen, 2014), subtopics
based on Latent Dirichlet Allocation (Chali and
Hasan, 2015), or semantic-role labeling (Flor and Ri-
ordan, 2018).
Template-based methods are used for special-
purpose applications with built-in templates, includ-
ing methods based on Natural Language Generation
Markup Language (NLGML) (Cai et al., 2006), on
phrase structure parsing and enhanced XML (Rus
et al., 2007), on self questioning (Mostow and Chen,
2009), on enhanced self-questioning (Chen, 2009), on
pattern matching and templates similar to NLGML
(Wyse and Piwek, 2009), on templates with place-
holder variables (Lindberg, 2013), and on semantics
turned to templates (Lindberg et al., 2013).
2.2 Generative methods
Recent advances of deep learning have shed new light
on generative methods. For example, the attention
mechanism (Luong et al., 2015) is used to determine
what content in a sentence should be asked, and the
sequence-to-sequence (Bahdanau et al., 2014; Cho
et al., 2014) and the long short-term memory (Sak
et al., 2014) mechanisms are used to generate each
word in an interrogative sentence (see, e.g., (Du et al.,
2017; Duan et al., 2017; Harrison and Walker, 2018;
Sachan and Xing, 2018)). These models, however,
only deal with question generations without generat-
ing correct answers. Moreover, training these models
require a dataset comprising over 100K interrogative
sentences.
To generate answers, researchers have explored
ways to encode a passage (a sentence or multiple sen-
tences) and an answer word (or a phrase) as input,
and determine what interrogative sentences are to be
generated for a given answer (Zhou et al., 2018; Zhao
et al., 2018; Song et al., 2018). Kim et al. (Kim et al.,
2019) pointed out that these models could generate a
number of answer-revealing questions (namely, ques-
tions contain in them the corresponding answers).
They then devised a new method by encoding an-
swers separately, at the expense of having substan-
tially more parameters. This method, however, suffers
from low accuracy, and it is also unknown whether the
generated interrogative sentences are grammatically
correct.
Recently, a new method is presented to perform
a downstream task of transformers with preprocess-
ing and postprocessing pipelines (TP3) for generating
QAPs (Zhang et al., 2022). They showed that TP3
using pretrained T5 models (Raffel et al., 2020) out-
performs previous models. Human evaluations also
confirm the high qualities of QAPs generated by this
method. However, TP3 may generate silly questions
for certain chunk of texts. This calls for further in-