Tag-Set-Sequence Learning for Generating Question-Answer Pairs Cheng Zhang and Jie Wang Department of Computer Science University of Massachusetts Lowell MA 01854 USA

2025-05-02 0 0 199.54KB 10 页 10玖币
侵权投诉
Tag-Set-Sequence Learning for Generating Question-Answer Pairs
Cheng Zhang and Jie Wang
Department of Computer Science, University of Massachusetts, Lowell, MA 01854, USA
cheng zhang@student.uml.edu, wang@cs.uml.edu
Keywords: Tag-Set-Sequence Learning, Question-Answer Pairs, Natural Language Processing
Abstract: Transformer-based QG models can generate question-answer pairs (QAPs) with high qualities, but may also
generate silly questions for certain texts. We present a new method called tag-set sequence learning to tackle
this problem, where a tag-set sequence is a sequence of tag sets to capture the syntactic and semantic infor-
mation of the underlying sentence, and a tag set consists of one or more language feature tags, including,
for example, semantic-role-labeling, part-of-speech, named-entity-recognition, and sentiment-indication tags.
We construct a system called TSS-Learner to learn tag-set sequences from given declarative sentences and the
corresponding interrogative sentences, and derive answers to the latter. We train a TSS-Learner model for the
English language using a small training dataset and show that it can indeed generate adequate QAPs for cer-
tain texts that transformer-based models do poorly. Human evaluation on the QAPs generated by TSS-Learner
over SAT practice reading tests is encouraging.
1 INTRODUCTION
Multiple-choice questions (MCQs) are often used to
assess if students understand the main points of a
given article. An MCQ consists of an interrogative
sentence, a correct answer (aka. answer key), and a
number of distractors. A QAP is an interrogative sen-
tence and its answer key. We study how to generate
QAPs from declarative sentences.
The coverage of the main points of an article may
be obtained by selecting important declarative sen-
tences using a sentence ranking algorithm such as
CNATAR (Contextual Network and Topic Analysis
Rank) (Zhang et al., 2021) on the article. QAPs may
be generated by applying a text-to-text transformer on
a declarative sentence, a chosen answer key, and the
surrounding chunk of sentences using, for example,
TP3 (Transformer with Preprocessing and Postpro-
cessing Pipelines) (Zhang et al., 2022), which gener-
ates interrogative sentences for the answer keys with
much higher success rates over previous methods.
TP3, however, may generate silly QAPs for certain
chunks of texts.
Training a deep learning model like TP3 for gen-
erating QAPs may be viewed as learning to speak in a
language environment, akin to how kids learn to talk
from their environment. On the other hand, learn-
ing to write well would require kids to receive for-
mal educations. This analogy motivates us to explore
machine-learning mechanisms that could mimic ex-
plicit rule learning.
To this end we devise a method using a tag-set
sequence to represent the pattern of a sentence, where
each tag set consists of a few language-feature tags for
the underlying word or phrase in the sentence. Given
a declarative sentence and a corresponding interrog-
ative sentence, we would like to learn the tag-set se-
quences for each of these sentences and derive the tag-
set sequence for the answer key, so that when we are
given a new declarative sentence as input, we could
generate a QAP by first searching for a learned tag-
set sequence that matches the tag-set sequence of the
input sentence, then use the learned tag-set sequence
of the corresponding interrogative sentence to map the
context of the input sentence to produce an interroga-
tive sentence and derive a correct answer.
We construct a general framework called TSS-
Learner (Tag-Set-Sequence Learner) to carry out this
learning task. We train a TSS-Learner model for
the English language over a small initial training
dataset using SRL (semantic-role-label), POS (part-
of-speech), and NER (named-entity-recognition)
tags, where each data entry consists of a well-written
declarative sentence and a well-written interrogative
sentence. We show that TSS-Learner can generate ad-
equate QAPs for certain texts on which TP3 generates
silly questions. Moreover, TSS-Learner can generate
efficiently a reasonable number of adequate QAPs.
arXiv:2210.11608v1 [cs.CL] 20 Oct 2022
On QAPs generated from the official SAT practice
reading tests, evaluations by human judges indicate
that 97% of the QAPs are both grammatically and se-
mantically correct.
The initial training dataset is not big enough to
contain tag-set sequences that match the tag-set se-
quences of a number of declarative sentences in the
passages of the SAT reading tests. We manually add
new interrogative sentences for some of these declar-
ative sentences, and show that TSS-Learner is able to
learn new rules and use them to generate additional
adequate QAPs.
The rest of the paper is organized as follows: We
summarize in Section 2 related work and present in
Section 3 a general framework of tag-set-sequence
learning. We then present in Section 4 an implemen-
tation of TSS-Learner for the English language and
describe evaluation results in Section 5. Section 6
concludes the paper.
2 RELATED WORK
Automatic question generation (QG), first studied by
Wolfe (Wolfe, 1976) as a means to aid independent
study, has attracted much research in two lines of
methodologies with a certain success; they are trans-
formative and generative methods.
2.1 Transformative Methods
Transformative methods transform key phrases from
a single declarative sentence into interrogative sen-
tences, including rule-based, semantics-based, and
template-based methods.
Rule-based methods parse sentences using a syn-
tactic parser to identify key phrases and transform a
sentence to an interrogative sentence based on syntac-
tic rules, including methods to identify key phrases
from input sentences and use syntactic rules for dif-
ferent types of questions (Varga and Ha, 2010), gen-
erate QAPs using a syntactic parser, a POS tagger,
and an NE analyzer (Ali et al., 2010), transform a
sentence into a set of interrogative sentences using a
series of domain-independent rules (Danon and Last,
2017), and generate questions using relative pronouns
and adverbs from complex English sentences (Khullar
et al., 2018).
Semantics-based methods create interrogative
sentences using predicate-argument structures and se-
mantic roles (Mannem et al., 2010), semantic pattern
recognition (Mazidi and Nielsen, 2014), subtopics
based on Latent Dirichlet Allocation (Chali and
Hasan, 2015), or semantic-role labeling (Flor and Ri-
ordan, 2018).
Template-based methods are used for special-
purpose applications with built-in templates, includ-
ing methods based on Natural Language Generation
Markup Language (NLGML) (Cai et al., 2006), on
phrase structure parsing and enhanced XML (Rus
et al., 2007), on self questioning (Mostow and Chen,
2009), on enhanced self-questioning (Chen, 2009), on
pattern matching and templates similar to NLGML
(Wyse and Piwek, 2009), on templates with place-
holder variables (Lindberg, 2013), and on semantics
turned to templates (Lindberg et al., 2013).
2.2 Generative methods
Recent advances of deep learning have shed new light
on generative methods. For example, the attention
mechanism (Luong et al., 2015) is used to determine
what content in a sentence should be asked, and the
sequence-to-sequence (Bahdanau et al., 2014; Cho
et al., 2014) and the long short-term memory (Sak
et al., 2014) mechanisms are used to generate each
word in an interrogative sentence (see, e.g., (Du et al.,
2017; Duan et al., 2017; Harrison and Walker, 2018;
Sachan and Xing, 2018)). These models, however,
only deal with question generations without generat-
ing correct answers. Moreover, training these models
require a dataset comprising over 100K interrogative
sentences.
To generate answers, researchers have explored
ways to encode a passage (a sentence or multiple sen-
tences) and an answer word (or a phrase) as input,
and determine what interrogative sentences are to be
generated for a given answer (Zhou et al., 2018; Zhao
et al., 2018; Song et al., 2018). Kim et al. (Kim et al.,
2019) pointed out that these models could generate a
number of answer-revealing questions (namely, ques-
tions contain in them the corresponding answers).
They then devised a new method by encoding an-
swers separately, at the expense of having substan-
tially more parameters. This method, however, suffers
from low accuracy, and it is also unknown whether the
generated interrogative sentences are grammatically
correct.
Recently, a new method is presented to perform
a downstream task of transformers with preprocess-
ing and postprocessing pipelines (TP3) for generating
QAPs (Zhang et al., 2022). They showed that TP3
using pretrained T5 models (Raffel et al., 2020) out-
performs previous models. Human evaluations also
confirm the high qualities of QAPs generated by this
method. However, TP3 may generate silly questions
for certain chunk of texts. This calls for further in-
摘要:

Tag-Set-SequenceLearningforGeneratingQuestion-AnswerPairsChengZhangandJieWangDepartmentofComputerScience,UniversityofMassachusetts,Lowell,MA01854,USAchengzhang@student.uml.edu,wang@cs.uml.eduKeywords:Tag-Set-SequenceLearning,Question-AnswerPairs,NaturalLanguageProcessingAbstract:Transformer-basedQGm...

展开>> 收起<<
Tag-Set-Sequence Learning for Generating Question-Answer Pairs Cheng Zhang and Jie Wang Department of Computer Science University of Massachusetts Lowell MA 01854 USA.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:199.54KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注