Tag-Set-Sequence Learning for Generating Question-Answer Pairs Cheng Zhang and Jie Wang Department of Computer Science University of Massachusetts Lowell MA 01854 USA

2025-05-02 1 0 199.54KB 10 页 10玖币

侵权投诉

Tag-Set-Sequence Learning for Generating Question-Answer Pairs

Cheng Zhang and Jie Wang

Department of Computer Science, University of Massachusetts, Lowell, MA 01854, USA

cheng zhang@student.uml.edu, wang@cs.uml.edu

Keywords: Tag-Set-Sequence Learning, Question-Answer Pairs, Natural Language Processing

Abstract: Transformer-based QG models can generate question-answer pairs (QAPs) with high qualities, but may also

generate silly questions for certain texts. We present a new method called tag-set sequence learning to tackle

this problem, where a tag-set sequence is a sequence of tag sets to capture the syntactic and semantic infor-

mation of the underlying sentence, and a tag set consists of one or more language feature tags, including,

for example, semantic-role-labeling, part-of-speech, named-entity-recognition, and sentiment-indication tags.

We construct a system called TSS-Learner to learn tag-set sequences from given declarative sentences and the

corresponding interrogative sentences, and derive answers to the latter. We train a TSS-Learner model for the

English language using a small training dataset and show that it can indeed generate adequate QAPs for cer-

tain texts that transformer-based models do poorly. Human evaluation on the QAPs generated by TSS-Learner

over SAT practice reading tests is encouraging.

1 INTRODUCTION

Multiple-choice questions (MCQs) are often used to

assess if students understand the main points of a

given article. An MCQ consists of an interrogative

sentence, a correct answer (aka. answer key), and a

number of distractors. A QAP is an interrogative sen-

tence and its answer key. We study how to generate

QAPs from declarative sentences.

The coverage of the main points of an article may

be obtained by selecting important declarative sen-

tences using a sentence ranking algorithm such as

CNATAR (Contextual Network and Topic Analysis

Rank) (Zhang et al., 2021) on the article. QAPs may

be generated by applying a text-to-text transformer on

a declarative sentence, a chosen answer key, and the

surrounding chunk of sentences using, for example,

TP3 (Transformer with Preprocessing and Postpro-

cessing Pipelines) (Zhang et al., 2022), which gener-

ates interrogative sentences for the answer keys with

much higher success rates over previous methods.

TP3, however, may generate silly QAPs for certain

chunks of texts.

Training a deep learning model like TP3 for gen-

erating QAPs may be viewed as learning to speak in a

language environment, akin to how kids learn to talk

from their environment. On the other hand, learn-

ing to write well would require kids to receive for-

mal educations. This analogy motivates us to explore

machine-learning mechanisms that could mimic ex-

plicit rule learning.

To this end we devise a method using a tag-set

sequence to represent the pattern of a sentence, where

each tag set consists of a few language-feature tags for

the underlying word or phrase in the sentence. Given

a declarative sentence and a corresponding interrog-

ative sentence, we would like to learn the tag-set se-

quences for each of these sentences and derive the tag-

set sequence for the answer key, so that when we are

given a new declarative sentence as input, we could

generate a QAP by ﬁrst searching for a learned tag-

set sequence that matches the tag-set sequence of the

input sentence, then use the learned tag-set sequence

of the corresponding interrogative sentence to map the

context of the input sentence to produce an interroga-

tive sentence and derive a correct answer.

We construct a general framework called TSS-

Learner (Tag-Set-Sequence Learner) to carry out this

learning task. We train a TSS-Learner model for

the English language over a small initial training

dataset using SRL (semantic-role-label), POS (part-

of-speech), and NER (named-entity-recognition)

tags, where each data entry consists of a well-written

declarative sentence and a well-written interrogative

sentence. We show that TSS-Learner can generate ad-

equate QAPs for certain texts on which TP3 generates

silly questions. Moreover, TSS-Learner can generate

efﬁciently a reasonable number of adequate QAPs.

arXiv:2210.11608v1 [cs.CL] 20 Oct 2022

On QAPs generated from the ofﬁcial SAT practice

reading tests, evaluations by human judges indicate

that 97% of the QAPs are both grammatically and se-

mantically correct.

The initial training dataset is not big enough to

contain tag-set sequences that match the tag-set se-

quences of a number of declarative sentences in the

passages of the SAT reading tests. We manually add

new interrogative sentences for some of these declar-

ative sentences, and show that TSS-Learner is able to

learn new rules and use them to generate additional

adequate QAPs.

The rest of the paper is organized as follows: We

summarize in Section 2 related work and present in

Section 3 a general framework of tag-set-sequence

learning. We then present in Section 4 an implemen-

tation of TSS-Learner for the English language and

describe evaluation results in Section 5. Section 6

concludes the paper.

2 RELATED WORK

Automatic question generation (QG), ﬁrst studied by

Wolfe (Wolfe, 1976) as a means to aid independent

study, has attracted much research in two lines of

methodologies with a certain success; they are trans-

formative and generative methods.

2.1 Transformative Methods

Transformative methods transform key phrases from

a single declarative sentence into interrogative sen-

tences, including rule-based, semantics-based, and

template-based methods.

Rule-based methods parse sentences using a syn-

tactic parser to identify key phrases and transform a

sentence to an interrogative sentence based on syntac-

tic rules, including methods to identify key phrases

from input sentences and use syntactic rules for dif-

ferent types of questions (Varga and Ha, 2010), gen-

erate QAPs using a syntactic parser, a POS tagger,

and an NE analyzer (Ali et al., 2010), transform a

sentence into a set of interrogative sentences using a

series of domain-independent rules (Danon and Last,

2017), and generate questions using relative pronouns

and adverbs from complex English sentences (Khullar

et al., 2018).

Semantics-based methods create interrogative

sentences using predicate-argument structures and se-

mantic roles (Mannem et al., 2010), semantic pattern

recognition (Mazidi and Nielsen, 2014), subtopics

based on Latent Dirichlet Allocation (Chali and

Hasan, 2015), or semantic-role labeling (Flor and Ri-

ordan, 2018).

Template-based methods are used for special-

purpose applications with built-in templates, includ-

ing methods based on Natural Language Generation

Markup Language (NLGML) (Cai et al., 2006), on

phrase structure parsing and enhanced XML (Rus

et al., 2007), on self questioning (Mostow and Chen,

2009), on enhanced self-questioning (Chen, 2009), on

pattern matching and templates similar to NLGML

(Wyse and Piwek, 2009), on templates with place-

holder variables (Lindberg, 2013), and on semantics

turned to templates (Lindberg et al., 2013).

2.2 Generative methods

Recent advances of deep learning have shed new light

on generative methods. For example, the attention

mechanism (Luong et al., 2015) is used to determine

what content in a sentence should be asked, and the

sequence-to-sequence (Bahdanau et al., 2014; Cho

et al., 2014) and the long short-term memory (Sak

et al., 2014) mechanisms are used to generate each

word in an interrogative sentence (see, e.g., (Du et al.,

2017; Duan et al., 2017; Harrison and Walker, 2018;

Sachan and Xing, 2018)). These models, however,

only deal with question generations without generat-

ing correct answers. Moreover, training these models

require a dataset comprising over 100K interrogative

sentences.

To generate answers, researchers have explored

ways to encode a passage (a sentence or multiple sen-

tences) and an answer word (or a phrase) as input,

and determine what interrogative sentences are to be

generated for a given answer (Zhou et al., 2018; Zhao

et al., 2018; Song et al., 2018). Kim et al. (Kim et al.,

2019) pointed out that these models could generate a

number of answer-revealing questions (namely, ques-

tions contain in them the corresponding answers).

They then devised a new method by encoding an-

swers separately, at the expense of having substan-

tially more parameters. This method, however, suffers

from low accuracy, and it is also unknown whether the

generated interrogative sentences are grammatically

correct.

Recently, a new method is presented to perform

a downstream task of transformers with preprocess-

ing and postprocessing pipelines (TP3) for generating

QAPs (Zhang et al., 2022). They showed that TP3

using pretrained T5 models (Raffel et al., 2020) out-

performs previous models. Human evaluations also

conﬁrm the high qualities of QAPs generated by this

method. However, TP3 may generate silly questions

for certain chunk of texts. This calls for further in-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Tag-Set-SequenceLearningforGeneratingQuestion-AnswerPairsChengZhangandJieWangDepartmentofComputerScience,UniversityofMassachusetts,Lowell,MA01854,USAchengzhang@student.uml.edu,wang@cs.uml.eduKeywords:Tag-Set-SequenceLearning,Question-AnswerPairs,NaturalLanguageProcessingAbstract:Transformer-basedQGm...

展开>> 收起<<

Tag-Set-Sequence Learning for Generating Question-Answer Pairs Cheng Zhang and Jie Wang Department of Computer Science University of Massachusetts Lowell MA 01854 USA.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tag-Set-Sequence Learning for Generating Question-Answer Pairs Cheng Zhang and Jie Wang Department of Computer Science University of Massachusetts Lowell MA 01854 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: