CONSISTENT Open-Ended Question Generation From News Articles Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1 1Department of Computer Science Columbia University

2025-05-02 0 0 1.38MB 15 页 10玖币

侵权投诉

CONSISTENT: Open-Ended Question Generation From News Articles

Tuhin Chakrabarty1∗Justin Lewis2∗Smaranda Muresan1

1Department of Computer Science, Columbia University

2The New York Times R&D

tuhin.chakr@cs.columbia.edu, justin@justintlewis.com, smara@cs.columbia.edu

Abstract

Recent work on question generation has

largely focused on factoid questions such as

who, what, where, when about basic facts.

Generating open-ended why, how, what, etc.

questions that require long-form answers have

proven more difﬁcult. To facilitate the gen-

eration of open-ended questions, we propose

CONSISTENT, a new end-to-end system for

generating open-ended questions that are an-

swerable from and faithful to the input text.

Using news articles as a trustworthy foun-

dation for experimentation, we demonstrate

our model’s strength over several baselines us-

ing both automatic and human-based evalu-

ations. We contribute an evaluation dataset

of expert-generated open-ended questions.We

discuss potential downstream applications for

news media organizations.

1 Introduction

Factoid questions are relatively straightforward

questions that can be answered with single words

or short phrases (e.g. who, what, where, when).

However to obtain the central idea of a long piece

of text, one can ask an open-ended question (e.g.

why, how, what) (Cao and Wang,2021;Gao et al.,

2022), which can essentially be viewed as an ex-

treme summary of the text (Narayan et al.,2018)

in the form of a question. The ability to generate

such questions is particularly difﬁcult because the

generated questions must be answerable from and

faithful to the given input text (see Table 1).

“Answer-agnostic” (Du et al.,2017;Subrama-

nian et al.,2018;Scialom and Staiano,2020) or

“Answer-aware” (Lewis et al.,2021;Song et al.,

2018;Zhao et al.,2018;Li et al.,2019) question

generation has gained focus in NLP but these ap-

proaches are usually trained by re-purposing ques-

tion answering datasets that are factual in nature

or trained with trivia-like factoid QA pair data sets

where answers are entities or short phrases.

∗Work done at The New York Times R&D

At the current rate of COVID-19 vaccination,

experts say, it will take months to change the

virus’s trajectory. In the short term, they worry

that the vaccine could present new risks if

newly immunized people start socializing

without taking precautions. It is not yet clear if

the vaccine protects against asymptomatic

infection, so vaccinated people may still be able

to spread the virus to others.

Seq2Seq Why are people so worried about the

COVID-19 virus?

Seq2Seq

+Control

Why is the current rate of vaccination

for COVID-19 so worrisome?

Table 1: Example of open ended questions requiring

long form answer generated by ﬁne-tuning a Seq2Seq

model BART (Lewis et al.,2020) and by adding ex-

plicit control with salient n-grams

Prior work on long-form question answering

(LFQA) (Kwiatkowski et al.,2019a;Fan et al.,

2019) focuses on generating answers to open-ended

questions that require explanations. We argue that

these benchmarks can also be useful for genera-

tion of diverse, human-like open-ended question

requiring long form answer.

While question generation often helps in data

augmentation for training models (Lewis et al.,

2021;Pan et al.,2020), it can also help in possi-

ble downstream consumer applications (Section 7).

Leading news organizations often rely on human-

written QA-pairs for frequently asked questions

(FAQ) news tools (Figure 1) or as representative

headlines for news articles used in article recom-

mendation panels. As seen in Figure 1, a news arti-

cle about the likelihood of breakthrough infections

after Covid-19 vaccination can be summarized in

the form of representative question-answer pairs.

We propose a novel end-to-end system,

CON-

SISTENT

for generating open-ended questions

that are answerable from and faithful to the in-

put document. We ﬁne-tune a state-of-the-art pre-

trained seq2seq model (Lewis et al.,2020) to gener-

arXiv:2210.11536v1 [cs.CL] 20 Oct 2022

Figure 1: Human-written question-answer pairs as seen

on a FAQ news tool about Covid-19 vaccination

ate open-ended questions conditioned on an input

paragraph. We further propose methods to ensure

better controllability and faithfulness for our gener-

ated questions by steering them towards salient key-

words in the paragraph which act as “control codes”

(Keskar et al.,2019). Well-formed generated ques-

tions can still be unanswerable. Prior work on us-

ing ﬁltering methods (Lewis et al.,2021) to ensure

consistency is not possible for our task, owing to

increased answer length. Thus, we ﬁrst rely on con-

ﬁdence scores obtained from pre-trained question

answering models to ﬁlter out simple inconsistent

questions. We further evaluate answerability by

designing human-readable prompts to elicit judge-

ments for answerability from the T0pp model (Sanh

et al.,2021), which has shown good zero-shot per-

formance on several NLP benchmarks.

We release an evaluation dataset of 529 para-

graphs across diverse domains along with human

written open-ended questions. Empirical evalu-

ation using automatic metrics demonstrate that

our model is better than 5 baselines. Finally, ex-

pert evaluation of the top two performing systems

shows that our model is capable of generating high

quality, answerable open-ended questions spanning

diverse news topics (3.5 times better than a compet-

itive baseline: a (Lewis et al.,2020, BART) model

ﬁne-tuned on an existing inquisitive questions-

answers dataset ELI5 (Fan et al.,2019, Explain

Like I’m Five). Our novel evaluation dataset, code

and models is made publicly available at 1.

1https://github.com/tuhinjubcse/OpenD

omainQuestionGeneration

2 Related Work

Question generation can primarily be answer-

aware or answer-agnostic. Prior work on Answer-

agnostic Question Generation (Du et al.,2017;Sub-

ramanian et al.,2018;Nakanishi et al.,2019;Wang

et al.,2019;Scialom et al.,2019) focuses on train-

ing models that can extract phrases or sentences

that are question-worthy and use this information

to generate better questions. Scialom and Staiano

(2020) paired questions with other sentences in

the article that do not contain the answers to gen-

erate curiosity-driven questions. However, these

approaches are trained by repurposing QA datasets

that are factual (Rajpurkar et al.,2016) or conversa-

tional (Reddy et al.,2019;Choi et al.,2018). Cao

and Wang (2021) focus on generating open-ended

questions from input consisting of multiple sen-

tences based on a question type ontology. Most

recently Ko et al. (2020) built question generation

models by ﬁne-tuning generative language models

on 19K crowd-sourced inquisitive questions from

news articles. These questions are elicited from

readers as they naturally read through a document

sentence by sentence, are not required to be answer-

able from the given context or document.

Answer-Aware question generation models

(Lewis et al.,2021;Song et al.,2018;Zhao et al.,

2018;Li et al.,2019) typically encode a passage

P and an answer A letting the decoder generate a

question Q auto-regressively. These methods work

well in practice and have been shown to be improve

downstream QA performance. However despite

their efﬁcacy, these methods emphasize simple fac-

toid questions whose answers are based on short

and straightforward spans. Previous work on gen-

erating clariﬁcation questions (Rao and Daumé III,

2019,2018;Majumder et al.,2021) uses questions

crawled from forums and product reviews. The

answers to the questions were used in the models

to improve the utility of the generated questions.

Our work is different from prior work in that we

focus on generating open-ended questions, which

require long-form answers, from news articles. Un-

like answer-aware question generation, where mod-

els ask a factoid question conditioned on an answer

span, our task is challenging as it requires compre-

hension of the larger context as well as the ability

to compress and represent the salient idea of the

passage in the form of a question.

3 Data

It’s springtime of the pandemic. After the trauma of the

last year, the quarantined are emerging into sunlight, and

beginning to navigate travel, classrooms and restaurants.

And they are discovering that when it comes to returning

to the old ways, many feel out of sorts. Do they shake

hands? Hug? With or without a mask?

How are people adapting to life after the pandemic?

Table 2: Examples of our evaluation data contain-

ing paragraphs from news articles with human written

questions. More in Table 9in Appendix A

Training Data

Most prior work has success-

fully trained models for question generation using

SQUAD (Rajpurkar et al.,2016), TriviaQA (Joshi

et al.,2017), or NQ (Kwiatkowski et al.,2019b)

datasets, the answers to which are typically short.

To account for the open-ended nature of our de-

sired questions, we rely on the ELI5 (Fan et al.,

2019, Explain Like I’m Five) dataset. The dataset

comprises 270K English-language threads in sim-

ple language from the Reddit forum of the same

name

, i.e easily comprehensible to someone with

minimal background knowledge.

Compared to existing datasets, ELI5 comprises

diverse questions requiring long-form answers.

It contains a signiﬁcant number of open-ended

how/why questions. Interestingly, even what ques-

tions tend to require paragraph-length explanations

(What is the difference...). As seen in Table 8in

Appendix A, each question is open-ended, inquis-

itive and requires an answer that is descriptive in

nature. Finally, one of the advantages of the ELI5

dataset is that it covers diverse domains such as sci-

ence, health, and politics. This quality makes ELI5

an ideal candidate to transfer to the news domain,

which similarly covers a diverse range of topics.

Evaluation Data

Since our goal is to gener-

ate open-ended questions from news articles, we

speciﬁcally design our evaluation data to reﬂect

the same. To achieve this goal we obtain English-

language articles from The New York Times website

from January 2020 to June 2020. We obtained writ-

ten consent to use this content for research purposes

by the copyright holder. One of the additional ad-

vantages of crawling data from the The New York

Times website is that we can divide news articles by

domain, as each news article appears in a speciﬁc

2https://www.reddit.com/r/explainlike

imfive/

section of the website. From the given URL

, we

can tell that the article belongs to the Science do-

main. Additionally, as most pre-trained language

models were trained prior to the Covid-19 pan-

demic, we also test how well they generalize to

COVID-19 related news topics.

Each news article from a particular domain is

segmented into several paragraphs. We randomly

sample

529

paragraphs spanning six domains. This

includes 55 paragraphs from Science, 66 from Cli-

mate, 98 from Technology, 110 from Health, 100

from NYRegion, and 100 from Business. While

we understand that selecting standalone paragraphs

might sometimes ignore the greater context, or suf-

fer from co-reference issues, we carefully replace

any such paragraphs from our bigger pool.

As we do not have gold questions associated

with each paragraph, we crowd-source human-

written questions for each paragraph on Amazon

Mechanical Turk. Each paragraph is shown to a

distinct crowdworker who is then instructed to read

the paragraph carefully and write an open-ended

question that is answered by the entire passage. We

recruit 96 distinct crowd workers for this task. Af-

ter the questions are collected from ﬁrst round of

crowd-sourcing, two expert news media employ-

ees approve or reject them based on quality. The

paragraphs with rejected questions are put up again

and through this iterative process and careful qual-

ity control we obtain one high quality open-ended

question associated with each paragraph. Table 2

and 9shows selected paragraphs from our evalua-

tion set and the associated human-generated open-

ended question.

4 CONSISTENT Model

The backbone of our approach is a ﬁne-tuned

BART-large (Lewis et al.,2020) model on the ELI5

dataset of question-answer pairs. However, there

are two major factors to consider in our end-to-end

question generation pipeline. The generated ques-

tions i) must be relevant and factually consistent to

the input paragraph, and ii) must have the answer

self-contained in the input paragraph. Our CON-

SISTENT model (Figures 2and 3) addresses these

issues as described below.

Factual Consistency

To ensure faithfulness to

the input paragraph, we need to design our model

3https://www.nytimes.com/2021/12/10/s

cience/astronaut-wings-faa-bezos-musk.ht

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CONSISTENT:Open-EndedQuestionGenerationFromNewsArticlesTuhinChakrabarty1JustinLewis2SmarandaMuresan11DepartmentofComputerScience,ColumbiaUniversity2TheNewYorkTimesR&Dtuhin.chakr@cs.columbia.edu,justin@justintlewis.com,smara@cs.columbia.eduAbstractRecentworkonquestiongenerationhaslargelyfocusedonfa...

展开>> 收起<<

CONSISTENT Open-Ended Question Generation From News Articles Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1 1Department of Computer Science Columbia University.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CONSISTENT Open-Ended Question Generation From News Articles Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1 1Department of Computer Science Columbia University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: