CONSISTENT Open-Ended Question Generation From News Articles Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1 1Department of Computer Science Columbia University

2025-05-02 0 0 1.38MB 15 页 10玖币
侵权投诉
CONSISTENT: Open-Ended Question Generation From News Articles
Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1
1Department of Computer Science, Columbia University
2The New York Times R&D
tuhin.chakr@cs.columbia.edu, justin@justintlewis.com, smara@cs.columbia.edu
Abstract
Recent work on question generation has
largely focused on factoid questions such as
who, what, where, when about basic facts.
Generating open-ended why, how, what, etc.
questions that require long-form answers have
proven more difficult. To facilitate the gen-
eration of open-ended questions, we propose
CONSISTENT, a new end-to-end system for
generating open-ended questions that are an-
swerable from and faithful to the input text.
Using news articles as a trustworthy foun-
dation for experimentation, we demonstrate
our model’s strength over several baselines us-
ing both automatic and human-based evalu-
ations. We contribute an evaluation dataset
of expert-generated open-ended questions.We
discuss potential downstream applications for
news media organizations.
1 Introduction
Factoid questions are relatively straightforward
questions that can be answered with single words
or short phrases (e.g. who, what, where, when).
However to obtain the central idea of a long piece
of text, one can ask an open-ended question (e.g.
why, how, what) (Cao and Wang,2021;Gao et al.,
2022), which can essentially be viewed as an ex-
treme summary of the text (Narayan et al.,2018)
in the form of a question. The ability to generate
such questions is particularly difficult because the
generated questions must be answerable from and
faithful to the given input text (see Table 1).
Answer-agnostic” (Du et al.,2017;Subrama-
nian et al.,2018;Scialom and Staiano,2020) or
Answer-aware” (Lewis et al.,2021;Song et al.,
2018;Zhao et al.,2018;Li et al.,2019) question
generation has gained focus in NLP but these ap-
proaches are usually trained by re-purposing ques-
tion answering datasets that are factual in nature
or trained with trivia-like factoid QA pair data sets
where answers are entities or short phrases.
Work done at The New York Times R&D
At the current rate of COVID-19 vaccination,
experts say, it will take months to change the
virus’s trajectory. In the short term, they worry
that the vaccine could present new risks if
newly immunized people start socializing
without taking precautions. It is not yet clear if
the vaccine protects against asymptomatic
infection, so vaccinated people may still be able
to spread the virus to others.
Seq2Seq Why are people so worried about the
COVID-19 virus?
Seq2Seq
+Control
Why is the current rate of vaccination
for COVID-19 so worrisome?
Table 1: Example of open ended questions requiring
long form answer generated by fine-tuning a Seq2Seq
model BART (Lewis et al.,2020) and by adding ex-
plicit control with salient n-grams
Prior work on long-form question answering
(LFQA) (Kwiatkowski et al.,2019a;Fan et al.,
2019) focuses on generating answers to open-ended
questions that require explanations. We argue that
these benchmarks can also be useful for genera-
tion of diverse, human-like open-ended question
requiring long form answer.
While question generation often helps in data
augmentation for training models (Lewis et al.,
2021;Pan et al.,2020), it can also help in possi-
ble downstream consumer applications (Section 7).
Leading news organizations often rely on human-
written QA-pairs for frequently asked questions
(FAQ) news tools (Figure 1) or as representative
headlines for news articles used in article recom-
mendation panels. As seen in Figure 1, a news arti-
cle about the likelihood of breakthrough infections
after Covid-19 vaccination can be summarized in
the form of representative question-answer pairs.
We propose a novel end-to-end system,
CON-
SISTENT
for generating open-ended questions
that are answerable from and faithful to the in-
put document. We fine-tune a state-of-the-art pre-
trained seq2seq model (Lewis et al.,2020) to gener-
arXiv:2210.11536v1 [cs.CL] 20 Oct 2022
Figure 1: Human-written question-answer pairs as seen
on a FAQ news tool about Covid-19 vaccination
ate open-ended questions conditioned on an input
paragraph. We further propose methods to ensure
better controllability and faithfulness for our gener-
ated questions by steering them towards salient key-
words in the paragraph which act as “control codes”
(Keskar et al.,2019). Well-formed generated ques-
tions can still be unanswerable. Prior work on us-
ing filtering methods (Lewis et al.,2021) to ensure
consistency is not possible for our task, owing to
increased answer length. Thus, we first rely on con-
fidence scores obtained from pre-trained question
answering models to filter out simple inconsistent
questions. We further evaluate answerability by
designing human-readable prompts to elicit judge-
ments for answerability from the T0pp model (Sanh
et al.,2021), which has shown good zero-shot per-
formance on several NLP benchmarks.
We release an evaluation dataset of 529 para-
graphs across diverse domains along with human
written open-ended questions. Empirical evalu-
ation using automatic metrics demonstrate that
our model is better than 5 baselines. Finally, ex-
pert evaluation of the top two performing systems
shows that our model is capable of generating high
quality, answerable open-ended questions spanning
diverse news topics (3.5 times better than a compet-
itive baseline: a (Lewis et al.,2020, BART) model
fine-tuned on an existing inquisitive questions-
answers dataset ELI5 (Fan et al.,2019, Explain
Like I’m Five). Our novel evaluation dataset, code
and models is made publicly available at 1.
1https://github.com/tuhinjubcse/OpenD
omainQuestionGeneration
2 Related Work
Question generation can primarily be answer-
aware or answer-agnostic. Prior work on Answer-
agnostic Question Generation (Du et al.,2017;Sub-
ramanian et al.,2018;Nakanishi et al.,2019;Wang
et al.,2019;Scialom et al.,2019) focuses on train-
ing models that can extract phrases or sentences
that are question-worthy and use this information
to generate better questions. Scialom and Staiano
(2020) paired questions with other sentences in
the article that do not contain the answers to gen-
erate curiosity-driven questions. However, these
approaches are trained by repurposing QA datasets
that are factual (Rajpurkar et al.,2016) or conversa-
tional (Reddy et al.,2019;Choi et al.,2018). Cao
and Wang (2021) focus on generating open-ended
questions from input consisting of multiple sen-
tences based on a question type ontology. Most
recently Ko et al. (2020) built question generation
models by fine-tuning generative language models
on 19K crowd-sourced inquisitive questions from
news articles. These questions are elicited from
readers as they naturally read through a document
sentence by sentence, are not required to be answer-
able from the given context or document.
Answer-Aware question generation models
(Lewis et al.,2021;Song et al.,2018;Zhao et al.,
2018;Li et al.,2019) typically encode a passage
P and an answer A letting the decoder generate a
question Q auto-regressively. These methods work
well in practice and have been shown to be improve
downstream QA performance. However despite
their efficacy, these methods emphasize simple fac-
toid questions whose answers are based on short
and straightforward spans. Previous work on gen-
erating clarification questions (Rao and Daumé III,
2019,2018;Majumder et al.,2021) uses questions
crawled from forums and product reviews. The
answers to the questions were used in the models
to improve the utility of the generated questions.
Our work is different from prior work in that we
focus on generating open-ended questions, which
require long-form answers, from news articles. Un-
like answer-aware question generation, where mod-
els ask a factoid question conditioned on an answer
span, our task is challenging as it requires compre-
hension of the larger context as well as the ability
to compress and represent the salient idea of the
passage in the form of a question.
3 Data
It’s springtime of the pandemic. After the trauma of the
last year, the quarantined are emerging into sunlight, and
beginning to navigate travel, classrooms and restaurants.
And they are discovering that when it comes to returning
to the old ways, many feel out of sorts. Do they shake
hands? Hug? With or without a mask?
How are people adapting to life after the pandemic?
Table 2: Examples of our evaluation data contain-
ing paragraphs from news articles with human written
questions. More in Table 9in Appendix A
Training Data
Most prior work has success-
fully trained models for question generation using
SQUAD (Rajpurkar et al.,2016), TriviaQA (Joshi
et al.,2017), or NQ (Kwiatkowski et al.,2019b)
datasets, the answers to which are typically short.
To account for the open-ended nature of our de-
sired questions, we rely on the ELI5 (Fan et al.,
2019, Explain Like I’m Five) dataset. The dataset
comprises 270K English-language threads in sim-
ple language from the Reddit forum of the same
name
2
, i.e easily comprehensible to someone with
minimal background knowledge.
Compared to existing datasets, ELI5 comprises
diverse questions requiring long-form answers.
It contains a significant number of open-ended
how/why questions. Interestingly, even what ques-
tions tend to require paragraph-length explanations
(What is the difference...). As seen in Table 8in
Appendix A, each question is open-ended, inquis-
itive and requires an answer that is descriptive in
nature. Finally, one of the advantages of the ELI5
dataset is that it covers diverse domains such as sci-
ence, health, and politics. This quality makes ELI5
an ideal candidate to transfer to the news domain,
which similarly covers a diverse range of topics.
Evaluation Data
Since our goal is to gener-
ate open-ended questions from news articles, we
specifically design our evaluation data to reflect
the same. To achieve this goal we obtain English-
language articles from The New York Times website
from January 2020 to June 2020. We obtained writ-
ten consent to use this content for research purposes
by the copyright holder. One of the additional ad-
vantages of crawling data from the The New York
Times website is that we can divide news articles by
domain, as each news article appears in a specific
2https://www.reddit.com/r/explainlike
imfive/
section of the website. From the given URL
3
, we
can tell that the article belongs to the Science do-
main. Additionally, as most pre-trained language
models were trained prior to the Covid-19 pan-
demic, we also test how well they generalize to
COVID-19 related news topics.
Each news article from a particular domain is
segmented into several paragraphs. We randomly
sample
529
paragraphs spanning six domains. This
includes 55 paragraphs from Science, 66 from Cli-
mate, 98 from Technology, 110 from Health, 100
from NYRegion, and 100 from Business. While
we understand that selecting standalone paragraphs
might sometimes ignore the greater context, or suf-
fer from co-reference issues, we carefully replace
any such paragraphs from our bigger pool.
As we do not have gold questions associated
with each paragraph, we crowd-source human-
written questions for each paragraph on Amazon
Mechanical Turk. Each paragraph is shown to a
distinct crowdworker who is then instructed to read
the paragraph carefully and write an open-ended
question that is answered by the entire passage. We
recruit 96 distinct crowd workers for this task. Af-
ter the questions are collected from first round of
crowd-sourcing, two expert news media employ-
ees approve or reject them based on quality. The
paragraphs with rejected questions are put up again
and through this iterative process and careful qual-
ity control we obtain one high quality open-ended
question associated with each paragraph. Table 2
and 9shows selected paragraphs from our evalua-
tion set and the associated human-generated open-
ended question.
4 CONSISTENT Model
The backbone of our approach is a fine-tuned
BART-large (Lewis et al.,2020) model on the ELI5
dataset of question-answer pairs. However, there
are two major factors to consider in our end-to-end
question generation pipeline. The generated ques-
tions i) must be relevant and factually consistent to
the input paragraph, and ii) must have the answer
self-contained in the input paragraph. Our CON-
SISTENT model (Figures 2and 3) addresses these
issues as described below.
Factual Consistency
To ensure faithfulness to
the input paragraph, we need to design our model
3https://www.nytimes.com/2021/12/10/s
cience/astronaut-wings-faa-bezos-musk.ht
ml
摘要:

CONSISTENT:Open-EndedQuestionGenerationFromNewsArticlesTuhinChakrabarty1JustinLewis2SmarandaMuresan11DepartmentofComputerScience,ColumbiaUniversity2TheNewYorkTimesR&Dtuhin.chakr@cs.columbia.edu,justin@justintlewis.com,smara@cs.columbia.eduAbstractRecentworkonquestiongenerationhaslargelyfocusedonfa...

展开>> 收起<<
CONSISTENT Open-Ended Question Generation From News Articles Tuhin Chakrabarty1Justin Lewis2Smaranda Muresan1 1Department of Computer Science Columbia University.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.38MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注