Closed-book Question Generation via Contrastive Learning Xiangjue Dong1 Jiaying Lu2 Jianling Wang1 James Caverlee1 1Texas AM University2Emory University

2025-04-24 0 0 484.19KB 13 页 10玖币
侵权投诉
Closed-book Question Generation via Contrastive Learning
Xiangjue Dong1, Jiaying Lu2
, Jianling Wang1
, James Caverlee1
1Texas A&M University, 2Emory University
{xj.dong, jlwang, caverlee}@tamu.edu, jiaying.lu@emory.edu
Abstract
Question Generation (QG) is a fundamental
NLP task for many downstream applications.
Recent studies on open-book QG, where sup-
portive answer-context pairs are provided to
models, have achieved promising progress.
However, generating natural questions under a
more practical closed-book setting that lacks
these supporting documents still remains a
challenge. In this work, we propose a new
QG model for this closed-book setting that
is designed to better understand the seman-
tics of long-form abstractive answers and store
more information in its parameters through
contrastive learning and an answer reconstruc-
tion module. Through experiments, we vali-
date the proposed QG model on both public
datasets and a new WikiCQA dataset. Empir-
ical results show that the proposed QG model
outperforms baselines in both automatic eval-
uation and human evaluation. In addition, we
show how to leverage the proposed model to
improve existing question-answering systems.
These results further indicate the effectiveness
of our QG model for enhancing closed-book
question-answering tasks.
1 Introduction
Question Generation (QG) has a wide range of
applications, such as generating questions for ex-
ams (Jia et al.,2021;Lelkes et al.,2021;Dugan
et al.,2022) or children’s story books (Zhao et al.,
2022;Yao et al.,2022), recommending questions
for users in a dialogue system (Shukla et al.,2019;
Laban et al.,2020), improving visual (Li et al.,
2018;Lu et al.,2022) or textual question-answering
tasks (Duan et al.,2017;Lewis et al.,2019a;Zhang
and Bansal,2019;Sultan et al.,2020;Lyu et al.,
2021), asking clarification questions (Rao and
Daumé III,2019;Yu et al.,2020;Ren et al.,2021),
and generating queries for SQL (Wu et al.,2021)
or multimodal documents (Kim et al.,2021).
Equal Contribution
Previous works on QG are mainly under the open-
book setting, which aims to generate questions
based on factoid or human-generated short an-
swers under the assumption that there is access
to external knowledge like retrieved documents
or passages (Du et al.,2017;Zhao et al.,2018;
Kim et al.,2019;Fei et al.,2021). After Roberts
et al. (2020) demonstrated that feeding a large
pre-trained model input questions alone without
any external knowledge can lead to competitive re-
sults with retrieval-based methods on open-domain
question-answering benchmarks, there is an in-
creasing interest in the closed-book setting. This
closed-book setting is appealing in practice and
can be widely applied, e.g., in question sugges-
tion (Laban et al.,2020;Yin et al.,2021), query
recommendation (Kim et al.,2021), and other prac-
tical settings where extensive external knowledge
is unavailable.
However, generating questions without access
to such external knowledge is challenging for two
key reasons. First, without access to retrieved doc-
uments (or passages), simple open-domain strate-
gies like basing the answers on these documents (or
passages) are not possible under the closed-book
setting. Instead, models must rely on the answers
alone. Second, the data used by most of the closed-
book works (Lewis et al.,2021;Wang et al.,2021)
are variants of existing open-domain datasets, e.g.,
SQuAD (Rajpurkar et al.,2018), TriviaQA (Joshi
et al.,2017), WebQuestions (Berant et al.,2013)
that ignore the answer-related passages. These an-
swers in open-book works are usually short, e.g.,
entities, and easier to be remembered by the lan-
guage model and stored in the parameters of the
model than long-form answers. Thus, this leads
to our motivating research question – How can we
empower a QG model to better understand the se-
mantics of long-form abstractive answers and store
more information in its parameters?
To tackle the aforementioned challenges existing
arXiv:2210.06781v2 [cs.CL] 10 Feb 2023
in the closed-book setting, this paper proposes a
new QG model with two unique characteristics: (i)
a contrastive learning loss designed to better under-
stand the semantics of the answers and the seman-
tic relationship between answers and ground-truth
questions at a contextual-level; and (ii) an answer
reconstruction loss designed to measure the an-
swerability of the generated question. Contrastive
learning has shown promising results in many NLP
tasks, e.g., (Giorgi et al.,2021;Gao et al.,2021;
Yang et al.,2021) and aligns positive pairs bet-
ter with available supervised signals (Gao et al.,
2021); here we show how to learn question rep-
resentations by distinguishing features of correct
question-answer pairs from features of incorrectly
linked question-answer pairs. Further, to ensure the
generated questions are of good quality and can be
answered by the answer that is used for question
generation, we frame the model as a generation-
reconstruction process (Cao et al.,2019;Zhu et al.,
2020), by predicting the original answers given
the generated questions by a pre-trained seq2seq
model. In addition, we introduce a new closed-
book dataset with long-form abstractive answers –
WikiCQA – to complement existing datasets like
GooAQ (Khashabi et al.,2021) and ELI5 (Fan
et al.,2019) and show how to leverage our model
to generate synthetic data to improve closed-book
question-answering tasks.
Through experiments, we find that the proposed
QG model shows improvement through both auto-
matic and human evaluation metrics on WikiCQA
and two public datasets. Compared to the base-
line, the proposed QG framework shows an im-
provement of up to 2.0%, 2.7%, and 1.8% on
the ROUGE-L score on WikiCQA, GooAQ-S, and
ELI5, respectively, and 1.3% and 2.6% in terms of
relevance and correctness. Furthermore, we lever-
age the QG framework to generate synthetic QA
data from WikiHow summary data and pre-train
a closed-book QA model on it in both an unsu-
pervised and semi-supervised setting. The perfor-
mance is evaluated on both seen (WikiCQA) and
unseen (GooAQ-S, ELI5) datasets. We find consis-
tent improvements across these datasets, indicating
the QG model’s effectiveness in enhancing closed-
book question-answering tasks.
In conclusion, our contributions can be summa-
rized as follows:
We propose a contrastive QG model, which to
our knowledge is the first work to explore con-
trastive learning for QG under a closed-book
setting.
The proposed model outperforms baselines on
three datasets. The human evaluation also indi-
cates that the questions generated by our model
are more informative compared to other base-
lines.
We leverage the QG model as a data augmenta-
tion strategy to generate large-scale QA pairs.
Consistent improvements shown on both seen
datasets and unseen datasets indicate the QG
model’s effectiveness in enhancing closed-book
question-answering tasks.
2 Related Work
Many previous works on QG are under the open-
book setting, which takes factoid short answers (Ra-
jpurkar et al.,2016) or human-generated short
answers (Koˇ
ciský et al.,2018) with the corre-
sponding passages to generate questions (Zhang
et al.,2021). Early approaches for question genera-
tion rely on rule-based methods (Labutov et al.,
2015;Khullar et al.,2018). To bypass hand-
crafted rules and sophisticated pipelines in QG,
Du et al. (2017) introduce a vanilla RNN-based
sequence-to-sequence approach with an attention
mechanism. The recently proposed pre-trained
transformer-based frameworks (Lewis et al.,2020;
Raffel et al.,2020) also improve the performance
of QG. In addition, Sultan et al. (2020) shows that
the lexical and factual diversity of QG provides
better QA training. However, their success can not
directly adapt to the closed-book setting, where
the model is supposed to generate questions solely
relying on answers. In this work, we explore the
widely applicable closed-book QG setting, which
is still under-explored.
Contrastive Learning
aims to pull semantically
similar neighbors close and push non-neighbors
apart. It has achieved great success under both
supervised and unsupervised settings. In pioneer
works, the contrastive loss function (Hadsell et al.,
2006;Chopra et al.,2005) has been proposed as a
training objective in deep metric learning consid-
ering both similar and dissimilar pairs. Recently,
Chen et al. (2020) proposes the SimCLR frame-
work to learn useful visual representations. View-
ing contrastive learning as dictionary look-up, He
et al. (2020) present Momentum Contrast (MoCo)
to build dynamic dictionaries for contrastive learn-
ing. Some works apply contrastive learning into
the NLP domain to learn better sentence repre-
sentations (Giorgi et al.,2021;Gao et al.,2021).
In addition, contrastive learning has been applied
in multilingual neural machine translation (Pan
et al.,2021), abstractive summarization (Liu and
Liu,2021), and multi-document question genera-
tion (Cho et al.,2021). The recent most relevant
work is (Yang et al.,2021), where they design two
contrastive losses for paraphrase generation. In this
work, we adopt contrastive learning for improv-
ing representation learning in question generation
under a closed-book setting.
3 Proposed Approach
To answer our research question – How can we
empower a QG model to better understand the se-
mantics of long-form abstractive answers and store
more information in its parameters? – we propose a
closed-book QG model, which generates questions
directly without access to external knowledge. For-
mally, given an answer sentence
x
, a closed-book
QG engine generates a natural question
y
. Fig-
ure 1illustrates an overview of the proposed QG
framework, which consists of three parts: question
generation, contrastive learning, and answer recon-
struction. The framework is optimized with the
joint losses from these three parts simultaneously.
A1
E
Ai
DReconstructor
Gumbel
Softmax
An
E
Q1
Qi
Qn
Figure 1: An overview of the proposed closed-book
QG framework, which consists of three parts: con-
trastive learning, question generation, and answer con-
struction. Ai: represents answer i; Qirepresents ques-
tion i.
3.1 Question Generation
We first focus on question generation through a
sequence-to-sequence architecture which consists
of an encoder and a decoder (Sutskever et al.,2014;
Vaswani et al.,2017). The encoder takes an input
sequence of source words
x= (x1, x2, . . . , xn)
and maps it to a sequence of continuous represen-
tations
z= (z1, z2, . . . , zn)
. Then, the decoder
takes
z
and generates a sequence of target words
y= (y1, y2, . . . , ym)
at a time. The closed-book
QG task is defined as finding ˆ
y:
ˆ
y= arg max
y
P(y|x),(1)
where
P(y|x)
is the conditional likelihood of the
predicted question sequence ygiven answer x.
P(y|x) =
T
Y
i=1
p(yt|y<t,x),(2)
Given the answer-question pairs, the training ob-
jective of the generation part in the proposed frame-
work is to minimize the Negative Log-Likelihood
(NLL) of the training data,
Lqg =
N
X
i=1
log p(qi|A),(3)
where
qi
is the
i
-th token in the generated question
and Ais the answer.
A naive question generation model will generate
questions based on answers but lacks a rich model
of the semantics of answers nor can it guarantee the
generated questions have a semantic relationship
with the answers. Intuitively, an encoded answer
should be similar to its question and dissimilar to
others. In addition, the generated question should
be able to be answered by the answers. Hence, this
motivates the following contrastive learning and
answer reconstruction modules.
3.2 Contrastive Learning
Contrastive learning aims to pull positive pairs and
push apart negative pairs to learn effective represen-
tations. Further, the supervised signals can produce
better sentence embeddings by improving align-
ment between positive pairs (Chen et al.,2020).
An effective QG model should be able to under-
stand the semantics of the answers and the semantic
relationship with the ground-truth questions. Espe-
cially, the encoded answer should have semantic
similarity with its ground-truth question and dis-
similarity with other questions. Thus, aiming to
learn a similarity function that pulls the distance
between the answer sequence representation and
its ground-truth question sequence representation
closer, we design a contrastive loss in the repre-
sentation space. Specifically, given a positive pair
S={(xi,yi)}n
i=1
, where
xi
and
yi
are semanti-
cally related inputs, the other
2(n1)
examples
摘要:

Closed-bookQuestionGenerationviaContrastiveLearningXiangjueDong1,JiayingLu2,JianlingWang1,JamesCaverlee11TexasA&MUniversity,2EmoryUniversity{xj.dong,jlwang,caverlee}@tamu.edu,jiaying.lu@emory.eduAbstractQuestionGeneration(QG)isafundamentalNLPtaskformanydownstreamapplications.Recentstudiesonopen-bo...

展开>> 收起<<
Closed-book Question Generation via Contrastive Learning Xiangjue Dong1 Jiaying Lu2 Jianling Wang1 James Caverlee1 1Texas AM University2Emory University.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:484.19KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注