QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation Zhenrui Yue

2025-05-02 0 0 630.93KB 14 页 10玖币
侵权投诉
QA Domain Adaptation using Hidden Space Augmentation and
Self-Supervised Contrastive Adaptation
Zhenrui Yue
UIUC
zhenrui3@illinois.edu
Huimin Zeng
UIUC
huiminz3@illinois.edu
Bernhard Kratzwald
EthonAI
bernhard.kratzwald@ethon.ai
Stefan Feuerriegel
LMU Munich
feuerriegel@lmu.de
Dong Wang
UIUC
dwang24@illinois.edu
Abstract
Question answering (QA) has recently shown
impressive results for answering questions
from customized domains. Yet, a common
challenge is to adapt QA models to an un-
seen target domain. In this paper, we pro-
pose a novel self-supervised framework called
QADA for QA domain adaptation. QADA in-
troduces a novel data augmentation pipeline
used to augment training QA samples. Dif-
ferent from existing methods, we enrich the
samples via hidden space augmentation. For
questions, we introduce multi-hop synonyms
and sample augmented token embeddings with
Dirichlet distributions. For contexts, we de-
velop an augmentation method which learns
to drop context spans via a custom attentive
sampling strategy. Additionally, contrastive
learning is integrated in the proposed self-
supervised adaptation framework QADA. Un-
like existing approaches, we generate pseudo
labels and propose to train the model via a
novel attention-based contrastive adaptation
method. The attention weights are used to
build informative features for discrepancy es-
timation that helps the QA model separate an-
swers and generalize across source and tar-
get domains. To the best of our knowledge,
our work is the first to leverage hidden space
augmentation and attention-based contrastive
adaptation for self-supervised domain adapta-
tion in QA. Our evaluation shows that QADA
achieves considerable improvements on mul-
tiple target datasets over state-of-the-art base-
lines in QA domain adaptation.
1 Introduction
Question answering (QA) is the task of finding
answers for a given context and a given question.
QA models are typically trained using data triplets
consisting of context, question and answer. In the
case of extractive QA, answers are represented as
subspans in the context defined by a start position
Both authors contributed equally to this research.
and an end position, while question and context are
given as running text (e.g., Seo et al.,2016;Chen
et al.,2017;Devlin et al.,2019;Kratzwald et al.,
2019).
A common challenge in extractive QA is that QA
models often suffer from performance deterioration
upon deployment and thus make mistakes for user-
generated inputs. The underlying reason for such
deterioration can be traced back to the domain shift
between training data (from the source domain)
and test data (from the target domain) (Fisch et al.,
2019;Miller et al.,2020;Zeng et al.,2022b).
Existing approaches to address domain shifts
in extractive QA can be grouped as follows. One
approach is to include labeled target examples or
user feedback during training (Daumé III,2007;
Kratzwald and Feuerriegel,2019a;Kratzwald et al.,
2020;Kamath et al.,2020). Another approach is to
generate labeled QA samples in the target domain
for training (Lee et al.,2020;Yue et al.,2021a,
2022a). However, these approaches typically re-
quire large amounts of annotated data or extensive
computational resources. As such, they tend to be
ineffective in adapting existing QA models to an
unseen target domain (Fisch et al.,2019). Only re-
cently, a contrastive loss has been proposed to han-
dle domain adaptation in QA (Yue et al.,2021b).
Several approaches have been used to address
issues related to insufficient data and generaliza-
tion in NLP tasks, yet outside of QA. For exam-
ple, augmentation in the hidden space encourages
more generalizable features for training (Verma
et al.,2019;Chen et al.,2020,2021). For domain
adaptation, there are approaches that encourage
the model to learn domain-invariant features via a
domain critic (Lee et al.,2019;Cao et al.,2020),
or adopt discrepancy regularization between the
source and target domains (Kang et al.,2019;Yue
et al.,2022b). However, to the best of our knowl-
edge, no work has attempted to build a smooth and
generalized feature space via hidden space augmen-
arXiv:2210.10861v1 [cs.CL] 19 Oct 2022
tation and self-supervised domain adaption.
In this paper, we propose a novel self-supervised
QA domain adaptation framework for extractive
QA called QADA. Our QADA framework is de-
signed to handle domain shifts and should thus
answer out-of-domain questions. QADA has three
stages, namely pseudo labeling, hidden space aug-
mentation and self-supervised domain adaptation.
First, we use pseudo labeling to generate and fil-
ter labeled target QA data. Next, the augmenta-
tion component integrates a novel pipeline for data
augmentation to enrich training samples in the hid-
den space. For questions, we build upon multi-
hop synonyms and introduce Dirichlet neighbor-
hood sampling in the embedding space to generate
augmented tokens. For contexts, we develop an
attentive context cutoff method which learns to
drop context spans via a sampling strategy using
attention scores. Third, for training, we propose
to train the QA model via a novel attention-based
contrastive adaptation. Specifically, we use the at-
tention weights to sample informative features that
help the QA model separate answers and generalize
across the source and target domains.
Main contributions of our work are:1
1.
We propose a novel, self-supervised framework
called QADA for domain adaptation in QA.
QADA aims at answering out-of-domain ques-
tion and should thus handle the domain shift
upon deployment in an unseen domain.
2.
To the best of our knowledge, QADA is the first
work in QA domain adaptation that (i) lever-
ages hidden space augmentation to enrich train-
ing data; and (ii) integrates attention-based con-
trastive learning for self-supervised adaptation.
3.
We demonstrate the effectiveness of QADA in
an unsupervised setting where target answers
are not accessible. Here, QADA can consid-
erably outperform state-of-the-art baselines on
multiple datasets for QA domain adaptation.
2 Related Work
Extractive QA has achieved significantly progress
recently (Devlin et al.,2019;Kratzwald et al.,2019;
Lan et al.,2020;Zhang et al.,2020). Yet, the ac-
curacy of QA models can drop drastically under
1
The code for our QADA framework is publicly available
at https://github.com/Yueeeeeeee/Self-Supervised-QA.
domain shifts; that is, when deployed in an un-
seen domain that differs from the training distribu-
tion (Fisch et al.,2019;Talmor and Berant,2019).
To overcome the above challenge, various ap-
proaches for QA domain adaptation have been
proposed, which can be categorized as follows.
(1)
(Semi-)
supervised adaptation uses partially la-
beled data from the target distribution for train-
ing (Yang et al.,2017;Kratzwald and Feuerriegel,
2019b;Yue et al.,2022a). (2) Unsupervised adap-
tation with question generation refers to settings
where only context paragraphs in the target domain
are available, QA samples are generated separately
to train the QA model (Shakeri et al.,2020;Yue
et al.,2021b). (3) Unsupervised adaptation has
access to context and question information from
the target domain, whereas answers are unavail-
able (Chung et al.,2018;Cao et al.,2020;Yue
et al.,2022d). In this paper, we focus on the third
category and study the problem of unsupervised
QA domain adaptation.
Domain adaptation for QA
: Several ap-
proaches have been developed to generate synthetic
QA samples via question generation (QG) in an
end-to-end fashion (i.e., seq2seq) (Du et al.,2017;
Sun et al.,2018). Leveraging such samples from
QG can also improve the QA performance in out-of-
domain distributions (Golub et al.,2017;Tang et al.,
2017,2018;Lee et al.,2020;Shakeri et al.,2020;
Yue et al.,2022a;Zeng et al.,2022a). Given unla-
beled questions, there are two main approaches: do-
main adversarial training can be applied to reduce
feature discrepancy between domains (Lee et al.,
2019;Cao et al.,2020), while contrastive adapta-
tion minimizes the domain discrepancy using maxi-
mum mean discrepancy (MMD) (Yue et al.,2021b,
2022d). We later use the idea from contrastive
learning but tailor it carefully for our adaptation
framework.
Data augmentation for NLP
: Data augmenta-
tion for NLP aims at improving the language under-
standing with diverse data samples. One approach
is to apply token-level augmentation and enrich the
training data with simple techniques (e.g., synonym
replacement, token swapping, etc.) (Wei and Zou,
2019) or custom heuristics (McCoy et al.,2019).
Alternatively, augmentation can be done in the hid-
den space of the underlying model (Chen et al.,
2020). For example, one can drop partial spans
in the hidden space, which aids generalization per-
formance under distributional shifts (Chen et al.,
2021) but in NLP tasks outside of QA. To the best
of our knowledge, we are the first to propose a hid-
den space augmentation pipeline tailored for QA
data in which different strategies are combined for
question and context augmentation.
Contrastive learning for domain adaptation
:
Contrastive learning is used to minimize distances
of same-class samples and maximize discrepancy
among classes (Hadsell et al.,2006). For this, dif-
ferent metrics are adopted to measure pair-wise
distances (e.g., triplet loss) or domain distances
with MMD (Cheng et al.,2016;Schroff et al.,
2015). Contrastive learning can also be used for
domain adaptation by reducing the domain discrep-
ancy: this “pulls together” intra-class features and
“pushes apart” inter-class representations. Here,
several applications are in computer vision (Kang
et al.,2019). In QA domain adaptation, contrastive
learning was applied with averaged token features
to separate answer tokens and minimize the dis-
crepancy between source and target domain (Yue
et al.,2021b,2022d). However, our work is dif-
ferent in that we introduce a novel attention-based
strategy to construct more informative features for
discrepancy estimation and contrastive adaptation.
3 Setup
We consider the following problem setup for QA
domain adaptation, where labeled source data and
unlabeled target data are available for training. Our
goal is to train a QA model
f
that maximizes the
performance in the target domain using both source
data and unlabeled target data (Cao et al.,2020;
Shakeri et al.,2020;Yue et al.,2021b,2022d).
Data
: Our research focuses on question answer-
ing under domain shift. Let
Ds
denote the source
domain, and let
Dt
denote the (different) target do-
main. Then, labeled data from the source domain
can be used for training, while, upon deployment,
it should perform well on the data from the target
domain. Specifically, training is two-fold: we first
pretrain a QA model on the source domain
Ds
and,
following this, the pretrained QA model is adapted
to the target domain
Dt
. The input data to each
domain is as follows:
Labeled source data: Training data is provided
by labeled QA data
Xs
from the source domain
Ds
. Here, each sample
(x(i)
s,c,x(i)
s,q,x(i)
s,a)Xs
is a triplet comprising a context
x(i)
s,c
, a question
x(i)
s,q
, and an answer
x(i)
s,a
. As we consider extrac-
tive QA, the answer is represented by the start
and end position in the context.
Unlabeled target data: We assume partial ac-
cess to data from the target domain
Dt
, that is,
only contexts and unlabeled questions. The con-
texts and questions are first used for pseudo la-
beling, followed by self-supervised adaptation.
Formally, we refer to the contexts and questions
via
x(i)
t,c
and
x(i)
t,q
, with
(x(i)
t,c,x(i)
t,q)X0
t
where
X0
t
is the unlabeled data from the target domain.
Model:
The QA model can be represented with
function
f
.
f
takes both a question and con-
text as input and predicts an answer, i.e.,
x(i)
a=
f(x(i)
q,x(i)
c)
. Upon deployment, our goal is to
maximize the model performance on
Xt
in the tar-
get domain
Dt
. Mathematically, this corresponds
to the optimization of fover target data Xt:
min
fLce(f,Xt),(1)
where Lce is the cross-entropy loss.
4 The QADA Framework
4.1 Overview
Our proposed QADA framework has three stages
to be performed in each epoch (see Fig. 1):
(1)
pseudo labeling
, where pseudo labels are gen-
erated for the unlabeled targeted data; (2)
hidden
space augmentation
, in which the proposed aug-
mentation strategy is leveraged to generated virtual
examples in the feature space; and (3)
contrastive
adaptation
that minimizes domain discrepancy to
transfer source knowledge to the target domain.
To address the domain shift upon deployment,
we use the aforementioned stages as follows. In
the first stage, we generate pseudo labels for the
unlabeled target data
X0
t
. Next, we enrich the set of
training data via hidden space augmentation. In the
adaptation stage, we train the QA model using both
the source and the target data with our attention-
based contrastive adaptation. We summarize the
three stages in the following:
1.
Pseudo labeling: First, we build labeled target
data
ˆ
Xt
via pseudo labeling. Formally, a source-
pretrained QA model
f
generates a (pseudo)
answer
x(i)
t,a
for context
x(i)
t,c
and question
x(i)
t,q
,
i= 1, . . .
Each sample
x(i)
tˆ
Xt
now contains
the original context, the original question, and
摘要:

QADomainAdaptationusingHiddenSpaceAugmentationandSelf-SupervisedContrastiveAdaptationZhenruiYueUIUCzhenrui3@illinois.eduHuiminZengUIUChuiminz3@illinois.eduBernhardKratzwaldEthonAIbernhard.kratzwald@ethon.aiStefanFeuerriegelLMUMunichfeuerriegel@lmu.deDongWangUIUCdwang24@illinois.eduAbstractQuestion...

展开>> 收起<<
QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation Zhenrui Yue.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:630.93KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注