QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation Zhenrui Yue

2025-05-02 0 0 630.93KB 14 页 10玖币

侵权投诉

QA Domain Adaptation using Hidden Space Augmentation and

Self-Supervised Contrastive Adaptation

Zhenrui Yue∗

UIUC

zhenrui3@illinois.edu

Huimin Zeng∗

UIUC

huiminz3@illinois.edu

Bernhard Kratzwald

EthonAI

bernhard.kratzwald@ethon.ai

Stefan Feuerriegel

LMU Munich

feuerriegel@lmu.de

Dong Wang

UIUC

dwang24@illinois.edu

Abstract

Question answering (QA) has recently shown

impressive results for answering questions

from customized domains. Yet, a common

challenge is to adapt QA models to an un-

seen target domain. In this paper, we pro-

pose a novel self-supervised framework called

QADA for QA domain adaptation. QADA in-

troduces a novel data augmentation pipeline

used to augment training QA samples. Dif-

ferent from existing methods, we enrich the

samples via hidden space augmentation. For

questions, we introduce multi-hop synonyms

and sample augmented token embeddings with

Dirichlet distributions. For contexts, we de-

velop an augmentation method which learns

to drop context spans via a custom attentive

sampling strategy. Additionally, contrastive

learning is integrated in the proposed self-

supervised adaptation framework QADA. Un-

like existing approaches, we generate pseudo

labels and propose to train the model via a

novel attention-based contrastive adaptation

method. The attention weights are used to

build informative features for discrepancy es-

timation that helps the QA model separate an-

swers and generalize across source and tar-

get domains. To the best of our knowledge,

our work is the ﬁrst to leverage hidden space

augmentation and attention-based contrastive

adaptation for self-supervised domain adapta-

tion in QA. Our evaluation shows that QADA

achieves considerable improvements on mul-

tiple target datasets over state-of-the-art base-

lines in QA domain adaptation.

1 Introduction

Question answering (QA) is the task of ﬁnding

answers for a given context and a given question.

QA models are typically trained using data triplets

consisting of context, question and answer. In the

case of extractive QA, answers are represented as

subspans in the context deﬁned by a start position

∗Both authors contributed equally to this research.

and an end position, while question and context are

given as running text (e.g., Seo et al.,2016;Chen

et al.,2017;Devlin et al.,2019;Kratzwald et al.,

2019).

A common challenge in extractive QA is that QA

models often suffer from performance deterioration

upon deployment and thus make mistakes for user-

generated inputs. The underlying reason for such

deterioration can be traced back to the domain shift

between training data (from the source domain)

and test data (from the target domain) (Fisch et al.,

2019;Miller et al.,2020;Zeng et al.,2022b).

Existing approaches to address domain shifts

in extractive QA can be grouped as follows. One

approach is to include labeled target examples or

user feedback during training (Daumé III,2007;

Kratzwald and Feuerriegel,2019a;Kratzwald et al.,

2020;Kamath et al.,2020). Another approach is to

generate labeled QA samples in the target domain

for training (Lee et al.,2020;Yue et al.,2021a,

2022a). However, these approaches typically re-

quire large amounts of annotated data or extensive

computational resources. As such, they tend to be

ineffective in adapting existing QA models to an

unseen target domain (Fisch et al.,2019). Only re-

cently, a contrastive loss has been proposed to han-

dle domain adaptation in QA (Yue et al.,2021b).

Several approaches have been used to address

issues related to insufﬁcient data and generaliza-

tion in NLP tasks, yet outside of QA. For exam-

ple, augmentation in the hidden space encourages

more generalizable features for training (Verma

et al.,2019;Chen et al.,2020,2021). For domain

adaptation, there are approaches that encourage

the model to learn domain-invariant features via a

domain critic (Lee et al.,2019;Cao et al.,2020),

or adopt discrepancy regularization between the

source and target domains (Kang et al.,2019;Yue

et al.,2022b). However, to the best of our knowl-

edge, no work has attempted to build a smooth and

generalized feature space via hidden space augmen-

arXiv:2210.10861v1 [cs.CL] 19 Oct 2022

tation and self-supervised domain adaption.

In this paper, we propose a novel self-supervised

QA domain adaptation framework for extractive

QA called QADA. Our QADA framework is de-

signed to handle domain shifts and should thus

answer out-of-domain questions. QADA has three

stages, namely pseudo labeling, hidden space aug-

mentation and self-supervised domain adaptation.

First, we use pseudo labeling to generate and ﬁl-

ter labeled target QA data. Next, the augmenta-

tion component integrates a novel pipeline for data

augmentation to enrich training samples in the hid-

den space. For questions, we build upon multi-

hop synonyms and introduce Dirichlet neighbor-

hood sampling in the embedding space to generate

augmented tokens. For contexts, we develop an

attentive context cutoff method which learns to

drop context spans via a sampling strategy using

attention scores. Third, for training, we propose

to train the QA model via a novel attention-based

contrastive adaptation. Speciﬁcally, we use the at-

tention weights to sample informative features that

help the QA model separate answers and generalize

across the source and target domains.

Main contributions of our work are:1

We propose a novel, self-supervised framework

called QADA for domain adaptation in QA.

QADA aims at answering out-of-domain ques-

tion and should thus handle the domain shift

upon deployment in an unseen domain.

To the best of our knowledge, QADA is the ﬁrst

work in QA domain adaptation that (i) lever-

ages hidden space augmentation to enrich train-

ing data; and (ii) integrates attention-based con-

trastive learning for self-supervised adaptation.

We demonstrate the effectiveness of QADA in

an unsupervised setting where target answers

are not accessible. Here, QADA can consid-

erably outperform state-of-the-art baselines on

multiple datasets for QA domain adaptation.

2 Related Work

Extractive QA has achieved signiﬁcantly progress

recently (Devlin et al.,2019;Kratzwald et al.,2019;

Lan et al.,2020;Zhang et al.,2020). Yet, the ac-

curacy of QA models can drop drastically under

The code for our QADA framework is publicly available

at https://github.com/Yueeeeeeee/Self-Supervised-QA.

domain shifts; that is, when deployed in an un-

seen domain that differs from the training distribu-

tion (Fisch et al.,2019;Talmor and Berant,2019).

To overcome the above challenge, various ap-

proaches for QA domain adaptation have been

proposed, which can be categorized as follows.

(1)

(Semi-)

supervised adaptation uses partially la-

beled data from the target distribution for train-

ing (Yang et al.,2017;Kratzwald and Feuerriegel,

2019b;Yue et al.,2022a). (2) Unsupervised adap-

tation with question generation refers to settings

where only context paragraphs in the target domain

are available, QA samples are generated separately

to train the QA model (Shakeri et al.,2020;Yue

et al.,2021b). (3) Unsupervised adaptation has

access to context and question information from

the target domain, whereas answers are unavail-

able (Chung et al.,2018;Cao et al.,2020;Yue

et al.,2022d). In this paper, we focus on the third

category and study the problem of unsupervised

QA domain adaptation.

Domain adaptation for QA

: Several ap-

proaches have been developed to generate synthetic

QA samples via question generation (QG) in an

end-to-end fashion (i.e., seq2seq) (Du et al.,2017;

Sun et al.,2018). Leveraging such samples from

QG can also improve the QA performance in out-of-

domain distributions (Golub et al.,2017;Tang et al.,

2017,2018;Lee et al.,2020;Shakeri et al.,2020;

Yue et al.,2022a;Zeng et al.,2022a). Given unla-

beled questions, there are two main approaches: do-

main adversarial training can be applied to reduce

feature discrepancy between domains (Lee et al.,

2019;Cao et al.,2020), while contrastive adapta-

tion minimizes the domain discrepancy using maxi-

mum mean discrepancy (MMD) (Yue et al.,2021b,

2022d). We later use the idea from contrastive

learning but tailor it carefully for our adaptation

framework.

Data augmentation for NLP

: Data augmenta-

tion for NLP aims at improving the language under-

standing with diverse data samples. One approach

is to apply token-level augmentation and enrich the

training data with simple techniques (e.g., synonym

replacement, token swapping, etc.) (Wei and Zou,

2019) or custom heuristics (McCoy et al.,2019).

Alternatively, augmentation can be done in the hid-

den space of the underlying model (Chen et al.,

2020). For example, one can drop partial spans

in the hidden space, which aids generalization per-

formance under distributional shifts (Chen et al.,

2021) but in NLP tasks outside of QA. To the best

of our knowledge, we are the ﬁrst to propose a hid-

den space augmentation pipeline tailored for QA

data in which different strategies are combined for

question and context augmentation.

Contrastive learning for domain adaptation

Contrastive learning is used to minimize distances

of same-class samples and maximize discrepancy

among classes (Hadsell et al.,2006). For this, dif-

ferent metrics are adopted to measure pair-wise

distances (e.g., triplet loss) or domain distances

with MMD (Cheng et al.,2016;Schroff et al.,

2015). Contrastive learning can also be used for

domain adaptation by reducing the domain discrep-

ancy: this “pulls together” intra-class features and

“pushes apart” inter-class representations. Here,

several applications are in computer vision (Kang

et al.,2019). In QA domain adaptation, contrastive

learning was applied with averaged token features

to separate answer tokens and minimize the dis-

crepancy between source and target domain (Yue

et al.,2021b,2022d). However, our work is dif-

ferent in that we introduce a novel attention-based

strategy to construct more informative features for

discrepancy estimation and contrastive adaptation.

3 Setup

We consider the following problem setup for QA

domain adaptation, where labeled source data and

unlabeled target data are available for training. Our

goal is to train a QA model

that maximizes the

performance in the target domain using both source

data and unlabeled target data (Cao et al.,2020;

Shakeri et al.,2020;Yue et al.,2021b,2022d).

Data

: Our research focuses on question answer-

ing under domain shift. Let

denote the source

domain, and let

denote the (different) target do-

main. Then, labeled data from the source domain

can be used for training, while, upon deployment,

it should perform well on the data from the target

domain. Speciﬁcally, training is two-fold: we ﬁrst

pretrain a QA model on the source domain

and,

following this, the pretrained QA model is adapted

to the target domain

. The input data to each

domain is as follows:

•

Labeled source data: Training data is provided

by labeled QA data

from the source domain

. Here, each sample

(x(i)

s,c,x(i)

s,q,x(i)

s,a)∈Xs

is a triplet comprising a context

x(i)

s,c

, a question

x(i)

s,q

, and an answer

x(i)

s,a

. As we consider extrac-

tive QA, the answer is represented by the start

and end position in the context.

•

Unlabeled target data: We assume partial ac-

cess to data from the target domain

, that is,

only contexts and unlabeled questions. The con-

texts and questions are ﬁrst used for pseudo la-

beling, followed by self-supervised adaptation.

Formally, we refer to the contexts and questions

via

x(i)

t,c

and

x(i)

t,q

, with

(x(i)

t,c,x(i)

t,q)∈X0

where

is the unlabeled data from the target domain.

Model:

The QA model can be represented with

function

takes both a question and con-

text as input and predicts an answer, i.e.,

x(i)

f(x(i)

q,x(i)

. Upon deployment, our goal is to

maximize the model performance on

in the tar-

get domain

. Mathematically, this corresponds

to the optimization of fover target data Xt:

min

fLce(f,Xt),(1)

where Lce is the cross-entropy loss.

4 The QADA Framework

4.1 Overview

Our proposed QADA framework has three stages

to be performed in each epoch (see Fig. 1):

(1)

pseudo labeling

, where pseudo labels are gen-

erated for the unlabeled targeted data; (2)

hidden

space augmentation

, in which the proposed aug-

mentation strategy is leveraged to generated virtual

examples in the feature space; and (3)

contrastive

adaptation

that minimizes domain discrepancy to

transfer source knowledge to the target domain.

To address the domain shift upon deployment,

we use the aforementioned stages as follows. In

the ﬁrst stage, we generate pseudo labels for the

unlabeled target data

. Next, we enrich the set of

training data via hidden space augmentation. In the

adaptation stage, we train the QA model using both

the source and the target data with our attention-

based contrastive adaptation. We summarize the

three stages in the following:

Pseudo labeling: First, we build labeled target

data

via pseudo labeling. Formally, a source-

pretrained QA model

generates a (pseudo)

answer

x(i)

t,a

for context

x(i)

t,c

and question

x(i)

t,q

i= 1, . . .

Each sample

x(i)

t∈ˆ

now contains

the original context, the original question, and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

QADomainAdaptationusingHiddenSpaceAugmentationandSelf-SupervisedContrastiveAdaptationZhenruiYueUIUCzhenrui3@illinois.eduHuiminZengUIUChuiminz3@illinois.eduBernhardKratzwaldEthonAIbernhard.kratzwald@ethon.aiStefanFeuerriegelLMUMunichfeuerriegel@lmu.deDongWangUIUCdwang24@illinois.eduAbstractQuestion...

展开>> 收起<<

QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation Zhenrui Yue.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation Zhenrui Yue

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: