Look to the Right Mitigating Relative Position Bias in Extractive Question Answering Kazutoshi Shinoda12Saku Sugawara2Akiko Aizawa12

2025-05-02 0 0 402.37KB 8 页 10玖币

侵权投诉

Look to the Right: Mitigating Relative Position Bias

in Extractive Question Answering

Kazutoshi Shinoda1,2Saku Sugawara2Akiko Aizawa1,2

1The University of Tokyo

2National Institute of Informatics

shinoda@is.s.u-tokyo.ac.jp

{saku,aizawa}@nii.ac.jp

Abstract

Extractive question answering (QA) models

tend to exploit spurious correlations to make

predictions when a training set has unintended

biases. This tendency results in models not be-

ing generalizable to examples where the corre-

lations do not hold. Determining the spurious

correlations QA models can exploit is crucial

in building generalizable QA models in real-

world applications; moreover, a method needs

to be developed that prevents these models

from learning the spurious correlations even

when a training set is biased. In this study, we

discovered that the relative position of an an-

swer, which is deﬁned as the relative distance

from an answer span to the closest question-

context overlap word, can be exploited by QA

models as superﬁcial cues for making predic-

tions. Speciﬁcally, we ﬁnd that when the rela-

tive positions in a training set are biased, the

performance on examples with relative posi-

tions unseen during training is signiﬁcantly de-

graded. To mitigate the performance degrada-

tion for unseen relative positions, we propose

an ensemble-based debiasing method that does

not require prior knowledge about the distribu-

tion of relative positions. We demonstrate that

the proposed method mitigates the models’ re-

liance on relative positions using the biased

and full SQuAD dataset. We hope that this

study can help enhance the generalization abil-

ity of QA models in real-world applications.1

1 Introduction

Deep learning-based natural language understand-

ing (NLU) models are prone to use spurious cor-

relations in the training set. This tendency results

in models’ poor generalization ability to out-of-

distribution test sets (McCoy et al.,2019;Geirhos

et al.,2020), which is a signiﬁcant challenge in the

ﬁeld. Question answering (QA) models trained on

intentionally biased training sets are more likely

Our codes are available at

https://github.com/

KazutoshiShinoda/RelativePositionBias.

Context

... This changed in

1924

with formal re-

quirements developed for graduate degrees,

including offering Doctorate (PhD) degrees

...

Question

The granting of Doctorate degrees ﬁrst oc-

curred in what year at Notre Dame?

Relative

Position

−1

Context

... The other magazine, The Juggler, is re-

leased

twice

a year and focuses on student

literature and artwork ...

Question

How often is Notre Dame’s the Juggler pub-

lished?

Relative

Position

−2

Table 1: Examples taken from SQuAD. Underlined

words are contained in both the context and question.

Bold spans are the answers to the questions. In both

the examples, answers are found by looking to the right

from the overlapping words. See §2.1 for the deﬁnition

of the relative position.

to learn solutions based on spurious correlations

rather than on causal relationships between inputs

and labels. For example, QA models can learn

question-answer type matching heuristics (Lewis

and Fan,2019), and absolute-positional correla-

tions (Ko et al.,2020), particularly when a training

set is biased toward examples with corresponding

spurious correlations. Collecting a fully unbiased

dataset is challenging. Therefore, it is vital to dis-

cover possible dataset biases that can degrade the

generalization and develop debiasing methods to

learn generalizable solutions even when training

on unintentionally biased datasets.

In extractive QA (e.g., Rajpurkar et al.,2016),

in which answers to questions are spans in textual

contexts, we ﬁnd that the relative position of an an-

swer, which is deﬁned as the relative distance from

an answer span to the closest word that appears in

both a context and a question, can be exploited as

superﬁcial cues by QA models. See Table 1for the

arXiv:2210.14541v1 [cs.CL] 26 Oct 2022

Figure 1: F1 score for each relative position din the

SQuAD development set. “ALL” in the legend refers to

a QA model trained on all the examples in the SQuAD

training set. The other terms refer to models trained

only on examples for which the respective conditionals

are satisﬁed. BERT-base was used for the QA models.

The accuracy is comparable to ALL for examples with

seen relative positions, but worse for others. Please re-

fer to §2.1 for the deﬁnition of d.

examples. Speciﬁcally, we ﬁnd that when the rela-

tive positions are intentionally biased in a training

set, a QA model tends to degrade the performance

on examples where answers are located in relative

positions unseen during training, as shown in Fig-

ure 1. For example, when a QA model is trained on

examples with negative relative positions, as shown

in Table 1, the QA performance on examples with

non-negative relative positions is degraded by 10

∼

20 points, as indicated by the square markers

(



) in Figure 1. Similar phenomena were observed

when the distribution of the relative positions in

the training set was biased differently, as shown in

Figure 1. This observation implies that the model

may preferentially learn to ﬁnd answers from seen

relative positions.

We aim to develop a method for mitigating the

performance degradation on subsets with unseen

relative positions while maintaining the scores on

subsets with seen relative positions, even when

the training set is biased with respect to relative

positions. To this end, we propose debiasing meth-

ods based on an ensemble (Hinton,2002) of in-

tentionally biased and main models. The biased

model makes predictions relying on relative posi-

tions, which promotes the main model not depend-

ing solely on relative positions. Our experiments

on SQuAD (Rajpurkar et al.,2016) using BERT-

base (Devlin et al.,2019) as the main model show

that the proposed methods improved the scores

for unseen relative positions by 0

∼

10 points. We

demonstrate that the proposed method is effective

in four settings where the training set is differently

ﬁltered to be biased with respect to relative posi-

tions. Furthermore, when applied to the full train-

ing set, our method improves the generalization to

examples where questions and contexts have no

lexical overlap.

2 Relative Position Bias

2.1 Deﬁnition

In this study, we call a word that is contained in

both the question and the context as an overlap-

ping word. Let

be the relative position of the

nearest overlapping word from the answer span in

extractive QA. If

is a word,

c={wc

i}N

i=0

for

the sentence,

q={wq

i}M

i=0

for the question, and

a={wc

i}e

i=s

(

0≤s≤e≤N

) for the answer, the

relative position dis deﬁned as follows:

f(j, s, e) = 









j−s, for j < s

0,for s≤j≤e

j−e, for j > e

(1)

D={f(j, s, e)|wc

j∈q}(2)

d= argmind0∈D|d0|(3)

where

0≤j≤N

denotes the position of the word

in the sentence,

f(i, s, e)

denotes the relative

position of

from

, and

denotes the set of rel-

ative positions of all overlapping words.

Because

QA models favor spans that are located close to

the overlapping words (Jia and Liang,2017) and

accuracy deteriorates when the absolute distance

between the answer span and the overlapping word

is considerable (Sugawara et al.,2018), the one

with the lowest absolute value in Equation 3is

used as the relative position.3

2.2 Distribution of Relative Position d

Figure 2shows the distribution of relative position

in the SQuAD (Rajpurkar et al.,2016) training

set. This demonstrates that the

values are bi-

ased around zero. Although the tendency to bias

around zero is consistent for the other QA datasets,

there are differences in the distribution between the

datasets. See Appendix Bfor more details. This

Because function words as well as content words are

important clues for reading comprehension,

in Equation 2

can contain function and content words.

There are a few cases where

in Equation 3is not ﬁxed

to one value. However, such examples are excluded from the

training and evaluation sets for brevity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LooktotheRight:MitigatingRelativePositionBiasinExtractiveQuestionAnsweringKazutoshiShinoda1;2SakuSugawara2AkikoAizawa1;21TheUniversityofTokyo2NationalInstituteofInformaticsshinoda@is.s.u-tokyo.ac.jp{saku,aizawa}@nii.ac.jpAbstractExtractivequestionanswering(QA)modelstendtoexploitspuriouscorrelationst...

展开>> 收起<<

Look to the Right Mitigating Relative Position Bias in Extractive Question Answering Kazutoshi Shinoda12Saku Sugawara2Akiko Aizawa12.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Look to the Right Mitigating Relative Position Bias in Extractive Question Answering Kazutoshi Shinoda12Saku Sugawara2Akiko Aizawa12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: