DyREx Dynamic Query Representation for Extractive Question Answering Urchade Zaratiana12 Niama El Khbir2 Dennis Núñez2 Pierre Holat12

2025-05-03 0 0 342.15KB 6 页 10玖币

侵权投诉

DyREx: Dynamic Query Representation for Extractive

Question Answering

Urchade Zaratiana1,2∗

, Niama El Khbir2, Dennis Núñez2, Pierre Holat1,2,

Nadi Tomeh2, Thierry Charnois2

1FI Group, 2LIPN, Université Sorbonne Paris Nord - CNRS UMR 7030

Abstract

Extractive question answering (ExQA) is an essential task for Natural Language

Processing. The dominant approach to ExQA is one that represents the input

sequence tokens (question and passage) with a pre-trained transformer, then uses

two learned query vectors to compute distributions over the start and end answer

span positions. These query vectors lack the context of the inputs, which can be a

bottleneck for the model performance. To address this problem, we propose DyREx,

a generalization of the vanilla approach where we dynamically compute query

vectors given the input, using an attention mechanism through transformer layers.

Empirical observations demonstrate that our approach consistently improves the

performance over the standard one. The code and accompanying ﬁles for running

the experiments are available at https://github.com/urchade/DyReX.

1 Introduction

Extractive question answering is a challenging task where the goal is to extract the answer span given

a question and a passage as inputs [Rajpurkar et al., 2016, Kwiatkowski et al., 2019]. The prevailing

approach achieves Extractive question answerin (ExQA) by ﬁrstly producing a contextualized repre-

sentation of the input, which is a concatenation of the question and the passage, using a pre-trained

transformer model. Two learned query vectors are then used to compute a probability distribution

over this input sequence representation to produce the start and end positions of the answer span.

This approach has demonstrated very strong and hard-to-beat results, which makes it the de facto

approach to extractive QA [Devlin et al., 2019, Liu et al., 2019, Joshi et al., 2020].

However, despite their high performance, we argue that these methods remain suboptimal since the

query vectors used to compute the start and end distributions are static, i.e., they are independent of

the input sequence, which can be a bottleneck for improving the performance of the model. Hence,

we propose to extend this by allowing the queries to dynamically aggregate information from the

input sequence to better answer the question. Our method, DyREx, iteratively reﬁnes the initial query

representations, allowing them to aggregate information from the source sequence through attention

mechanism [Bahdanau et al., 2015, Vaswani et al., 2017]. More speciﬁcally, we make use of an

L-layers transformer decoder architecture, which allows (1) interaction between the queries through

self-attention to model the interdependence between the start and end of the answer span, and allows

(2) interaction between queries and the input sequence through cross-attention, which specializes the

queries to a speciﬁc input question and passage, giving more ﬂexibility than a static representation.

We conduct extensive experiments on several extractive Question Answering benchmarks, including

SQuad [Rajpurkar et al., 2016] and MRQA datasets [Fisch et al., 2019]. Experimental results

demonstrate that our approach consistently improves the performance over the standard approach.

∗Correspondence to: zaratiana@lipn.fr

NeurIPS 2022 2nd Workshop on Efﬁcient Natural Language and Speech Processing, New Orleans.

arXiv:2210.15048v1 [cs.CL] 26 Oct 2022

2 Model

2.1 Background: Vanilla QA model

We describe here the mainstream approach to extractive Question Answering tasks. In all the

following, we call it the ExQA vanilla approach. It is typically performed by feeding the input text

sequence

{xi}N

i=1

(the concatenation of the question

and the passage

containing the answer)

into a pre-trained language model such as BERT [Devlin et al., 2019], producing contextualized token

representations

{hi}N

i=1 ∈Rd

being the embedding dimension of the model. Then, to compute the

probability of the start and end positions of the answer span, the following estimators are used:

p(start =i|Q, D) = exp(qT

shi)

i0=1 exp(qT

shi0)p(end =j|Q, D) = exp(qT

ehj)

j0=1 exp(qT

ehj0)(1)

Where

and

qe∈Rd

are respectively the start and end queries, randomly initialized and updated

during model learning. The training objective is to minimize the sum of the negative log-likelihood

of the correct start and end positions (ˆ

i, ˆ

j):

L=−log p(start =ˆ

i|Q, D)−log p(end =ˆ

j|Q, D)(2)

This approach was ﬁrst proposed by Devlin et al. [2019], and is now used by most of the work on

transformer-based extractive question answering [Liu et al., 2019, Joshi et al., 2020, Shi et al., 2022].

2.2 Our model: DyREx

The learned query vectors

and

in the vanilla approach are shared among all sentences and are

context insensitive. We presume that using such static queries is a constraining factor for performance

improvement, so we propose to extend this approach by allowing the queries to dynamically aggregate

information from the input sequence to allow the model to better adapt to the context.

In our model, the initial start and end query representations

and

are concatenated and fed to an

L-layers transformer decoder [Vaswani et al., 2017] to obtain dynamic representations qL

sand qL

QL=Trans_DecL(Q0,H)(3)

with

Qi= [qi

e,qi

the concactenated queries at layer

and

H= [h0,h1, ..., hN]

the concatenated

token representations, and Trans_DecLbeing an L-layers transformer decoder.

More speciﬁcally, the i-th transformer layer consists of a bi-directional self-attention module

self-atti

applied between the queries to model the interdependence between the start and the

end positions of the answer, a cross-attention

cross-atti

which updates the query representations

by aggregating information from the input sequence embeddings, and ﬁnally a two-layer point-wise

feedforward network FFNiwith GeLU activation [Hendrycks and Gimpel, 2016]:

Qi=self-atti(Q=Qi,K=Qi,V=Qi)

Qi=cross-atti(Q=˜

Qi,K=H,V=H)

Qi+1 =FNNi(b

Qi)

(4)

Furthermore, an

Add-Norm

(skip connection [He et al., 2016] + layer normalization [Ba et al., 2016])

is inserted after each of the components as in Vaswani et al. [2017], but we do not show it here

for better readability. Moreover, both

self-att

and

cross-att

are multi-head scaled dot-product

attention from Vaswani et al. [2017], and the embedding dimension and the number of attention heads

of the decoder layers are the same as for the token representation layer.

Finally, to compute the start and the end answer position probabilities, we use the same estimator

as the vanilla model in equation 1, substituting

and

respectively. Note that the

vanilla model is a particular case of our model with a number of decoder layers L= 0.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DyREx:DynamicQueryRepresentationforExtractiveQuestionAnsweringUrchadeZaratiana1;2,NiamaElKhbir2,DennisNúñez2,PierreHolat1;2,NadiTomeh2,ThierryCharnois21FIGroup,2LIPN,UniversitéSorbonneParisNord-CNRSUMR7030AbstractExtractivequestionanswering(ExQA)isanessentialtaskforNaturalLanguageProcessing.Thedomi...

展开>> 收起<<

DyREx Dynamic Query Representation for Extractive Question Answering Urchade Zaratiana12 Niama El Khbir2 Dennis Núñez2 Pierre Holat12.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DyREx Dynamic Query Representation for Extractive Question Answering Urchade Zaratiana12 Niama El Khbir2 Dennis Núñez2 Pierre Holat12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: