
DyREx: Dynamic Query Representation for Extractive
Question Answering
Urchade Zaratiana1,2∗
, Niama El Khbir2, Dennis Núñez2, Pierre Holat1,2,
Nadi Tomeh2, Thierry Charnois2
1FI Group, 2LIPN, Université Sorbonne Paris Nord - CNRS UMR 7030
Abstract
Extractive question answering (ExQA) is an essential task for Natural Language
Processing. The dominant approach to ExQA is one that represents the input
sequence tokens (question and passage) with a pre-trained transformer, then uses
two learned query vectors to compute distributions over the start and end answer
span positions. These query vectors lack the context of the inputs, which can be a
bottleneck for the model performance. To address this problem, we propose DyREx,
a generalization of the vanilla approach where we dynamically compute query
vectors given the input, using an attention mechanism through transformer layers.
Empirical observations demonstrate that our approach consistently improves the
performance over the standard one. The code and accompanying files for running
the experiments are available at https://github.com/urchade/DyReX.
1 Introduction
Extractive question answering is a challenging task where the goal is to extract the answer span given
a question and a passage as inputs [Rajpurkar et al., 2016, Kwiatkowski et al., 2019]. The prevailing
approach achieves Extractive question answerin (ExQA) by firstly producing a contextualized repre-
sentation of the input, which is a concatenation of the question and the passage, using a pre-trained
transformer model. Two learned query vectors are then used to compute a probability distribution
over this input sequence representation to produce the start and end positions of the answer span.
This approach has demonstrated very strong and hard-to-beat results, which makes it the de facto
approach to extractive QA [Devlin et al., 2019, Liu et al., 2019, Joshi et al., 2020].
However, despite their high performance, we argue that these methods remain suboptimal since the
query vectors used to compute the start and end distributions are static, i.e., they are independent of
the input sequence, which can be a bottleneck for improving the performance of the model. Hence,
we propose to extend this by allowing the queries to dynamically aggregate information from the
input sequence to better answer the question. Our method, DyREx, iteratively refines the initial query
representations, allowing them to aggregate information from the source sequence through attention
mechanism [Bahdanau et al., 2015, Vaswani et al., 2017]. More specifically, we make use of an
L-layers transformer decoder architecture, which allows (1) interaction between the queries through
self-attention to model the interdependence between the start and end of the answer span, and allows
(2) interaction between queries and the input sequence through cross-attention, which specializes the
queries to a specific input question and passage, giving more flexibility than a static representation.
We conduct extensive experiments on several extractive Question Answering benchmarks, including
SQuad [Rajpurkar et al., 2016] and MRQA datasets [Fisch et al., 2019]. Experimental results
demonstrate that our approach consistently improves the performance over the standard approach.
∗Correspondence to: zaratiana@lipn.fr
NeurIPS 2022 2nd Workshop on Efficient Natural Language and Speech Processing, New Orleans.
arXiv:2210.15048v1 [cs.CL] 26 Oct 2022