
Given the initial evidence set (e.g., the car race
entry list table in Fig. 2), our intermediary module
produces a list of query-dependent evidence chains
(e.g., the red line linked evidence chain consisting
of the car race entry list and the driver’s Wikipedia
page). We first propose a linker model (§3.1) to ex-
pand the candidate evidence set by including extra
passages related to tables in the initial set (purple
arrows in Fig. 2). This step allows the model to
enrich the evidence context, especially including
reasoning chains needed for multi-hop questions.
Since there could be many links between a piece
of evidence and others (i.e., a densely connected
graph), considering all links is computationally in-
feasible for the downstream reader. Thus, we de-
velop a chainer model (§3.2) to prune the evidence
graph with the corresponding question and then
chain the evidence across hops together to form
query-dependent paths. Here, we only keep top-
K
scored chains for reading so that the reader can
work on a fixed computation budget.
Finally, the Fusion-in-Decoder (FiD) (Izacard
and Grave,2021), a T5-based generative model
(Raffel et al.,2019), is used as the reader for gen-
erating the final answer. The model first encodes
each top-
K
evidence chain independently along
with the question. During decoding, the decoder
can attend to all chains, thus fusing all the input
information.
3 Intermediary Modules
In this part, we present the two key components of
CORE
for supporting multi-hop reasoning, i.e., the
linker for building evidence graphs and the chainer
for forming query-dependent paths.
3.1 Linker
In this work, we mainly focus on linking an en-
tity mention in the retrieved evidence to the cor-
responding Wikipedia page for building evidence
graphs. This setup is related to the recent entity
linking work (Wu et al.,2020). However, there are
important modifications for ODQA. In particular,
instead of assuming the entity mention as a prior,
we consider a more realistic end-to-end scenario for
ODQA: the linker model has to first propose candi-
date entity mentions (spans) for a given evidence
(e.g., “Tony Longhurst” in Fig. 2), and then links
the proposed mention to its Wikipedia page. An-
other major difference is that we study entity men-
tions in tables instead of text. As tables contain
more high-level summary information than text,
using tables as pivots for constructing evidence
graphs can potentially help improve the recall of
evidence chains for QA. In the meanwhile, this
task is challenging due to the mismatch between
the lexical form of the table cells and their linked
passage titles. For example, the table of "NCAA Di-
vision I women’s volleyball tournament" contains
the cell VCU, which refers to VCU Rams instead of
Virginia Commonwealth University. Thus simple
lexical matching would not work.
In the following, we first describe the model
for entity mention proposal and then present a
novel entity linking model for mentions in tables.
Both models are based on a pretrained language
model, BERT (Devlin et al.,2019). Following
previous work (Oguz et al.,2020), we flatten the
table row-wise into a sequence of tokens for deriv-
ing table representations from BERT. In particular,
we use
x1, . . . , xN
to denote an input sequence of
length
N
. Typically, when using BERT, there is a
prepended token
[CLS]
for all input sequences, i.e.,
[CLS], x1, . . . , xN
. Then the output is a sequence
of hidden states
h[CLS],h1,...,hN∈Rd
from the
last BERT layer for each input token, where
d
is
the hidden dimension.
Entity Mention Proposal
In realistic settings, the
ground truth entity mention locations are not pro-
vided. Directly applying an off-the-shelf named
entity recognition (NER) model can be sub-optimal
as the tables are structured very differently from the
text. Thus, we develop a span proposal model to
label the entity mentions in the table. Specifically,
we use BERT as the encoder (BERT
m
) and add a
linear projection to predict whether a token is part
of an entity mention for all tokens in the table,
hm1,...,hmN=BERT m(t1, . . . , tN),(1)
ˆ
y=Whm,(2)
where
hm∈RN×d
and
W∈R2×d
. The model is
trained with a token-level binary loss
−1
N
n=1
X
N
(ynlog P(ˆ
y)1+ (1 −yn) log P(ˆ
y)0),
(3)
where
yn
is the 0-1 label for the token at position
n, and P(·)is the softmax function.
Table Entity Linking
Once the candidate entity
mentions are proposed, we follow Wu et al. (2020)
to use a bi-encoder model for linking. Simi-
larly, two BERT models are used to encode ta-
bles (BERT
t
) and passages (BERT
p
), respectively.