
Dataset Fact-related examples Error rate
NQ 736 (20.4%) 31.9%
TriviaQA 3,738 (33.0%) 17.4%
WebQ 1,181 (58.1%) 42.5%
Table 1: The error rate of state-of-the-art reader (i.e.,
FiD base) on the subset of data examples in the test set
that have related fact triplets on the knowledge graph.
i.e., entities in questions are neighbors of answer
entities in retrieved passages through any relation.
We also wonder how many of the above examples
are correctly answered by state-of-the-art readers.
Table 1shows that a great portion of examples
(e.g., 58.1% in WebQ) can be matched to related
fact triplets on the KG. However, without using the
KG, FiD frequently produces incorrect answers to
questions on these subsets, leaving us significant
room for improvement. Therefore, a framework
that leverages not only textual information in re-
trieved passages but also fact triplets from the KG
is urgently desired to improve reader performance.
In this paper, we propose a novel knowledge
Gra
ph enhanced
p
assag
e
reader, namely GRAPE,
to improve the reader performance for open-
domain QA. Considering the enormous size of KGs
and complex interweaving between entities (e.g.,
over 5 million entities and over 30 neighbors per
entity on Wikidata), direct reasoning on the entire
graph is intractable. Thus, we first construct a lo-
calized bipartite graph for each pair of question and
passage, where nodes represent entities contained
within them, and edges represent relationships be-
tween entities. Then, node representations are ini-
tialized with the hidden states of the corresponding
entities, extracted from the intermediate layer of
the reader model. Next, a graph neural network
learns node representations with relational knowl-
edge, and passes them back into the hidden states
of the reader model. Through this carefully curated
design, GRAPEtakes into account both aspects of
knowledge as a holistic framework.
To the best of our knowledge, we are the first
work to leverage knowledge graphs to enhance the
passage reader for open-domain QA. Our experi-
ments demonstrate that, given the same retriever
and the same set of retrieved passages, GRAPE
can achieve superior performance on three open-
domain QA benchmarks (i.e., NQ, TriviaQA, and
WebQ) with up to 2.2 improvement on the exact
match score over the state-of-the-art readers. In
particular, our proposed GRAPEnearly doubles
the improvement gain on the subset that can be
enhanced by fact triplets on the KG.
2 Related Work
Text-based open-domain QA
Mainstream
open-domain QA models employ a retriever-
reader architecture, and recent follow-up work
has mainly focused on improving the retriever
or the reader (Chen and Yih,2020;Zhu et al.,
2021). For the retriever, most of them split text
paragraphs on Wikipedia pages into over 20
million disjoint chunks of 100 words, each of
which is called a passage. Traditional methods
such as TF-IDF and BM25 explore sparse retrieval
strategies by matching the overlapping contents
between questions and passages (Chen et al.,
2017;Yang et al.,2019). DPR (Karpukhin et al.,
2020) revolutionized the field by utilizing dense
contextualized vectors for passage indexing.
Furthermore, other research improved the per-
formance by better training strategies (Qu et al.,
2021), passage re-ranking (Mao et al.,2021) or
directly generating passages (Yu et al.,2022a).
Whereas for the reader, extractive readers aimed to
locate a span of words in the retrieved passages as
answer (Karpukhin et al.,2020;Iyer et al.,2021;
Guu et al.,2020). On the other hand, FiD and
RAG, current state-of-the-art readers, leveraged
encoder-decoder models such as T5 to generate
answers (Lewis et al.,2020;Izacard and Grave,
2021). Nevertheless, these readers only used text
corpus, failing to capture the complex relationships
between entities, and hence resulting in produced
answers contradicting the facts.
KG-enhanced methods for open-domain QA
Recent work has explored incorporating knowledge
graphs (KGs) into the retriever-reader pipeline for
open-domain QA (Min et al.,2019;Zhou et al.,
2020;Oguz et al.,2022;Yu et al.,2021;Hu et al.,
2022;Yu et al.,2022b). For example, Unik-QA
converted structured KG triples and merged un-
structured text together into a unified index, so
the retrieved evidence has more knowledge cov-
ered. Graph-Retriever (Min et al.,2019) and GNN-
encoder (Liu et al.,2022) explored passage-level
KG relations for better passage retrieval. KAQA
(Zhou et al.,2020) improved passage retrieval by
re-ranking according to KG relations between can-
didate passages. KG-FiD (Yu et al.,2021) utilized
KG relations to re-rank retrieved passages by a KG
fine-grained filter. However, all of these retriever-