QA architectures, RAG (Lewis et al.,2020b) com-
bines the information retrieval stage and answer
generation stage in a differentiable manner. It uses
a combination of parametric and non-parametric
memory, where the parametric memory consists of
a pre-trained seq2seq BART (Lewis et al.,2019)
generator, and the non-parametric memory consists
of dense vector representations of Wikipedia arti-
cles indexed with the FAISS library (Johnson et al.,
2017). RAG first encodes a question into a dense
representation, retrieves the relevant passages from
an indexed Wikipedia knowledge base, and then
feeds them into the generator. The loss function
can finetune both the generator and the question en-
coder at the same time. Lewis et al. (Lewis et al.,
2020b) highlight RAG’s ability to perform well
in Wikipedia-based general question-answering
datasets like Natural Questions (Kwiatkowski et al.,
2019). Other recent work also highlights how the
outputs generated from RAG models are much
more factual due to RAG being conditioned on
the retrieved documents, possibly providing an an-
swer to the hallucination problem of generative
language models. Shuster, Kurt, et al. (Shuster
et al.,2021) also highlight how RAG reduces hal-
lucinations in knowledge-grounded conversational
tasks, where the task is to generate responses to
dialogues based on a large Wikipedia knowledge
base. Xu et al. (2021) illustrate the effectiveness
of RAG in chat-bot frameworks and highlight how
RAG models are able to recall and summarize con-
versations compared to standard seq2seq models
with only parametric memory. This paper aims
to understand how RAG could be extended to an
end2end model and adapted to specific domains.
To the best of our knowledge, this is the first time
RAG is being investigated on domain adaptation
for the task of ODQA systems.
2.2 REALM-like end2end Retrieval Augment
Architectures
REALM (Guu et al.,2020) is a similar Retrieval
Augmented model to RAG. REALM introduced
a novel masked language pre-training step that in-
volves an end-to-end trainable retriever. In the
REALM work, the authors first train the entire
model on the masked language prediction task
and then fine-tune it on question-answering tasks
(keeping the retriever frozen). In comparison to
REALM, the original RAG model uses an already
trained DPR retriever and conducts partial end-to-
end training with a BART reader model. Com-
pared to REALM, RAG is less computationally
expensive, and its code is available open-source.
We explore and extend the original RAG archi-
tecture for domain adaptation in our work. We
adapted some concepts of our RAG-end2end ex-
tension from REALM. REALM only updates its
retriever during the pre-training process that uses
the masked language modeling (MLM) (Devlin
et al.,2018) task. Then during the downstream
fine-tuning task, REALM keeps its retriever fixed.
However, the REALM end-to-end training code
is not open-sourced, possibly due to its computa-
tional complexity. Compared to REALM, RAG
is a combination of already pre-trained language
models where the users do not need to go through a
heavy pre-training stage. Due to these engineering-
friendly features and high availability, we con-
ducted our experiments with RAG and extended
RAG into an end-to-end trainable retrieval augmen-
tation model. It is also important to highlight that
none of the prior work has explored the domain
adaptation of retrieval augment models for question
answering; instead, most focus on general question
answering with Wikipedia-based knowledge bases.
Similar to REALM’s end2end architecture, re-
cent work (Sachan et al.,2021) extended RAG
and highlighted that the retriever training could
improve the overall performance in question-
answering datasets like Natural Questions. Com-
pared to our work, the authors did not focus on
the domain adaptation of retrieval augment mod-
els. The authors mainly explore the ability to train
neural retrievers in an end-to-end way using re-
trieval augment models. Similarly, another related
work (Singh et al.,2021) extended retrieval aug-
mented architectures to an end-to-end model and
illustrated that it could improve the question an-
swering accuracy. Singh et al. (2021) mainly fo-
cused on improving the document reading ability
and answer generation rather than domain adapta-
tion.
3 Model Architecture and Training
Procedure
In this work, we extend RAG to finetune all compo-
nents, including the DPR retriever, and dynamically
update the external knowledge base during train-
ing. We hypothesize that the use of asynchronous
updates helps with domain adaptation. Figure 1
demonstrates the main workflow of our model. In