
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
Document-Level Question Answering
Tavish McDonald1, Brian Tsan 2, Amar Saini 1, Juanita Ordonez 1,
Luis Gutierrez 1, Phan Nguyen 1, Blake Mason 1, Brenda Ng 1
1Lawrence Livermore National Laboratory; 2University of California, Merced;
{mcdonald53, saini5, ordonez2, gutierrez74, nguyen97, mason35, ng30}@llnl.gov; btsan@ucmerced.edu
Abstract
Researchers produce thousands of scholarly documents con-
taining valuable technical knowledge. The community faces
the laborious task of reading these documents to identify, ex-
tract, and synthesize information. To automate information
gathering, document-level question answering (QA) offers
a flexible framework where human-posed questions can be
adapted to extract diverse knowledge. Finetuning QA sys-
tems requires access to labeled data (tuples of context, ques-
tion and answer). However, data curation for document QA
is uniquely challenging because the context (i.e., text passage
containing evidence to answer the question) needs to be re-
trieved from potentially long, ill-formatted documents. Ex-
isting QA datasets sidestep this challenge by providing short,
well-defined contexts that are unrealistic in real-world appli-
cations. We present a three-stage document QA approach:
(1) text extraction from PDF; (2) evidence retrieval from ex-
tracted texts to form well-posed contexts; (3) QA to extract
knowledge from contexts to return high-quality answers – ex-
tractive, abstractive, or Boolean. Using the QASPER dataset
for evaluation, our Detect-Retrieve-Comprehend (DRC) sys-
tem achieves a +7.19 improvement in Answer-F1over exist-
ing baselines due to superior context selection. Our results
demonstrate that DRC holds tremendous promise as a flexi-
ble framework for practical scientific document QA.
1 Introduction
Growth in new machine learning publications has exploded
in recent years, with much of this activity occurring out-
side traditional publication venues. For example, arXiv hosts
researchers’ manuscripts detailing the latest progress and
burgeoning initiatives. In 2021 alone, over 68,000 machine
learning papers were submitted to arXiv. Since 2015, sub-
missions to this category have increased yearly at an aver-
age rate of 52%. While it is admirable that the accelerated
pace of AI research has produced many innovative works
and manuscripts, the sheer amount of papers makes it pro-
hibitively difficult to keep pace with the latest developments
in the field. Increasingly, researchers turn to scientific search
engines (e.g., Semantic Scholar and Zeta Alpha), powered
by neural information retrieval, to find relevant literature. To
date, scientific search engines (Fadaee et al. 2020; Zhao and
Copyright © 2023, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
What is the seed lexicon?
Question
The seed lexicon consists of positive
and negative predicates. If the pred-
icate of an extracted event is in the
seed lexicon and does not involve com-
plex phenomena like negation, we as-
sign the corresponding polarity score
(+1 for positive events and -1 for neg-
ative events) to the event. We expect
the model to automatically learn com-
plex phenomena through label propa-
gation. Based on the availability of
scores and the types of discourse re-
lations, we classify the extracted
event pairs into the following three
types.
Evidence
a vocabulary of positive and negative
predicates that helps determine the
polarity score of an event
Answer
Figure 1: QASPER questions require PDF text extraction
and evidence retrieval to generate an answer.
Lee 2020; Parisot and Zavrel 2022) have focused on serv-
ing recommendations based on semantic similarity and lex-
ical matching between a query phrase and a collection of
document-derived contents, particularly titles and abstracts.
Other efforts to elicit the details of scholarly papers have
extracted quantified experimental results from structured ta-
bles (Kardas et al. 2020) and generated detailed summaries
from the hierarchical content of scientific documents (So-
tudeh, Cohan, and Goharian 2020).
While these scientific search engines suffice for topic ex-
ploration, once a set of papers are identified as relevant, re-
searchers would want to probe deeper for information to ad-
dress specific questions conditioned on their prior domain
knowledge (e.g., What baselines is the neural relation ex-
tractor compared to?). While one can gain a sense of the
main findings of a paper by reading the abstract, the answers
to these probing questions are frequently found in the de-
tails of the methodology, experimental setup, and results sec-
tions. Furthermore, questions may require synthesis of doc-
ument passages to produce an abstractive answer rather than
simply extracting a contiguous span. Reading and manually
cross-referencing the results of several papers is a labor-
intensive approach to glean specific knowledge from scien-
tific documents. Therefore, effective tools to help automate
knowledge discovery are sorely needed.
A promising approach to extracting knowledge from sci-
entific publications is document-level question answering
(QA): using an open set of questions to comprehend fig-
arXiv:2210.01959v3 [cs.CL] 11 Dec 2023