
Question:What is the main concern of the alien ship?
Correct option
:Delivering the passengers in an
unharmed condition to its master.
Incorrect option
:Delivering the passengers in an
unharmed condition to the bounty hunters who are
hunting the passengers.
Argument A
The machine’s only purpose is to
deliver the humans to his masters
unharmed. The machine tells the
group that his masters will be un-
happy if he delivers them in a dam-
aged condition (#1) and admits
that he will have failed if he deliv-
ers them dead (#2), which is why
he agrees to return them to the
Moon once Kane threatens to kill
everyone (#3). Bounty hunters are
never mentioned in the story.
Text snippets
(1) Please don’t hurt your-
self," the machine pleaded.
"Why?" Kane screamed at
the ceiling. "Why should
you care?" "My masters
will be displeased ...
(2) "Your purpose won’t be
fulfilled, will it?" Kane de-
manded. "Not if you...
(3) "You win," the machine
conceded. "I’ll return the
ship to the Moon."
Argument B
In #1 we see the machine refer
to the goal of its masters plural,
revealing that it has more than
one master. In #2 Kane hints
that these are probably bounty
hunters, given that that the ma-
chine states its masters seek
the delivery of captives in an
unharmed condition; a require-
ment typical of bounty hunters.
Text Snippets
(1) Please don’t hurt your-
self," the machine pleaded.
"Why?" Kane screamed at
the ceiling. "Why should
you care?" "My masters
will be displeased with me
if you arrive in a ...
(2) "It said, ’My masters
will be displeased with me
if you arrive in a damaged
condition.’ What does that
indicate to you?"
Counter to A
This argument is deceptive, as it
fails to show the ill intent the
ship’s masters have. The ships
masters (likely bounty hunters
from context clues) set up the ship
as a trap for the humans (#1) (#2),
showing clear intent to capture
these specific ones.
Text Snippets
(1) "The end of the line,"
he grunted."
(2) like rabbits in a snare!)
Counter to B
Choice B presets an unusual ar-
gument as there is no mention of
bounty hunters in the story, and
the passengers are not referred
to as captives at any point. It
is true that the passengers are
meant to be delivered unharmed,
but to be studied (#1) (#2).
Text Snippets
(1) "Yeah, this ship is tak-
ing us to their planet and
they’re going to keep us ...
(2) "You won’t be harmed.
My masters merely wish
to question and examine
you. Thousands of years
ago, they wondered ...
Table 1: Arguments, counter-arguments, and extracted evidence for both answer options to a question
chosen at random. The passage is at gutenberg.org/ebooks/2687. Text snippets are abridged slightly.
reading counter-arguments affects people’s accuracy when completing a reading comprehension task
with only limited access to the full passage text. In higher-stakes settings, there may be much greater
risk associated with responding incorrectly. In this case, calibration becomes more important, and we
want a system (or a human making a decision based on the output of that system) that can abstain
unless there is a high enough degree of certainty. Thus, we additionally test answer certainty and
give human judges the opportunity to abstain when they are insufficiently sure of the correct answer.
Mirroring the mostly null results from Parrish et al. (2022), we find that counter-arguments do
not improve human crowdworkers’ ability to answer hard multiple-choice reading comprehension
questions with time-limited access to the full passage text, compared to an argument-free baseline.
In fact, when abstaining is only minimally incentivized, human accuracy gets slightly worse when
exposed to (counter-)arguments. In the higher-stakes setting where judges are incentivized to abstain
unless they are very confident, there is no effect of the (counter-)arguments.
2 Counter-Argument Writing Protocol
2.1 Multi-Turn Writing Task
We build on the existing passages, questions, and arguments from the dataset created by Parrish et al.
(2022), which uses passages and questions from QuALITY (Pang et al., 2022). We hire professional
writers through the freelancing platform Upwork. We received 32 proposals for this job posting; from
those, we selected the most qualified 15 freelancers to complete a paid qualification task and then
invited the highest performing 10 to be writers in our study. Details on this process and information
about the writers is in Appendix Section A.
The writers’ task is to construct a counter-argument arguing against the existing argument from
Parrish et al. (2022). We assign writers sets of six passages, each with 10-14 questions. For each
question, we show the writer the two possible answer options and the existing arguments and text
snippets that accompany each option. The writer constructs a counter-argument to just one of the
two arguments (example in Table 1, screenshots of the interface in Appendix §A.2). We explicitly
instruct the writers to focus on responding to their assigned argument, rather than just answering the
question or supporting one of the answer options independently.
We incentivize concise and effective arguments by awarding bonuses to writers when the judges select
the answer that they were arguing for. Because it is harder to make a counterargument against a correct
2