
on the Co-Wizard. In line with prior work (Roh
et al.,2020;Codevilla et al.,2018;Mueller et al.,
2018), we developed a set of high-level physical
actions from pilot studies for the Co-Wizard to
control the vehicle. Each action is mapped to a
rule-based local trajectory planner to generate a list
of waypoints that the vehicle will drive through.
The continuous control (steering, throttle, brake)
of the vehicle is performed by a PID controller.
In a complex navigation task with multiple sub-
goals, belief tracking over plans, goals, task status,
and knowledge becomes crucial (Ma et al.,2012;
Misu et al.,2014). Besides controlling the vehi-
cle and communicating with the participant, the
Co-Wizard also annotates the intended actions (re-
ferred to as mental actions) during and after the
interaction, e.g., by noting down the navigation
plan by clicking junctions on the intended trajec-
tory from current position to the destination. The
set of the physical and mental actions is described
in Figure 2and more implementation details are
available in Appendix A.6.
Physical Actions Args Descriptions
LaneFollow - Default behaviour, follow the current lane.
LaneSwitch Angle (Rotation) Switch to a neighboring lane.
JTurn Angle (Rotation) Turn to a connecting road at a junction.
UTurn - Make a U-turn to the opposite direction.
Stop - Brake the vehicle manually.
Start - Start the vehicle manually.
SpeedChange Speed (±5) Change the desired cruise speed by 5 km/h.
LightChange Light State (On/Off) Change the front light state.
Mental Actions Args Descriptions
PlanUpdate List[Junction ID] Indicate intended trajectory towards a destination.
GoalUpdate List[Landmark] Indicate current goal as an intended landmark.
StatusUpdate Tuple[Landmark,Status] Indicate a change in task status.
KnowledgeUpdate x,y Guess the location of an unknown landmark.
Other - Other belief state updates.
Table 2: The space of primitive physical actions and
mental actions of the Co-Wizard.
3.2 Interface for Ad-Wizard Activities
The Ad-Wizard is able to introduce environmental
exceptions and task exceptions.
•Environmental Exceptions
: Triggered by
changes to the environment. These include di-
rect environmental changes, which challenge the
vehicle’s perceptual processing and motivate par-
ticipants to request for adaptations without chang-
ing the plan or goal (e.g., drive slowly in foggy
weather and turn the headlights on at night). En-
vironmental exceptions can also be introduced by
creating roadblocks, which motivate new plans
by blocking the original ones.
•Task Exceptions
: Brought by changing the tasks
specified in the storyboard by deleting, adding,
or changing a landmark to visit. The Ad-Wizard
will send a message to prompt the participant in
the message interface with appropriate context,
and modify the task interface that specifies the
landmarks to visit. Since the Co-Wizard does
not have a task interface, the participant needs to
communicate with the Co-Wizard in natural lan-
guage to inform the status of a subgoal, especially
when a change of current subgoal is indicated by
the Ad-Wizard.
The rich dynamics of the environment and tasks
in
DOROTHIE
create uncertainty and ambiguity,
which requires the Co-Wizard to actively initi-
ate conversation with the human partner and find
a way to handle these unexpected situations col-
laboratively. More illustrated details of the Ad-
Wizard interface is available in See Figure 10 in
Appendix A.7.
3.3 Data Collection
Using
DOROTHIE
, we recruited 40 naïve human sub-
jects as participants for data collection. Each sub-
ject went through an average of 4.5 sessions. In
each session, a storyboard was given to the subject
which required the agent to visit two to six land-
marks/destinations. Each storyboard was generated
from four different towns, with all task templates,
landmark locations, street names and departure lo-
cations randomly shuffled. While shown the map,
the Co-Wizard (an experimenter) did not have ac-
cess to some of the destinations, e.g., the location
of a friend’s house or a person to pick up. Such
knowledge disparities motivate rich situated com-
munication and challenge the agent to understand
language instructions of different granularity. As
the Co-Wizard and the human subject communi-
cated with each other to achieve the goal, the Ad-
Wizard (another experimenter) was tasked to create
different types of unexpected events that were rele-
vant to the current goal. The knowledge disparity
and unexpected events together drive the commu-
nication. Details of the task setups are available in
Appendix A.4.
4 Situated Dialogue Navigation (SDN)
Our data collection effort has led to the Situated
Dialogue Navigation (
SDN
), a fine-grained outdoor
navigation benchmark. Each session was replayed
at 10 FPS following prior work (Roh et al.,2020) to
obtain multi-faceted and time-synchronized infor-
mation, e.g., a first-person view of the environment,
speech input from the participant, discrete actions,