ToMinHAI at CHI 2024, May 12th, Honolulu, Hawaii Wang and Goel
2 RELATED WORK
2.1
Theoretical Perspectives of Communication
Communication is commonly dened as “the process of transmit-
ting information and common understanding from one person to
another.” [
32
] Scholars across disciplines have oered dierent per-
spectives to study and enhance communication.
In communication studies, researchers have focused on the dif-
ferent components at play during the communication process. The
classic Shannon-Weaver model of communication [
38
] outlines sev-
eral key components during the communication process [
32
]: sender
who initiates the communication process by sending messages en-
coded using symbols, gestures, words, or sentences through a cho-
sen channel to the receiver. While the message is transmitting
through the channel, there could be noises that could distort the
message. After receiving the message from the sender, the receiver
will decode the message into meaningful information, depending on
how the receiver interprets the message. Finally, the receiver will
provide feedback as a response to the sender. These key components
determine the quality and eectiveness of the communication.
The Cognitive Science perspective of communication highlights
the critical role of ToM [
33
]. ToM enables us to make suppositions
of other’s minds through verbal and behavioral cues, acting as
the foundation of human-human communication [
2
,
3
]. From this
perspective, both interlocutors during communication can form
interpretations of what’s on the other interlocutor’s mind based
on the implicit and explicit communication cues. For example, we
can often infer the interlocutors’ goals, plans, or preferences based
on what they said, their facial expressions, or their bodily expres-
sions [
2
,
33
]. Based on that interpretation we formed about the
other’s mind, we will act accordingly to correct, explain, or per-
suade. This cycle of building an interpretation of other’s minds and
then act upon that interpretation continues iteratively throughout
the communication process. Inferring about each other’s minds
through behavioral cues, according to this perspective, is therefore
crucial to a smooth and successful communication.
Communication process can also be interpreted from the social
science perspective through impression management [
14
]. In his
seminal work, Goman describes social interaction as an informa-
tion game between individuals and their audience to maintain the
“veneer of consensus” to keep the conversation going and to avoid
awkwardness. During social interactions, the audience usually try
to gather as much information as they could about the individu-
als they interact with in order to elicit a desirable response from
the individual; whereas individuals put up performances through
two kinds of expressions—– expressions that are intentionally per-
formed to leave a certain impression (expression given) or expres-
sions that are unintentionally given o that could inuence the
audience’s impressions of them (expression given o)—– to manage
impressions [
14
]. Throughout interactions, each party conveys their
denition of the situation through communications: individuals by
expressions and audience by reactions to the individuals.
These three perspectives on communication emphasize dierent
aspects of the communication process: the communication study
perspective focuses on the encoding and decoding process of mes-
sages; the cognitive science perspective discusses how behavioral
cues can inform our interpretations of interlocutor’s minds; the
social science perspective describes how interpretations of others’
minds could predict our behaviors. Our Mutual Theory of Mind
framework attempts to bring these dierent emphasis together into
one coherent framework to understand the mutual shaping process
of interpretations and feedback during communication.
2.2 Theory of Mind in Human-AI
Communication
Over the years, many researchers have recognized the crucial role
of ToM in HAI. In human-robot teaming research, ToM has been
intentionally built in as part of the system architecture to help
robots monitor world state as well as the human state [
8
], to con-
struct simulation of hypothetical cognitive models of the human
partner to account for human behaviors that deviate from original
plans [
34
], and to help robots to build mental models about user
beliefs, plans and goals [
20
,
24
]. Robots built with ToM have demon-
strated positive outcomes in team operations [
8
] and are perceived
to be more natural and intelligent [29].
Other research in HCI and human-centered AI has also been
exploring along the realm of ToM, focusing mostly on enhancing
user’s mental models and understanding of the AI systems. Prior re-
search has explored people’s mental model of AI systems— people’s
mental model of AI agents could include global behavior, knowl-
edge distribution, and local behavior [
13
]. People’s perception of
AI systems is instrumental in guiding how they interact with AI
systems [
13
] and thus serves as a precursor to their expectation of
AI’s behavior. Some recent research has also begun to examine how
to automatically infer user’s mental model of AI. Prior research sug-
gests the potential of leveraging linguistic cues to indicate people’s
perception of AIs during human-AI interactions. Researchers have
been able to infer users’ emotions towards an AI agent [
40
] and
signs of conversation breakdowns [26]from communication cues.
Given that AI’s behavior and output could also inuence user’s
mental model of the AI, and therefore how the user decides to
interact with the AI, we want to highlight that the interpretation-
feedback loop is mutual during the human-AI communication
process— user’s mental model of the AI can be informed by the AI’s
output, yet AI’s interpretation of the user can also be informed by
the user’s output, which is determined by the user’s mental model
of the AI. We propose the Mutual Theory of Mind framework to cap-
ture this mutual shaping process of interpretation-feedback during
human-AI communication.
3 MUTUAL THEORY OF MIND FRAMEWORK
Drawing from theoretical and empirical work, we posit the MToM
framework to guide the understanding and design of communica-
tions between humans and AI systems that exhibit social behaviors
enabled by ToM-like capability. The MToM framework provides
both process and content account of human-AI communication
by highlighting three elements that mutually shape the human-AI
communication process in three stages.
3.1 Three Elements of the MToM Framework
In the MToM framework, three elements are critical for humans
and AI to reach mutual understanding during the communication
process: interpretation, feedback, and mutuality.