
prompted with a stimulus, does an agent repro-
duce an internal representation similar enough
to that intended? Call this
understanding-as-
representation
. Many have proposed related theo-
ries (Shanahan and Mitchell,2022;Barsalou,2008;
Hofstadter and Sander,2013;Jackendoff et al.,
2012;Grice,1989). In this view, if someone un-
thinkingly blurts out the correct answer to a ques-
tion, they would not have understood. While a ther-
mostat reproduces a certain representation given
a temperature this representation is not similar to
a person’s. Some have said that models appear
not to understand because their interrogators fail
to present stimuli in a model-understandable way
(Michael 2020 summarizes). Exactly: I am con-
cerned with human language understanding–not
any possible form of understanding.
To advance a science of understanding, I argue
that as-reliability is necessary, as-representation is
sufficient, and as-mapping is neither.
I reject the premise of as-mapping that the way
we use words is separate from our meanings. While
current work in NLP poorly approximates shared
intentionality
3
I disagree that this is the only route
to meaning.
4
We could imagine a very large look-
up table. There is no boundary between what is
and what is not a language.5
I accept as-reliability in theory. Enough data
and parameters should yield a language-performant
agent indistinguishably similar to a human tested
on byte streams passed along a wire. Similarly,
Potts (2022) argues that a self-supervised founda-
tion model could do so. Still, I am skeptical of
what I call the Scaling Paradigm, that scale alone
is a realistic approach.
I think that hill climbing works but we’re climb-
ing the wrong hill.
3 Climbing the Right Hill
As-representation and as-reliability are compatible:
we may care about representation but more easily
look for reliability. I argue that input-output behav-
ioral tests are necessary but may not be sufficient
to attribute understanding–we may need to look
inside.6
3
The meaning to which Bender and Koller (2020) says
models have no access.
4
Millikan offers an account where inner representations
exist but are not shared (Millikan,2017).
5
Bender and Koller (2020) permit meaning in models
which ground linguistic form on images.
6Compare Churchland and Churchland (1990).
Nonetheless, Alisha, when messaging with
Bowen, has no need to look inside Bowen’s head to
verify that he understood the following exchange:
A: I’m unhappy.
B: Why aren’t you happy?
Our human bias is to assume that other agents un-
derstand until evidence proves otherwise (Weizen-
baum,1976). This is pragmatic; until recently hu-
mans did not encounter non-human agents who
could respond somewhat reliably. Humans assume
a similarity of representation, that others have the
same inductive biases.
We can’t make that assumption with our models.
We can’t assume that a chat-bot has a bias to coo
over babies (cf. Hrdy 2009). This is why Turing’s
(1948) test doesn’t work–the smoke and mirror pro-
grams which won the Loebner prize unintentionally
parody input-output tests (Minsky,1995). Reliabil-
ity, while useful, alone does not advance a science
of understanding. As-reliability does not tell us
which biases induce understanding. It is not causal.
Granted, humans’ internal representations are
difficult to measure, may change at each point of
access, and in AI we’ve historically leaned too
heavily on certain putative representations. Sutton
(2019) calls this a "bitter lesson."
So why talk of representation? I agree with the
"bitter lesson" but I also know that there is no such
thing as free lunch; human language occupies a
small manifold in the space of possible functions. I
don’t argue to replicate natural functions but rather
to be honest about human strengths lest we wander
off into fruitless regions of state space. To do logic,
at some internal level a system is going to have to
appear to use the parts of logic.
Advancing as-representation does not mean we
know what representations underlie human lan-
guage nor that we must use certain ones.
Advancing as-representation does mean that we
pay attention to the constraints on human language
usage (§4). We should use those to guide our bench-
mark tests for reliability. We should not get lost in
our proxies, especially what the Scaling Paradigm
assumes (§5).
4 Under-specification of Meaning
Language is dynamic (e.g. has a history), intersub-
jective (multi-agent), grounded in a large number
of modalities (senses), collectively intentional (in a