Language Models Understand Us Poorly Jared Moore University of Washington School of Computer Science

2025-05-04 0 0 262.6KB 9 页 10玖币
侵权投诉
Language Models Understand Us, Poorly
Jared Moore
University of Washington School of Computer Science
jared@jaredmoore.org
Abstract
Some claim language models understand us.
Others won’t hear it. To clarify, I investi-
gate three views of human language under-
standing:as-mapping,as-reliability and as-
representation 2). I argue that while be-
havioral reliability is necessary for understand-
ing, internal representations are sufficient; they
climb the right hill (§3). I review state-
of-the-art language and multi-modal models:
they are pragmatically challenged by under-
specification of form (§4). I question the Scal-
ing Paradigm: limits on resources may pro-
hibit scaled-up models from approaching un-
derstanding (§5). Last, I describe how as-
representation advances a science of under-
standing. We need work which probes model
internals, adds more of human language, and
measures what models can learn (§6).
1 Introduction
A theme of EMNLP this year is "unresolved issues
in NLP." Hence I consider what it means to under-
stand human language, whether current language
models understand and whether future models will.
Recent large language models have achieved im-
pressive results on benchmark tasks (Thoppilan
et al.,2022;Brown et al.,2020). These results
challenge ordained wisdom on the representations
necessary for language production. We’ve seen im-
proved results from multi-modal models (Saharia
et al.,2022;Ramesh et al.,2022,2021;Shuster
et al.,2020;Radford et al.,2022;Borsos et al.,
2022), what some call foundation models (Bom-
masani et al.,2021). Some models even run images,
text, and games (Reed et al.,2022). Michael et al.
(2022) identify language understanding and scaling
as pertinent and much debated questions in NLP.
So what’s next? I identify three views on
language understanding (§2): understanding-
as-mapping,understanding-as-reliability, and
understanding-as-representation. Through exam-
ples of recent limitations of language models (§4), I
argue for understanding-as-representation because
it climbs the right hill (§3). In particular, I ques-
tion the assumption that scaling current models is
computationally feasible to lead to human-like un-
derstanding (§5). Because of the large gap between
human and model understanding, I think it is gen-
erally misapplied to say that models "understand"
6.1). Better applied are examples of promising
work on understanding (§6.2).
2 Views on Understanding
Some argue that there is a strict barrier which sepa-
rates human from machine understanding (Bender
and Koller,2020;Searle,1980).
Understanding-
as-mapping
puts understanding in terms of an ab-
solute mapping between form and meaning. Here,
meaning comes from what a series of forms de-
scribes. Those forms can be composed in a variety
of ways to yield different, legible meanings.
1
Of-
ten, those with this view imply humans have special
access to meaning.
Others argue that we ought be rid of the dis-
tinction between human and machine understand-
ing. They imply models will close the gap soon
enough (Manning,2022;Agüera y Arcas,2022;
Kurzweil,2005;Turing,1950).
Understanding-
as-reliability
puts understanding as a question of
reliable communication: can one agent expect an-
other agent to respond to stimuli in a certain way?
2
This view assumes that scaling alone will lead to
an agent capable of human-like language; system
internals don’t matter. For example, in the most
extreme case we can imagine a very large look-up
table with state (cf. Russell and Norvig 2021): a
mapping from every input sequence to a sensible
output sequence.
In this paper, I put understanding in terms
of internal, dynamical representation: when
1Goldberg (2015) reviews compositionality.
2Michael (2020) names this the behaviorist view.
arXiv:2210.10684v1 [cs.CL] 19 Oct 2022
prompted with a stimulus, does an agent repro-
duce an internal representation similar enough
to that intended? Call this
understanding-as-
representation
. Many have proposed related theo-
ries (Shanahan and Mitchell,2022;Barsalou,2008;
Hofstadter and Sander,2013;Jackendoff et al.,
2012;Grice,1989). In this view, if someone un-
thinkingly blurts out the correct answer to a ques-
tion, they would not have understood. While a ther-
mostat reproduces a certain representation given
a temperature this representation is not similar to
a person’s. Some have said that models appear
not to understand because their interrogators fail
to present stimuli in a model-understandable way
(Michael 2020 summarizes). Exactly: I am con-
cerned with human language understanding–not
any possible form of understanding.
To advance a science of understanding, I argue
that as-reliability is necessary, as-representation is
sufficient, and as-mapping is neither.
I reject the premise of as-mapping that the way
we use words is separate from our meanings. While
current work in NLP poorly approximates shared
intentionality
3
I disagree that this is the only route
to meaning.
4
We could imagine a very large look-
up table. There is no boundary between what is
and what is not a language.5
I accept as-reliability in theory. Enough data
and parameters should yield a language-performant
agent indistinguishably similar to a human tested
on byte streams passed along a wire. Similarly,
Potts (2022) argues that a self-supervised founda-
tion model could do so. Still, I am skeptical of
what I call the Scaling Paradigm, that scale alone
is a realistic approach.
I think that hill climbing works but we’re climb-
ing the wrong hill.
3 Climbing the Right Hill
As-representation and as-reliability are compatible:
we may care about representation but more easily
look for reliability. I argue that input-output behav-
ioral tests are necessary but may not be sufficient
to attribute understanding–we may need to look
inside.6
3
The meaning to which Bender and Koller (2020) says
models have no access.
4
Millikan offers an account where inner representations
exist but are not shared (Millikan,2017).
5
Bender and Koller (2020) permit meaning in models
which ground linguistic form on images.
6Compare Churchland and Churchland (1990).
Nonetheless, Alisha, when messaging with
Bowen, has no need to look inside Bowen’s head to
verify that he understood the following exchange:
A: I’m unhappy.
B: Why aren’t you happy?
Our human bias is to assume that other agents un-
derstand until evidence proves otherwise (Weizen-
baum,1976). This is pragmatic; until recently hu-
mans did not encounter non-human agents who
could respond somewhat reliably. Humans assume
a similarity of representation, that others have the
same inductive biases.
We can’t make that assumption with our models.
We can’t assume that a chat-bot has a bias to coo
over babies (cf. Hrdy 2009). This is why Turing’s
(1948) test doesn’t work–the smoke and mirror pro-
grams which won the Loebner prize unintentionally
parody input-output tests (Minsky,1995). Reliabil-
ity, while useful, alone does not advance a science
of understanding. As-reliability does not tell us
which biases induce understanding. It is not causal.
Granted, humans’ internal representations are
difficult to measure, may change at each point of
access, and in AI we’ve historically leaned too
heavily on certain putative representations. Sutton
(2019) calls this a "bitter lesson."
So why talk of representation? I agree with the
"bitter lesson" but I also know that there is no such
thing as free lunch; human language occupies a
small manifold in the space of possible functions. I
don’t argue to replicate natural functions but rather
to be honest about human strengths lest we wander
off into fruitless regions of state space. To do logic,
at some internal level a system is going to have to
appear to use the parts of logic.
Advancing as-representation does not mean we
know what representations underlie human lan-
guage nor that we must use certain ones.
Advancing as-representation does mean that we
pay attention to the constraints on human language
usage (§4). We should use those to guide our bench-
mark tests for reliability. We should not get lost in
our proxies, especially what the Scaling Paradigm
assumes (§5).
4 Under-specification of Meaning
Language is dynamic (e.g. has a history), intersub-
jective (multi-agent), grounded in a large number
of modalities (senses), collectively intentional (in a
摘要:

LanguageModelsUnderstandUs,PoorlyJaredMooreUniversityofWashingtonSchoolofComputerSciencejared@jaredmoore.orgAbstractSomeclaimlanguagemodelsunderstandus.Otherswon'thearit.Toclarify,Iinvesti-gatethreeviewsofhumanlanguageunder-standing:as-mapping,as-reliabilityandas-representation(§2).Iarguethatwhilebe...

展开>> 收起<<
Language Models Understand Us Poorly Jared Moore University of Washington School of Computer Science.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:262.6KB 格式:PDF 时间:2025-05-04

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注