Language Models Understand Us Poorly Jared Moore University of Washington School of Computer Science

2025-05-04 0 0 262.6KB 9 页 10玖币

侵权投诉

Language Models Understand Us, Poorly

Jared Moore

University of Washington School of Computer Science

jared@jaredmoore.org

Abstract

Some claim language models understand us.

Others won’t hear it. To clarify, I investi-

gate three views of human language under-

standing:as-mapping,as-reliability and as-

representation (§2). I argue that while be-

havioral reliability is necessary for understand-

ing, internal representations are sufﬁcient; they

climb the right hill (§3). I review state-

of-the-art language and multi-modal models:

they are pragmatically challenged by under-

speciﬁcation of form (§4). I question the Scal-

ing Paradigm: limits on resources may pro-

hibit scaled-up models from approaching un-

derstanding (§5). Last, I describe how as-

representation advances a science of under-

standing. We need work which probes model

internals, adds more of human language, and

measures what models can learn (§6).

1 Introduction

A theme of EMNLP this year is "unresolved issues

in NLP." Hence I consider what it means to under-

stand human language, whether current language

models understand and whether future models will.

Recent large language models have achieved im-

pressive results on benchmark tasks (Thoppilan

et al.,2022;Brown et al.,2020). These results

challenge ordained wisdom on the representations

necessary for language production. We’ve seen im-

proved results from multi-modal models (Saharia

et al.,2022;Ramesh et al.,2022,2021;Shuster

et al.,2020;Radford et al.,2022;Borsos et al.,

2022), what some call foundation models (Bom-

masani et al.,2021). Some models even run images,

text, and games (Reed et al.,2022). Michael et al.

(2022) identify language understanding and scaling

as pertinent and much debated questions in NLP.

So what’s next? I identify three views on

language understanding (§2): understanding-

as-mapping,understanding-as-reliability, and

understanding-as-representation. Through exam-

ples of recent limitations of language models (§4), I

argue for understanding-as-representation because

it climbs the right hill (§3). In particular, I ques-

tion the assumption that scaling current models is

computationally feasible to lead to human-like un-

derstanding (§5). Because of the large gap between

human and model understanding, I think it is gen-

erally misapplied to say that models "understand"

(§6.1). Better applied are examples of promising

work on understanding (§6.2).

2 Views on Understanding

Some argue that there is a strict barrier which sepa-

rates human from machine understanding (Bender

and Koller,2020;Searle,1980).

Understanding-

as-mapping

puts understanding in terms of an ab-

solute mapping between form and meaning. Here,

meaning comes from what a series of forms de-

scribes. Those forms can be composed in a variety

of ways to yield different, legible meanings.

Of-

ten, those with this view imply humans have special

access to meaning.

Others argue that we ought be rid of the dis-

tinction between human and machine understand-

ing. They imply models will close the gap soon

enough (Manning,2022;Agüera y Arcas,2022;

Kurzweil,2005;Turing,1950).

Understanding-

as-reliability

puts understanding as a question of

reliable communication: can one agent expect an-

other agent to respond to stimuli in a certain way?

This view assumes that scaling alone will lead to

an agent capable of human-like language; system

internals don’t matter. For example, in the most

extreme case we can imagine a very large look-up

table with state (cf. Russell and Norvig 2021): a

mapping from every input sequence to a sensible

output sequence.

In this paper, I put understanding in terms

of internal, dynamical representation: when

1Goldberg (2015) reviews compositionality.

2Michael (2020) names this the behaviorist view.

arXiv:2210.10684v1 [cs.CL] 19 Oct 2022

prompted with a stimulus, does an agent repro-

duce an internal representation similar enough

to that intended? Call this

understanding-as-

representation

. Many have proposed related theo-

ries (Shanahan and Mitchell,2022;Barsalou,2008;

Hofstadter and Sander,2013;Jackendoff et al.,

2012;Grice,1989). In this view, if someone un-

thinkingly blurts out the correct answer to a ques-

tion, they would not have understood. While a ther-

mostat reproduces a certain representation given

a temperature this representation is not similar to

a person’s. Some have said that models appear

not to understand because their interrogators fail

to present stimuli in a model-understandable way

(Michael 2020 summarizes). Exactly: I am con-

cerned with human language understanding–not

any possible form of understanding.

To advance a science of understanding, I argue

that as-reliability is necessary, as-representation is

sufﬁcient, and as-mapping is neither.

I reject the premise of as-mapping that the way

we use words is separate from our meanings. While

current work in NLP poorly approximates shared

intentionality

I disagree that this is the only route

to meaning.

We could imagine a very large look-

up table. There is no boundary between what is

and what is not a language.5

I accept as-reliability in theory. Enough data

and parameters should yield a language-performant

agent indistinguishably similar to a human tested

on byte streams passed along a wire. Similarly,

Potts (2022) argues that a self-supervised founda-

tion model could do so. Still, I am skeptical of

what I call the Scaling Paradigm, that scale alone

is a realistic approach.

I think that hill climbing works but we’re climb-

ing the wrong hill.

3 Climbing the Right Hill

As-representation and as-reliability are compatible:

we may care about representation but more easily

look for reliability. I argue that input-output behav-

ioral tests are necessary but may not be sufﬁcient

to attribute understanding–we may need to look

inside.6

The meaning to which Bender and Koller (2020) says

models have no access.

Millikan offers an account where inner representations

exist but are not shared (Millikan,2017).

Bender and Koller (2020) permit meaning in models

which ground linguistic form on images.

6Compare Churchland and Churchland (1990).

Nonetheless, Alisha, when messaging with

Bowen, has no need to look inside Bowen’s head to

verify that he understood the following exchange:

A: I’m unhappy.

B: Why aren’t you happy?

Our human bias is to assume that other agents un-

derstand until evidence proves otherwise (Weizen-

baum,1976). This is pragmatic; until recently hu-

mans did not encounter non-human agents who

could respond somewhat reliably. Humans assume

a similarity of representation, that others have the

same inductive biases.

We can’t make that assumption with our models.

We can’t assume that a chat-bot has a bias to coo

over babies (cf. Hrdy 2009). This is why Turing’s

(1948) test doesn’t work–the smoke and mirror pro-

grams which won the Loebner prize unintentionally

parody input-output tests (Minsky,1995). Reliabil-

ity, while useful, alone does not advance a science

of understanding. As-reliability does not tell us

which biases induce understanding. It is not causal.

Granted, humans’ internal representations are

difﬁcult to measure, may change at each point of

access, and in AI we’ve historically leaned too

heavily on certain putative representations. Sutton

(2019) calls this a "bitter lesson."

So why talk of representation? I agree with the

"bitter lesson" but I also know that there is no such

thing as free lunch; human language occupies a

small manifold in the space of possible functions. I

don’t argue to replicate natural functions but rather

to be honest about human strengths lest we wander

off into fruitless regions of state space. To do logic,

at some internal level a system is going to have to

appear to use the parts of logic.

Advancing as-representation does not mean we

know what representations underlie human lan-

guage nor that we must use certain ones.

Advancing as-representation does mean that we

pay attention to the constraints on human language

usage (§4). We should use those to guide our bench-

mark tests for reliability. We should not get lost in

our proxies, especially what the Scaling Paradigm

assumes (§5).

4 Under-speciﬁcation of Meaning

Language is dynamic (e.g. has a history), intersub-

jective (multi-agent), grounded in a large number

of modalities (senses), collectively intentional (in a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LanguageModelsUnderstandUs,PoorlyJaredMooreUniversityofWashingtonSchoolofComputerSciencejared@jaredmoore.orgAbstractSomeclaimlanguagemodelsunderstandus.Otherswon'thearit.Toclarify,Iinvesti-gatethreeviewsofhumanlanguageunder-standing:as-mapping,as-reliabilityandas-representation(§2).Iarguethatwhilebe...

展开>> 收起<<

Language Models Understand Us Poorly Jared Moore University of Washington School of Computer Science.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Language Models Understand Us Poorly Jared Moore University of Washington School of Computer Science

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: