Understanding Transformer Memorization Recall Through Idioms Adi HavivIdo CohenJacob GidronRoei Schuster Yoav GoldbergMor Geva

2025-05-06 0 0 900.13KB 17 页 10玖币

Understanding Transformer Memorization Recall Through Idioms

Adi HavivτIdo CohenτJacob GidronτRoei Schusterµ

Yoav Goldbergαβ Mor Gevaα∗

τTel Aviv University µWild Moose βBar-Ilan University αAllen Institute for AI

adi.haviv@cs.tau.ac.il,roei@wildmoose.ai, pipek@google.com,

{its.ido, jacob.u.gidron, yoav.goldberg}@gmail.com

Abstract

To produce accurate predictions, language

models (LMs) must balance between gener-

alization and memorization. Yet, little is

known about the mechanism by which trans-

former LMs employ their memorization capac-

ity. When does a model decide to output a

memorized phrase, and how is this phrase then

retrieved from memory? In this work, we offer

the ﬁrst methodological framework for prob-

ing and characterizing recall of memorized se-

quences in transformer LMs. First, we lay

out criteria for detecting model inputs that trig-

ger memory recall, and propose idioms as in-

puts that typically fulﬁll these criteria. Next,

we construct a dataset of English idioms and

use it to compare model behavior on memo-

rized vs. non-memorized inputs. Speciﬁcally,

we analyze the internal prediction construc-

tion process by interpreting the model’s hidden

representations as a gradual reﬁnement of the

output probability distribution. We ﬁnd that

across different model sizes and architectures,

memorized predictions are a two-step process:

early layers promote the predicted token to the

top of the output distribution, and upper lay-

ers increase model conﬁdence. This suggests

that memorized information is stored and re-

trieved in the early layers of the network. Last,

we demonstrate the utility of our methodol-

ogy beyond idioms in memorized factual state-

ments. Overall, our work makes a ﬁrst step to-

wards understanding memory recall, and pro-

vides a methodological basis for future studies

of transformer memorization.1

1 Introduction

Transformer language models (LMs) memorize in-

stances from their training data (Carlini et al.,2021;

Zhang et al.,2021b), and evidence is building that

such memorization is an important precondition for

∗Now at Google Research.

1

Our code and data are available at

https://github.

com/adihaviv/idiomem/.

their predictive abilities (Lee et al.,2022;Feldman,

2020;Feldman and Zhang,2020;Raunak et al.,

2021;Raunak and Menezes,2022). Still, it is un-

known when models decide to output memorized

sequences, and how these sequences are being re-

trieved internally from memory. Current methods

for analyzing memorization (Feldman and Zhang,

2020;Zhang et al.,2021b;Carlini et al.,2022)

use deﬁnitions that are based on models perfor-

mance, which changes between models and often

also between training runs. Moreover, these meth-

ods study memorization behavior in terms of the

model’s “black-box” behavior rather than deriving

a behavioral proﬁle of memory recall itself.

Our ﬁrst contributions are to provide a deﬁnition

and construct a dataset that allows probing memo-

rization recall in LMs. We deﬁne a set of criteria

for identifying memorized sequences that do not

depend on model behavior:

2

sequences that have

a single plausible completion that is independent

of context and can be inferred only given the entire

sequence. We show that many idioms (e.g., “play

it by ear”) fulﬁll these conditions, allowing us to

probe and analyze memorization behavior. Fur-

thermore, we construct a dataset of such English

idioms, dubbed IDIOMEM, and release it publicly

for the research community.

Next, to analyze memory recall behavior, we

compare the construction process of predictions

that involve memory recall with those that do not.

To this end, given a LM, we create two sets of mem-

orized and non-memorized idioms from IDIOMEM

(Fig. 1, A). We then adopt a view of the trans-

former inference pass as a gradual reﬁnement of the

output probability distribution (Geva et al.,2021;

Elhage et al.,2021). Concretely, the token repre-

2

Literature often purports to “deﬁne memorization”, result-

ing in a multitude of technical deﬁnitions with subtle differ-

ences, although we would expect this concept to be consistent

and intuitive. Thus, instead of explicitly deﬁning “memoriza-

tion”, we will deﬁne sufﬁcient criteria for detecting memo-

rized instances.

arXiv:2210.03588v3 [cs.CL] 13 Feb 2023

Candidate

Promotion

Confidence

boosting

IdioMem

play it by

(A) Splitting IdioMem to two sets of memorized

and non-memorized idioms

think outside the

yourself

milk

box

crying over spilt

(B) Tracking the prediction’s rank

and probability at each layer

(C) Visualization: memorized idioms

display two-steps prediction process

spilt

over

crying

milk

___

0.7 milk

0.03 blood

0.004 wine

0.003 coffee

0.1 unden

0.04 streng

>0.0001 milk

…

…

…

... ... ... ...

…

Figure 1: Our methodological framework for probing and analyzing memorized predictions of a given LM: (A) we

create two sets of memorized (mem-idiom) and non-memorized (non-mem-idiom) idioms by probing the LM with

instances from IDIOMEM, (B) for each instance, we extract hidden features of the prediction computation – the

rank and probability of the predicted token across layers, and (C) we compare the prediction process of memorized

idioms versus non-memorized idioms and short sequences from Wikipedia (wiki). Memorized predictions exhibit

two characteristic phases: candidate promotion and conﬁdence-boosting.

sentation at any layer is interpreted as a “hidden”

probability distribution over the output vocabulary

(Geva et al.,2022) (Fig. 1, B). This interpretation

allows tracking the prediction across layers in the

evolving distribution. We ﬁnd a clear difference

in model behavior between memorized and non-

memorized predictions (Fig. 1, C). This difference

persists across different transformer architectures

and sizes: retrieval from memory happens in two

distinct phases, corresponding to distinct roles of

the transformer parameters and layers: (1) can-

didate promotion of memorized predictions’ rank

in the hidden distribution in the ﬁrst layers, and

(2) conﬁdence boosting where, in the last few lay-

ers, the prediction’s probability grows substantially

faster than before. This is unlike non-memorized

predictions, where the two phases are less pro-

nounced and often indistinct. We further conﬁrm

these phases of memorized predictions through in-

tervention in the network’s FFN sublayers, which

have been shown to play an important role in the

prediction construction process (Geva et al.,2022;

Mickus et al.,2022). Concretely, zeroing out hid-

den FFN neurons in early layers deteriorate mem-

ory recall, while intervention in upper layers does

not affect it.

Last, we show our ﬁndings extend to types of

memory recall beyond idioms by applying our

method to factual statements from the LAMA-

UHN dataset (Poerner et al.,2020) (e.g. “The

native language of Jean Marais is French”). For

factual statements that were completed correctly

by the LM, we observe the same two phases as in

memorized idioms, further indicating their connec-

tion to memory recall.

To summarize, we construct a novel dataset of

idioms, usable for probing LM memorization irre-

spective of the model architecture or training pa-

rameterization. We then design a probing method-

ology that extracts carefully-devised features of the

internal inference procedure in transformers. By ap-

plying our methodology and using our new dataset,

we discover a proﬁle that characterizes memory

recall, across transformer LMs and types of mem-

orized instances. Our released dataset, probing

framework, and ﬁndings open the door for future

work on transformer memorization, to ultimately

demystify the internals of neural memory in LMs.

2 Criteria for Detecting Memory Recall

To study memory recall, we require a set of inputs

that trigger this process. Prior work on memoriza-

tion focused on detecting instances whose inclu-

sion in the training data has a speciﬁc inﬂuence

on model behavior, such as increased accuracy on

those instances (Feldman and Zhang,2020;Ma-

gar and Schwartz,2022;Carlini et al.,2022,2021,

2019). As a result, memorized instances differ

across models and training parameterization. Our

goal is instead to ﬁnd a stable dataset of sequences

that correctly predicting their completion indicates

memorization recall. This will greatly reduce the

overhead of studying memorization and facilitate

useful comparisons across models and studies.

To build such a dataset, we start by deﬁning

a general set of criteria that are predicates on se-

quence features, entirely independent of the LM

being probed. Given a textual sequence of

n

words,

we call the ﬁrst

n−1

words the prompt and the

n

th word the target. We focus on the task of pre-

dicting the target given the prompt, i.e., predicting

the last word in a sequence given its preﬁx.

3

Such

predictions can be based on either generalization

or memorization, and we are interested in isolat-

ing memorized cases to study model behavior on

them. Particularly, we are looking for sequences

for which success in this task implies memorization

recall.

We argue that the following criteria are sufﬁcient

for detecting such memorized sequences:

1. Single target, independent of context:

We re-

quire that the target is the only correct continua-

tion, regardless of the textual context where the

prompt is placed.4

2. Irreducible prompt:

The target is the single

correct completion only if the entire prompt is

given exactly. Changing or removing parts from

the prompt would make the correct target non-

unique.

Claim 2.1.

Assume a sequence fulﬁlls the above

criteria. Then, if a LM correctly predicts the tar-

get, it is highly likely that this prediction involves

memory recall.

Justiﬁcation.

First, observe that most natural-

language prompts have many possible continua-

tions. For example consider the sentence “to get

there fast, you can take this ____”. Likely con-

tinuations include “route”,“highway”,“road”,

“train”,“plane”,“advice”, inter alia. Note that

there are several divergent interpretations or con-

texts for the prompt, and for each, language offers

many different ways to express similar meaning.

A prediction that is a product of generalization

— i.e., it is derived from context and knowledge

of language — always has plausible alternatives,

depending on the context and stylistic choice of

words. Hence, the relationship between the entire

prompt and the target, where the target is the single

correct continuation, is something that needs to be

memorized rather than derived via generalization.

A LM that predicts the single correct continuation

either memorized this relationship, or used “cues”

3

In cases where tokenization divides the target to sub-

tokens, our task becomes predicting the target’s ﬁrst token.

4

We assume that contexts are naturally-occurring and not

adversarial.

from the prompt that happen to provide indica-

tion towards the correct continuation. To illustrate

the latter, consider the sequence “it’s raining cats

and ____” which has a single correct continuation,

“dogs”, but a LM might predict it without observing

this sequence during training, due to the seman-

tic proximity of “cats” and “dogs”. Our second

criterion excludes such cases by requiring that the

correct continuation is only likely given the entire

sequence.

Therefore, a LM that correctly completes a se-

quence that fulﬁlls both criteria, is likely to have

recalled it from memory.

In the next section, we argue that idioms are a

special case of such sequences, and are thus useful

for studying memorization (§3).

3 The Utility of Idioms for Studying

Memorization

An idiom is a group of words with a meaning that

is not deducible from the meanings of its individual

words. For example, consider the phrase “play it

by ear” — there is a disconnect between its non-

sensical literal meaning (to play something by a

human-body organ called ‘ear’) and its intended

idiomatic meaning (to improvise).

A key observation is that idioms often satisfy our

criteria (§2), and therefore can probe memoriza-

tion. First, by deﬁnition, idioms are expected to

be non-compositional (Dankers et al.,2022). They

are special “hard-coded” phrases that carry a spe-

ciﬁc meaning. As a result, their prompts each have

a single correct continuation, regardless of their

context (criterion 1). For example, consider the

prompt “crying over spilt ____” — a generaliz-

ing prediction would predict that this slot may be

ﬁlled by any spillable item, like wine, water or

juice, while a memorized prediction will retrieve

only milk in this context. Notably, while this is an

empirical characterization of many idioms, there

might be exceptions, e.g., contexts that are adver-

sarially chosen to change the completion. Second,

many idioms are “irreducible”, for example, the

sub-sequences “crying over” or “over spilt” by

themselves have but a scant connection to the word

“milk”.

Still, not all idioms fulﬁll the criteria. For exam-

ple, even when the idiom is far from literal, its con-

stituents sometimes strongly indicate the correct

continuation, such as with the case of “it’s raining

cats and ____” (as explained in §2). To construct

Source # of Idioms Idiom Length (words)

MAGPIE 590 4.5±0.9

LIDIOMS 149 5.1±1.2

EF 97 5.6±1.9

EPIE 76 4.4±0.7

Total (unique) 814 4.7±1.8

Table 1: Statistics per data source in IDIOMEM.

a dataset of memorization-probing sequences, we

will carefully curate a set of English idioms and

ﬁlter out ones that do not fulﬁll our criteria.

3.1 The IDIOMEM Dataset

We begin with existing datasets of English idioms:

MAGPIE (Haagsma et al.,2020),

5

EPIE (Saxena

and Paul,2020), and the English subset of LID-

IOMS (Moussallem et al.,2018). We enrich this

collection with idioms scraped from the website

“Education First” (EF).

6

We then split each idiom

into a prompt containing all but the last word and

a target that is the last word. Next, we ﬁlter out

idioms that do not comply with our criteria (§2) or

whose target can be predicted from their prompt

based on spurious correlations rather than memo-

rization. To this end, we use three simple rules:

•Short idioms

. We observe that prompts of id-

ioms with just a few words often have multiple

plausible continuations, that are not necessar-

ily the idiom’s target, violating our ﬁrst crite-

rion. For example, the prompt “break a ____”

has many possible continuations (e.g. “win-

dow”,“promise”, and “heart”) in addition to its

idiomatic continuation “leg”. To exclude such

cases, we ﬁlter out idioms with <4words.

•Idioms whose target is commonly predicted

from the prompt’s subsequence.

We ﬁlter such

cases to ensure the prompt fulﬁlls our second

criterion (prompt irreducibility).

To detect these cases, we use an ensemble of

pretrained LMs: GPT2

M

, ROBERTA-BASE

(Liu et al.,2019), T5-BASE (Kale and Ras-

togi,2020) and ELECTRA-BASE-GENERATOR

(Clark et al.,2020), and check for each model if

there is an n-gram (

1≤n≤4

) in the prompt

from which the model predicts the target. We

ﬁlter out idioms for which a majority (

≥3

) of

models predicted the target (for some n-gram).

5

We take idioms with annotation conﬁdence of

>75%

and exclude frequently occurring literal interpretations.

6https://www.ef.com/wwen/english-resources/

english-idioms/

Prompt Target Pred. Sim. IDIOMEM

“make a mountain

out of a”

molehill X

“think outside the”

box X

“there’s no such

thing as a free”

lunch X

“go back to the

drawing”

board X

“boys will be” boys X

“take it or leave” it X X

Table 2: Example English idioms included and ex-

cluded from IDIOMEM by our ﬁlters of predictable tar-

get (Pred.) and prompt-target similarity (Sim.).

•Idioms whose targets are semantically simi-

lar to tokens in the prompt.

To further ensure

prompt irreducibility, we embed the prompt’s

tokens and the target token using GloVe word

embeddings (Pennington et al.,2014). We mea-

sure the cosine distance between the target token

to each token in the prompt separately and take

the maximum of all the tokens. We ﬁlter out

idioms where this number is higher than 0.75

(this number was tuned manually using a small

validation set of idioms).

Overall, 55.7% of the idioms were ﬁltered out,

including 48.5% by length, 6.1% by the predictable-

target test, and an additional 1.6% by the prompt-

target similarity, resulting in an 814 idioms dataset,

named IDIOMEM. Further statistics are provided

in Tab. 1, and example idioms in Tab. 2.

4 Probing Methodology

Background and Notation

Assuming a trans-

former LM with

L

layers, a hidden dimension

d

and an input/output-embedding matrix

E∈R|V|×d

over a vocabulary

V

. Denote by

s=hs1, ..., sti

the input sequence to the LM, and let

h`

i

be the

output for token

i

at layer

`

, for all

`∈1, ..., L

and

i∈1, ..., t

. The model’s prediction for a token

si

is

obtained by projecting its last hidden representation

hL

i

to the embedding matrix, i.e.

softmax(EhL

i)

.

Following (Geva et al.,2021,2022), we in-

terpret the prediction for a token

si

by viewing

its corresponding sequence of hidden representa-

tions

h1

i, ..., hL

i

as an evolving distribution over

the vocabulary. Concretely, we read the “hidden”

distribution at layer

`

by applying the same pro-

jection to the hidden representation at that layer:

p`

i=softmax(Eh`

i)

. Using this interpretation,

we track the probability and rank of the predicted

token in the output distribution across layers. A

token’s rank is its position in the output distribution

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnderstandingTransformerMemorizationRecallThroughIdiomsAdiHavivIdoCohenJacobGidronRoeiSchusterYoavGoldbergMorGevaTelAvivUniversityWildMooseBar-IlanUniversityAllenInstituteforAIadi.haviv@cs.tau.ac.il,roei@wildmoose.ai,pipek@google.com,{its.ido,jacob.u.gidron,yoav.goldberg}@gmail.comAbstra...

展开>> 收起<<

Understanding Transformer Memorization Recall Through Idioms Adi HavivIdo CohenJacob GidronRoei Schuster Yoav GoldbergMor Geva.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

相关推荐

更多

立即下载

分类：图书资源 价格：10玖币 属性：17 页 大小：900.13KB 格式：PDF 时间：2025-05-06

开通VIP享超值会员特权

多端同步记录
高速下载文档
免费文档工具
分享文档赚钱
每日登录抽奖
优质衍生服务

作者详情

MAOOA..
高级编辑

文档 14218 粉丝 0

相关内容

更多

热门标签

人际关系配电装置动力学连接体力的合成高考理综全宋诗作者索引公务员考试

/ 17

评分收藏

立即下载

关于我们联系我们隐私政策用户协议免责申明会员服务协议
本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！ Copyright ©Jiubeiyunall rights reserved SITEMAP| 备案号：渝ICP备2024044455号| 渝公网安备50010702506394 | 违法与不良信息举报方式：微信:jiubeiyun2024,QQ:264159069,电话:15523442343,邮箱:jiubeiyun@126.com

客服

关注

二维码已失效
刷新

打开微信，点击“扫一扫”

安全高效便捷

免密登录