Finding Memo Extractive Memorization in Constrained Sequence Generation Tasks Vikas Raunak Arul Menezes

2025-05-08 0 0 287.38KB 10 页 10玖币

侵权投诉

Finding Memo:

Extractive Memorization in Constrained Sequence Generation Tasks

Vikas Raunak Arul Menezes

Microsoft Azure AI

Redmond, Washington

{viraunak,arulm}@microsoft.com

Abstract

Memorization presents a challenge for sev-

eral constrained Natural Language Generation

(NLG) tasks such as Neural Machine Trans-

lation (NMT), wherein the proclivity of neu-

ral models to memorize noisy and atypical

samples reacts adversely with the noisy (web

crawled) datasets. However, previous stud-

ies of memorization in constrained NLG tasks

have only focused on counterfactual memo-

rization, linking it to the problem of halluci-

nations. In this work, we propose a new, inex-

pensive algorithm for extractive memorization

(exact training data generation under insufﬁ-

cient context) in constrained sequence gener-

ation tasks and use it to study extractive mem-

orization and its effects in NMT. We demon-

strate that extractive memorization poses a

serious threat to NMT reliability by quali-

tatively and quantitatively characterizing the

memorized samples as well as the model be-

havior in their vicinity. Based on empirical

observations, we develop a simple algorithm

which elicits non-memorized translations of

memorized samples from the same model, for

a large fraction of such samples. Finally,

we show that the proposed algorithm could

also be leveraged to mitigate memorization in

the model through ﬁnetuning. We have re-

leased the code to reproduce our results at

https://github.com/vyraun/Finding-Memo.

1 Introduction

Previous studies (Arpit et al.,2017;Feldman,2020;

Zhang et al.,2021a) have shown that neural net-

works capture regular patterns in the training data

(generalization) while simultaneously ﬁtting noisy

and atypical samples using brute-force (memoriza-

tion). For constrained Natural Language Gener-

ation tasks such as Neural Machine Translation

(NMT), which rely heavily on noisy (web crawled)

data for training high-capacity neural networks,

this creates an inherent reliability problem. For ex-

ample, memorizations could manifest themselves

in the form of catastrophic translation errors on

speciﬁc samples despite high average model per-

formance (Raunak et al.,2021). It is also likely

that the memorization of a speciﬁc sample could

corrupt the translations of samples in its vicinity.

Therefore, exploring, quantifying and alleviating

the impact of memorization is of critical impor-

tance for improving the reliability of such systems.

Yet, most of the work on memorization in natural

language processing (NLP) has focused either on

classiﬁcation (Zheng and Jiang,2022) or on uncon-

strained generation tasks, predominantly language

modeling (Carlini et al.,2021;Zhang et al.,2021b;

Kharitonov et al.,2021;Chowdhery et al.,2022;

Tirumala et al.,2022;Tänzer et al.,2022;Haviv

et al.,2022). In this work, we ﬁll a gap in the

literature by developing an analogue of extractive

memorization for constrained sequence generation

tasks in general and NMT in particular. Our main

contributions are:

We propose a new, inexpensive algorithm

for studying extractive memorization in con-

strained sequence generation tasks and use it

to characterize memorization in NMT.

We demonstrate that extractive memorization

poses a serious threat to NMT reliability by

quantitatively and qualitatively analyzing the

memorized samples and the neighborhood ef-

fects of such memorization. We also demon-

strate that the memorized instances could be

used to generate errors in disparate systems.

Based on an analysis of the neighborhood ef-

fects of memorization, we develop a simple

memorization mitigation algorithm which pro-

duces non-memorized (higher quality) outputs

for a large fraction of memorized samples.

We show that the outputs produced by the

memorization mitigation algorithm could also

be used to directly impart corrective behavior

into the model through ﬁnetuning.

arXiv:2210.12929v1 [cs.CL] 24 Oct 2022

Repetitions Total Samples Memorized Ratio (%) Perturb Preﬁx Perturb Sufﬁx Perturb Start

1 100,000 174 0.17 17.58 % 43.24 % 12.29 %

2 100,000 317 0.32 11.67 % 62.84 % 4.98 %

3 5,381 17 0.32 28.42 % 49.52 % 18.82 %

4 1,885 5 0.26 27.40 % 34.00 % 8.00 %

5 976 7 0.72 26.67 % 70.00 % 11.42 %

1-5 208,242 520 0.25 16.65 % 51.65 % 8.00 %

Table 1: Quantifying Extractive Memorization: Number of Memorized Samples (using Algorithm 1) and Neigh-

borhood Effects of Memorization (using Algorithm 2) across different training data frequency buckets.

2 Related Work

Our work is concerned with the phenomenon of

memorization in constrained natural language gen-

eration in general and NMT in particular. The main

challenge in analyzing memorization is to deter-

mine which samples have been memorized by the

model during training. There exist two key algo-

rithms to elicit memorized samples, each yielding a

distinctive operational deﬁnition of memorization:

1. Counterfactual Memorization

:Feldman

and Zhang (2020) study label memorization

and propose to estimate the memorization

value of a training sample by training mul-

tiple models on different random subsets of

the training data and then measuring the devi-

ation in the sample’s classiﬁcation accuracy

under inclusion/exclusion. This deﬁnition of

memorization was further extended to arbi-

trary performance measures by Raunak et al.

(2021) to study memorization in NMT and by

Zhang et al. (2021b) to study memorization

in language models. However, a practical lim-

itation of analysis based on this deﬁnition is

the prohibitive computational cost (multiple

model trainings) associated with computing

memorization values for each training sample.

2. Extractive Memorization

:Carlini et al.

(2021) propose a data-extraction based deﬁni-

tion of memorization to study memorization

in language models. Therein, a training string

is extractable if there exists a preﬁx

that

could exactly generate

under an appropri-

ate sampling strategy (e.g. greedy decoding).

This deﬁnition has the beneﬁt of being com-

putationally inexpensive, although it doesn’t

have any existing analogue for constrained nat-

ural language generation tasks such as NMT.

In the next section, we deﬁne extractive memo-

rization for constrained sequence generation tasks

and apply it to NMT, in section 4we estimate the

neighborhood effect of such memorizations and in

section 5we propose a simple algorithm for recov-

ering correct translations of memorized samples.

3 Extractive Memorization

We present our deﬁnition of extractive memoriza-

tion as Algorithm 1. Analogous to extractive mem-

orization in language models (Carlini et al.,2021),

this deﬁnition labels an input sentence (source) as

being memorized if its transduction (translation)

could be replicated exactly with a preﬁx consid-

erably shorter than the length of the full input

sentence (source), under greedy decoding. Opera-

tionally, we set preﬁx ratio threshold (p) to 0.75.

Algorithm 1: Extractive Memorization in NMT

Data: Trained NMT Model T, Training Dataset S,

Preﬁx Ratio Threshold p

Result: Memorized Samples M, Preﬁx Lengths L

Greedily Translate Sources in Susing T;

= Sources with translations matching References;

Greedily Translate Preﬁxes of Sources in

using

;

M2= Sources with Preﬁxes producing References;

for Mi

2in M2do

n= Length of the Source Mi

l= Length of Smallest Preﬁx producing the Ref;

if l

n≤pthen

Add Mi

2to Mand Add lto L;

Next, we apply this deﬁnition of memorization

on a strong Transformer-Big (Vaswani et al.,2017)

baseline trained on the 48.2M WMT20 En-De par-

allel corpus (Barrault et al.,2020). We describe the

dataset, model and training details in Appendix A.

Qualitatively, we observe that the memorized

samples detected by Algorithm 1mostly consist of

low-quality samples – templatized source sentences

and noisy translations. To analyze the results quan-

titatively, similar to Carlini et al. (2022), we bucket

the training data pairs in terms of their repetitions

in the training data. Owing to the sparsity of data

with greater than 5 repetitions we report results in

the range of 1-5 repetitions. Further, for repetition

values 1 and 2, we select 100K random samples for

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FindingMemo:ExtractiveMemorizationinConstrainedSequenceGenerationTasksVikasRaunakArulMenezesMicrosoftAzureAIRedmond,Washington{viraunak,arulm}@microsoft.comAbstractMemorizationpresentsachallengeforsev-eralconstrainedNaturalLanguageGeneration(NLG)taskssuchasNeuralMachineTrans-lation(NMT),whereinthepr...

展开>> 收起<<

Finding Memo Extractive Memorization in Constrained Sequence Generation Tasks Vikas Raunak Arul Menezes.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Finding Memo Extractive Memorization in Constrained Sequence Generation Tasks Vikas Raunak Arul Menezes

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: