Reducing Training Sample Memorization in GANs by Train- ing with Memorization Rejection Andrew Bai andrewbaics.ucla.edu

2025-04-29 0 0 2.83MB 15 页 10玖币
侵权投诉
Reducing Training Sample Memorization in GANs by Train-
ing with Memorization Rejection
Andrew Bai andrewbai@cs.ucla.edu
Department of Computer Science
University of California, Los Angeles (UCLA)
Cho-Jui Hsieh chohsieh@cs.ucla.edu
Department of Computer Science
University of California, Los Angeles (UCLA)
Wendy Kan wendykan@google.com
Google
Hsuan-Tien Lin htlin@cs.ntu.edu.tw
Department of Computer Science and Information Engineering
National Taiwan University
Abstract
Generative adversarial network (GAN) continues to be a popular research direction due
to its high generation quality. It is observed that many state-of-the-art GANs generate
samples that are more similar to the training set than a holdout testing set from the same
distribution, hinting some training samples are implicitly memorized in these models. This
memorization behavior is unfavorable in many applications that demand the generated
samples to be sufficiently distinct from known samples. Nevertheless, it is unclear whether
it is possible to reduce memorization without compromising the generation quality. In
this paper, we propose memorization rejection, a training scheme that rejects generated
samples that are near-duplicates of training samples during training. Our scheme is simple,
generic and can be directly applied to any GAN architecture. Experiments on multiple
datasets and GAN models validate that memorization rejection effectively reduces training
sample memorization, and in many cases does not sacrifice the generation quality. Code to
reproduce the experiment results can be found at https://github.com/jybai/MRGAN.
1 Introduction
There has been much progress made on improving the generation quality of Generative Adversarial Networks
(GANs) (Brock et al., 2019; Goodfellow et al., 2014; Karras et al., 2020; Wu et al., 2019; Zhao et al., 2020;
Zhang et al., 2019). Despite GANs being capable of generating high-fidelity samples, it has been recently
observed that they tend to memorize training samples due to the high model complexity coupled with a
finite amount of training samples (Meehan et al., 2020; Lopez-Paz & Oquab, 2018; Gulrajani et al., 2020;
Borji, 2021). This naturally leads to the following questions: Are GANs learning the underlying distribution
or merely memorizing training samples? More fundamentally, what is the relationship between learning
and memorizing for GANs? Studying these questions are important since generative models that output
near-duplicates of the training data are undesirable for many applications. For example, Repecka et al.
(2021) proposed to learn the diversity of natural protein sequencing with GANs and generate new protein
structures to aid medicine development. Frid-Adar et al. (2018) leveraged GANs to generate augmented
medical images and increase the size of training data for improving liver lesion classification.
Although measuring and preventing memorization in supervised learning is well-studied, handling memoriza-
tion in generative modeling is non-trivial. For supervised learning, training sample memorization typically
1
arXiv:2210.12231v1 [cs.LG] 21 Oct 2022
results in overfitting and can be diagnosed by benchmarking on a holdout testing dataset. In contrast, a gen-
erative model that completely memorizes the training data and only generates near-duplicates of the training
data can still perform well on common distribution-matching-based quality metrics, even when evaluated on
a holdout testing set.
Recently various metrics and detection methods have been proposed to analyze the severity of memorization
after GAN models are trained (Borji, 2021; Bounliphone et al., 2016; Esteban et al., 2017; Lopez-Paz &
Oquab, 2018; Liu et al., 2017; Gulrajani et al., 2020; Thanh-Tung & Tran, 2020; Nalisnick et al., 2019).
Some of these methods rely on training a new neural network for measuring sample distance while others
rely on traditional statistical tests. However, it is still unclear how to actively reduce memorization during
GAN training. We thus aim to answer the following questions in this paper: is it possible to efficiently
reduce memorization during the training phase? If so, to what extent can memorization be reduced without
sacrificing the generation quality? Our contributions are as follows:
1. We confirmed that while the distance of a generated instance to the training data is generally
correlated with its quality, it is not the case for instances that are already sufficiently close. Therefore,
it is possible to reduce memorization without sacrificing generation quality.
2. We propose memorization rejection, a simple training scheme that can effectively reduce memoriza-
tion in GAN. The method is based on the key insight that a generated sample being sufficiently
similar to its nearest neighbor in the training data implies good enough quality and further opti-
mizing it causes the model to overfit and memorize. To the best of our knowledge, this is the first
method proposed for reducing training data memorization in GAN training.
3. Experimental results demonstrate that our proposed method is effective in reducing training sample
memorization. We provided a guideline for estimating the optimal hyperparameter that maximally
reduces memorization while minimally impacting the generation quality.
2 Preliminaries
Consider an input space Xand an N-dimensional code space Z=RN. For instance, when considering
RGB images, Xis simply R3×w×h, where wand hare respectively the width and height of the image (in
this paper, X=R3×w×hif not specified otherwise). Generative adversarial networks (GANs; Goodfellow
et al., 2014) typically consist of a generator function and a discriminator function. The generator function
Gθ:Z X , parameterized by θΘ, decodes from Zto X. The discriminator function Dφ:X R,
parameterized by φΦ, maps any x∈ X to a real value that reflects how likely xcomes from an underlying
distribution p(X). A typical objective of a GAN optimizes the minimax loss between Gθand Dφ
min
θΘmax
φΦ
E
xp(X)
[log Dφ(x)] + E
zq(Z)
[log(1 Dφ(Gθ(z)))],
where q(Z)is a controllable distribution (e.g. Gaussian). GANs aim to approximate p(X)by Gθ(q(Z)) with
the adversarial help of the discriminator Dφ(x). In particular, the generator is optimized to increase the
likelihood of generated instances with the likelihood gauged by the discriminator, while the discriminator is
optimized to increase the likelihood of instances sampled from the real distribution p(X)and decrease the
likelihood of instances generated from the fake distribution Gθ(q(Z)). Since it is infeasible to sample from
p(X)directly, a training set XT⊆ X of Ninstances is used to approximate the population instead.
2.1 Quantitative evaluation of sample similarity
The most commonly used method to detect training sample memorization is by visualizing nearest neighbors
of generated images in the training data (Brock et al., 2019; Borji, 2018). If the visualized samples look
similar to their nearest neighbors in the training data, it is reasonable to suspect that the model is trying to
memorize the training data. Given a generated sample xGθ(q(Z)) and an embedding function f:X Rk,
the nearest neighbor of xin the training set is defined as
NNf,XT(x) = arg min
x0XT
1hf(x), f(x0)i
kf(x)k·kf(x0)k.
2
The cosine similarity is conventionally used for evaluating the similarity of latent vectors (Salton & Buckley,
1988; Le-Khac et al., 2020; Borji, 2021) but other distance metrics could also be chosen. To avoid sensitivity
to noise in the input space, fis usually chosen to project to a latent space embedded with higher-level
semantics. It is widely believed that a pretrained image classification model can extract high-level semantics
and serves as a robust latent space for distance measurement. For example, calculation of FID involves first
passing the set of images through the Inception v3 (Szegedy et al., 2016) classification model pretrained on
ImageNet for feature extraction. A well-chosen fretrieves nearest neighbors that align well with human’s
perception. Following this definition, the distance to the nearest neighbor can serve as a quantitative measure
for sample similarity
df,XT(x) = min
x0XT
1hf(x), f(x0)i
kf(x)k·kf(x0)k.
Thus, the problem of reducing memorization can be formulated as regulating the nearest neighbor distance
of generated samples, which motivates our proposed algorithm.
2.2 Quantitative evaluation of memorization
Meehan et al. (2020) proposed a non-parametric test score CTfor measuring the degree of training sample
memorization of a generative model based on sample similarity. Their key insight is that a model should
generate samples that are on average, as similar to the training data as an independently drawn test sample
from the same distribution. The model is memorizing if the generated samples are on average, more similar
to the training data than an independently drawn test sample from the same distribution.
The memorization test is based on the Mann-Whitney U test, a non-parametric statistical test for testing
the ordinal relationship with the null hypothesis that the given two sets of samples are from the same
distribution. In this case, the two sets of samples are the nearest neighbor distances (with respect to the
training data) of a generated set and a reference testing set. The more severe the memorization, the more
negative the U statistics, and vice versa. Additionally, to better detect local memorization, the input domain
can be divided into subspaces and the test score is aggregated over memorization tests performed on each
of the subspaces. In this paper, we adopt the definition of memorization as characterized by the CTvalues.
2.3 Generation quality and memorization
Figure 1: Nearest neighbor distance distribution of the
reference testing set (CIFAR10.1) versus BigGAN.
Good generation quality and reduced memorization
can coexist. In the ideal case, if the generator per-
fectly fits the underlying data distribution, then the
generated samples have perfect quality and are in no
way more similar to the training data than another
independent sample from the distribution. However,
GAN models are imperfect. Figure 1 shows the near-
est neighbor distance distribution (approximated by
2K samples) of a generated set from BigGAN and
a reference testing set (CIFAR10.1). If the model
successfully learned the data distribution, the ex-
pectations of the two nearest neighbor distributions
should be identical. However, samples generated
from BigGAN (orange line) are in fact closer to the
training data than samples from the reference test-
ing set (highlighted in orange) which indicates the
memorization phenomenon.
In general, it is true that generated samples with
smaller nearest neighbor distances are associated
with better quality. Smaller distances imply being closer to the training distribution. Figure 2 visual-
izes a subset of 5k samples from a BigGAN trained on CIFAR10. The images are sorted by their nearest
neighbor distance. From top to bottom, each row shows 10 images from the 20%, 40%, 60%, 80%, and 100%
3
摘要:

ReducingTrainingSampleMemorizationinGANsbyTrain-ingwithMemorizationRejectionAndrewBaiandrewbai@cs.ucla.eduDepartmentofComputerScienceUniversityofCalifornia,LosAngeles(UCLA)Cho-JuiHsiehchohsieh@cs.ucla.eduDepartmentofComputerScienceUniversityofCalifornia,LosAngeles(UCLA)WendyKanwendykan@google.comGoo...

展开>> 收起<<
Reducing Training Sample Memorization in GANs by Train- ing with Memorization Rejection Andrew Bai andrewbaics.ucla.edu.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:2.83MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注