Generative Adversarial Nets Can we generate a new dataset based on only one training set

2025-04-27 0 0 693.46KB 9 页 10玖币
侵权投诉
Generative Adversarial Nets:
Can we generate a new dataset based on only one
training set?
Lan V. Truong
Department of Engineering
University of Cambridge
Cambridge, CB2 1PZ
lt407@cam.ac.uk
Abstract
A generative adversarial network (GAN) is a class of machine learning frameworks
designed by Goodfellow et al. in 2014. In the GAN framework, the generative
model is pitted against an adversary: a discriminative model that learns to determine
whether a sample is from the model distribution or the data distribution. GAN
generates new samples from the same distribution as the training set. In this work,
we aim to generate a new dataset that has a different distribution from the training
set. In addition, the Jensen-Shannon divergence between the distributions of the
generative and training datasets can be controlled by some target
δ[0,1]
. Our
work is motivated by applications in generating new kinds of rices which have
similar characteristics as a good rice.
1 INTRODUCTION
Representation learning is a set of techniques that allows a system to automatically discover the
representations from raw data needed for feature detection or classification from raw data. This
replaces manual feature engineering and allows a machine to both learn the features and use them
to perform a specific task. Feature learning can be either supervised or unsupervised. In supervised
feature learning, features are learned using labeled input data. Examples include supervised neural
networks, multilayer perceptron and (supervised) dictionary learning. In unsupervised feature
learning, features are learned with unlabeled input data. Examples include dictionary learning,
independent component analysis, autoencoders, matrix factorization and various forms of clustering.
1.1 Related Papers
In the last few years, deep learning based generative models have gained more and more interest due
to (and implying) some amazing improvements in the field. Relying on huge amount of data, well-
designed networks architectures and smart training techniques, deep generative models have shown
an incredible ability to produce highly realistic pieces of content of various kind, such as images,
texts and sounds. Among these deep generative models, two major families stand out and deserve a
special attention: Generative Adversarial Networks (GANs) [
2
] and Variational Autoencoders (VAEs)
[4].
A variational autoencoder can be defined as being an autoencoder [
5
] whose training is regularised to
avoid overfitting and ensure that the latent space has good properties that enable generative process.
Use footnote for providing further information about author (webpage, alternative address)—not for
acknowledging funding agencies.
Preprint. Under review.
arXiv:2210.06005v1 [cs.LG] 12 Oct 2022
Tolstikhin et al. proposed a Wasserstein Autoencoder (WAE), which minimizes a penalized form of
the Wasserstein distance between the model distribution and the generative distribution [
7
]. WAE
shares many of the properties of VAEs such as stable training, encoder-decoder architecture, nice
latent manifold structure while generating samples of better quality, as measured by the FID score.
A generative adversarial network (GAN) is a class of machine learning frameworks designed by
Goodfellow et al. in 2014 [
2
]. In GAN, the generative model learns to map from a latent space to a
data distribution of interest, while the discriminative model distinguishes candidates produced by the
generator from the true data distribution. The generative network’s training objective is to increase
the error rate of the discriminative network. Generative adversarial networks have applications in
many fields such as fashion, art and advertising, science, video games, and audio synthesis. There is
a veritable zoo of GAN variants. Conditional GANs [
2
] are similar to standard GANs except they
allow the model to conditionally generate samples based on additional information. For example,
if we want to generate a cat face given a dog picture, we could use a conditional GAN. The GAN
game is a general framework and can be run with any reasonable parametrization of the generator
G
and discriminator
D
. In the original paper, the authors demonstrated it using multilayer perceptron
networks and convolutional neural networks. Many alternative architectures have been tried such as
Deep convolutional GAN [6], Self-attention GAN [1], Flow-GAN [3].
1.2 Motivations
There were some new variants of GAN which allow the use of multiple data distributions and the
generated ones such as the conditional GAN. However, these new variants of GAN require least
two different training sets to generate a new one. In many applications in practice, we would like
to generate a new dataset which have the same characteristic as a reference one. In this work, we
aim to develop a new variant of GAN which allows to perform this task. Our work is motivated by
applications in generating new kinds of rices which have similar characteristics as a good rice.
More specifically, assume that we have
L
datasets with unknown distribution
p1, p2,··· , pL
for
some
L1
. We aim to generate a new dataset which has a different distribution from the training
datasets. In addition, the Jensen-Shannon divergence between the distribution of the generative
dataset and a mixture data distribution can be controlled, i.e.
JSD(PL
l=1 αlpl,pg)δ
for some
given non-negative tuple
(α1, α2,··· , αL)
satisfying
PL
i=1 αi= 1
and
δ[0,1]
. For
L= 1
, our
algorithm generates a new dataset such that the Jensen-Shannon divergence between the distributions
of the generative and the training data is upper bounded by some target δ[0,1].
This additional “controllable property" is very important in many applications. For example, we
sometimes need to generate a new cat gender (images) which owns most properties as an old gender of
cats. In many other applications, we may increase the number of new generated images by lessening
the distance requirement between the distributions of data and generated ones compared with GAN
or conditional GANs.
1.3 Contributions
Our main contributions include:
We develop a new technique which allows to control the total variation between the distribu-
tion of the random vectors
x
and
y
where
y=x+z
and
z
is a sparse random vector with
fixed distribution.
We propose a mechanism to which allows to loosen Jensen-Shannon divergence between
the distribution of the generated distribution and the data distribution in the Goodfellow et
al’s model [2].
We extend this new model to allows the use of multiple data distributions as in the conditional
GAN.
We illustrate our ideas on datasets Cfar10 and Cfar100, and generate new datasets based on
only one dataset or a mixture of these two datasets for different values of δ.
2
摘要:

GenerativeAdversarialNets:Canwegenerateanewdatasetbasedononlyonetrainingset?LanV.TruongDepartmentofEngineeringUniversityofCambridgeCambridge,CB21PZlt407@cam.ac.ukAbstractAgenerativeadversarialnetwork(GAN)isaclassofmachinelearningframeworksdesignedbyGoodfellowetal.in2014.IntheGANframework,thegenerat...

展开>> 收起<<
Generative Adversarial Nets Can we generate a new dataset based on only one training set.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:693.46KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注