DOT-VAE Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa and Joseph JaJa University of Maryland College Park MD 20740 USA

2025-05-03 0 0 1.74MB 12 页 10玖币
侵权投诉
DOT-VAE: Disentangling One Factor at a Time
Vaishnavi Patil, Matthew Evanusa, and Joseph JaJa
University of Maryland, College Park, MD 20740, USA
{vspatil, mevanusa, josephj}@umd.edu
Abstract. As we enter the era of machine learning characterized by an
overabundance of data, discovery, organization, and interpretation of the
data in an unsupervised manner becomes a critical need. One promising
approach to this endeavour is the problem of Disentanglement, which
aims at learning the underlying generative latent factors, called the fac-
tors of variation, of the data and encoding them in disjoint latent rep-
resentations. Recent advances have made efforts to solve this problem
for synthetic datasets generated by a fixed set of independent factors of
variation. Here, we propose to extend this to real-world datasets with a
countable number of factors of variations. We propose a novel framework
which augments the latent space of a Variational Autoencoders with
a disentangled space and is trained using a Wake-Sleep-inspired two-
step algorithm for unsupervised disentanglement. Our network learns to
disentangle interpretable, independent factors from the data “one at a
time”, and encode it in different dimensions of the disentangled latent
space, while making no prior assumptions about the number of factors
or their joint distribution. We demonstrate its quantitative and quali-
tative effectiveness by evaluating the latent representations learned on
two synthetic benchmark datasets; DSprites and 3DShapes and on a real
datasets CelebA.
Keywords: Deep learning ·Representation learning ·Unsupervised Dis-
entanglement.
1 Introduction
Deep learning models, which are now widely adopted across multiple Artifi-
cial Intelligence tasks ranging from vision to music generation to game playing
[16,23,22], owe their success to their ability to learn representations from the
data rather than requiring hand-crafted features. However, this self-learning of
abstract representations comes at the known cost of the resulting representa-
tions being cryptic and inscrutable to human observers [8]. A more comprehen-
sive representation of the data where the essential indivisible, semantic concepts
are encoded in structurally disentangled parts could lead to successful domain
adaptation and transfer learning [1] and facilitate robust downstream learning
more effectively [27]. Learning these latent representations from the data alone
without the need of laborious labeling by human observers constitutes the prob-
lem of Unsupervised Disentanglement. In this work, we attempt to address the
arXiv:2210.10920v2 [cs.LG] 21 Oct 2022
2 Patil et al.
problem of unsupervised disentanglement via a novel Variational Autoencoder
based framework and training algorithm.
Though there is no commonly accepted formalized notion of disentangle-
ment or validation metrics [11], recent works have characterized disentangled
representations, based on natural intuition. This intuition by [1] states that a
disentangled representation is a representation of the data which encodes each
factor of variation in disjoint sets of the latent representation. [21] state further
that a change in a single factor of variation produces a change in only a subset of
the learned latent representation which corresponds to that factor. Here, a factor
of variation is an abstract human defined concept that assumes different values
for different examples in the dataset. This intuition is closely related to the in-
dependent mechanisms assumption [26] which renders the informative factors as
components of a causal mechanism. This assumption allows interventions on one
factor without affecting the other factors or the representations corresponding
to the other factors and thus can be independently controlled. In our work we
use independent interventions on the learned disentangled representations, which
encode the different factors, to generate samples and restrict the differences in
corresponding representations, of the data and the sample, pertaining to that
factor. This process of using interventions and generating new samples resembles
the sleep phase of the wake-sleep algorithm.
Most current Variational Autoencoder (VAE) based state-of-the-art (SOTA)
make the implicit assumption that there are a fixed number of independent
factors, common for all the data points in the dataset. However in real datasets,
in addition to the independent factors common to all points in the dataset, there
might also be some correlated, noisy factors pertinent to only certain data points.
While the approaches based on Generative Adversarial Networks (GAN) do not
make an assumption, they learn only a subset of the disentangled factors, whose
number is heuristically chosen. We believe however that one of the main goals
of disentanglement is to glean insights to the data, and the number of factors of
variation is generally one that we do not have access to. To this end, our method
augments the entangled latent space of a VAE with a disentangled latent code,
and iteratively encodes each factor, common to all the data points, in a single
disentangled code using interventions. This process allows our model to learn
any number of factors, without prior knowledge or ”hardcoding”, thus making
it better suited for real datasets.
1.1 Main Contributions
Our contributions in the proposed work are:
We introduce a novel, completely unsupervised method for solving disen-
tanglement, which offers the mode-covering properties of a VAE along with
the interpretability of the factors afforded by the GANs, to better encode
the factors of variations in the disentangled code, while encoding the other
informative factors in the entangled representations.
DOT-VAE: Disentangling One Factor at a Time 3
Our proposed model is the first unsupervised method that is capable of
learning an arbitrary number of latent factors via iterative unsupervised
interventions in the latent space.
We test and evaluate our algorithm on two standard datasets and across
multiple quantitative SOTA metrics and qualitatively on one dataset. Our
qualitative empirical results on synthetic datasets show that our model suc-
cessfully disentangles independent factors. Across all quantitative metrics,
our model generally outperforms existing methods for unsupervised disen-
tanglement.
2 Disentangling One Factor at a Time using Interventions
We base our framework on the VAE which assumes that data xis generated
from a set of latent features zRdwith a prior p(z). A generator pθ(x|z)
maps the latent features to the high-dimensional data x. The generator, mod-
eled as a neural network, is trained to maximize the total log-likelihood of the
data, log pθ(X). However, due to the intractability of calculating the exact log-
likelihood, the VAE instead maximizes an evidence lower-bound (ELBO) using
an approximate posterior distribution qφ(z|x) modeled by an encoder neural
network. For given data, this encoder projects the data to a lower-dimensional
representation z, such that the data can be reconstructed by the generator given
only the representation. Without any structural constraints, the dimensions of
the inferred latent representation zare informative but entangled. Disjoint La-
tent Sets: To encode the factors of variation in an interpretable way, following
[6] we augment the unstructured variables zwith a set of structured variables
c={c1, c2,· · · , cK}each of which is tasked with disentangling an independent
semantic attribute of the data. We train our network to systematically discern
the meaningful latent factors shared across the dataset into c, from the entangled
representation z, both of which are important to maximize the log-likelihood of
the data. However, it is only the most informative, common factors of variation
encoded in cthat we are interested in, as the remaining factors in zare con-
founded and contain the ’noise’ in the dataset, i.e., features that only a few data
samples contain.
The generator is now conditioned both on the disentangled latent codes and
the entangled latent space (c, z) and describes a causal mechanism [27], where
each causal component ckis independently controllable and a change in a par-
ticular index khas no effect on any other index cj(j6=k). Manipulating each
ckshould correspond to a distinct, semantic change, corresponding to the factor
encoded, in the generated sample, without entangling with changes effected by
the other factors or with the entangled code z. To this end, we perform inter-
ventions as proposed in [27] (described in detail below) that go into the latent
code, change a single dimension ckof the disentangled code cwhile keeping the
0Code available at https://github.com/DOTFactor/DOTFactor
摘要:

DOT-VAE:DisentanglingOneFactorataTimeVaishnaviPatil,MatthewEvanusa,andJosephJaJaUniversityofMaryland,CollegePark,MD20740,USAfvspatil,mevanusa,josephjg@umd.eduAbstract.Asweentertheeraofmachinelearningcharacterizedbyanoverabundanceofdata,discovery,organization,andinterpretationofthedatainanunsupervise...

展开>> 收起<<
DOT-VAE Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa and Joseph JaJa University of Maryland College Park MD 20740 USA.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.74MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注