DOT-VAE Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa and Joseph JaJa University of Maryland College Park MD 20740 USA

2025-05-03 5 0 1.74MB 12 页 10玖币

侵权投诉

DOT-VAE: Disentangling One Factor at a Time

Vaishnavi Patil, Matthew Evanusa, and Joseph JaJa

University of Maryland, College Park, MD 20740, USA

{vspatil, mevanusa, josephj}@umd.edu

Abstract. As we enter the era of machine learning characterized by an

overabundance of data, discovery, organization, and interpretation of the

data in an unsupervised manner becomes a critical need. One promising

approach to this endeavour is the problem of Disentanglement, which

aims at learning the underlying generative latent factors, called the fac-

tors of variation, of the data and encoding them in disjoint latent rep-

resentations. Recent advances have made eﬀorts to solve this problem

for synthetic datasets generated by a ﬁxed set of independent factors of

variation. Here, we propose to extend this to real-world datasets with a

countable number of factors of variations. We propose a novel framework

which augments the latent space of a Variational Autoencoders with

a disentangled space and is trained using a Wake-Sleep-inspired two-

step algorithm for unsupervised disentanglement. Our network learns to

disentangle interpretable, independent factors from the data “one at a

time”, and encode it in diﬀerent dimensions of the disentangled latent

space, while making no prior assumptions about the number of factors

or their joint distribution. We demonstrate its quantitative and quali-

tative eﬀectiveness by evaluating the latent representations learned on

two synthetic benchmark datasets; DSprites and 3DShapes and on a real

datasets CelebA.

Keywords: Deep learning ·Representation learning ·Unsupervised Dis-

entanglement.

1 Introduction

Deep learning models, which are now widely adopted across multiple Artiﬁ-

cial Intelligence tasks ranging from vision to music generation to game playing

[16,23,22], owe their success to their ability to learn representations from the

data rather than requiring hand-crafted features. However, this self-learning of

abstract representations comes at the known cost of the resulting representa-

tions being cryptic and inscrutable to human observers [8]. A more comprehen-

sive representation of the data where the essential indivisible, semantic concepts

are encoded in structurally disentangled parts could lead to successful domain

adaptation and transfer learning [1] and facilitate robust downstream learning

more eﬀectively [27]. Learning these latent representations from the data alone

without the need of laborious labeling by human observers constitutes the prob-

lem of Unsupervised Disentanglement. In this work, we attempt to address the

arXiv:2210.10920v2 [cs.LG] 21 Oct 2022

2 Patil et al.

problem of unsupervised disentanglement via a novel Variational Autoencoder

based framework and training algorithm.

Though there is no commonly accepted formalized notion of disentangle-

ment or validation metrics [11], recent works have characterized disentangled

representations, based on natural intuition. This intuition by [1] states that a

disentangled representation is a representation of the data which encodes each

factor of variation in disjoint sets of the latent representation. [21] state further

that a change in a single factor of variation produces a change in only a subset of

the learned latent representation which corresponds to that factor. Here, a factor

of variation is an abstract human deﬁned concept that assumes diﬀerent values

for diﬀerent examples in the dataset. This intuition is closely related to the in-

dependent mechanisms assumption [26] which renders the informative factors as

components of a causal mechanism. This assumption allows interventions on one

factor without aﬀecting the other factors or the representations corresponding

to the other factors and thus can be independently controlled. In our work we

use independent interventions on the learned disentangled representations, which

encode the diﬀerent factors, to generate samples and restrict the diﬀerences in

corresponding representations, of the data and the sample, pertaining to that

factor. This process of using interventions and generating new samples resembles

the sleep phase of the wake-sleep algorithm.

Most current Variational Autoencoder (VAE) based state-of-the-art (SOTA)

make the implicit assumption that there are a ﬁxed number of independent

factors, common for all the data points in the dataset. However in real datasets,

in addition to the independent factors common to all points in the dataset, there

might also be some correlated, noisy factors pertinent to only certain data points.

While the approaches based on Generative Adversarial Networks (GAN) do not

make an assumption, they learn only a subset of the disentangled factors, whose

number is heuristically chosen. We believe however that one of the main goals

of disentanglement is to glean insights to the data, and the number of factors of

variation is generally one that we do not have access to. To this end, our method

augments the entangled latent space of a VAE with a disentangled latent code,

and iteratively encodes each factor, common to all the data points, in a single

disentangled code using interventions. This process allows our model to learn

any number of factors, without prior knowledge or ”hardcoding”, thus making

it better suited for real datasets.

1.1 Main Contributions

Our contributions in the proposed work are:

–We introduce a novel, completely unsupervised method for solving disen-

tanglement, which oﬀers the mode-covering properties of a VAE along with

the interpretability of the factors aﬀorded by the GANs, to better encode

the factors of variations in the disentangled code, while encoding the other

informative factors in the entangled representations.

DOT-VAE: Disentangling One Factor at a Time 3

–Our proposed model is the ﬁrst unsupervised method that is capable of

learning an arbitrary number of latent factors via iterative unsupervised

interventions in the latent space.

–We test and evaluate our algorithm on two standard datasets and across

multiple quantitative SOTA metrics and qualitatively on one dataset. Our

qualitative empirical results on synthetic datasets show that our model suc-

cessfully disentangles independent factors. Across all quantitative metrics,

our model generally outperforms existing methods for unsupervised disen-

tanglement.

2 Disentangling One Factor at a Time using Interventions

We base our framework on the VAE which assumes that data xis generated

from a set of latent features z∈Rdwith a prior p(z). A generator pθ(x|z)

maps the latent features to the high-dimensional data x. The generator, mod-

eled as a neural network, is trained to maximize the total log-likelihood of the

data, log pθ(X). However, due to the intractability of calculating the exact log-

likelihood, the VAE instead maximizes an evidence lower-bound (ELBO) using

an approximate posterior distribution qφ(z|x) modeled by an encoder neural

network. For given data, this encoder projects the data to a lower-dimensional

representation z, such that the data can be reconstructed by the generator given

only the representation. Without any structural constraints, the dimensions of

the inferred latent representation zare informative but entangled. Disjoint La-

tent Sets: To encode the factors of variation in an interpretable way, following

[6] we augment the unstructured variables zwith a set of structured variables

c={c1, c2,· · · , cK}each of which is tasked with disentangling an independent

semantic attribute of the data. We train our network to systematically discern

the meaningful latent factors shared across the dataset into c, from the entangled

representation z, both of which are important to maximize the log-likelihood of

the data. However, it is only the most informative, common factors of variation

encoded in cthat we are interested in, as the remaining factors in zare con-

founded and contain the ’noise’ in the dataset, i.e., features that only a few data

samples contain.

The generator is now conditioned both on the disentangled latent codes and

the entangled latent space (c, z) and describes a causal mechanism [27], where

each causal component ckis independently controllable and a change in a par-

ticular index khas no eﬀect on any other index cj(j6=k). Manipulating each

ckshould correspond to a distinct, semantic change, corresponding to the factor

encoded, in the generated sample, without entangling with changes eﬀected by

the other factors or with the entangled code z. To this end, we perform inter-

ventions as proposed in [27] (described in detail below) that go into the latent

code, change a single dimension ckof the disentangled code cwhile keeping the

0Code available at https://github.com/DOTFactor/DOTFactor

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DOT-VAE:DisentanglingOneFactorataTimeVaishnaviPatil,MatthewEvanusa,andJosephJaJaUniversityofMaryland,CollegePark,MD20740,USAfvspatil,mevanusa,josephjg@umd.eduAbstract.Asweentertheeraofmachinelearningcharacterizedbyanoverabundanceofdata,discovery,organization,andinterpretationofthedatainanunsupervise...

展开>> 收起<<

DOT-VAE Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa and Joseph JaJa University of Maryland College Park MD 20740 USA.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DOT-VAE Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa and Joseph JaJa University of Maryland College Park MD 20740 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: