Temporally Disentangled Representation Learning Weiran Yao CMU

2025-05-02 0 0 2.05MB 26 页 10玖币

侵权投诉

Temporally Disentangled Representation Learning

Weiran Yao

CMU

weiran@cmu.edu

Guangyi Chen

CMU & MBZUAI

guangyichen1994@gmail.com

Kun Zhang

CMU & MBZUAI

kunz1@cmu.edu

Abstract

Recently in the ﬁeld of unsupervised representation learning, strong identiﬁability

results for disentanglement of causally-related latent variables have been estab-

lished by exploiting certain side information, such as class labels, in addition to

independence. However, most existing work is constrained by functional form

assumptions such as independent sources or further with linear transitions, and

distribution assumptions such as stationary, exponential family distribution. It is

unknown whether the underlying latent variables and their causal relations are

identiﬁable if they have arbitrary, nonparametric causal inﬂuences in between. In

this work, we establish the identiﬁability theories of nonparametric latent causal

processes from their nonlinear mixtures under ﬁxed temporal causal inﬂuences

and analyze how distribution changes can further beneﬁt the disentanglement. We

propose TDRL, a principled framework to recover time-delayed latent causal vari-

ables and identify their relations from measured sequential data under stationary

environments and under different distribution shifts. Speciﬁcally, the framework

can factorize unknown distribution shifts into transition distribution changes under

ﬁxed and time-varying latent causal relations, and under observation changes in ob-

servation. Through experiments, we show that time-delayed latent causal inﬂuences

are reliably identiﬁed and that our approach considerably outperforms existing

baselines that do not correctly exploit this modular representation of changes. Our

code is available at: https://github.com/weirayao/tdrl.

1 Introduction

Causal reasoning for time-series data is a fundamental task in numerous ﬁelds [

]. Most existing

work focuses on estimating the temporal causal relations among observed variables. However, in

many real-world scenarios, the observed signals (e.g., image pixels in videos) do not have direct

causal edges, but are generated by latent temporal processes or confounders that are causally related.

Inspired by these scenarios, this work aims to uncover causally-related latent processes and their

relations from observed temporal variables. Estimating latent causal structure from observations,

which we assume are unknown (but invertible) nonlinear mixtures of the latent processes, is very

challenging. It has been found in [

] that without exploiting an appropriate class of assumptions in

estimation, the latent variables are not identiﬁable in the most general case. As a result, one cannot

make causal claims on the recovered relations in the latent space.

Recently, in the ﬁeld of unsupervised representation learning, strong identiﬁability results of the

latent variables have been established [

] by using certain side information in nonlinear

Independent Component Analysis (ICA), such as class labels, in addition to independence. For

time-series data, history information is widely used as the side information for the identiﬁability

of latent processes. To establish identiﬁability, the existing approaches enforce different sets of

functional and distributional form assumptions as constraints in estimation; for example,

(1)

PCL

[

], GCL [

], HM-NLICA [

] and SlowVAE [

] assume mutually-independent sources in the

data generating process. However, this assumption may severely distort the identiﬁability if the

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.13647v1 [cs.LG] 24 Oct 2022

Table 1: Attributes of nonlinear ICA theories for time-series. A check denotes that a method has an

attribute or can be applied to a setting, whereas a cross denotes the opposite.

†

indicates our approach.

Theory Time-varying

Relation

Causally-related

Process

Partitioned

Subspace

Nonparametric

Transition

Applicable to

Stationary Environment

PCL 7 7 7 3 3

GCL 3 7 7 3 3

HM-NLICA 7 7 7 7 7

SlowVAE 7 7 7 7 3

SNICA 3 3 7 7 7

i-VAE 3 7 7 7 7

LEAP 7 3 7 3 7

TDRL †3 3 3 3 3

latent variables have time-delayed causal relations in between (i.e., causally-related process);

(2)

SlowVAE [

] and SNICA [

] assume linear relations, which may distort the identiﬁability results

if the underlying transitions are nonlinear, and

(3)

SlowVAE [

] assumes that the process noise is

drawn from Laplacian distribution; i-VAE [

] assumes that the conditional transition distribution is

part of the exponential family. However, in real-world scenarios, one cannot choose a proper set of

functional and distributional form assumptions without knowing in advance the parametric forms of

the latent temporal processes. Our ﬁrst step is hence to understand under what conditions the latent

causally processes are identiﬁable if they have

nonparametric transitions

in between. With the pro-

posed condition, our approach allows recovery of latent temporal

causally-related processes

stationary environments

without knowing their parametric forms in advance.

Mass = 1

Gravity = 5

Noise = 0

Mass = 1.5

Gravity = 10

Noise = 0

Mass = 2

Gravity = 15

Noise = 0.1

Video domains

t=1,2,…T

𝛉

"𝐫

𝐝𝐲𝐧 𝛉

"𝒓

𝐨𝐛𝐬

z,!,#

z,$ ,#

z,!,#%&

z,$ ,#%&

Time-delayed

Information

Element Index

Transitions

Time

Change factors for

transition dynamics and observations

𝐱𝐭

Latent Causal Process Estimation

…

Figure 1:

TDRL

emporally

isentangled

epresentation

earning. We exploit ﬁxed causal dynamics and distribution

changes from changing causal inﬂuences and global observation

changes to identify the underlying causal processes.

ˆzi,t

is the

estimated latent process.

θdyn

is the change factor for

transition

dynamics

, i.e., representing mass and gravity in this example.

θobs

ris the change factor for observation, i.e., noise scale.

On the other hand, nonstation-

arity has greatly improved the

identiﬁability results for learn-

ing the latent causal structure

[

]. For instance, LEAP

[

] established the identiﬁabil-

ity of latent temporal processes,

but in limited nonstationary cases,

under the condition that the dis-

tribution of the noise terms of

the latent processes varies across

all segments. Our second step

is to analyze how distribution

shifts beneﬁt our stationary con-

dition and to extend our con-

dition to a general nonstation-

ary case. Accordingly, our ap-

proach enables the recovery of la-

tent temporal causal processes in

a general nonstationary environ-

ment with

time-varying relations

such as changes in the inﬂuencing

strength or switching some edges

off [17] over time or domains.

Given the identiﬁability results,

we propose a learning framework,

called

TDRL

, to recover nonparametric time-delayed latent causal variables and identify their relations

from measured temporal data under stationary environments and under nonstationary environments

in which it is unknown in advance how the joint distribution changes across domains (we deﬁne it as

“unknown distribution shifts”). For instance, Fig. 1 shows an example of multiple video domains of

a physical system under different mass, gravity, and environment rendering settings

. With

TDRL

, the differences across segments are characterized by the learned change factors

θdyn

of domain

(note that domain index is given to the model) that encode changes in transition dynamics, and

changes in observation or styles modeled by

θobs

(we use “causal dynamics” and “latent causal

relations/inﬂuences” interchangeably). We then present a generalized time-series data generative

model that takes these change factors as arguments for modeling the distribution changes. Speciﬁcally,

The variables and functions with “hat” are estimated by the model; the ones without “hat” are ground truth.

we factorize unknown distribution shifts into transition distribution changes in stationary processes,

time-varying latent causal relations, and global changes in observation by constructing partitioned

latent subspaces, and propose provable conditions under which nonparametric latent causal processes

can be identiﬁed from their nonlinear invertible mixtures. We demonstrate through a number of real-

world datasets, including video and motion capture data, that time-delayed latent causal inﬂuences are

reliably identiﬁed from observed variables under stationary environments and unknown distribution

shifts. Through experiments, we show that our approach considerably outperforms existing baselines

that do not correctly leverage this modular representation of changes.

2 Related Work

Causal Discovery from Time Series

Inferring the causal structure from time-series data is critical

to many ﬁelds including machine learning [

], econometrics [

], and neuroscience [

]. Most existing

work focuses on estimating the temporal causal relations between observed variables. For this

task, constraint-based methods [

] apply the conditional independence tests to recover the causal

structures, while score-based methods [

] deﬁne score functions to guide a search process.

Furthermore, [

] propose to fuse both conditional independence tests and score-based methods.

The Granger causality [23] and its nonlinear variations [24, 25] are also widely used.

Nonlinear ICA for Time Series

Temporal structure and nonstationarities were recently used to

achieve identiﬁability in nonlinear ICA. Time-contrastive learning (TCL [

]) used the independent

sources assumption and leveraged sufﬁcient variability in variance terms of different data segments.

Permutation-based contrastive (PCL [

]) proposed a learning framework which discriminates be-

tween true independent sources and permuted ones, and identiﬁable under the uniformly dependent

assumption. HM-NLICA [

] combined nonlinear ICA with a Hidden Markov Model (HMM) to au-

tomatically model nonstationarity without manual data segmentation. i-VAE [

] introduced VAEs to

approximate the true joint distribution over observed and auxiliary nonstationary regimes. Their work

assumes that the conditional distribution is within exponential families to achieve the identiﬁability

of the latent space. The most recent literature on nonlinear ICA for time-series includes LEAP [

]

and (i-)CITRIS [

]. LEAP proposed a nonparametric condition leveraging the nonstationary

noise terms. However, all latent processes are changed across contexts and the distribution changes

need to be modeled by nonstationary noise and it does not exploit the stationary nonparametric

components for identiﬁability. Alternatively, CITRIS proposed to use intervention target information

for identiﬁcation of scalar and multidimensional latent causal factors. This approach does not suffer

from functional or distributional form constraints, but needs access to active intervention.

3 Problem Formulation

3.1 Time Series Generative Model

Stationary Model

As a

fundamental

case, we ﬁrst present a regular, stationary time-series gener-

ative process where the observations

comes from a nonlinear (but invertible) mixing function

that maps the time-delayed causally-related latent variables

. The latent variables or processes

zthave stationary, nonparametric time-delayed causal relations. Let τbe the time lag:

xt=g(zt)

| {z }

Nonlinear mixing

, zit =fi({zj,t−τ|zj,t−τ∈Pa(zit)}, it)

| {z }

Stationary nonparametric transition

with it ∼pi

| {z }

Stationary noise

Note that with nonparametric causal transitions, the noise term

it ∼pi

(where

pi

denotes the

distribution of

it

) and the time-delayed parents

Pa(zit)

zit

(i.e., the set of latent factors that

directly cause

zit

) are interacted and transformed in an arbitrarily nonlinear way to generate

zit

. Under

stationarity assumptions, the mixing function

, the transition functions

and the noise distributions

pi

are invariant. Finally, we assume that the noise terms are mutually-independent (i.e., spatially and

temporally independent), which implies that instantaneous causal inﬂuence between latent causal

processes is not allowed by the formulation. The stationary time-series model in the fundamental

case is used to establish the identiﬁability results under ﬁxed causal dynamics in Section 4.1.

Nonstationary Model

We further consider two violations of the stationarity assumptions in the

fundamental case, which lead to two nonstationary time series models. Let

denote the domain

or regime index. Suppose there exist

regimes of data, i.e.,

with

r= 1,2, ..., m

, with un-

known distribution shifts. In practice, the changing parameters of the joint distribution across

domains often lie in a low-dimensional manifold [

]. Moreover, if the distribution is causally

factorized, the distributions are often changed in a minimal and sparse way [

]. Based on these

assumptions, we introduce the low-dimensional minimal change factor

(θdyn

r,θobs

, which was

proposed in [

], to respectively capture distribution shifts in transition functions and observa-

tion. The vector

θr= (θdyn

r,θobs

has a constant value in each domain but varies across domains.

The formulation of the nonstationary time-series model is in line with [

]. The nonstationary

model is used to establish the identiﬁability results under nonstationary cases in Section 4.2, where

we show that the violation of stationarity in both ways can even further improve the identiﬁabil-

ity results. We ﬁrst present the two nonstationary cases.

(1) Changing Causal Dynamics

. The

causal inﬂuences between the latent temporal processes are changed across domains in this set-

ting. We model it by adding the transition change factors

θdyn

as inputs to the transition function:

zit =fi{zj,t−τ|zj,t−τ∈Pa(zit)},θdyn

r, it

(2) Global Observation Changes.

The global

properties of the time series (e.g., video styles) are changed across domains in this setting. Our

model captures them using latent variables that represent global styles; these latent variables are

generated by a bijection

that transforms the noise terms

i,t

into the latent with change factor

θobs

zi,t =fiθobs

r, i,t

. Finally, we can deal with a more general nonstationary case by combining the

three types of latent processes in the latent space in a modular way.

(3) Modular Distribution Shifts.











zﬁx

s,t =fs{zi,t−τ|zi,t−τ∈Pa(zﬁx

s,t)}, s,t ,

zchg

c,t =fc{zi,t−τ|zi,t−τ∈Pa(zchg

c,t )},θdyn

r, c,t,

zobs

o,t =foθobs

r, o,t,

xt=g(zt).

(1)

The latent space has three blocks

zt= (zﬁx

t,zchg

t,zobs

where

zﬁx

s,t

is the

component of the ﬁxed dynamics

parts,

zchg

c,t

is the c

component of the

changing dynamics parts, and

zobs

o,t

the o

component of the observation

changes. The functions

[fs, fc, fo]

capture ﬁxed and changing transitions and observation changes for each dimension of ztin Eq. 1.

3.2 Identiﬁability of Latent Causal Processes and Time-Delayed Latent Causal Relations

We deﬁne the identiﬁability of time-delayed latent causal processes in the representation function

space in

Deﬁnition 1

. Furthermore, if the estimated latent processes can be identiﬁed at least up

to permutation and component-wise invertible nonlinearities, the latent causal relations are also

immediately identiﬁable because conditional independence relations fully characterize time-delayed

causal relations in a time-delayed causally sufﬁcient system, in which there are no latent causal

confounders in the (latent) causal processes. Note that invertible component-wise transformations on

latent causal processes do not change their conditional independence relations.

Deﬁnition 1

(Identiﬁable Latent Causal Processes)

Formally let

{xt}T

t=1

be a sequence of observed

variables generated by the true temporally causal latent processes speciﬁed by

(fi,θr, p(i),g)

given in Eq. 1. A learned generative model

(ˆ

fi,ˆ

θr,ˆp(i),ˆ

is observationally equivalent

(fi,θr, p(i),g)

if the model distribution

pˆ

f,ˆ

θr,ˆp,ˆ

g({xt}T

t=1)

matches the data distribution

pf,θr,p,g({xt}T

t=1)

everywhere. We say latent causal processes are identiﬁable if observational

equivalence can lead to identiﬁability of the latent variables up to permutation

and component-wise

invertible transformation T:

pˆ

fi,ˆ

θr,ˆpi,ˆ

g({xt}T

t=1) = pfi,θr,pi,g({xt}T

t=1)⇒ˆ

g(xt) = g◦π◦T, ∀xt∈ X ,(2)

where Xis the observation space.

4 Identiﬁability Theory

We establish the identiﬁability theory of nonparametric time-delayed latent causal processes under

three different types of distribution shifts. W.l.o.g., we consider the latent processes with maximum

time lag

L= 1

. The extentions to arbitrary time lags are discussed in Appendix S1.5. Let

be the element index of the latent space

and the latent size is

. In particular,

(1)

under ﬁxed

temporal causal inﬂuences, we leverage the distribution changes

p(zk,t|zt−1)

for different values

zt−1

;

(2)

when the underlying causal relations change over time, we exploit the changing causal

inﬂuences on

p(zk,t|zt−1, ur)

under different domain

, and

(3)

under global observation changes,

the nonstationarity

p(zk,t|ur)

under different values of

is exploited. The proofs are provided in

Appendix S1. The comparisons between existing theories are in Appendix S1.3.

4.1 Identiﬁability under Fixed Temporal Causal Inﬂuence

Let

ηkt ,log p(zk,t|zt−1)

. Assume that

ηkt

is twice differentiable in

zk,t

and is differentiable in

zl,t−1

l= 1,2, ..., n

. Note that the parents of

zk,t

may be only a subset of

zt−1

; if

zl,t−1

is not a

parent of

zk,t

, then

∂ηkt

∂zl,t−1= 0

. Below we provide a sufﬁcient condition for the identiﬁability of

followed by a discussion of speciﬁc unidentiﬁable and identiﬁable cases to illustrate how general it is.

Theorem 1

(Identiﬁablity under a Fixed Temporal Causal Model)

Suppose there exists invertible

function ˆ

gthat maps xtto ˆ

zt, i.e.,

zt=ˆ

g(xt)(3)

such that the components of ˆ

ztare mutually independent conditional on ˆ

zt−1. Let

vk,t ,∂2ηkt

∂zk,t ∂z1,t−1,∂2ηkt

∂zk,t ∂z2,t−1, ..., ∂2ηkt

∂zk,t ∂zn,t−1|,˚

vk,t ,∂3ηkt

∂z2

k,t ∂z1,t−1,∂3ηkt

∂z2

k,t ∂z2,t−1, ..., ∂3ηkt

∂z2

k,t ∂zn,t−1|

(4)

If for each value of

v1,t,˚

v1,t,v2,t,˚

v2,t, ..., vn,t,˚

vn,t

, as

vector functions in

z1,t−1

z2,t−1

, ...,

zn,t−1

, are linearly independent, then

must be an invertible, component-wise transformation of a

permuted version of ˆ

zt.

The linear independence condition in Theorem 1 is the core condition to guarantee the identiﬁability

from the observed

. To make this condition more intuitive, below we consider speciﬁc

unidentiﬁable cases, in which there is no temporal dependence in

or the noise terms in

are

additive Gaussian, and two identiﬁable cases, in which

has additive, heterogeneous noise or follows

some linear, non-Gaussian temporal process.

Let us start with two unidentiﬁable cases. In case

is an independent and identically distributed

(i.i.d.) process, i.e., there is no causal inﬂuence from any component of

zt−1

to any

zk,t

. In this

case,

vk,t

and

vk,t

(deﬁned in Eq. 4) are always

for

k= 1,2, ..., n

, since

p(zk,t |zt−1)

does not

involve

zt−1

. So the linear independence condition is violated. In fact, this is the regular nonlinear

ICA problem with i.i.d. data, and it is well-known that the underlying independent variables are not

identiﬁable [5]. In case N2, all zk,t follow an additive noise model with Gaussian noise terms, i.e.,

zt=q(zt−1) + t,(5)

where

is a transformation and the components of the Gaussian vector

t

are independent and

also independent from

zt−1

. Then

∂2ηkt

∂z2

k,t

is constant, and

∂3ηkt

∂z2

k,t ∂zl,t−1≡0

, violating the linear

independence condition. In the following proposition we give some alternative solutions and verify

the unidentiﬁability in this case.

Proposition 1

(Unidentiﬁability under Gaussian Noise)

Suppose

xt=g(zt)

was generated by

Eq. 5, where the components of

t

are mutually independent Gaussian and also independent from

zt−1

. Then any

zt=D1UD2·zt

, where

is an arbitrary non-singular diagonal matrix,

is an

arbitrary orthogonal matrix, and

is a diagonal matrix with

Var−1/2(k,t)

as its

kth

diagonal

entry, is a valid solution to satisfy the condition that the components of

are mutually independent

conditional on ˆ

zt−1.

Roughly speaking, for a randomly chosen conditional density function

p(zk,t |zt−1)

in which

zk,t

is not independent from

zt−1

(i.e., there is temporal dependence in the latent processes) and which

does not follow an additive noise model with Gaussian noise, the chance for its speciﬁc second- and

third-order partial derivatives to be linearly dependent is slim. Now let us consider two cases in

which the latent temporally processes

are naturally identiﬁable. First, consider case

, where

zk,t

follows a heterogeneous noise process, in which the noise variance depends on its parents:

zk,t =qk(zt−1) + 1

bk(zt−1)k,t.(6)

Here we assume

k,t

is standard Gaussian and

1,t, 2,t, .., n,t

are mutually independent and inde-

pendent from

zt−1

, which depends on

zt−1

, is the standard deviation of the noise in

zk,t

. (For

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TemporallyDisentangledRepresentationLearningWeiranYaoCMUweiran@cmu.eduGuangyiChenCMU&MBZUAIguangyichen1994@gmail.comKunZhangCMU&MBZUAIkunz1@cmu.eduAbstractRecentlyintheeldofunsupervisedrepresentationlearning,strongidentiabilityresultsfordisentanglementofcausally-relatedlatentvariableshavebeenestab...

展开>> 收起<<

Temporally Disentangled Representation Learning Weiran Yao CMU.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Temporally Disentangled Representation Learning Weiran Yao CMU

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: