Temporally Disentangled Representation Learning Weiran Yao CMU

2025-05-02 0 0 2.05MB 26 页 10玖币
侵权投诉
Temporally Disentangled Representation Learning
Weiran Yao
CMU
weiran@cmu.edu
Guangyi Chen
CMU & MBZUAI
guangyichen1994@gmail.com
Kun Zhang
CMU & MBZUAI
kunz1@cmu.edu
Abstract
Recently in the field of unsupervised representation learning, strong identifiability
results for disentanglement of causally-related latent variables have been estab-
lished by exploiting certain side information, such as class labels, in addition to
independence. However, most existing work is constrained by functional form
assumptions such as independent sources or further with linear transitions, and
distribution assumptions such as stationary, exponential family distribution. It is
unknown whether the underlying latent variables and their causal relations are
identifiable if they have arbitrary, nonparametric causal influences in between. In
this work, we establish the identifiability theories of nonparametric latent causal
processes from their nonlinear mixtures under fixed temporal causal influences
and analyze how distribution changes can further benefit the disentanglement. We
propose TDRL, a principled framework to recover time-delayed latent causal vari-
ables and identify their relations from measured sequential data under stationary
environments and under different distribution shifts. Specifically, the framework
can factorize unknown distribution shifts into transition distribution changes under
fixed and time-varying latent causal relations, and under observation changes in ob-
servation. Through experiments, we show that time-delayed latent causal influences
are reliably identified and that our approach considerably outperforms existing
baselines that do not correctly exploit this modular representation of changes. Our
code is available at: https://github.com/weirayao/tdrl.
1 Introduction
Causal reasoning for time-series data is a fundamental task in numerous fields [
1
,
2
,
3
]. Most existing
work focuses on estimating the temporal causal relations among observed variables. However, in
many real-world scenarios, the observed signals (e.g., image pixels in videos) do not have direct
causal edges, but are generated by latent temporal processes or confounders that are causally related.
Inspired by these scenarios, this work aims to uncover causally-related latent processes and their
relations from observed temporal variables. Estimating latent causal structure from observations,
which we assume are unknown (but invertible) nonlinear mixtures of the latent processes, is very
challenging. It has been found in [
4
,
5
] that without exploiting an appropriate class of assumptions in
estimation, the latent variables are not identifiable in the most general case. As a result, one cannot
make causal claims on the recovered relations in the latent space.
Recently, in the field of unsupervised representation learning, strong identifiability results of the
latent variables have been established [
6
,
7
,
8
,
9
,
10
] by using certain side information in nonlinear
Independent Component Analysis (ICA), such as class labels, in addition to independence. For
time-series data, history information is widely used as the side information for the identifiability
of latent processes. To establish identifiability, the existing approaches enforce different sets of
functional and distributional form assumptions as constraints in estimation; for example,
(1)
PCL
[
7
], GCL [
8
], HM-NLICA [
11
] and SlowVAE [
12
] assume mutually-independent sources in the
data generating process. However, this assumption may severely distort the identifiability if the
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.13647v1 [cs.LG] 24 Oct 2022
Table 1: Attributes of nonlinear ICA theories for time-series. A check denotes that a method has an
attribute or can be applied to a setting, whereas a cross denotes the opposite.
indicates our approach.
Theory Time-varying
Relation
Causally-related
Process
Partitioned
Subspace
Nonparametric
Transition
Applicable to
Stationary Environment
PCL 7 7 7 3 3
GCL 3 7 7 3 3
HM-NLICA 7 7 7 7 7
SlowVAE 7 7 7 7 3
SNICA 3 3 7 7 7
i-VAE 3 7 7 7 7
LEAP 7 3 7 3 7
TDRL 3 3 3 3 3
latent variables have time-delayed causal relations in between (i.e., causally-related process);
(2)
SlowVAE [
12
] and SNICA [
13
] assume linear relations, which may distort the identifiability results
if the underlying transitions are nonlinear, and
(3)
SlowVAE [
12
] assumes that the process noise is
drawn from Laplacian distribution; i-VAE [
9
] assumes that the conditional transition distribution is
part of the exponential family. However, in real-world scenarios, one cannot choose a proper set of
functional and distributional form assumptions without knowing in advance the parametric forms of
the latent temporal processes. Our first step is hence to understand under what conditions the latent
causally processes are identifiable if they have
nonparametric transitions
in between. With the pro-
posed condition, our approach allows recovery of latent temporal
causally-related processes
in
stationary environments
without knowing their parametric forms in advance.
Mass = 1
Gravity = 5
Noise = 0
Mass = 1.5
Gravity = 10
Noise = 0
Mass = 2
Gravity = 15
Noise = 0.1
Video domains
t=1,2,…T
𝛉
"𝐫
𝐝𝐲𝐧 𝛉
"𝒓
𝐨𝐛𝐬
z,!,#
z,$ ,#
z,!,#%&
z,$ ,#%&
Time-delayed
Information
Element Index
Transitions
Time
Change factors for
transition dynamics and observations
𝐱𝐭
Latent Causal Process Estimation
Figure 1:
TDRL
:
T
emporally
D
isentangled
R
epresentation
L
earning. We exploit fixed causal dynamics and distribution
changes from changing causal influences and global observation
changes to identify the underlying causal processes.
ˆzi,t
is the
estimated latent process.
ˆ
θdyn
r
is the change factor for
transition
dynamics
, i.e., representing mass and gravity in this example.
ˆ
θobs
ris the change factor for observation, i.e., noise scale.
On the other hand, nonstation-
arity has greatly improved the
identifiability results for learn-
ing the latent causal structure
[
14
,
15
,
16
]. For instance, LEAP
[
14
] established the identifiabil-
ity of latent temporal processes,
but in limited nonstationary cases,
under the condition that the dis-
tribution of the noise terms of
the latent processes varies across
all segments. Our second step
is to analyze how distribution
shifts benefit our stationary con-
dition and to extend our con-
dition to a general nonstation-
ary case. Accordingly, our ap-
proach enables the recovery of la-
tent temporal causal processes in
a general nonstationary environ-
ment with
time-varying relations
such as changes in the influencing
strength or switching some edges
off [17] over time or domains.
Given the identifiability results,
we propose a learning framework,
called
TDRL
, to recover nonparametric time-delayed latent causal variables and identify their relations
from measured temporal data under stationary environments and under nonstationary environments
in which it is unknown in advance how the joint distribution changes across domains (we define it as
“unknown distribution shifts”). For instance, Fig. 1 shows an example of multiple video domains of
a physical system under different mass, gravity, and environment rendering settings
1
. With
TDRL
, the differences across segments are characterized by the learned change factors
ˆ
θdyn
r
of domain
r
(note that domain index is given to the model) that encode changes in transition dynamics, and
changes in observation or styles modeled by
ˆ
θobs
r
(we use “causal dynamics” and “latent causal
relations/influences” interchangeably). We then present a generalized time-series data generative
model that takes these change factors as arguments for modeling the distribution changes. Specifically,
1
The variables and functions with “hat” are estimated by the model; the ones without “hat” are ground truth.
2
we factorize unknown distribution shifts into transition distribution changes in stationary processes,
time-varying latent causal relations, and global changes in observation by constructing partitioned
latent subspaces, and propose provable conditions under which nonparametric latent causal processes
can be identified from their nonlinear invertible mixtures. We demonstrate through a number of real-
world datasets, including video and motion capture data, that time-delayed latent causal influences are
reliably identified from observed variables under stationary environments and unknown distribution
shifts. Through experiments, we show that our approach considerably outperforms existing baselines
that do not correctly leverage this modular representation of changes.
2 Related Work
Causal Discovery from Time Series
Inferring the causal structure from time-series data is critical
to many fields including machine learning [
1
], econometrics [
2
], and neuroscience [
3
]. Most existing
work focuses on estimating the temporal causal relations between observed variables. For this
task, constraint-based methods [
18
] apply the conditional independence tests to recover the causal
structures, while score-based methods [
19
,
20
] define score functions to guide a search process.
Furthermore, [
21
,
22
] propose to fuse both conditional independence tests and score-based methods.
The Granger causality [23] and its nonlinear variations [24, 25] are also widely used.
Nonlinear ICA for Time Series
Temporal structure and nonstationarities were recently used to
achieve identifiability in nonlinear ICA. Time-contrastive learning (TCL [
6
]) used the independent
sources assumption and leveraged sufficient variability in variance terms of different data segments.
Permutation-based contrastive (PCL [
7
]) proposed a learning framework which discriminates be-
tween true independent sources and permuted ones, and identifiable under the uniformly dependent
assumption. HM-NLICA [
11
] combined nonlinear ICA with a Hidden Markov Model (HMM) to au-
tomatically model nonstationarity without manual data segmentation. i-VAE [
9
] introduced VAEs to
approximate the true joint distribution over observed and auxiliary nonstationary regimes. Their work
assumes that the conditional distribution is within exponential families to achieve the identifiability
of the latent space. The most recent literature on nonlinear ICA for time-series includes LEAP [
14
]
and (i-)CITRIS [
26
,
27
]. LEAP proposed a nonparametric condition leveraging the nonstationary
noise terms. However, all latent processes are changed across contexts and the distribution changes
need to be modeled by nonstationary noise and it does not exploit the stationary nonparametric
components for identifiability. Alternatively, CITRIS proposed to use intervention target information
for identification of scalar and multidimensional latent causal factors. This approach does not suffer
from functional or distributional form constraints, but needs access to active intervention.
3 Problem Formulation
3.1 Time Series Generative Model
Stationary Model
As a
fundamental
case, we first present a regular, stationary time-series gener-
ative process where the observations
xt
comes from a nonlinear (but invertible) mixing function
g
that maps the time-delayed causally-related latent variables
zt
to
xt
. The latent variables or processes
zthave stationary, nonparametric time-delayed causal relations. Let τbe the time lag:
xt=g(zt)
| {z }
Nonlinear mixing
, zit =fi({zj,tτ|zj,tτPa(zit)}, it)
| {z }
Stationary nonparametric transition
with it pi
| {z }
Stationary noise
.
Note that with nonparametric causal transitions, the noise term
it pi
(where
pi
denotes the
distribution of
it
) and the time-delayed parents
Pa(zit)
of
zit
(i.e., the set of latent factors that
directly cause
zit
) are interacted and transformed in an arbitrarily nonlinear way to generate
zit
. Under
stationarity assumptions, the mixing function
g
, the transition functions
fi
and the noise distributions
pi
are invariant. Finally, we assume that the noise terms are mutually-independent (i.e., spatially and
temporally independent), which implies that instantaneous causal influence between latent causal
processes is not allowed by the formulation. The stationary time-series model in the fundamental
case is used to establish the identifiability results under fixed causal dynamics in Section 4.1.
Nonstationary Model
We further consider two violations of the stationarity assumptions in the
fundamental case, which lead to two nonstationary time series models. Let
u
denote the domain
3
or regime index. Suppose there exist
m
regimes of data, i.e.,
ur
with
r= 1,2, ..., m
, with un-
known distribution shifts. In practice, the changing parameters of the joint distribution across
domains often lie in a low-dimensional manifold [
28
]. Moreover, if the distribution is causally
factorized, the distributions are often changed in a minimal and sparse way [
29
]. Based on these
assumptions, we introduce the low-dimensional minimal change factor
(θdyn
r,θobs
r)
, which was
proposed in [
30
], to respectively capture distribution shifts in transition functions and observa-
tion. The vector
θr= (θdyn
r,θobs
r)
has a constant value in each domain but varies across domains.
The formulation of the nonstationary time-series model is in line with [
30
]. The nonstationary
model is used to establish the identifiability results under nonstationary cases in Section 4.2, where
we show that the violation of stationarity in both ways can even further improve the identifiabil-
ity results. We first present the two nonstationary cases.
(1) Changing Causal Dynamics
. The
causal influences between the latent temporal processes are changed across domains in this set-
ting. We model it by adding the transition change factors
θdyn
r
as inputs to the transition function:
zit =fi{zj,tτ|zj,tτPa(zit)},θdyn
r, it
.
(2) Global Observation Changes.
The global
properties of the time series (e.g., video styles) are changed across domains in this setting. Our
model captures them using latent variables that represent global styles; these latent variables are
generated by a bijection
fi
that transforms the noise terms
i,t
into the latent with change factor
θobs
r
:
zi,t =fiθobs
r, i,t
. Finally, we can deal with a more general nonstationary case by combining the
three types of latent processes in the latent space in a modular way.
(3) Modular Distribution Shifts.
zfix
s,t =fs{zi,tτ|zi,tτPa(zfix
s,t)}, s,t ,
zchg
c,t =fc{zi,tτ|zi,tτPa(zchg
c,t )},θdyn
r, c,t,
zobs
o,t =foθobs
r, o,t,
xt=g(zt).
(1)
The latent space has three blocks
zt= (zfix
t,zchg
t,zobs
t)
where
zfix
s,t
is the
s
th
component of the fixed dynamics
parts,
zchg
c,t
is the c
th
component of the
changing dynamics parts, and
zobs
o,t
is
the o
th
component of the observation
changes. The functions
[fs, fc, fo]
capture fixed and changing transitions and observation changes for each dimension of ztin Eq. 1.
3.2 Identifiability of Latent Causal Processes and Time-Delayed Latent Causal Relations
We define the identifiability of time-delayed latent causal processes in the representation function
space in
Definition 1
. Furthermore, if the estimated latent processes can be identified at least up
to permutation and component-wise invertible nonlinearities, the latent causal relations are also
immediately identifiable because conditional independence relations fully characterize time-delayed
causal relations in a time-delayed causally sufficient system, in which there are no latent causal
confounders in the (latent) causal processes. Note that invertible component-wise transformations on
latent causal processes do not change their conditional independence relations.
Definition 1
(Identifiable Latent Causal Processes)
.
Formally let
{xt}T
t=1
be a sequence of observed
variables generated by the true temporally causal latent processes specified by
(fi,θr, p(i),g)
given in Eq. 1. A learned generative model
(ˆ
fi,ˆ
θr,ˆp(i),ˆ
g)
is observationally equivalent
to
(fi,θr, p(i),g)
if the model distribution
pˆ
f,ˆ
θr,ˆp,ˆ
g({xt}T
t=1)
matches the data distribution
pf,θr,p,g({xt}T
t=1)
everywhere. We say latent causal processes are identifiable if observational
equivalence can lead to identifiability of the latent variables up to permutation
π
and component-wise
invertible transformation T:
pˆ
fi,ˆ
θr,ˆpi,ˆ
g({xt}T
t=1) = pfi,θr,pi,g({xt}T
t=1)ˆ
g(xt) = gπT, xt X ,(2)
where Xis the observation space.
4 Identifiability Theory
We establish the identifiability theory of nonparametric time-delayed latent causal processes under
three different types of distribution shifts. W.l.o.g., we consider the latent processes with maximum
time lag
L= 1
. The extentions to arbitrary time lags are discussed in Appendix S1.5. Let
k
be the element index of the latent space
zt
and the latent size is
n
. In particular,
(1)
under fixed
temporal causal influences, we leverage the distribution changes
p(zk,t|zt1)
for different values
4
of
zt1
;
(2)
when the underlying causal relations change over time, we exploit the changing causal
influences on
p(zk,t|zt1, ur)
under different domain
ur
, and
(3)
under global observation changes,
the nonstationarity
p(zk,t|ur)
under different values of
ur
is exploited. The proofs are provided in
Appendix S1. The comparisons between existing theories are in Appendix S1.3.
4.1 Identifiability under Fixed Temporal Causal Influence
Let
ηkt ,log p(zk,t|zt1)
. Assume that
ηkt
is twice differentiable in
zk,t
and is differentiable in
zl,t1
,
l= 1,2, ..., n
. Note that the parents of
zk,t
may be only a subset of
zt1
; if
zl,t1
is not a
parent of
zk,t
, then
ηkt
zl,t1= 0
. Below we provide a sufficient condition for the identifiability of
zt
,
followed by a discussion of specific unidentifiable and identifiable cases to illustrate how general it is.
Theorem 1
(Identifiablity under a Fixed Temporal Causal Model)
.
Suppose there exists invertible
function ˆ
gthat maps xtto ˆ
zt, i.e.,
ˆ
zt=ˆ
g(xt)(3)
such that the components of ˆ
ztare mutually independent conditional on ˆ
zt1. Let
vk,t ,2ηkt
zk,t z1,t1,2ηkt
zk,t z2,t1, ..., 2ηkt
zk,t zn,t1|,˚
vk,t ,3ηkt
z2
k,t z1,t1,3ηkt
z2
k,t z2,t1, ..., 3ηkt
z2
k,t zn,t1|
.
(4)
If for each value of
zt
,
v1,t,˚
v1,t,v2,t,˚
v2,t, ..., vn,t,˚
vn,t
, as
2n
vector functions in
z1,t1
,
z2,t1
, ...,
zn,t1
, are linearly independent, then
zt
must be an invertible, component-wise transformation of a
permuted version of ˆ
zt.
The linear independence condition in Theorem 1 is the core condition to guarantee the identifiability
of
zt
from the observed
xt
. To make this condition more intuitive, below we consider specific
unidentifiable cases, in which there is no temporal dependence in
zt
or the noise terms in
zt
are
additive Gaussian, and two identifiable cases, in which
zt
has additive, heterogeneous noise or follows
some linear, non-Gaussian temporal process.
Let us start with two unidentifiable cases. In case
N1
,
tt
is an independent and identically distributed
(i.i.d.) process, i.e., there is no causal influence from any component of
zt1
to any
zk,t
. In this
case,
vk,t
and
˚
vk,t
(defined in Eq. 4) are always
0
for
k= 1,2, ..., n
, since
p(zk,t |zt1)
does not
involve
zt1
. So the linear independence condition is violated. In fact, this is the regular nonlinear
ICA problem with i.i.d. data, and it is well-known that the underlying independent variables are not
identifiable [5]. In case N2, all zk,t follow an additive noise model with Gaussian noise terms, i.e.,
zt=q(zt1) + t,(5)
where
q
is a transformation and the components of the Gaussian vector
t
are independent and
also independent from
zt1
. Then
2ηkt
z2
k,t
is constant, and
3ηkt
z2
k,t zl,t10
, violating the linear
independence condition. In the following proposition we give some alternative solutions and verify
the unidentifiability in this case.
Proposition 1
(Unidentifiability under Gaussian Noise)
.
Suppose
xt=g(zt)
was generated by
Eq. 5, where the components of
t
are mutually independent Gaussian and also independent from
zt1
. Then any
ˆ
zt=D1UD2·zt
, where
D1
is an arbitrary non-singular diagonal matrix,
U
is an
arbitrary orthogonal matrix, and
D2
is a diagonal matrix with
Var1/2(k,t)
as its
kth
diagonal
entry, is a valid solution to satisfy the condition that the components of
ˆ
zt
are mutually independent
conditional on ˆ
zt1.
Roughly speaking, for a randomly chosen conditional density function
p(zk,t |zt1)
in which
zk,t
is not independent from
zt1
(i.e., there is temporal dependence in the latent processes) and which
does not follow an additive noise model with Gaussian noise, the chance for its specific second- and
third-order partial derivatives to be linearly dependent is slim. Now let us consider two cases in
which the latent temporally processes
zt
are naturally identifiable. First, consider case
Y1
, where
zk,t
follows a heterogeneous noise process, in which the noise variance depends on its parents:
zk,t =qk(zt1) + 1
bk(zt1)k,t.(6)
Here we assume
k,t
is standard Gaussian and
1,t, 2,t, .., n,t
are mutually independent and inde-
pendent from
zt1
.
1
bk
, which depends on
zt1
, is the standard deviation of the noise in
zk,t
. (For
5
摘要:

TemporallyDisentangledRepresentationLearningWeiranYaoCMUweiran@cmu.eduGuangyiChenCMU&MBZUAIguangyichen1994@gmail.comKunZhangCMU&MBZUAIkunz1@cmu.eduAbstractRecentlyintheeldofunsupervisedrepresentationlearning,strongidentiabilityresultsfordisentanglementofcausally-relatedlatentvariableshavebeenestab...

展开>> 收起<<
Temporally Disentangled Representation Learning Weiran Yao CMU.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:2.05MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注