1 Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations

2025-04-28 0 0 1.69MB 18 页 10玖币
侵权投诉
1
Non-Parametric and Regularized Dynamical
Wasserstein Barycenters for Sequential Observations
Kevin C. Cheng,IEEE Student Member, Eric L. MillerIEEE Fellow,
Michael C. Hughes, Shuchin AeronIEEE Senior Member
Abstract
We consider probabilistic models for sequential observations which exhibit gradual transitions among a finite number of states.
We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains
segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition
among these pure states. To capture this transitory behavior, the dynamical Wasserstein barycenter (DWB) model of [1] associates
with each pure state a data-generating distribution and models the continuous transitions among these states as a Wasserstein
barycenter of these distributions with dynamically evolving weights. Focusing on the univariate case where Wasserstein distances
and barycenters can be computed in closed form, we extend [1] specifically relaxing the parameterization of the pure states as
Gaussian distributions. We highlight issues related to the uniqueness in identifying the model parameters as well as uncertainties
induced when estimating a dynamically evolving distribution from a limited number of samples. To ameliorate non-uniqueness,
we introduce regularization that imposes temporal smoothness on the dynamics of the barycentric weights. A quantile-based
approximation of the pure state distributions yields a finite dimensional estimation problem which we numerically solve using
cyclic descent alternating between updates to the pure-state quantile functions and the barycentric weights. We demonstrate the
utility of the proposed algorithm in segmenting both simulated and real world human activity time series.
Index Terms
Wasserstein barycenter, displacement interpolation, dynamical model, sequential data, time series analysis, sliding window,
non-parametric, quantile function, human activity analysis.
I. INTRODUCTION
We consider a probabilistic model for sequentially observed data where the observation at each point in time depends on a
dynamically evolving latent state. We are particularly motivated by systems that continuously move among a set of canonical
behaviors, which we call pure states. Over some periods, the system may reside entirely in one of the pure states while over
other periods, the system is transitioning among these pure states in a temporally smooth manner. There are many applications
where such a model is appropriate including climate modeling [2], sleep analysis [3], simulating physical systems [4], as well
as characterizing human activity from video [5] or wearable-derived accelerometry [6] data. Using the last case as an example,
there will be periods when the individual will be engaged in a well-defined activity such as standing or running. During these
intervals, the data can be modeled as drawn from a probability distribution specific to that canonical state. Given the high
sampling rates of modern sensors, there also may be intervals where multiple consecutive observations reflect the gradual
transition between or among pure states. Over these periods the distribution of the data is given by a suitable combination of
the pure state distributions. Therefore, one possible model for these types of systems consists of three components: a set of
distributions containing the data-generating distribution for each pure state, a continuously evolving latent state which captures
the transition dynamics of the system as it moves among these pure states, and a means of interpolating among these pure
state distributions to characterize the data distribution in the transition regions.
These types of systems pose some unique considerations that are not sufficiently addressed by prior work in time series
modeling. The two most common methods for modeling latent state systems are continuous and discrete state-space models.
Continuous state-space models [7], [8], [9] have no natural way to identify those pure states in which the system may persist
for periods of time. In discrete state-space models such as hidden Markov models, [10], [11], [12], the dynamics are captured
by a temporally varying state vector whose elements represent the probability that the system resides in each of a countable
number of discrete (or in our terminology, pure) states. For these models, the data-generating distribution associated with this
Tufts University, Dept. of Electrical and Computer Engineering
Tufts University, Dept. of Computer Science
This research was sponsored by the U.S. Army DEVCOM Soldier Center under the Measuring and Advancing Soldier Tactical Readiness and Effectiveness
program and Cooperative Agreement Number W911QY-19-2-0003. We also acknowledge support from the U.S. National Science Foundation under award
HDR-1934553 for the Tufts T-TRIPODS Institute. Shuchin Aeron is supported in part by NSF CCF:1553075, NSF RAISE 1931978, NSF ERC planning
1937057, and AFOSR FA9550-18-1-0465. Michael C. Hughes is supported in part by NSF IIS-1908617. Eric L. Miller is supported in part by NSF grants
1934553, 1935555, 1931978, and 1937057.
Code repository: https://github.com/kevin-c-cheng/DWB Nonparametric
©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works. DOI: 10.1109/TSP.2023.3303616
arXiv:2210.01918v3 [cs.LG] 21 Sep 2023
2
Fig. 1: Comparison of convex combination vs Wasserstein barycenter for modeling human activity transitions. The
beep test (BT) dataset consists of a subject running back-and-forth between two points, stopping at each point (see Sec. V-B
for more details). (a) The probability distribution functions (PDF) of the vertical acceleration of the system’s two pure states
(stand, run) are estimated using a KDE with a Gaussian kernel whose mean corresponds to the observed data when the system
resides in these pure states. Modeling a transition from stand to run via the time-varying weights (b) for t= 1, .., 5, we show
the resulting data distributions during this transition region according to a convex combination (c) and Wasserstein barycenter
(d) interpolation model.
latent state vector is given as a convex combination, i.e. a linear mixture, of the pure state distributions. As argued in [1] this is
also an insufficient model for the problems which interest us. As an example, for modeling human activity, the data distribution
illustrated in Fig. 1 produced by a convex combination of the underlying “standing” and “running” pure state distributions can
be interpreted as “sometimes standing” and “sometimes running,” which is not a proper description of the gradual transition
that actually occurs.
A better model for interpolating the data distribution during the transition period between standing and running would
smoothly shift probability mass between the two pure state distributions. As illustrated in Fig. 1, such blending can be
achieved through displacement interpolation [13] between two pure states or, for more than two pure states, as a Wasserstein
barycenter of the associated distributions [14]. Using this perspective, Dynamical Wasserstein Barycenters (DWB) [1] were
recently proposed to model a dynamically evolving distribution as a sequence of Wasserstein barycenters constructed as a
time-varying convex combination of the pure state distributions. The dynamical weights, which lie on the probability simplex,
are taken to be the latent state of the model. A Bayesian model is proposed in [1], whose parameters were determined via
maximum a posteriori estimation.
Here we expand on [1] by highlighting two challenging characteristics of the DWB model and two improvements that address
certain limitations of the original DWB approach. The first characteristic relates to uniqueness in DWB model identification,
where multiple combinations of pure state distributions and barycentric weights can produce the same Wasserstein barycenter.
Although this is true for multidimensional distributions, here we use a univariate formulation to more transparently demonstrate
how this non-uniqueness is captured in an inverse-scaling relationship between the model’s latent state and pure state parameters.
The second characteristic is related to a tradeoff in tracking and estimating an evolving data distribution from a single instance
of a time series using a windowed approach to collect samples. Smaller windows lack the number of samples to ensure small
statistical error in the estimation of the data distribution at a given point in time. On the other hand, larger windows span
longer periods during which, under relatively faster dynamics, the data distribution can change significantly again increasing
the estimation error. In a simulated example, where the dynamics consist of constant rate transitions between two Gaussian
states, we show that there exists an optimal window size that balances these two effects and discuss the dependency of this
window size tradeoff on the temporal dynamics of the latent state and pure states of the system.
Our first improvement addresses the limitations of the choice in [1] in using a probabilistic prior for the dynamics of the
DWB weight vector. That approach may introduce additional unnecessary or potentially undesirable probabilistic properties on
the latent state process such as a limiting distribution [15] which fails to adequately regularize the DWB estimation problem.
Instead, we propose here a regularization scheme that imposes temporal smoothness by penalizing the difference between the
simplex-constrained, latent state vectors at adjacent points in time. Drawing from the field of compositional analysis [16], the
Bhattacharya-arccos distance [17] proves to be well-suited to our needs. As a consequence of the aforementioned inverse-
scaling relationship, introducing this latent state regularizer impacts the model’s pure state distribution in a manner that causes
them to diverge from the data. Therefore, we also introduce a regularizer to counteract this effect to ensure that the learned
pure state distributions are representative of data while the system resides in each pure state.
Our second improvement removes the restriction in [1] where a parametric approach to model pure states with multivariate
Gaussians was employed. Here we adopt a non-parametric approach and focus on the univariate case where the Wassertein-2
distance between distributions is equivalent to the 2-norm between their respective quantile functions [18]. Using a discrete
approximation to the pure state quantile functions leads to a convenient finite dimensional, regularized linear least squares
problem for estimating the pure states.
Our numerical experiments empirically validate our analysis and improvements to the DWB model. Using simulated data,
we demonstrate in a controlled setting how we effectively regularize our model parameters with proper consideration of the
3
inverse-scaling analysis and the impact of window size on the accuracy of the model parameters. Additionally, using real
world human activity data, we show how our non-parametric approach leads to improved estimation of the system’s pure state
distributions as well as improved fit of the time-evolving distribution of the observed data compared to [1].
In summary, the primary contributions of this work consist of the following:
1) We highlight the non-uniqueness of the parameters corresponding to a Wasserstein barycenter by detailing the inverse-
scaling relationship between the pure state distributions and the simplex-valued barycentric weights.
2) We explore the impact of the window size on the ability to accurately estimate a dynamically evolving data distribution
by exploring the tradeoff between the errors associated with large and small windows and the dependency of this tradeoff
on the dynamics and pure states of the system.
3) We propose regularizers for the model parameters that impose temporal smoothness in the latent states in a manner that
addresses the non-uniqueness of the model.
4) We propose a flexible, non-parametric representation for univariate pure state distributions using a discrete approximation
to the quantile function that results in a finite dimensional formulation for DWB learning.
The remainder of the paper is organized as follows: in Sec. II, we provide an overview of the Wasserstein distance and
barycenter focusing on the univariate case. In Sec. III, we discuss the DWB model, highlighting the non-uniqueness and inverse-
scaling property of the Wasserstein barycenter as well as the impact of the window size on the estimation of a dynamically
evolving data distribution. In Sec. IV, we develop a variational problem for learning a DWB model, followed by a discussion of
the regularization approach, and discretization of the pure state distributions required to obtain a finite dimensional estimation
problem. We then formally state our non-parametric and regularized DWB variational problem and provide an algorithm to
estimate the model parameters. In Sec. V we use simulated data to demonstrate the non-uniqueness, impact of window size
and regularization terms discussed in this work and use real world human activity data to demonstrate the advantages of the
non-parametric DWB approach relative to the Gaussian model.
II. TECHNICAL BACKGROUND
The Wasserstein-2 distance is a metric on the space of probability distributions on Rdwith finite second moments [18], [19].
For two random variables qand sdistributions ρqand ρs, the squared Wasserstein-2 distance is defined via,
W2
2(ρq, ρs) = inf
πΠ(ρqs)
Eq,sπqs2
2(1)
where πdenotes the joint distribution of qand s, and Π(ρq, ρs)is the set of all joint distributions with marginals ρq, ρs. In
this work, we refer to Eq. (1) as the squared Wasserstein distance.
Given a set of distributions ρq1:K={ρq1, ρq2, ..., ρqK}and a vector xK, where Kdenotes the standard K-simplex,
the Wasserstein barycenter is the distribution that minimizes the weighted (with respect to elements in x) squared Wasserstein
distance to the set of distributions [14] and is given by,
ρB=B(x, ρq1:K) = argmin
ρ
K
X
k=1
x[k]W2
2(ρ, ρqk),(2)
where x[k]denotes the k-th element of the vector x. When ρqand ρsare univariate distributions with cumulative distribution
functions Pq, Ps, the squared Wasserstein distance in Eq. (1) becomes [19], [20],
W2
2(ρq, ρs) = Z1
0P1
q(ξ)P1
s(ξ)2. (3)
Here P1
qand P1
sare quantile functions, the generalized inverse [21] of the cumulative distribution function, given by,
P1(ξ) = inf{gR:P(g)ξ}.(4)
It follows from Eq. (3) and Eq. (2) that the Wasserstein barycenter of a set of univariate distributions with quantile functions
P1
q1:K, will have quantile function [20],
P1
B=
K
X
k=1
x[k]P1
qk.(5)
III. THE DYNAMICAL WASSERSTEIN BARYCENTER MODEL
Shown in Fig. 2, the DWB model [1] describes the distribution of a time series ytat time tas,
ytρBt=B(xt, ρq1:K)(6)
where ρqk,k= 1,2, . . . , K are the distributions of the pure states and the barycentric weight xtKcapture the dynamics
of the transitions among these pure states.
4
Fig. 2: DWB model diagram. The DWB models the distribution ρBtfrom which the time series ytis sampled as the
Wasserstein barycenter of a set of pure state distributions ρq1:Kand barycentric weight x1:T, the latent state of the model.
Fig. 3: Diagram of the non-uniqueness and inverse-scaling effect of the parameters of a Wasserstein barycenter. Consider
a set of three pure states with quantile functions P1
q1:3 and simplex-valued weight xB3where ρB=B(xB, ρq1:3 )with
quanitle function P1
B=P3
k=1 xB[k]P1
qk. We construct a family of distinctly different pure state quantile functions ¯
P1
q1:K
and barycentric weights ¯
xBwhich produce the exact same barycenter P1
B=P3
k=1 ¯
xB[k]¯
P1
qk. Let x0be another point on
the simplex where ρ0=B(x0, ρq1:3 )has quantile function P1
0=P3
k=1 x0[k]P1
qk. Given x0and xB, let ¯
xBgiven by
Eq. (7) be any point on the line connecting x0through xBto the edge of the simplex. Moving ¯
xBaway from x0along the
line connecting x0and xB(orange segments), causes the pure states quantile functions ¯
P1
q1:3 to move from P1
q1:3 towards
P1
0. This corresponds to α[α0,1] where α0is the smallest value of αsuch that ¯
xBstill lies on the simplex. Conversely
moving xBtowards x0(blue segments) results in the pure state quantile functions moving away from P1
0. This corresponds
to α[1, αm], where αmis the largest value of αsuch that all ¯
Pq1:3 remain in the set of quantile functions Q.
Given yt, t = 1,2,· · · , T modeled via equation (6), the problem is to estimate DWB model parameters which consist of the
pure state distributions and the sequence of barycentric weights.
Below we discuss two key characteristics that pose challenges for estimating the parameters of the DWB model. The first
is the non-uniqueness of the parameters (i.e., the pure state distributions and the barycentric weights) that yield a Wasserstein
barycenter. The second relates to the complications that arise when we are provided only a single time series for learning a
DWB model.
A. Non-uniqueness in the Parameters of a Wasserstein Barycenter
The issue of uniqueness refers to the fact that a Wasserstein barycenter is not described by a unique set of pure state
distributions and barycentric weights. While the statement is true regardless of dimension (see Appendix A for an example),
given the focus of this paper, we examine the univariate case in some detail. Specifically, we provide a construction that
illustrates an inverse-scaling relation between the family of pure state distributions and barycentric weights that yields the
same Wasserstein barycenter.
As shown in Fig. 3, assume we have a set of pure state distributions ρq1:Kindexed by k= 1,2, . . . , K with quantile
functions, P1
qkand barycentric weights xBKwhich give rise to the barycenter ρB=B(xB, ρq1:K)with quantile function
P1
B=PK
k=1 xB[k]P1
qk(Eq. (5)). For now, xBis assumed to lie in the interior of the simplex and we consider below the
cases where xBis on a lower dimensional face or vertex. Let us choose another point x0̸=xBcorresponding to barycentric
quantile function P1
0=PK
k=1 x0[k]P1
qk(Eq. (5)). We construct a family of barycentric weights ¯
xBand pure state quantile
functions, ¯
P1
q1:Kcorresponding to distributions ¯ρq1:Ksuch that ρB=B(¯
xB,¯ρq1:K), or in other words, P1
B=PK
k=1 ¯
xB[k]¯
P1
qk.
摘要:

1Non-ParametricandRegularizedDynamicalWassersteinBarycentersforSequentialObservationsKevinC.Cheng∗,IEEEStudentMember,EricL.Miller∗IEEEFellow,MichaelC.Hughes†,ShuchinAeron∗IEEESeniorMemberAbstractWeconsiderprobabilisticmodelsforsequentialobservationswhichexhibitgradualtransitionsamongafinitenumberofs...

展开>> 收起<<
1 Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.69MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注