1 Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations

2025-04-28 0 0 1.69MB 18 页 10玖币

侵权投诉

Non-Parametric and Regularized Dynamical

Wasserstein Barycenters for Sequential Observations

Kevin C. Cheng∗,IEEE Student Member, Eric L. Miller∗IEEE Fellow,

Michael C. Hughes†, Shuchin Aeron∗IEEE Senior Member

Abstract

We consider probabilistic models for sequential observations which exhibit gradual transitions among a ﬁnite number of states.

We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains

segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition

among these pure states. To capture this transitory behavior, the dynamical Wasserstein barycenter (DWB) model of [1] associates

with each pure state a data-generating distribution and models the continuous transitions among these states as a Wasserstein

barycenter of these distributions with dynamically evolving weights. Focusing on the univariate case where Wasserstein distances

and barycenters can be computed in closed form, we extend [1] speciﬁcally relaxing the parameterization of the pure states as

Gaussian distributions. We highlight issues related to the uniqueness in identifying the model parameters as well as uncertainties

induced when estimating a dynamically evolving distribution from a limited number of samples. To ameliorate non-uniqueness,

we introduce regularization that imposes temporal smoothness on the dynamics of the barycentric weights. A quantile-based

approximation of the pure state distributions yields a ﬁnite dimensional estimation problem which we numerically solve using

cyclic descent alternating between updates to the pure-state quantile functions and the barycentric weights. We demonstrate the

utility of the proposed algorithm in segmenting both simulated and real world human activity time series.

Index Terms

Wasserstein barycenter, displacement interpolation, dynamical model, sequential data, time series analysis, sliding window,

non-parametric, quantile function, human activity analysis.

I. INTRODUCTION

We consider a probabilistic model for sequentially observed data where the observation at each point in time depends on a

dynamically evolving latent state. We are particularly motivated by systems that continuously move among a set of canonical

behaviors, which we call pure states. Over some periods, the system may reside entirely in one of the pure states while over

other periods, the system is transitioning among these pure states in a temporally smooth manner. There are many applications

where such a model is appropriate including climate modeling [2], sleep analysis [3], simulating physical systems [4], as well

as characterizing human activity from video [5] or wearable-derived accelerometry [6] data. Using the last case as an example,

there will be periods when the individual will be engaged in a well-deﬁned activity such as standing or running. During these

intervals, the data can be modeled as drawn from a probability distribution speciﬁc to that canonical state. Given the high

sampling rates of modern sensors, there also may be intervals where multiple consecutive observations reﬂect the gradual

transition between or among pure states. Over these periods the distribution of the data is given by a suitable combination of

the pure state distributions. Therefore, one possible model for these types of systems consists of three components: a set of

distributions containing the data-generating distribution for each pure state, a continuously evolving latent state which captures

the transition dynamics of the system as it moves among these pure states, and a means of interpolating among these pure

state distributions to characterize the data distribution in the transition regions.

These types of systems pose some unique considerations that are not sufﬁciently addressed by prior work in time series

modeling. The two most common methods for modeling latent state systems are continuous and discrete state-space models.

Continuous state-space models [7], [8], [9] have no natural way to identify those pure states in which the system may persist

for periods of time. In discrete state-space models such as hidden Markov models, [10], [11], [12], the dynamics are captured

by a temporally varying state vector whose elements represent the probability that the system resides in each of a countable

number of discrete (or in our terminology, pure) states. For these models, the data-generating distribution associated with this

∗Tufts University, Dept. of Electrical and Computer Engineering

†Tufts University, Dept. of Computer Science

This research was sponsored by the U.S. Army DEVCOM Soldier Center under the Measuring and Advancing Soldier Tactical Readiness and Effectiveness

program and Cooperative Agreement Number W911QY-19-2-0003. We also acknowledge support from the U.S. National Science Foundation under award

HDR-1934553 for the Tufts T-TRIPODS Institute. Shuchin Aeron is supported in part by NSF CCF:1553075, NSF RAISE 1931978, NSF ERC planning

1937057, and AFOSR FA9550-18-1-0465. Michael C. Hughes is supported in part by NSF IIS-1908617. Eric L. Miller is supported in part by NSF grants

1934553, 1935555, 1931978, and 1937057.

Code repository: https://github.com/kevin-c-cheng/DWB Nonparametric

reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or

reuse of any copyrighted component of this work in other works. DOI: 10.1109/TSP.2023.3303616

arXiv:2210.01918v3 [cs.LG] 21 Sep 2023

Fig. 1: Comparison of convex combination vs Wasserstein barycenter for modeling human activity transitions. The

beep test (BT) dataset consists of a subject running back-and-forth between two points, stopping at each point (see Sec. V-B

for more details). (a) The probability distribution functions (PDF) of the vertical acceleration of the system’s two pure states

(stand, run) are estimated using a KDE with a Gaussian kernel whose mean corresponds to the observed data when the system

resides in these pure states. Modeling a transition from stand to run via the time-varying weights (b) for t= 1, .., 5, we show

the resulting data distributions during this transition region according to a convex combination (c) and Wasserstein barycenter

(d) interpolation model.

latent state vector is given as a convex combination, i.e. a linear mixture, of the pure state distributions. As argued in [1] this is

also an insufﬁcient model for the problems which interest us. As an example, for modeling human activity, the data distribution

illustrated in Fig. 1 produced by a convex combination of the underlying “standing” and “running” pure state distributions can

be interpreted as “sometimes standing” and “sometimes running,” which is not a proper description of the gradual transition

that actually occurs.

A better model for interpolating the data distribution during the transition period between standing and running would

smoothly shift probability mass between the two pure state distributions. As illustrated in Fig. 1, such blending can be

achieved through displacement interpolation [13] between two pure states or, for more than two pure states, as a Wasserstein

barycenter of the associated distributions [14]. Using this perspective, Dynamical Wasserstein Barycenters (DWB) [1] were

recently proposed to model a dynamically evolving distribution as a sequence of Wasserstein barycenters constructed as a

time-varying convex combination of the pure state distributions. The dynamical weights, which lie on the probability simplex,

are taken to be the latent state of the model. A Bayesian model is proposed in [1], whose parameters were determined via

maximum a posteriori estimation.

Here we expand on [1] by highlighting two challenging characteristics of the DWB model and two improvements that address

certain limitations of the original DWB approach. The ﬁrst characteristic relates to uniqueness in DWB model identiﬁcation,

where multiple combinations of pure state distributions and barycentric weights can produce the same Wasserstein barycenter.

Although this is true for multidimensional distributions, here we use a univariate formulation to more transparently demonstrate

how this non-uniqueness is captured in an inverse-scaling relationship between the model’s latent state and pure state parameters.

The second characteristic is related to a tradeoff in tracking and estimating an evolving data distribution from a single instance

of a time series using a windowed approach to collect samples. Smaller windows lack the number of samples to ensure small

statistical error in the estimation of the data distribution at a given point in time. On the other hand, larger windows span

longer periods during which, under relatively faster dynamics, the data distribution can change signiﬁcantly again increasing

the estimation error. In a simulated example, where the dynamics consist of constant rate transitions between two Gaussian

states, we show that there exists an optimal window size that balances these two effects and discuss the dependency of this

window size tradeoff on the temporal dynamics of the latent state and pure states of the system.

Our ﬁrst improvement addresses the limitations of the choice in [1] in using a probabilistic prior for the dynamics of the

DWB weight vector. That approach may introduce additional unnecessary or potentially undesirable probabilistic properties on

the latent state process such as a limiting distribution [15] which fails to adequately regularize the DWB estimation problem.

Instead, we propose here a regularization scheme that imposes temporal smoothness by penalizing the difference between the

simplex-constrained, latent state vectors at adjacent points in time. Drawing from the ﬁeld of compositional analysis [16], the

Bhattacharya-arccos distance [17] proves to be well-suited to our needs. As a consequence of the aforementioned inverse-

scaling relationship, introducing this latent state regularizer impacts the model’s pure state distribution in a manner that causes

them to diverge from the data. Therefore, we also introduce a regularizer to counteract this effect to ensure that the learned

pure state distributions are representative of data while the system resides in each pure state.

Our second improvement removes the restriction in [1] where a parametric approach to model pure states with multivariate

Gaussians was employed. Here we adopt a non-parametric approach and focus on the univariate case where the Wassertein-2

distance between distributions is equivalent to the 2-norm between their respective quantile functions [18]. Using a discrete

approximation to the pure state quantile functions leads to a convenient ﬁnite dimensional, regularized linear least squares

problem for estimating the pure states.

Our numerical experiments empirically validate our analysis and improvements to the DWB model. Using simulated data,

we demonstrate in a controlled setting how we effectively regularize our model parameters with proper consideration of the

inverse-scaling analysis and the impact of window size on the accuracy of the model parameters. Additionally, using real

world human activity data, we show how our non-parametric approach leads to improved estimation of the system’s pure state

distributions as well as improved ﬁt of the time-evolving distribution of the observed data compared to [1].

In summary, the primary contributions of this work consist of the following:

1) We highlight the non-uniqueness of the parameters corresponding to a Wasserstein barycenter by detailing the inverse-

scaling relationship between the pure state distributions and the simplex-valued barycentric weights.

2) We explore the impact of the window size on the ability to accurately estimate a dynamically evolving data distribution

by exploring the tradeoff between the errors associated with large and small windows and the dependency of this tradeoff

on the dynamics and pure states of the system.

3) We propose regularizers for the model parameters that impose temporal smoothness in the latent states in a manner that

addresses the non-uniqueness of the model.

4) We propose a ﬂexible, non-parametric representation for univariate pure state distributions using a discrete approximation

to the quantile function that results in a ﬁnite dimensional formulation for DWB learning.

The remainder of the paper is organized as follows: in Sec. II, we provide an overview of the Wasserstein distance and

barycenter focusing on the univariate case. In Sec. III, we discuss the DWB model, highlighting the non-uniqueness and inverse-

scaling property of the Wasserstein barycenter as well as the impact of the window size on the estimation of a dynamically

evolving data distribution. In Sec. IV, we develop a variational problem for learning a DWB model, followed by a discussion of

the regularization approach, and discretization of the pure state distributions required to obtain a ﬁnite dimensional estimation

problem. We then formally state our non-parametric and regularized DWB variational problem and provide an algorithm to

estimate the model parameters. In Sec. V we use simulated data to demonstrate the non-uniqueness, impact of window size

and regularization terms discussed in this work and use real world human activity data to demonstrate the advantages of the

non-parametric DWB approach relative to the Gaussian model.

II. TECHNICAL BACKGROUND

The Wasserstein-2 distance is a metric on the space of probability distributions on Rdwith ﬁnite second moments [18], [19].

For two random variables qand sdistributions ρqand ρs, the squared Wasserstein-2 distance is deﬁned via,

2(ρq, ρs) = inf

π∈Π(ρq,ρs)

Eq,s∼π∥q−s∥2

2(1)

where πdenotes the joint distribution of qand s, and Π(ρq, ρs)is the set of all joint distributions with marginals ρq, ρs. In

this work, we refer to Eq. (1) as the squared Wasserstein distance.

Given a set of distributions ρq1:K={ρq1, ρq2, ..., ρqK}and a vector x∈∆K, where ∆Kdenotes the standard K-simplex,

the Wasserstein barycenter is the distribution that minimizes the weighted (with respect to elements in x) squared Wasserstein

distance to the set of distributions [14] and is given by,

ρB=B(x, ρq1:K) = argmin

k=1

x[k]W2

2(ρ, ρqk),(2)

where x[k]denotes the k-th element of the vector x. When ρqand ρsare univariate distributions with cumulative distribution

functions Pq, Ps, the squared Wasserstein distance in Eq. (1) becomes [19], [20],

2(ρq, ρs) = Z1

0P−1

q(ξ)−P−1

s(ξ)2dξ. (3)

Here P−1

qand P−1

sare quantile functions, the generalized inverse [21] of the cumulative distribution function, given by,

P−1(ξ) = inf{g∈R:P(g)≥ξ}.(4)

It follows from Eq. (3) and Eq. (2) that the Wasserstein barycenter of a set of univariate distributions with quantile functions

P−1

q1:K, will have quantile function [20],

P−1

k=1

x[k]P−1

qk.(5)

III. THE DYNAMICAL WASSERSTEIN BARYCENTER MODEL

Shown in Fig. 2, the DWB model [1] describes the distribution of a time series ytat time tas,

yt∼ρBt=B(xt, ρq1:K)(6)

where ρqk,k= 1,2, . . . , K are the distributions of the pure states and the barycentric weight xt∈∆Kcapture the dynamics

of the transitions among these pure states.

Fig. 2: DWB model diagram. The DWB models the distribution ρBtfrom which the time series ytis sampled as the

Wasserstein barycenter of a set of pure state distributions ρq1:Kand barycentric weight x1:T, the latent state of the model.

Fig. 3: Diagram of the non-uniqueness and inverse-scaling effect of the parameters of a Wasserstein barycenter. Consider

a set of three pure states with quantile functions P−1

q1:3 and simplex-valued weight xB∈∆3where ρB=B(xB, ρq1:3 )with

quanitle function P−1

B=P3

k=1 xB[k]P−1

qk. We construct a family of distinctly different pure state quantile functions ¯

P−1

q1:K

and barycentric weights ¯

xBwhich produce the exact same barycenter P−1

B=P3

k=1 ¯

xB[k]¯

P−1

qk. Let x0be another point on

the simplex where ρ0=B(x0, ρq1:3 )has quantile function P−1

0=P3

k=1 x0[k]P−1

qk. Given x0and xB, let ¯

xBgiven by

Eq. (7) be any point on the line connecting x0through xBto the edge of the simplex. Moving ¯

xBaway from x0along the

line connecting x0and xB(orange segments), causes the pure states quantile functions ¯

P−1

q1:3 to move from P−1

q1:3 towards

P−1

0. This corresponds to α∈[α0,1] where α0is the smallest value of αsuch that ¯

xBstill lies on the simplex. Conversely

moving xBtowards x0(blue segments) results in the pure state quantile functions moving away from P−1

0. This corresponds

to α∈[1, αm], where αmis the largest value of αsuch that all ¯

Pq1:3 remain in the set of quantile functions Q.

Given yt, t = 1,2,· · · , T modeled via equation (6), the problem is to estimate DWB model parameters which consist of the

pure state distributions and the sequence of barycentric weights.

Below we discuss two key characteristics that pose challenges for estimating the parameters of the DWB model. The ﬁrst

is the non-uniqueness of the parameters (i.e., the pure state distributions and the barycentric weights) that yield a Wasserstein

barycenter. The second relates to the complications that arise when we are provided only a single time series for learning a

DWB model.

A. Non-uniqueness in the Parameters of a Wasserstein Barycenter

The issue of uniqueness refers to the fact that a Wasserstein barycenter is not described by a unique set of pure state

distributions and barycentric weights. While the statement is true regardless of dimension (see Appendix A for an example),

given the focus of this paper, we examine the univariate case in some detail. Speciﬁcally, we provide a construction that

illustrates an inverse-scaling relation between the family of pure state distributions and barycentric weights that yields the

same Wasserstein barycenter.

As shown in Fig. 3, assume we have a set of pure state distributions ρq1:Kindexed by k= 1,2, . . . , K with quantile

functions, P−1

qkand barycentric weights xB∈∆Kwhich give rise to the barycenter ρB=B(xB, ρq1:K)with quantile function

P−1

B=PK

k=1 xB[k]P−1

qk(Eq. (5)). For now, xBis assumed to lie in the interior of the simplex and we consider below the

cases where xBis on a lower dimensional face or vertex. Let us choose another point x0̸=xBcorresponding to barycentric

quantile function P−1

0=PK

k=1 x0[k]P−1

qk(Eq. (5)). We construct a family of barycentric weights ¯

xBand pure state quantile

functions, ¯

P−1

q1:Kcorresponding to distributions ¯ρq1:Ksuch that ρB=B(¯

xB,¯ρq1:K), or in other words, P−1

B=PK

k=1 ¯

xB[k]¯

P−1

qk.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Non-ParametricandRegularizedDynamicalWassersteinBarycentersforSequentialObservationsKevinC.Cheng∗,IEEEStudentMember,EricL.Miller∗IEEEFellow,MichaelC.Hughes†,ShuchinAeron∗IEEESeniorMemberAbstractWeconsiderprobabilisticmodelsforsequentialobservationswhichexhibitgradualtransitionsamongafinitenumberofs...

展开>> 收起<<

1 Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: