1 Detection and Evaluation of Clusters within Sequential Data

2025-04-28 0 0 7.57MB 37 页 10玖币

侵权投诉

Detection and Evaluation of

Clusters within Sequential Data

Alexander Van Werde, Albert Senen–Cerda, Gianluca Kosmella, Jaron Sanders

Abstract

—Motivated by theoretical advancements in dimensionality

reduction techniques we use a recent model, called Block Markov

Chains, to conduct a practical study of clustering in real-world sequential

data. Clustering algorithms for Block Markov Chains possess theoretical

optimality guarantees and can be deployed in sparse data regimes. Despite

these favorable theoretical properties, a thorough evaluation of these

algorithms in realistic settings has been lacking.

We address this issue and investigate the suitability of these clustering

algorithms in exploratory data analysis of real-world sequential data. In

particular, our sequential data is derived from human DNA, written text,

animal movement data and ﬁnancial markets. In order to evaluate the

determined clusters, and the associated Block Markov Chain model,

we further develop a set of evaluation tools. These tools include

benchmarking, spectral noise analysis and statistical model selection

tools. An eﬃcient implementation of the clustering algorithm and the

new evaluation tools is made available together with this paper.

Practical challenges associated to real-world data are encountered

and discussed. It is ultimately found that the Block Markov Chain model

assumption, together with the tools developed here, can indeed produce

meaningful insights in exploratory data analyses despite the complexity

and sparsity of real-world data.

1 Introduction

Modern data often consists of observations that were obtained

from some complex process, and that became available se-

quentially. The speciﬁc order in which the observations occur

often matters: future observations correlate typically with past

observations. By identifying a relation between subsequent ob-

servations within sequential data one may hope to gain insight

into the underlying complex process. The high-dimensional

nature of modern data however can make understanding the

sequential structure diﬃcult. Many algorithms which one

may want to apply to the data namely slow down to an

infeasible degree in a high-dimensional regime. Further, human

interpretation of the data may become tedious. To avoid these

issues it is desirable to identify a latent structure which respects

the sequential structure but has reduced dimensions.

In this paper we are concerned with uncovering lower-

dimensional structures within sequential data when this latent

structure is hidden from the observer but still detectable

from the speciﬁc sequence of observations. Practical tools

The authors are with the Faculty of Mathematics and Computer

Science, Eindhoven University of Technology, Groene Loper 5, 5612

AZ Eindhoven, The Netherlands.

E-mail: a.van.werde@tue.nl, a.senen.cerda@tue.nl,

g.k.kosmella@tue.nl, jaron.sanders@tue.nl

for the discovery, evaluation, and presentation of the lower-

dimensional structures from real-world sequential data are

developed and applied. The term “real-world data” is here

in contrast to synthetic data. For synthetic data one namely

often has prior knowledge concerning a ground-truth latent

structure, and this facilitates evaluation of latent structures

which are output by an algorithm. Such prior knowledge is not

available for real-world data, complicating the evaluation.

We focus on a popular class of methods for discovering

latent structure in datasets: clustering algorithms. Clustering

algorithms work by clustering together data points from a

dataset that are “similar” in some sense. If for instance data

has a geometric structure for which a notion of distance

is applicable then one may call two points similar if their

distance is small. This distance-based notion of similarity

can be leveraged with the well-known

-means algorithm

for clustering point clouds. If instead the data has a graph

structure then it is natural to call two vertices of the graph

similar if they connect to other vertices in similar ways. This

connection-based notion is made rigorous in the Stochastic

Block Model for random graphs.

A natural notion of similarity for sequential observations,

whose order matters, may similarly be given by the following

informal criterion: “two observations are similar if and only

if they transition into other observations in similar ways.”

This transition-based notion is made formal in Block Markov

Chains (BMCs). Speciﬁcally, the BMC-model assumes that the

observations are the states of a Markov Chain (MC) in which

the state space can be partitioned in such a manner that the

transition rate between two states only depends on the parts

of the partition (clusters) in which these two states lie.

The problem of clustering the observations in a single

sequence of observations of a BMC was recently investigated

theoretically [

], [

]. For example, given a sample path gener-

ated by a BMC, the authors of [

] provide information-theoretic

thresholds below which exact clustering is impossible because

insuﬃcient data is available. The authors also provide a

clustering algorithm which can provably recover the underlying

clusters whenever these conditions are satisﬁed. The clustering

algorithm consists of two steps. First, an initial guess for the

underlying cluster structure is found by means of a spectral

initialization. Here the term spectral refers to the fact that

singular vectors of a random matrix associated to the sample

path are employed. These singular vectors are used to construct

a low-rank approximation of the random matrix after which

-means algorithm is applied. Second, the initial guess for

arXiv:2210.01679v1 [cs.LG] 4 Oct 2022

the cluster structure is reﬁned by means of an improvement

algorithm that reconsiders the sequence of observations and

performs a greedy, local maximization of the log-likelihood

function of the BMC model.

A broad study on the performance of this clustering

algorithm for sequential data obtained from real-world pro-

cesses was however not provided. We tackle this problem

by investigating the clustering algorithm under a variety of

scenarios, and thus demonstrating its applicability in real-

world data. This is the main contribution of the present paper.

Evaluating the actual performance of such clustering algorithm

in a real-world scenario is however a nontrivial task since the

latent cluster structure is unknown, as opposed to a scenario

where data is generated synthetically. Hence, we also propose a

set of tools used for evaluation of the quality of the clusters and

the BMC model. We go, therefore, a step beyond clustering

benchmark evaluation and use statistical tools to assess the

validity of the BMC assumption in the data.

Our goal is here not to compare the performance of the

BMC clustering algorithm relative to other algorithms. A

comparison would namely only be fair if both algorithms have

access to the same amount of training data. Now remark

that the BMC algorithm is explicitly designed to manage in

sparse regimes where the amount of data is small. On the

other hand most model-free algorithms, such as those based

on deep learning, perform optimally in a regime where one has

access to large amounts of training data. A comparison would

consequently be vacuous; its outcome mainly being determined

by the choice of the amount of training data. Our goal is rather

to supplement the theoretical understanding of the BMC-based

algorithm with a practical viewpoint.

In order to practically evaluate the clustering algorithm for

BMCs, we investigate the following questions:

•

How can the BMC model practically aid in data explo-

ration of nonsynthetic sequential data obtained from real-

world data?

•

Can the quality of the found clusters be evaluated when

no ground-truth clustering is known?

•

How can one statistically decide whether the BMC model

is an appropriate model for the sequence of observations?

How can it be detected that either a simpler model than a

BMC would suﬃce, or a richer model is required?

•

Can the algorithm be expected to give meaningful results

despite the sparsity and complexity of real-world data?

Are the clustering algorithms robust to model violations?

To obtain insight into these questions, we develop practical

tools for evaluating the merit of a detected clustering, which

we then apply to several data sets. These evaluation tools

are designed to cope with the aforementioned fact that

ground-truth clusters are not known to us. The datasets

which we consider are diverse as they come from the ﬁelds

of microbiology, natural language processing, ethology, and

ﬁnance. Speciﬁcally, we investigate sequences of:

a. Codons in human Deoxyribonucleic Acid (DNA).

b. Words in Wikipedia articles.

Global Positioning System (GPS) data describing the

movement of bison in a ﬁeld.

Companies in the Standard and Poor’s 500 (S&P500) with

the highest daily return.

On each dataset we apply the BMC clustering algorithm to

uncover underlying clusters, and we evaluate these clusters

using the collection of tools which are developed in the present

paper. It is here found that the BMC model indeed aids in

exploratory data analyses of real-world sequential data.

For example, in DNA the algorithm leads us to rediscover

phenomena which are known in the biological community

as codon–pair bias and dinucleotide bias. In the text-based

sequential data we ﬁnd that the BMC-based improvement

algorithm improves performance on downstream natural lan-

guage processing tasks. The model evaluation tools of the

present paper are here found to be informative. They namely

uncover some model violations which are suggestive for future

methodological expansion. Our ﬁndings in the GPS data are

particularly striking. There, a scatter plot of the data gives

rise to a picture which is diﬃcult to interpret. After cluster-

ing, a picture can be displayed which provides signiﬁcantly

more insight; compare Figure 3 and Figure 6. It is further

notable that the sequential structure of the data gives rise to

geographical features which would have been diﬃcult, if not

impossible, to extract based on solely a distance-based notion

of similarity as is employed in

-means. The S&P500 dataset

gives the least clear conclusions out of the four datasets but is a

good illustrative example for our evaluation tools in a diﬃcult

setting. The diﬃculty of this dataset is due to the combination

of sparsity and a nuisance factor. With these diﬀerent examples,

along with the evaluation tools developed, we answer positively

the highlighted questions posed previously, and we conclude

that the BMC-based clustering can be successfully applied to

real-world sequential data.

Let us ﬁnally contemplate the practical implication of our

ﬁndings as it pertains to using clustering within sequential data

as a method to speed up a subsequent optimization procedure.

Consider that canonical machine learning algorithms that one

would like to apply to sequential data, such as

-learning

SARSA

-learning, slow down to an impractical degree if

the observations are from a high-dimensional state space.

By solving such learning problems on an accurate lower-

dimensional representation of the state space, the numerical

complexity can be reduced dramatically [

], [

]. The present

paper suggest that future integration of clustering in sequential

data applications such as natural language processing and

machine learning is indeed practically feasible.

Structure of this paper

We introduce the problem of clustering in sequential data in

Section 2. We describe the BMC as well as other models that

appear in our experiments in Section 3, and brieﬂy discuss

the advantages of a model-based approach. Next, we give an

overview of related literature in Section 4, and we introduce

the clustering algorithm in Section 5. We describe there also

our C

implementation of this clustering algorithm, which we

made publicly available as a Python library. Section 6 describes

practical tools to evaluate clusters found in datasets in the

absence of knowledge on the underlying ground truth. Sec-

tion 7 introduces the datasets and explains our preprocessing

procedures; Sections 8, 9 then extensively evaluate the clusters.

Finally, Section 10 concludes with a brief summary.

2 Problem formulation

We suppose that we have obtained an ordered sequence of

`∈N+discrete observations

X1:`:= X1→X2→ ··· → X`(1)

from some complex process. The observations can be real

numbers (one- or higher-dimensional) or abstract system states;

as long as the observations come from a ﬁnite set. We assume

speciﬁcally that there exists a number

n∈N+

such that the

sequence satisﬁes

Xt∈

[

] :=

{

, . . . , n}

for

t∈

[

]. Here,

can

be interpreted as the number of distinct, discrete observations

that are possible.

Given such ordered sequence of observations, we wonder

whether there exists a map

σn

: [

]

→

[

]with 1

≤K≤n

integer, such that the ordered sequence

σn(X1:`) := σn(X1)→σn(X2)→ ··· → σn(X`)(2)

captures dynamics of the underlying complex process. Observe

that the map σndeﬁnes Kclusters:

Vk:= i∈[n]|σn(i) = k(3)

for

k∈

[

]. Furthermore,

Vk∩ Vl

∅

whenever

and

∪K

k=1Vk= [n].

The clusters

V1,...,VK

are particularly interesting when

Kn

. In such a case the clustered process

{σn

(

)

lives in a much smaller observation space than the original

process

{Xt}t

. The reduction may then prove to be beneﬁcial

for computational tasks since the time complexity of some

algorithms depends on the size of the observation space. If

(2)

furthermore indeed captures the dynamics of the complex

process, then it is not unreasonable to expect that the clusters

could themselves be meaningful thus allowing for human

interpretation of the data.

3 Models

3.1 Main model: BMC

The main model in this paper is given by BMCs. Formally, a

1st-order BMC is a discrete-time stochastic process

{Xt}t≥0

a state space V:= [n]that satisﬁes not only the MC property

P[Xt+1 =j|Xt=i, Xt−1=it−1,...X0=i0]

=P[Xt+1 =j|Xt=i]∀j, i, it−1, . . . , i0∈[n]; (4)

but also that there exists a cluster assignment map

σn

: [

]

→

[

]such that there exists a stochastic matrix

p∈RK×K

with

Pi,j := P[Xt+1 =j|Xt=i] = pσn(i),σn(j)

#Vσn(j)

.(5)

Here

is deﬁned as in

(3)

. The

pk,l ∈

1] are called the

cluster transition probabilities and satisfy

l=1 pk,l

= 1 for

k∈

[

]. The matrix (

Pi,j

)

i=1,j=1

is called the state transition

matrix. Figure 1 depicts a BMC on K= 3 clusters.

The BMC-model can be viewed as an ideal case for the

setup of

(2)

. The reduced process

{σn

(

)

namely not only

captures some part of the dynamics of the true process but

rather all of the order-dependent dynamics. Indeed, observe

that for any

t >

1it holds that conditional on

σn

(

) =

the

observation

is chosen uniformly at random in the cluster

. The previous state

Xt−1

hence inﬂuences the next cluster

σn

(

)but does not provide any further information about

the precise element in Vσn(Xt).

If the MC associated to

is ergodic, then the BMC has

a unique state equilibrium distribution Π

∈

. This state

equilibrium distribution moreover has the symmetry property

that Π

only depends on the cluster assignment

σn

(

)for any

j∈[n]. That is, for any initial value i0∈[n]

Πj:= lim

t→∞ P[Xt=j|X0=i0](6)

#Vσn(j)

lim

t→∞ P[σn(Xt) = σn(j)|σn(X0) = σn(i0)]

=: πσn(j)

#Vσn(j)

Note that

π∈

is here the cluster equilibrium distri-

bution of the MC on [

]which is associated to the cluster

transition matrix

. One can characterize

as the unique

column vector satisfying πTp=πand PK

k=1 πk= 1.

Fig. 1. A visualization of a BMC with

= 3 clusters and

[[0

0]]. The thick arrows visualize to the

cluster transition probabilities

pk,l

, while the thin arrows visualize the

transitions of a sample path {Xt}t. Figure courtesy of [5].

3.2 Other models for experimentation

Recall that one of our goals is to develop tools which aid in

evaluating whether the BMC model is an appropriate model.

In this setting it is oftentimes useful to have some alternative

models to compare with. The models that we have used are

collected here for easy reference.

3.2.1 0th-order BMCs

Let

K∈

[

]and consider an arbitrary probability distribution

: [

]

→

1]. A 0th-order BMC is then a BMC with

cluster transition matrix

pk,l

ηl

for all

k, l ∈

[

]. The

0th-order BMC will serve as a benchmark to assert whether

the structures we ﬁnd are actually due to the sequential nature

of the process and do not admit a simpler explanation.

Namely, observe that in a 0th-order BMC each next

sample

Xt+1

is independent of the previous sample

. A

0th-order BMC therefore generates sequences of independent

and identically distributed random variables. This is contrary

to a 1st-order BMC, which generates a sequence of dependent

random variables. The probability of a speciﬁc observation

does depend on the cluster of the observation, and speciﬁcally

is identical for every observation within that cluster.

3.2.2 rth-order MCs

Conversely, one can also consider models with higher-order

dependencies than the 1st-order BMC has.

Consider a discrete-time stochastic process

{Yt}`

t=1

(not

necessarily a MC) that satisﬁes Yt∈[n]for some n∈N+.

We say that

{Yt}t≥1

is an

th-order MC if and only if for

all t∈[`−r], all ir= (i1, . . . , ir)∈[n]rand j∈[n],

P[Yt+1 =j|Yt=ir, Yt−1=ir−1, . . . , Yt−r+1 =i1,(7)

Yt−r=st−r,,...,Y1=s1]

=P[Yt+1 =j|Yt=ir, Yt−1=ir−1, . . . , Yt−r+1 =i1] =: Pr

ir,j

for some transition matrix Pr∈[0,1]nr×n.

3.2.3 Perturbed BMCs

Finally, we consider an alternative model which concerns the

scenario where a BMC captures the dynamics only partially.

Speciﬁcally, a perturbed BMC mixes a 1st-order BMC on [

]

that has transition matrix

PBMC

with a generic 1st-order MC

on [

]that has transition matrix ∆by consideration of the MC

with transition matrix

PPerturbed := (1 −ε)PBMC +ε∆.(8)

The parameter

ε∈

1] measures how many transitions are

aﬀected by the non-BMC part ∆.

Concretely: let

{Bt}t≥0

denote a sequence of independent,

identically distributed Bernoulli random variables, each taking

the value 1with probability

. The perturbed BMC cor-

responds to the MC

{Xε

t}t≥0

whose conditional transition

probabilities are given by

P[Xε

t+1 =j|Xε

t=i, Bt=b] = (PBMC,ij if b= 0,

∆ij otherwise.(9)

In other words, a sequence

Xε

0→ ··· → Xε

from the perturbed

BMC is generated by randomly selecting either the transition

matrix

PBMC

of a 1st-order BMC, or the transition matrix ∆

of some other 1st-order MC, for each transition. Whenever we

use a perturbed BMC, we specify ∆on the spot.

3.3 Concerning model misspeciﬁcation

It is unlikely that the complex process

{Xt}t

(1)

is exactly

a BMC. One may consequently wonder about the dangers of

model misspeciﬁcation:

(a)

Is the clustering algorithm robust to violations of the

model assumption?

(b)

When concerned with executing a downstream task on

X1:`

, does the BMC model assumption provide any beneﬁt

when compared to models with fewer assumptions?

In this regard we would like to point out that the data which

we consider is not only complex but oftentimes also sparse. Let

us illustrate the principle by a numerical experiment whose

precise setup may be found in Supplement 12.

To model a violation of the model assumptions while

retaining a sensible notion of ground-truth communities we

considered the perturbed BMC model as deﬁned in Section 3.2.

Concerning (a), we ﬁnd that for small perturbation levels

it is still possible to exactly recover the underlying clusters

based on the construction of the algorithm; see Figure 2(a).

Concerning (b), we consider the scenario where the goal is

to estimate the underlying transition kernel

of a Markovian

process based on a sample path of length

; see Figure 2(b).

We ﬁnd that clustering worsens performance when

is large

because a lack of expressivity: the true kernel

is not exactly

a BMC-kernel. On the other hand, when

is small, the

clustering improves performance. This is because the reduction

in the number of parameters makes the algorithm less prone

to overﬁtting. The answer to (b) is thus that it can be

advantageous to rely on the BMC model assumption when

data is sparse.

4 Related literature

This section provides references that theoretically study the

problem of clustering in BMCs and MCs, which have taken

inspiration from references on the problem of community

detection in random graphs. We also provide related references

on clustering and time-series, as well as on state space reduction

in decision theoretical problems. Finally, we give references on

statistical tools that we employ.

Clustering in BMCs and MCs

Cluster detection in BMCs was studied theoretically in [

], [

This research has yielded an algorithm that provably detects

clusters from the shortest sample paths whenever possible [

As evident from the proofs, studying the spectrum of BMC can

yield sharp bounds for recoverability [

]. Spectral properties of

the random matrices constructed from sample paths of BMCs

were investigated further in [6], [5]. The more recent paper [5]

proves convergence of a bulk of singular values to a limiting

distribution in the dense regime

= Θ(

); a result that we

use and reﬁne in our experiments (see Section 6.4).

Other related clustering algorithms that also use spectral

decompositions to learn low-rank structures from trajectories

of MCs are studied in [

], [

]. Finally, for scenarios in which

observations of a dynamical system are switched by a MC with

low-rank structure, [

] provides a method for the problem of

recovering a latent transition model. A similar objective is

considered in [10] for long sample paths.

Community detection in random graphs

Community detection in random graphs, such as those pro-

duced by the Stochastic Block Model, is an active area of

research. The distinction with clustering in BMCs is that the

vertices within a single observation of a random graph are

clustered, instead of the observations within sequential data.

We refer the reader to [

] for an extensive overview on cluster

recovery within the context of the Stochastic Block Model, and

to [12] for an overview on community detection in graphs.

Other clustering of sequential data

In the reviews [

], [

], research that relates to both clustering

and time-series/sequential data is divided into three categories.

Clustering between diﬀerent time-series is called “whole-time-

series clustering.” Survey papers include [

], [

]. Another

category is clustering of subsequences of a time-series, where

individual time-series are extracted via a sliding window. In

2003 [

] it was shown that the algorithms present at that

time extracted essentially random and therefore meaningless

clusters, because they relied on assumptions unlikely to be

met by non-synthetic data. The papers [

], [

] are contribu-

tions to seeking to obtain meaningful subsequence time-series

clustering. Finally, there is clustering states of within a time-

series, which is called “time-point clustering,” under which for

example problems like segmenting an

-element sequence into

segments, which can come from

diﬀerent sources, fall; see

e.g. [

]. Other examples referenced in this category are [

[

]. This category is closest to the clustering algorithm that

we employ.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

Perturbation level ε

Expected misclassiﬁcation ratio E[E]

Stochastic Uniform

Degree 0

Heavy-Tailed

Sparse

(a)

104105106

10−1

100

Sample path length ℓ

Expected estimation error R∗(ℓ)

Empirical

BMC

Uniform

(b)

Fig. 2. (a) The expected proportion of states which are misclassiﬁed in terms of the perturbation level

for four diﬀerent perturbation models ∆and a

sample path of length

nln

(

)

with a state space of and of size

= 500. (b) Estimated expected estimation error

R∗

(

) :=

[

kP−ˆ

P∗

(

)

]

for three diﬀerent estimators. The ground-truth model in this experiment was a perturbed BMC with a heavy-tailed perturbation of strength

= 0

05 on a state space of size

= 1000. In red: the empirical estimator

PEmpirical

which is the maximum likelihood estimator for a Markov chain

with no additional assumptions. In blue: the BMC estimator

PBMC

. In green: the trivial estimator

PUniform,ij

:= 1

which does not even use the

data. Let us emphasize that the precise values in (a) and (b) depend on the precise parameters of the BMC which was perturbed. The general shape

of the curves is expected to generalize but the precise numbers, e.g. that a perturbation up to ε≈0.1is tolerable, will depend on the speciﬁcs.

State space reduction in decision theoretical problems

Studying clustering in MCs is motivated by the necessity for

eﬀective state space reduction techniques in decision theoretical

problems, for example in Reinforcement Learning, Markov

Decision Processs, and Multi-Armed Bandit problems. It is for

example known that learning a latent space reduces regret in

Multi-Armed Bandit problems [

], [

]. State aggregation

and low rank approximations methods have been studied

for Markov Decision Processes as well as Reinforcement

Learning, see [

] and [

], [

], respectively. The idea to

cluster states in Reinforcement Learning based on the process’

trajectory was ﬁrst explored in the seminal papers [3], [4].

A few related experiments from the ﬁelds of microbiology, natural

language processing, ethology, and ﬁnance

The ﬁrst application of a MC model was by Andrei Markov, as

he investigated a sequence of 20000 letters in A. S. Pushkin’s

poem “Eugeny Onegin” [

]. Since, MCs and hidden Markov

models have been used in natural language processing [29].

Clustering in DNA, speciﬁcally clustering the sequence

of nucleotides or the sequence of codons as a MC has been

demonstrated [

], [

], the current paper is the ﬁrst time

that a BMC was used for this task.

Using similar means as in the animal movement data in

this paper, GPS coordinate sequences for New York City taxi

trips are investigated in [

], [

]. For both examples the low-

dimensional representation reveals insight into taxi customer

and animal movement behavior, respectively. The taxi trip

data is however of quite diﬀerent nature compared to the

animal movement data, because in the taxi trip data far away

entrance and drop-oﬀ locations can constitute a transition,

while in the animal movement the transitions between states

are only due to an animal moving from one area to another in

a time intervals of roughly not more than an hour.

In [

] the transition between the Dow Jones closing prices

are described as a MC close to equilibrium. Further references

for MC models in ﬁnance include [

], [

]. Other Markovian

models, like Hidden Markov Models, are also often used in the

analysis of ﬁnancial data; see for example [36].

Related statistical tools

Based on a likelihood ratio test statistic and a chi-square test

statistic, [

] tests if two MCs on the same state space have

the same transition matrix. The same problem is considered in

[

] by considering a divergence test statistic. Further results

are discussed in [39, §3.4.2].

Testing for the order of a MC can be done by the chi-square-

and likelihood ratio-test statistic [

] and using

-divergence

test statistics [

]. Further hypothesis tests for Markovian

models with random time observations are considered in [

]

and speciﬁcally a goodness-of-ﬁt test on the distribution

is described. For selection of the order of a MC, general

information criteria may also be used [42], [43], [44].

5 Clustering algorithm

In this section we describe the clustering algorithm from [

]

which was designed to infer the map

σn

from the sample path of

a BMC. The reason we use this particular clustering algorithm

is that it has a mathematical guarantee that it can recover the

clusters of BMCs accurately even if the number of observations

is small compared to the number of possible transitions

. This is useful for our purposes because observations are

generally noisy and few in practice.

The clustering algorithm in [

] ﬁrst constructs an empir-

ical frequency matrix

element-wise from the sequence of

observations X1:`: for i, j ∈[n],

Nij :=

`−1

t=1

1[Xt=i, Xt+1 =j].(10)

Depending on the sparsity of the frequency matrix character-

ized by the ratio

`/n2

, regularization is applied by trimming:

all entries of rows and columns of

corresponding to a desired

number of states with the largest degrees, which we denote by

Γ, are set to zero. The clustering algorithm then executes two

steps on the resulting trimmed frequency matrix ˆ

NΓ:

Step 1. Initialize with a spectral clustering algorithm.

Step 2. Iterate with a cluster improvement algorithm.

We provide pseudocode for these algorithms in Supplement 11.

Given some initial guess, here provided by a spectral

algorithm, the cluster improvement algorithm consists of local

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1DetectionandEvaluationofClusterswithinSequentialDataAlexanderVanWerde,AlbertSenenCerda,GianlucaKosmella,JaronSandersFAbstractMotivatedbytheoreticaladvancementsindimensionalityreductiontechniquesweusearecentmodel,calledBlockMarkovChains,toconductapracticalstudyofclusteringinreal-worldsequentialdat...

展开>> 收起<<

1 Detection and Evaluation of Clusters within Sequential Data.pdf

共37页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Detection and Evaluation of Clusters within Sequential Data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: