1 Detection and Evaluation of Clusters within Sequential Data

2025-04-28 0 0 7.57MB 37 页 10玖币
侵权投诉
1
Detection and Evaluation of
Clusters within Sequential Data
Alexander Van Werde, Albert Senen–Cerda, Gianluca Kosmella, Jaron Sanders
F
Abstract
—Motivated by theoretical advancements in dimensionality
reduction techniques we use a recent model, called Block Markov
Chains, to conduct a practical study of clustering in real-world sequential
data. Clustering algorithms for Block Markov Chains possess theoretical
optimality guarantees and can be deployed in sparse data regimes. Despite
these favorable theoretical properties, a thorough evaluation of these
algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text,
animal movement data and financial markets. In order to evaluate the
determined clusters, and the associated Block Markov Chain model,
we further develop a set of evaluation tools. These tools include
benchmarking, spectral noise analysis and statistical model selection
tools. An efficient implementation of the clustering algorithm and the
new evaluation tools is made available together with this paper.
Practical challenges associated to real-world data are encountered
and discussed. It is ultimately found that the Block Markov Chain model
assumption, together with the tools developed here, can indeed produce
meaningful insights in exploratory data analyses despite the complexity
and sparsity of real-world data.
1 Introduction
Modern data often consists of observations that were obtained
from some complex process, and that became available se-
quentially. The specific order in which the observations occur
often matters: future observations correlate typically with past
observations. By identifying a relation between subsequent ob-
servations within sequential data one may hope to gain insight
into the underlying complex process. The high-dimensional
nature of modern data however can make understanding the
sequential structure difficult. Many algorithms which one
may want to apply to the data namely slow down to an
infeasible degree in a high-dimensional regime. Further, human
interpretation of the data may become tedious. To avoid these
issues it is desirable to identify a latent structure which respects
the sequential structure but has reduced dimensions.
In this paper we are concerned with uncovering lower-
dimensional structures within sequential data when this latent
structure is hidden from the observer but still detectable
from the specific sequence of observations. Practical tools
The authors are with the Faculty of Mathematics and Computer
Science, Eindhoven University of Technology, Groene Loper 5, 5612
AZ Eindhoven, The Netherlands.
E-mail: a.van.werde@tue.nl, a.senen.cerda@tue.nl,
g.k.kosmella@tue.nl, jaron.sanders@tue.nl
for the discovery, evaluation, and presentation of the lower-
dimensional structures from real-world sequential data are
developed and applied. The term “real-world data” is here
in contrast to synthetic data. For synthetic data one namely
often has prior knowledge concerning a ground-truth latent
structure, and this facilitates evaluation of latent structures
which are output by an algorithm. Such prior knowledge is not
available for real-world data, complicating the evaluation.
We focus on a popular class of methods for discovering
latent structure in datasets: clustering algorithms. Clustering
algorithms work by clustering together data points from a
dataset that are “similar” in some sense. If for instance data
has a geometric structure for which a notion of distance
is applicable then one may call two points similar if their
distance is small. This distance-based notion of similarity
can be leveraged with the well-known
K
-means algorithm
for clustering point clouds. If instead the data has a graph
structure then it is natural to call two vertices of the graph
similar if they connect to other vertices in similar ways. This
connection-based notion is made rigorous in the Stochastic
Block Model for random graphs.
A natural notion of similarity for sequential observations,
whose order matters, may similarly be given by the following
informal criterion: “two observations are similar if and only
if they transition into other observations in similar ways.
This transition-based notion is made formal in Block Markov
Chains (BMCs). Specifically, the BMC-model assumes that the
observations are the states of a Markov Chain (MC) in which
the state space can be partitioned in such a manner that the
transition rate between two states only depends on the parts
of the partition (clusters) in which these two states lie.
The problem of clustering the observations in a single
sequence of observations of a BMC was recently investigated
theoretically [
1
], [
2
]. For example, given a sample path gener-
ated by a BMC, the authors of [
2
] provide information-theoretic
thresholds below which exact clustering is impossible because
insufficient data is available. The authors also provide a
clustering algorithm which can provably recover the underlying
clusters whenever these conditions are satisfied. The clustering
algorithm consists of two steps. First, an initial guess for the
underlying cluster structure is found by means of a spectral
initialization. Here the term spectral refers to the fact that
singular vectors of a random matrix associated to the sample
path are employed. These singular vectors are used to construct
a low-rank approximation of the random matrix after which
a
K
-means algorithm is applied. Second, the initial guess for
arXiv:2210.01679v1 [cs.LG] 4 Oct 2022
2
the cluster structure is refined by means of an improvement
algorithm that reconsiders the sequence of observations and
performs a greedy, local maximization of the log-likelihood
function of the BMC model.
A broad study on the performance of this clustering
algorithm for sequential data obtained from real-world pro-
cesses was however not provided. We tackle this problem
by investigating the clustering algorithm under a variety of
scenarios, and thus demonstrating its applicability in real-
world data. This is the main contribution of the present paper.
Evaluating the actual performance of such clustering algorithm
in a real-world scenario is however a nontrivial task since the
latent cluster structure is unknown, as opposed to a scenario
where data is generated synthetically. Hence, we also propose a
set of tools used for evaluation of the quality of the clusters and
the BMC model. We go, therefore, a step beyond clustering
benchmark evaluation and use statistical tools to assess the
validity of the BMC assumption in the data.
Our goal is here not to compare the performance of the
BMC clustering algorithm relative to other algorithms. A
comparison would namely only be fair if both algorithms have
access to the same amount of training data. Now remark
that the BMC algorithm is explicitly designed to manage in
sparse regimes where the amount of data is small. On the
other hand most model-free algorithms, such as those based
on deep learning, perform optimally in a regime where one has
access to large amounts of training data. A comparison would
consequently be vacuous; its outcome mainly being determined
by the choice of the amount of training data. Our goal is rather
to supplement the theoretical understanding of the BMC-based
algorithm with a practical viewpoint.
In order to practically evaluate the clustering algorithm for
BMCs, we investigate the following questions:
How can the BMC model practically aid in data explo-
ration of nonsynthetic sequential data obtained from real-
world data?
Can the quality of the found clusters be evaluated when
no ground-truth clustering is known?
How can one statistically decide whether the BMC model
is an appropriate model for the sequence of observations?
How can it be detected that either a simpler model than a
BMC would suffice, or a richer model is required?
Can the algorithm be expected to give meaningful results
despite the sparsity and complexity of real-world data?
Are the clustering algorithms robust to model violations?
To obtain insight into these questions, we develop practical
tools for evaluating the merit of a detected clustering, which
we then apply to several data sets. These evaluation tools
are designed to cope with the aforementioned fact that
ground-truth clusters are not known to us. The datasets
which we consider are diverse as they come from the fields
of microbiology, natural language processing, ethology, and
finance. Specifically, we investigate sequences of:
a. Codons in human Deoxyribonucleic Acid (DNA).
b. Words in Wikipedia articles.
c.
Global Positioning System (GPS) data describing the
movement of bison in a field.
d.
Companies in the Standard and Poor’s 500 (S&P500) with
the highest daily return.
On each dataset we apply the BMC clustering algorithm to
uncover underlying clusters, and we evaluate these clusters
using the collection of tools which are developed in the present
paper. It is here found that the BMC model indeed aids in
exploratory data analyses of real-world sequential data.
For example, in DNA the algorithm leads us to rediscover
phenomena which are known in the biological community
as codon–pair bias and dinucleotide bias. In the text-based
sequential data we find that the BMC-based improvement
algorithm improves performance on downstream natural lan-
guage processing tasks. The model evaluation tools of the
present paper are here found to be informative. They namely
uncover some model violations which are suggestive for future
methodological expansion. Our findings in the GPS data are
particularly striking. There, a scatter plot of the data gives
rise to a picture which is difficult to interpret. After cluster-
ing, a picture can be displayed which provides significantly
more insight; compare Figure 3 and Figure 6. It is further
notable that the sequential structure of the data gives rise to
geographical features which would have been difficult, if not
impossible, to extract based on solely a distance-based notion
of similarity as is employed in
K
-means. The S&P500 dataset
gives the least clear conclusions out of the four datasets but is a
good illustrative example for our evaluation tools in a difficult
setting. The difficulty of this dataset is due to the combination
of sparsity and a nuisance factor. With these different examples,
along with the evaluation tools developed, we answer positively
the highlighted questions posed previously, and we conclude
that the BMC-based clustering can be successfully applied to
real-world sequential data.
Let us finally contemplate the practical implication of our
findings as it pertains to using clustering within sequential data
as a method to speed up a subsequent optimization procedure.
Consider that canonical machine learning algorithms that one
would like to apply to sequential data, such as
Q
-learning
or
SARSA
-learning, slow down to an impractical degree if
the observations are from a high-dimensional state space.
By solving such learning problems on an accurate lower-
dimensional representation of the state space, the numerical
complexity can be reduced dramatically [
3
], [
4
]. The present
paper suggest that future integration of clustering in sequential
data applications such as natural language processing and
machine learning is indeed practically feasible.
Structure of this paper
We introduce the problem of clustering in sequential data in
Section 2. We describe the BMC as well as other models that
appear in our experiments in Section 3, and briefly discuss
the advantages of a model-based approach. Next, we give an
overview of related literature in Section 4, and we introduce
the clustering algorithm in Section 5. We describe there also
our C
++
implementation of this clustering algorithm, which we
made publicly available as a Python library. Section 6 describes
practical tools to evaluate clusters found in datasets in the
absence of knowledge on the underlying ground truth. Sec-
tion 7 introduces the datasets and explains our preprocessing
procedures; Sections 8, 9 then extensively evaluate the clusters.
Finally, Section 10 concludes with a brief summary.
3
2 Problem formulation
We suppose that we have obtained an ordered sequence of
`N+discrete observations
X1:`:= X1X2→ ··· → X`(1)
from some complex process. The observations can be real
numbers (one- or higher-dimensional) or abstract system states;
as long as the observations come from a finite set. We assume
specifically that there exists a number
nN+
such that the
sequence satisfies
Xt
[
n
] :=
{
1
, . . . , n}
for
t
[
`
]. Here,
n
can
be interpreted as the number of distinct, discrete observations
that are possible.
Given such ordered sequence of observations, we wonder
whether there exists a map
σn
: [
n
]
[
K
]with 1
Kn
an
integer, such that the ordered sequence
σn(X1:`) := σn(X1)σn(X2)→ ··· → σn(X`)(2)
captures dynamics of the underlying complex process. Observe
that the map σndefines Kclusters:
Vk:= i[n]|σn(i) = k(3)
for
k
[
K
]. Furthermore,
Vk∩ Vl
=
whenever
k6
=
l
and
K
k=1Vk= [n].
The clusters
V1,...,VK
are particularly interesting when
Kn
. In such a case the clustered process
{σn
(
Xt
)
}t
lives in a much smaller observation space than the original
process
{Xt}t
. The reduction may then prove to be beneficial
for computational tasks since the time complexity of some
algorithms depends on the size of the observation space. If
(2)
furthermore indeed captures the dynamics of the complex
process, then it is not unreasonable to expect that the clusters
Vk
could themselves be meaningful thus allowing for human
interpretation of the data.
3 Models
3.1 Main model: BMC
The main model in this paper is given by BMCs. Formally, a
1st-order BMC is a discrete-time stochastic process
{Xt}t0
on
a state space V:= [n]that satisfies not only the MC property
P[Xt+1 =j|Xt=i, Xt1=it1,...X0=i0]
=P[Xt+1 =j|Xt=i]j, i, it1, . . . , i0[n]; (4)
but also that there exists a cluster assignment map
σn
: [
n
]
[
K
]such that there exists a stochastic matrix
pRK×K
with
Pi,j := P[Xt+1 =j|Xt=i] = pσn(i)n(j)
#Vσn(j)
.(5)
Here
Vk
is defined as in
(3)
. The
pk,l
[0
,
1] are called the
cluster transition probabilities and satisfy
PK
l=1 pk,l
= 1 for
k
[
K
]. The matrix (
Pi,j
)
n
i=1,j=1
is called the state transition
matrix. Figure 1 depicts a BMC on K= 3 clusters.
The BMC-model can be viewed as an ideal case for the
setup of
(2)
. The reduced process
{σn
(
Xt
)
}t
namely not only
captures some part of the dynamics of the true process but
rather all of the order-dependent dynamics. Indeed, observe
that for any
t >
1it holds that conditional on
σn
(
Xt
) =
k
the
observation
Xt
is chosen uniformly at random in the cluster
Vk
. The previous state
Xt1
hence influences the next cluster
σn
(
Xt
)but does not provide any further information about
the precise element in Vσn(Xt).
If the MC associated to
p
is ergodic, then the BMC has
a unique state equilibrium distribution Π
[0
,
1]
n
. This state
equilibrium distribution moreover has the symmetry property
that Π
j
only depends on the cluster assignment
σn
(
j
)for any
j[n]. That is, for any initial value i0[n]
Πj:= lim
t→∞ P[Xt=j|X0=i0](6)
=1
#Vσn(j)
lim
t→∞ P[σn(Xt) = σn(j)|σn(X0) = σn(i0)]
=: πσn(j)
#Vσn(j)
.
Note that
π
[0
,
1]
K
is here the cluster equilibrium distri-
bution of the MC on [
K
]which is associated to the cluster
transition matrix
p
. One can characterize
π
as the unique
column vector satisfying πTp=πand PK
k=1 πk= 1.
Fig. 1. A visualization of a BMC with
K
= 3 clusters and
p
=
[[0
.
9
,
0
.
1
,
0]
,
[0
,
0
.
1
,
0
.
9]
,
[0
.
3
,
0
.
7
,
0]]. The thick arrows visualize to the
cluster transition probabilities
pk,l
, while the thin arrows visualize the
transitions of a sample path {Xt}t. Figure courtesy of [5].
3.2 Other models for experimentation
Recall that one of our goals is to develop tools which aid in
evaluating whether the BMC model is an appropriate model.
In this setting it is oftentimes useful to have some alternative
models to compare with. The models that we have used are
collected here for easy reference.
3.2.1 0th-order BMCs
Let
K
[
n
]and consider an arbitrary probability distribution
η
: [
K
]
[0
,
1]. A 0th-order BMC is then a BMC with
cluster transition matrix
pk,l
:=
ηl
for all
k, l
[
K
]. The
0th-order BMC will serve as a benchmark to assert whether
the structures we find are actually due to the sequential nature
of the process and do not admit a simpler explanation.
Namely, observe that in a 0th-order BMC each next
sample
Xt+1
is independent of the previous sample
Xt
. A
0th-order BMC therefore generates sequences of independent
and identically distributed random variables. This is contrary
to a 1st-order BMC, which generates a sequence of dependent
random variables. The probability of a specific observation
does depend on the cluster of the observation, and specifically
is identical for every observation within that cluster.
3.2.2 rth-order MCs
Conversely, one can also consider models with higher-order
dependencies than the 1st-order BMC has.
Consider a discrete-time stochastic process
{Yt}`
t=1
(not
necessarily a MC) that satisfies Yt[n]for some nN+.
4
We say that
{Yt}t1
is an
r
th-order MC if and only if for
all t[`r], all ir= (i1, . . . , ir)[n]rand j[n],
P[Yt+1 =j|Yt=ir, Yt1=ir1, . . . , Ytr+1 =i1,(7)
Ytr=str,,...,Y1=s1]
=P[Yt+1 =j|Yt=ir, Yt1=ir1, . . . , Ytr+1 =i1] =: Pr
ir,j
for some transition matrix Pr[0,1]nr×n.
3.2.3 Perturbed BMCs
Finally, we consider an alternative model which concerns the
scenario where a BMC captures the dynamics only partially.
Specifically, a perturbed BMC mixes a 1st-order BMC on [
n
]
that has transition matrix
PBMC
with a generic 1st-order MC
on [
n
]that has transition matrix by consideration of the MC
with transition matrix
PPerturbed := (1 ε)PBMC +ε.(8)
The parameter
ε
[0
,
1] measures how many transitions are
affected by the non-BMC part .
Concretely: let
{Bt}t0
denote a sequence of independent,
identically distributed Bernoulli random variables, each taking
the value 1with probability
ε
. The perturbed BMC cor-
responds to the MC
{Xε
t}t0
whose conditional transition
probabilities are given by
P[Xε
t+1 =j|Xε
t=i, Bt=b] = (PBMC,ij if b= 0,
ij otherwise.(9)
In other words, a sequence
Xε
0→ ··· → Xε
`
from the perturbed
BMC is generated by randomly selecting either the transition
matrix
PBMC
of a 1st-order BMC, or the transition matrix
of some other 1st-order MC, for each transition. Whenever we
use a perturbed BMC, we specify on the spot.
3.3 Concerning model misspecification
It is unlikely that the complex process
{Xt}t
in
(1)
is exactly
a BMC. One may consequently wonder about the dangers of
model misspecification:
(a)
Is the clustering algorithm robust to violations of the
model assumption?
(b)
When concerned with executing a downstream task on
X1:`
, does the BMC model assumption provide any benefit
when compared to models with fewer assumptions?
In this regard we would like to point out that the data which
we consider is not only complex but oftentimes also sparse. Let
us illustrate the principle by a numerical experiment whose
precise setup may be found in Supplement 12.
To model a violation of the model assumptions while
retaining a sensible notion of ground-truth communities we
considered the perturbed BMC model as defined in Section 3.2.
Concerning (a), we find that for small perturbation levels
ε
it is still possible to exactly recover the underlying clusters
based on the construction of the algorithm; see Figure 2(a).
Concerning (b), we consider the scenario where the goal is
to estimate the underlying transition kernel
P
of a Markovian
process based on a sample path of length
`
; see Figure 2(b).
We find that clustering worsens performance when
`
is large
because a lack of expressivity: the true kernel
P
is not exactly
a BMC-kernel. On the other hand, when
`
is small, the
clustering improves performance. This is because the reduction
in the number of parameters makes the algorithm less prone
to overfitting. The answer to (b) is thus that it can be
advantageous to rely on the BMC model assumption when
data is sparse.
4 Related literature
This section provides references that theoretically study the
problem of clustering in BMCs and MCs, which have taken
inspiration from references on the problem of community
detection in random graphs. We also provide related references
on clustering and time-series, as well as on state space reduction
in decision theoretical problems. Finally, we give references on
statistical tools that we employ.
Clustering in BMCs and MCs
Cluster detection in BMCs was studied theoretically in [
1
], [
2
].
This research has yielded an algorithm that provably detects
clusters from the shortest sample paths whenever possible [
2
].
As evident from the proofs, studying the spectrum of BMC can
yield sharp bounds for recoverability [
2
]. Spectral properties of
the random matrices constructed from sample paths of BMCs
were investigated further in [6], [5]. The more recent paper [5]
proves convergence of a bulk of singular values to a limiting
distribution in the dense regime
`
= Θ(
n2
); a result that we
use and refine in our experiments (see Section 6.4).
Other related clustering algorithms that also use spectral
decompositions to learn low-rank structures from trajectories
of MCs are studied in [
7
], [
8
]. Finally, for scenarios in which
observations of a dynamical system are switched by a MC with
low-rank structure, [
9
] provides a method for the problem of
recovering a latent transition model. A similar objective is
considered in [10] for long sample paths.
Community detection in random graphs
Community detection in random graphs, such as those pro-
duced by the Stochastic Block Model, is an active area of
research. The distinction with clustering in BMCs is that the
vertices within a single observation of a random graph are
clustered, instead of the observations within sequential data.
We refer the reader to [
11
] for an extensive overview on cluster
recovery within the context of the Stochastic Block Model, and
to [12] for an overview on community detection in graphs.
Other clustering of sequential data
In the reviews [
13
], [
14
], research that relates to both clustering
and time-series/sequential data is divided into three categories.
Clustering between different time-series is called “whole-time-
series clustering.” Survey papers include [
14
], [
15
]. Another
category is clustering of subsequences of a time-series, where
individual time-series are extracted via a sliding window. In
2003 [
16
] it was shown that the algorithms present at that
time extracted essentially random and therefore meaningless
clusters, because they relied on assumptions unlikely to be
met by non-synthetic data. The papers [
17
], [
18
] are contribu-
tions to seeking to obtain meaningful subsequence time-series
clustering. Finally, there is clustering states of within a time-
series, which is called “time-point clustering,” under which for
example problems like segmenting an
n
-element sequence into
k
segments, which can come from
h
different sources, fall; see
e.g. [
19
]. Other examples referenced in this category are [
20
],
[
21
]. This category is closest to the clustering algorithm that
we employ.
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
Perturbation level ε
Expected misclassification ratio E[E]
Stochastic Uniform
Degree 0
Heavy-Tailed
Sparse
(a)
104105106
101
100
Sample path length
Expected estimation error R()
Empirical
BMC
Uniform
(b)
Fig. 2. (a) The expected proportion of states which are misclassified in terms of the perturbation level
ε
for four different perturbation models and a
sample path of length
`n
=
b
30
nln
(
n
)
c
with a state space of and of size
n
= 500. (b) Estimated expected estimation error
R
(
`
) :=
E
[
kPˆ
P
(
`
)
k
]
for three different estimators. The ground-truth model in this experiment was a perturbed BMC with a heavy-tailed perturbation of strength
ε
= 0
.
05 on a state space of size
n
= 1000. In red: the empirical estimator
ˆ
PEmpirical
which is the maximum likelihood estimator for a Markov chain
with no additional assumptions. In blue: the BMC estimator
ˆ
PBMC
. In green: the trivial estimator
ˆ
PUniform,ij
:= 1
/n
which does not even use the
data. Let us emphasize that the precise values in (a) and (b) depend on the precise parameters of the BMC which was perturbed. The general shape
of the curves is expected to generalize but the precise numbers, e.g. that a perturbation up to ε0.1is tolerable, will depend on the specifics.
State space reduction in decision theoretical problems
Studying clustering in MCs is motivated by the necessity for
effective state space reduction techniques in decision theoretical
problems, for example in Reinforcement Learning, Markov
Decision Processs, and Multi-Armed Bandit problems. It is for
example known that learning a latent space reduces regret in
Multi-Armed Bandit problems [
22
], [
23
]. State aggregation
and low rank approximations methods have been studied
for Markov Decision Processes as well as Reinforcement
Learning, see [
24
] and [
25
], [
26
], [
27
], respectively. The idea to
cluster states in Reinforcement Learning based on the process’
trajectory was first explored in the seminal papers [3], [4].
A few related experiments from the fields of microbiology, natural
language processing, ethology, and finance
The first application of a MC model was by Andrei Markov, as
he investigated a sequence of 20000 letters in A. S. Pushkin’s
poem “Eugeny Onegin” [
28
]. Since, MCs and hidden Markov
models have been used in natural language processing [29].
Clustering in DNA, specifically clustering the sequence
of nucleotides or the sequence of codons as a MC has been
demonstrated [
30
], [
31
], [
32
], the current paper is the first time
that a BMC was used for this task.
Using similar means as in the animal movement data in
this paper, GPS coordinate sequences for New York City taxi
trips are investigated in [
1
], [
8
], [
5
]. For both examples the low-
dimensional representation reveals insight into taxi customer
and animal movement behavior, respectively. The taxi trip
data is however of quite different nature compared to the
animal movement data, because in the taxi trip data far away
entrance and drop-off locations can constitute a transition,
while in the animal movement the transitions between states
are only due to an animal moving from one area to another in
a time intervals of roughly not more than an hour.
In [
33
] the transition between the Dow Jones closing prices
are described as a MC close to equilibrium. Further references
for MC models in finance include [
34
], [
35
]. Other Markovian
models, like Hidden Markov Models, are also often used in the
analysis of financial data; see for example [36].
Related statistical tools
Based on a likelihood ratio test statistic and a chi-square test
statistic, [
37
] tests if two MCs on the same state space have
the same transition matrix. The same problem is considered in
[
38
] by considering a divergence test statistic. Further results
are discussed in [39, §3.4.2].
Testing for the order of a MC can be done by the chi-square-
and likelihood ratio-test statistic [
37
] and using
φ
-divergence
test statistics [
40
]. Further hypothesis tests for Markovian
models with random time observations are considered in [
41
]
and specifically a goodness-of-fit test on the distribution
is described. For selection of the order of a MC, general
information criteria may also be used [42], [43], [44].
5 Clustering algorithm
In this section we describe the clustering algorithm from [
2
]
which was designed to infer the map
σn
from the sample path of
a BMC. The reason we use this particular clustering algorithm
is that it has a mathematical guarantee that it can recover the
clusters of BMCs accurately even if the number of observations
`
is small compared to the number of possible transitions
n2
. This is useful for our purposes because observations are
generally noisy and few in practice.
The clustering algorithm in [
2
] first constructs an empir-
ical frequency matrix
ˆ
N
element-wise from the sequence of
observations X1:`: for i, j [n],
ˆ
Nij :=
`1
X
t=1
1[Xt=i, Xt+1 =j].(10)
Depending on the sparsity of the frequency matrix character-
ized by the ratio
`/n2
, regularization is applied by trimming:
all entries of rows and columns of
ˆ
N
corresponding to a desired
number of states with the largest degrees, which we denote by
Γ, are set to zero. The clustering algorithm then executes two
steps on the resulting trimmed frequency matrix ˆ
NΓ:
Step 1. Initialize with a spectral clustering algorithm.
Step 2. Iterate with a cluster improvement algorithm.
We provide pseudocode for these algorithms in Supplement 11.
Given some initial guess, here provided by a spectral
algorithm, the cluster improvement algorithm consists of local
摘要:

1DetectionandEvaluationofClusterswithinSequentialDataAlexanderVanWerde,AlbertSenenCerda,GianlucaKosmella,JaronSandersFAbstractMotivatedbytheoreticaladvancementsindimensionalityreductiontechniquesweusearecentmodel,calledBlockMarkovChains,toconductapracticalstudyofclusteringinreal-worldsequentialdat...

展开>> 收起<<
1 Detection and Evaluation of Clusters within Sequential Data.pdf

共37页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:37 页 大小:7.57MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 37
客服
关注