Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator Marie Du Roy de Chaumaray marie.du-roy-de-chaumarayuniv-rennes2.fr

2025-04-29 0 0 619.49KB 31 页 10玖币

侵权投诉

Estimation of the Order of Non-Parametric Hidden Markov

Models using the Singular Values of an Integral Operator

Marie Du Roy de Chaumaray marie.du-roy-de-chaumaray@univ-rennes2.fr

Mathematical Research Institute of Rennes IRMAR

Rennes University, Rennes, France

Salima El Kolei salima.el-kolei@ensai.fr

Univ. Rennes, Ensai, CNRS, CREST

UMR 9194, F-35000 Rennes, France

Marie-Pierre Etienne marie-pierre.etienne@institut-agro.fr

Mathematical Research Institute of Rennes IRMAR

Rennes University, Rennes, France

Matthieu Marbac matthieu.marbac-lourdelle@ensai.fr

Univ. Rennes, Ensai, CNRS, CREST

UMR 9194, F-35000 Rennes, France

Abstract

Interested in estimating the order of a ﬁnite-state Hidden Markov Model (HMM) with nonpara-

metric emission distributions, a new method that only requires full rank transition matrix and not

linearly dependent emission distributions is introduced. This method relies on the equality between

the order of the HMM and the rank of a speciﬁc integral operator. Since only the empirical counter-

part of the singular values of the operator can be obtained, a thresholding procedure is proposed.

At a non-asymptotic level, an upper-bound on the probability of overestimating the order of the

HMM is provided. At an asymptotic level, the consistency of the estimator is established. In addi-

tion a general heuristic that can be successfully applied to several problems in spectral analysis for

designing a data-driven procedure for the threshold is introduced. The approach has the advantage

of not requiring any knowledge of an upper-bound on the order of the HMM. Moreover, diﬀerent

types of data (including circular or mixed-type data) can be managed. The relevance of the ap-

proach is illustrated on numerical experiments and on real data considering multivariate data with

directional variables.

Keywords: Hidden Markov models, Latent state model, Model selection, Non-parametric esti-

mation.

1 Introduction

A discrete-time homogeneous hidden Markov model (HMM) deﬁnes the distribution of an observed

process (Yt)t∈Nand a latent process (Xt)t∈Nsuch that the sequence of unobserved states (Xt)t∈N

follows a Markov chain and the observations (Yt)t∈Nare independent given the state sequence

(Xt)t∈N.The conditional distribution of Yt, called emission distribution, only depend on the current

state Xt. This paper focuses on ﬁnite state HMMs where the latent process has a ﬁnite state

space {1, . . . , L}, the integer Lbeing called the order of the HMM. In this framework, the model

is completely described by the order L, the initial distribution and the transition matrix of the

hidden chain, and the emission distributions. Since the marginal distribution of each Ytis a ﬁnite

mixture model, ﬁnite state HMMs can be seen as an extension of ﬁnite mixture models where the

©?? Marie Du Roy de Chaumaray, Salima El Kolei, Marie-Pierre Etienne and Matthieu Marbac.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/.

arXiv:2210.03559v2 [math.ST] 27 Nov 2023

assumption of independence between observations is relaxed (i.e., Ytand Yt′are not independent).

HMMs are a popular tool for modeling the dependency structure for univariate and multivariate

processes driven by a latent Markov chain (see Juang and Rabiner (1991); Yang et al. (1995); Krogh

et al. (2001); Choo et al. (2004); Zucchini and MacDonald (2009) for examples of applications).

They also provide tractable models for circular time series widely used in biology, meteorology and

climate applications to model for instance the speed and the direction of wind, ocean current, or

animal movements (see Holzmann et al. (2006); Bulla et al. (2012); Mastrantonio and Calise (2016)).

Inferring the right order of the latent chain is an important issue, which precedes the estimation of

the model parameters and their interpretation. This paper focuses on the estimation of the order

Lfrom univariate and multivariate data (Yt)t∈Nin a non-parametric setting. The estimation of L

is indeed achieved without any parametric assumption on the emission distributions, since we only

require the linear independence between their probability distribution functions.

Initial developments on HMMs have been made in a parametric framework, which considers

that the emission distributions belong to some given parametric distribution family. Consider-

ing the order of the HMM as known, the existing literature provides the parameter identiﬁability

(Petrie, 1969), the algorithm for assessing the maximum likelihood estimator (MLE; Baum et al.

(1970)), the consistency of the MLE (Leroux, 1992) and its asymptotic normality (Bickel et al.,

1998). The identiﬁcation of the order is more challenging and represents a diﬃcult task, mainly

because of a loss of identiﬁability of the model parameters when the order is overestimated. The

standard assumptions used to control the likelihood ratio test statistics, are thus not satisﬁed when

the order is overestimated. For instance, Gassiat and Keribin (2000) show that this statistic can

diverge even for bounded parameters. Note that this issue already appears when estimating the

number of components in parametric ﬁnite mixture models (Ciuperca, 2002). Therefore, the order

of parametric HMMs can be estimated by homogeneous tests (Holzmann and Schwaiger, 2016),

penalized likelihood approaches (Volant et al., 2014) or cross-validation approaches (Celeux and

Durand, 2008). Using tools from information theory, Gassiat and Boucheron (2003) have shown

the strong consistency of the estimator of the order obtained by penalized maximum likelihood.

Moreover, Bayesian approaches can be used by penalizing the likelihood and thus avoiding the

issues due to the lack of identiﬁability of the parameters when the order is overestimated (Gassiat

and Rousseau, 2014). Alternatively, Robert et al. (2000) propose a Bayesian inference of the order

Lthrough a reversible jump Markov Chain Monte Carlo method (MCMC). In Chopin (2007), the

author also proposes a Bayesian strategy based on sequential Monte Carlo ﬁlter and MCMC. How-

ever, all these approaches consider parametric emission distributions but it is not always possible

to restrict the model to such a convenient ﬁnite-dimensional space. Moreover they provide biased

results when their parametric assumptions are violated. In such cases, non-parametric approaches

can be used to model the emission distributions.

Non-parametric HMMs have been proved to be useful in a wide range of applications (see Zhao

(2011) for ﬁnancial applications, Couvreur and Couvreur (2000) for voice activity detection, Lam-

bert et al. (2003) for climate state identiﬁcation and Yau et al. (2011) for genomic applications).

Nevertheless, identiﬁability of the parameters of ﬁnite state HMMs with non-parametric emission

distributions has been investigated recently. Gassiat and Rousseau (2016) consider the case of

translation HMMs. They show that all the model parameters (including the inﬁnite dimensional

parameters) are identiﬁable as soon as the matrix that deﬁnes the joint distribution of two consec-

utive latent variables, is non-singular and the translation parameters are distinct. Note that their

conditions are weaker than those used to obtain identiﬁability for location-scale mixture models. In-

deed, for the latter, constraints must be added such as considering symmetric distributions (Hunter

et al., 2007). This additional assumption is no longer required for translation HMMs because of

the dependency between a pair of consecutive observations. Based on the results of parameter

identiﬁability for a mixture of products of univariate distributions (Allman et al., 2009), Gassiat

et al. (2016) state weaker suﬃcient conditions for parameter identiﬁability since they consider a

full rank transition matrix of the latent chain and linearly independent emission probability dis-

tributions. The method introduced by the present paper for estimating the order of an HMM, is

developed under these assumptions. Note that the assumptions made on the emission distributions

have been weakened again by Alexandrovich et al. (2016) since they only require that the emission

distributions are diﬀerent.

To estimate the (ﬁnite and inﬁnite dimensional) parameters of non-parametric HMMs, kernel-

based (Bonhomme et al., 2016b) or wavelet-based (Jin and Mokhtarian, 2006) approaches can be

used. Alternatively, Bonhomme et al. (2016b) and De Castro et al. (2017) extended the spectral

method proposed by Hsu et al. (2012) for estimating parametric HMMs, in order to deal with

a non-parametric framework. However, all these methods are developed for a known order of

the HMM. Estimating the order of a generic non-parametric HMM is still a challenging problem

and to the best of our knowledge Leh´ericy (2019) is the only paper to consider this problem in

this non-parametric setting. The author proposes two methods that provide strongly consistent

estimators of the order of the HMM. The ﬁrst method considers a minimization of a penalized

least-square criterion that relies on a projection of the emission distributions onto a family of

nested parametric subspaces. For each subspace and each number of latent states, the criterion

used for model selection is computed by minimizing the empirical counterpart of the penalized

L2distance. Thus, the method provides an estimator of the order of the HMM together with

estimators of the emission distributions. The second method uses an estimator of the rank of

a matrix computed from the distribution of a pair of consecutive observations. More precisely,

this method relies on a spectral approach applied on the matrix containing the coordinates of

the density of a pair of consecutive observations in some orthonormal basis. Thus, this method

could be seen as an extension of the spectral method described in the Section 5 of Supplementary

material of Bonhomme et al. (2016a) to HMM. These two methods are complementary in practice.

Indeed, numerical experiments presented in Leh´ericy (2019) show that the penalized least-square

method is more eﬃcient for moderate sample sizes. Indeed, the non-convex criterion raises many

problems for the minimization, in practice. To overcome this diﬃculty, the author proposes to use

an approximate minimization algorithm (see Hansen and Auger (2011)) that requires a good initial

condition since it might otherwise remain trapped in a local minima which renders this approach

time-consuming for multivariate data and large sample. Furthermore, considering all the subspaces

and all the possible numbers of latent states makes this method computationally greedy. Therefore,

the spectral method should be considered for large sample sizes. Both methods involve an unknown

tuning parameter (i.e., constant in the penalty term of the penalized criterion and threshold for

the spectral method) but also choices of the subspaces (i.e., family of nested parametric subspaces

or the orthonormal basis) that can highly impact the results (see our numerical experiments).

This paper introduces a new simple method for selecting the order of a non-parametric HMM,

by using the rank of an integral operator relying on the distribution of a pair of consecutive obser-

vations. This approach is inspired by the one proposed in Kwon and Mbakop (2021) to estimate

the number of components in nonparametric i.i.d. mixture models but we go further in adapting

this framework for dependent and latent observations leading to new and diﬀerent theoretical re-

sults. Furthermore, we propose a more general heuristics for designing a data-driven procedure

which can be successfully applied to several problems in spectral analysis and give theoretical guar-

antees. The interest of this approach from integral operators lies in the fact that unlike most of

the spectral methods based on noisy matrices (Bonhomme et al., 2016b; De Castro et al., 2017;

Leh´ericy, 2019), the method does not require any choices of a functional basis or its number of

elements. Hence, the proposed method does not require any knowledge of an upper bound of the

order of the HMM. Moreover, diﬀerent types of data (including circular or mixed-type data) can

be managed. Since the distribution of the pair of consecutive observations is estimated with kernel

method, only the empirical counter-part of the singular values of the operator can be obtained, we

propose to use our new data-driven method for the thresholding procedure. At a non-asymptotic

level, an upper-bound on the probability of overestimating the order of the HMM is provided. At

an asymptotic level, the consistency of the estimator is established. The control at non-asymptotic

and asymptotic levels are obtained by a concentration inequality of the Hilbert-Schmidt norm of

the empirical version of the operator. The statistical tools needed to establish these results diﬀer

from those used in Kwon and Mbakop (2021), and consequently the results are diﬀerent. Thus,

using concentration results speciﬁc to Markov chains, a concentration inequality is obtained by

considering a sum of two terms, where one term does not depend on the bandwidth and the second

term does not depend on the probability of overestimating the order. Note that the bound obtained

in the i.i.d. context considers a product between the bandwidth, the probability of overestimating

the order and the sample size. This bound contains only terms that depends on the kernel and the

bandwidth. Contrary to this setting and because of the dependency of observations the concentra-

tion inequality that we obtain depends on some unknown constant of the HMM (e.g., the mixing

time). To circumvent this issue and practical convenience, we propose a data-driven procedure

based on an unsupervised classiﬁcation of the singular values of the operator and computed on

mini-batches, for estimating the constant in the concentration inequality. Note that in Leh´ericy

(2019) the model selection for the spectral method is also based on a thresholding rule applied on

the singular values whose choice is a delicate issue since it depends on the functional basis and on

the number of elements. Hence, in his paper the author proposes an empirical method based on

slope heuristic for the practical application. However, this approach requires an additional tuning

parameter that states the number of singular values used to apply the slope heuristic. In theory,

for the spectral methods to work, the rank of the spectral matrix needs to be equal to the order

of the chain. Thus, it is necessary that the number of elements of the orthonormal basis tends to

inﬁnity, otherwise we only obtain an estimator of an upper-bound of the order. However, deﬁning

the thresholding rule for the case of increasing number of basis elements is still an open problem

for the spectral methods. Indeed, for instance, the rank study performed in Kleibergen and Paap

(2006) should be extended to matrices with increasing dimension (but ﬁxed rank). Thus, in prac-

tice, the number of basis elements is set a priori. This number corresponds to an upper-bound on

the order of the HMM. To the best of our knowledge, since the proposed method avoids the use

of functional basis, it is the ﬁrst method which does not make assumptions on an upper-bound of

the order to be estimated. Numerical studies illustrate the relevance of this proposal and show also

that this new data-driven procedure guarantees good results for our estimator, but also improves

the spectral results of Leh´ericy (2019).

This paper is organized as follows. Section 2 introduces the speciﬁc integral operator. Section 3

presents the ﬁnite-sample size and the asymptotic properties of the estimator (including its consis-

tency). Section 4 describes the new data-driven procedure with a theoretical justiﬁcation. Section 5

is devoted to the computational aspects of the methods. Section 6 illustrates the consistency of the

estimator on simulated data and shows the relevance of the proposed method on benchmark data

(including circular data). Section 7 shows the contribution of our approach on one real-life data

set. Section 8 gives a conclusion and all the proofs are given in Appendix.

2 Order of a HMM and rank of integral operators

2.1 Hidden Markov model

Let Y= (Y⊤

1,...,Y⊤

n+1)⊤be a stationary sequence of random vectors Yt, where Yt∈Rdfollows

a ﬁnite state hidden Markov model (HMM) with Llatent states. This model assumes that there

exists a stationary Markov chain X= (X1, . . . , Xn+1)⊤that is unobserved, where Xt∈ {1, . . . , L}.

Moreover, conditionally on X, the Yt’s are independent and their distribution only depends on

the current state Xt. The Markov chain is deﬁned by a full rank transition matrix Ahaving

π= (π1, . . . , πL)⊤as stationary distribution. Finally, the densities of the emission distributions

f1, . . . , fLare assumed to be linearly independent, where fℓdeﬁnes the conditional distribution of

Ytgiven Xt=ℓ. The density of yis deﬁned by

p(y) = X

x∈{1,...,L}n+1

πx1fx1(y1)

t=1

A[xt, xt+1]fxt+1 (yt+1).(1)

The conditions made on the transition matrix and on the emission distributions are stated by

the following set of assumptions. Note that these assumptions are mild and have been considered

already in Gassiat et al. (2016) to state the identiﬁability of an HMM based on the distribution of

three consecutive observations (see also De Castro et al. (2016, 2017)).

Assumption 1 •The transition matrix Ahas full rank, is irreducible and aperiodic with sta-

tionary distribution π= (π1, . . . , πL)⊤.

•The densities deﬁning the emission distributions {fℓ}L

ℓ=1 are linearly independent ( i.e., if

ξ= (ξ1, . . . , ξL)∈RLis such that for any z∈Rd,PL

ℓ=1 ξℓfℓ(z) = 0 then ξ=0) and are

square integrable on Rd.

Under Assumption 1, the identiﬁability of the ﬁnite and inﬁnite parameters of a HMM can be

obtained from the distribution of three consecutive observations (Gassiat et al., 2016) or from the

distribution of a pair of consecutive observations when the emission distributions are deﬁned as

translations of the same distribution (Gassiat and Rousseau, 2016).

The aim is to make inference on the order L. This can be achieved by using the distribution of

a pair of consecutive observations. From (1), the distribution of a pair of consecutive observations

(Y⊤

t,Y⊤

t+1)⊤is deﬁned by the density

p(yt,yt+1) =

ℓ=1

πℓfℓ(yt)gℓ(yt+1),(2)

where gℓis the density of Yt+1 given Xt=ℓand is deﬁned by

gℓ(yt+1) =

m=1

A[ℓ, m]fm(yt+1).(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EstimationoftheOrderofNon-ParametricHiddenMarkovModelsusingtheSingularValuesofanIntegralOperatorMarieDuRoydeChaumaraymarie.du-roy-de-chaumaray@univ-rennes2.frMathematicalResearchInstituteofRennesIRMARRennesUniversity,Rennes,FranceSalimaElKoleisalima.el-kolei@ensai.frUniv.Rennes,Ensai,CNRS,CRESTUMR91...

展开>> 收起<<

Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator Marie Du Roy de Chaumaray marie.du-roy-de-chaumarayuniv-rennes2.fr.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator Marie Du Roy de Chaumaray marie.du-roy-de-chaumarayuniv-rennes2.fr

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: