Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator Marie Du Roy de Chaumaray marie.du-roy-de-chaumarayuniv-rennes2.fr

2025-04-29 0 0 619.49KB 31 页 10玖币
侵权投诉
Estimation of the Order of Non-Parametric Hidden Markov
Models using the Singular Values of an Integral Operator
Marie Du Roy de Chaumaray marie.du-roy-de-chaumaray@univ-rennes2.fr
Mathematical Research Institute of Rennes IRMAR
Rennes University, Rennes, France
Salima El Kolei salima.el-kolei@ensai.fr
Univ. Rennes, Ensai, CNRS, CREST
UMR 9194, F-35000 Rennes, France
Marie-Pierre Etienne marie-pierre.etienne@institut-agro.fr
Mathematical Research Institute of Rennes IRMAR
Rennes University, Rennes, France
Matthieu Marbac matthieu.marbac-lourdelle@ensai.fr
Univ. Rennes, Ensai, CNRS, CREST
UMR 9194, F-35000 Rennes, France
Abstract
Interested in estimating the order of a finite-state Hidden Markov Model (HMM) with nonpara-
metric emission distributions, a new method that only requires full rank transition matrix and not
linearly dependent emission distributions is introduced. This method relies on the equality between
the order of the HMM and the rank of a specific integral operator. Since only the empirical counter-
part of the singular values of the operator can be obtained, a thresholding procedure is proposed.
At a non-asymptotic level, an upper-bound on the probability of overestimating the order of the
HMM is provided. At an asymptotic level, the consistency of the estimator is established. In addi-
tion a general heuristic that can be successfully applied to several problems in spectral analysis for
designing a data-driven procedure for the threshold is introduced. The approach has the advantage
of not requiring any knowledge of an upper-bound on the order of the HMM. Moreover, different
types of data (including circular or mixed-type data) can be managed. The relevance of the ap-
proach is illustrated on numerical experiments and on real data considering multivariate data with
directional variables.
Keywords: Hidden Markov models, Latent state model, Model selection, Non-parametric esti-
mation.
1 Introduction
A discrete-time homogeneous hidden Markov model (HMM) defines the distribution of an observed
process (Yt)tNand a latent process (Xt)tNsuch that the sequence of unobserved states (Xt)tN
follows a Markov chain and the observations (Yt)tNare independent given the state sequence
(Xt)tN.The conditional distribution of Yt, called emission distribution, only depend on the current
state Xt. This paper focuses on finite state HMMs where the latent process has a finite state
space {1, . . . , L}, the integer Lbeing called the order of the HMM. In this framework, the model
is completely described by the order L, the initial distribution and the transition matrix of the
hidden chain, and the emission distributions. Since the marginal distribution of each Ytis a finite
mixture model, finite state HMMs can be seen as an extension of finite mixture models where the
©?? Marie Du Roy de Chaumaray, Salima El Kolei, Marie-Pierre Etienne and Matthieu Marbac.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/.
arXiv:2210.03559v2 [math.ST] 27 Nov 2023
assumption of independence between observations is relaxed (i.e., Ytand Ytare not independent).
HMMs are a popular tool for modeling the dependency structure for univariate and multivariate
processes driven by a latent Markov chain (see Juang and Rabiner (1991); Yang et al. (1995); Krogh
et al. (2001); Choo et al. (2004); Zucchini and MacDonald (2009) for examples of applications).
They also provide tractable models for circular time series widely used in biology, meteorology and
climate applications to model for instance the speed and the direction of wind, ocean current, or
animal movements (see Holzmann et al. (2006); Bulla et al. (2012); Mastrantonio and Calise (2016)).
Inferring the right order of the latent chain is an important issue, which precedes the estimation of
the model parameters and their interpretation. This paper focuses on the estimation of the order
Lfrom univariate and multivariate data (Yt)tNin a non-parametric setting. The estimation of L
is indeed achieved without any parametric assumption on the emission distributions, since we only
require the linear independence between their probability distribution functions.
Initial developments on HMMs have been made in a parametric framework, which considers
that the emission distributions belong to some given parametric distribution family. Consider-
ing the order of the HMM as known, the existing literature provides the parameter identifiability
(Petrie, 1969), the algorithm for assessing the maximum likelihood estimator (MLE; Baum et al.
(1970)), the consistency of the MLE (Leroux, 1992) and its asymptotic normality (Bickel et al.,
1998). The identification of the order is more challenging and represents a difficult task, mainly
because of a loss of identifiability of the model parameters when the order is overestimated. The
standard assumptions used to control the likelihood ratio test statistics, are thus not satisfied when
the order is overestimated. For instance, Gassiat and Keribin (2000) show that this statistic can
diverge even for bounded parameters. Note that this issue already appears when estimating the
number of components in parametric finite mixture models (Ciuperca, 2002). Therefore, the order
of parametric HMMs can be estimated by homogeneous tests (Holzmann and Schwaiger, 2016),
penalized likelihood approaches (Volant et al., 2014) or cross-validation approaches (Celeux and
Durand, 2008). Using tools from information theory, Gassiat and Boucheron (2003) have shown
the strong consistency of the estimator of the order obtained by penalized maximum likelihood.
Moreover, Bayesian approaches can be used by penalizing the likelihood and thus avoiding the
issues due to the lack of identifiability of the parameters when the order is overestimated (Gassiat
and Rousseau, 2014). Alternatively, Robert et al. (2000) propose a Bayesian inference of the order
Lthrough a reversible jump Markov Chain Monte Carlo method (MCMC). In Chopin (2007), the
author also proposes a Bayesian strategy based on sequential Monte Carlo filter and MCMC. How-
ever, all these approaches consider parametric emission distributions but it is not always possible
to restrict the model to such a convenient finite-dimensional space. Moreover they provide biased
results when their parametric assumptions are violated. In such cases, non-parametric approaches
can be used to model the emission distributions.
Non-parametric HMMs have been proved to be useful in a wide range of applications (see Zhao
(2011) for financial applications, Couvreur and Couvreur (2000) for voice activity detection, Lam-
bert et al. (2003) for climate state identification and Yau et al. (2011) for genomic applications).
Nevertheless, identifiability of the parameters of finite state HMMs with non-parametric emission
distributions has been investigated recently. Gassiat and Rousseau (2016) consider the case of
translation HMMs. They show that all the model parameters (including the infinite dimensional
parameters) are identifiable as soon as the matrix that defines the joint distribution of two consec-
utive latent variables, is non-singular and the translation parameters are distinct. Note that their
conditions are weaker than those used to obtain identifiability for location-scale mixture models. In-
2
deed, for the latter, constraints must be added such as considering symmetric distributions (Hunter
et al., 2007). This additional assumption is no longer required for translation HMMs because of
the dependency between a pair of consecutive observations. Based on the results of parameter
identifiability for a mixture of products of univariate distributions (Allman et al., 2009), Gassiat
et al. (2016) state weaker sufficient conditions for parameter identifiability since they consider a
full rank transition matrix of the latent chain and linearly independent emission probability dis-
tributions. The method introduced by the present paper for estimating the order of an HMM, is
developed under these assumptions. Note that the assumptions made on the emission distributions
have been weakened again by Alexandrovich et al. (2016) since they only require that the emission
distributions are different.
To estimate the (finite and infinite dimensional) parameters of non-parametric HMMs, kernel-
based (Bonhomme et al., 2016b) or wavelet-based (Jin and Mokhtarian, 2006) approaches can be
used. Alternatively, Bonhomme et al. (2016b) and De Castro et al. (2017) extended the spectral
method proposed by Hsu et al. (2012) for estimating parametric HMMs, in order to deal with
a non-parametric framework. However, all these methods are developed for a known order of
the HMM. Estimating the order of a generic non-parametric HMM is still a challenging problem
and to the best of our knowledge Leh´ericy (2019) is the only paper to consider this problem in
this non-parametric setting. The author proposes two methods that provide strongly consistent
estimators of the order of the HMM. The first method considers a minimization of a penalized
least-square criterion that relies on a projection of the emission distributions onto a family of
nested parametric subspaces. For each subspace and each number of latent states, the criterion
used for model selection is computed by minimizing the empirical counterpart of the penalized
L2distance. Thus, the method provides an estimator of the order of the HMM together with
estimators of the emission distributions. The second method uses an estimator of the rank of
a matrix computed from the distribution of a pair of consecutive observations. More precisely,
this method relies on a spectral approach applied on the matrix containing the coordinates of
the density of a pair of consecutive observations in some orthonormal basis. Thus, this method
could be seen as an extension of the spectral method described in the Section 5 of Supplementary
material of Bonhomme et al. (2016a) to HMM. These two methods are complementary in practice.
Indeed, numerical experiments presented in Leh´ericy (2019) show that the penalized least-square
method is more efficient for moderate sample sizes. Indeed, the non-convex criterion raises many
problems for the minimization, in practice. To overcome this difficulty, the author proposes to use
an approximate minimization algorithm (see Hansen and Auger (2011)) that requires a good initial
condition since it might otherwise remain trapped in a local minima which renders this approach
time-consuming for multivariate data and large sample. Furthermore, considering all the subspaces
and all the possible numbers of latent states makes this method computationally greedy. Therefore,
the spectral method should be considered for large sample sizes. Both methods involve an unknown
tuning parameter (i.e., constant in the penalty term of the penalized criterion and threshold for
the spectral method) but also choices of the subspaces (i.e., family of nested parametric subspaces
or the orthonormal basis) that can highly impact the results (see our numerical experiments).
This paper introduces a new simple method for selecting the order of a non-parametric HMM,
by using the rank of an integral operator relying on the distribution of a pair of consecutive obser-
vations. This approach is inspired by the one proposed in Kwon and Mbakop (2021) to estimate
the number of components in nonparametric i.i.d. mixture models but we go further in adapting
this framework for dependent and latent observations leading to new and different theoretical re-
3
sults. Furthermore, we propose a more general heuristics for designing a data-driven procedure
which can be successfully applied to several problems in spectral analysis and give theoretical guar-
antees. The interest of this approach from integral operators lies in the fact that unlike most of
the spectral methods based on noisy matrices (Bonhomme et al., 2016b; De Castro et al., 2017;
Leh´ericy, 2019), the method does not require any choices of a functional basis or its number of
elements. Hence, the proposed method does not require any knowledge of an upper bound of the
order of the HMM. Moreover, different types of data (including circular or mixed-type data) can
be managed. Since the distribution of the pair of consecutive observations is estimated with kernel
method, only the empirical counter-part of the singular values of the operator can be obtained, we
propose to use our new data-driven method for the thresholding procedure. At a non-asymptotic
level, an upper-bound on the probability of overestimating the order of the HMM is provided. At
an asymptotic level, the consistency of the estimator is established. The control at non-asymptotic
and asymptotic levels are obtained by a concentration inequality of the Hilbert-Schmidt norm of
the empirical version of the operator. The statistical tools needed to establish these results differ
from those used in Kwon and Mbakop (2021), and consequently the results are different. Thus,
using concentration results specific to Markov chains, a concentration inequality is obtained by
considering a sum of two terms, where one term does not depend on the bandwidth and the second
term does not depend on the probability of overestimating the order. Note that the bound obtained
in the i.i.d. context considers a product between the bandwidth, the probability of overestimating
the order and the sample size. This bound contains only terms that depends on the kernel and the
bandwidth. Contrary to this setting and because of the dependency of observations the concentra-
tion inequality that we obtain depends on some unknown constant of the HMM (e.g., the mixing
time). To circumvent this issue and practical convenience, we propose a data-driven procedure
based on an unsupervised classification of the singular values of the operator and computed on
mini-batches, for estimating the constant in the concentration inequality. Note that in Leh´ericy
(2019) the model selection for the spectral method is also based on a thresholding rule applied on
the singular values whose choice is a delicate issue since it depends on the functional basis and on
the number of elements. Hence, in his paper the author proposes an empirical method based on
slope heuristic for the practical application. However, this approach requires an additional tuning
parameter that states the number of singular values used to apply the slope heuristic. In theory,
for the spectral methods to work, the rank of the spectral matrix needs to be equal to the order
of the chain. Thus, it is necessary that the number of elements of the orthonormal basis tends to
infinity, otherwise we only obtain an estimator of an upper-bound of the order. However, defining
the thresholding rule for the case of increasing number of basis elements is still an open problem
for the spectral methods. Indeed, for instance, the rank study performed in Kleibergen and Paap
(2006) should be extended to matrices with increasing dimension (but fixed rank). Thus, in prac-
tice, the number of basis elements is set a priori. This number corresponds to an upper-bound on
the order of the HMM. To the best of our knowledge, since the proposed method avoids the use
of functional basis, it is the first method which does not make assumptions on an upper-bound of
the order to be estimated. Numerical studies illustrate the relevance of this proposal and show also
that this new data-driven procedure guarantees good results for our estimator, but also improves
the spectral results of Leh´ericy (2019).
This paper is organized as follows. Section 2 introduces the specific integral operator. Section 3
presents the finite-sample size and the asymptotic properties of the estimator (including its consis-
tency). Section 4 describes the new data-driven procedure with a theoretical justification. Section 5
4
is devoted to the computational aspects of the methods. Section 6 illustrates the consistency of the
estimator on simulated data and shows the relevance of the proposed method on benchmark data
(including circular data). Section 7 shows the contribution of our approach on one real-life data
set. Section 8 gives a conclusion and all the proofs are given in Appendix.
2 Order of a HMM and rank of integral operators
2.1 Hidden Markov model
Let Y= (Y
1,...,Y
n+1)be a stationary sequence of random vectors Yt, where YtRdfollows
a finite state hidden Markov model (HMM) with Llatent states. This model assumes that there
exists a stationary Markov chain X= (X1, . . . , Xn+1)that is unobserved, where Xt∈ {1, . . . , L}.
Moreover, conditionally on X, the Yt’s are independent and their distribution only depends on
the current state Xt. The Markov chain is defined by a full rank transition matrix Ahaving
π= (π1, . . . , πL)as stationary distribution. Finally, the densities of the emission distributions
f1, . . . , fLare assumed to be linearly independent, where fdefines the conditional distribution of
Ytgiven Xt=. The density of yis defined by
p(y) = X
x∈{1,...,L}n+1
πx1fx1(y1)
n
Y
t=1
A[xt, xt+1]fxt+1 (yt+1).(1)
The conditions made on the transition matrix and on the emission distributions are stated by
the following set of assumptions. Note that these assumptions are mild and have been considered
already in Gassiat et al. (2016) to state the identifiability of an HMM based on the distribution of
three consecutive observations (see also De Castro et al. (2016, 2017)).
Assumption 1 The transition matrix Ahas full rank, is irreducible and aperiodic with sta-
tionary distribution π= (π1, . . . , πL).
The densities defining the emission distributions {f}L
=1 are linearly independent ( i.e., if
ξ= (ξ1, . . . , ξL)RLis such that for any zRd,PL
=1 ξf(z) = 0 then ξ=0) and are
square integrable on Rd.
Under Assumption 1, the identifiability of the finite and infinite parameters of a HMM can be
obtained from the distribution of three consecutive observations (Gassiat et al., 2016) or from the
distribution of a pair of consecutive observations when the emission distributions are defined as
translations of the same distribution (Gassiat and Rousseau, 2016).
The aim is to make inference on the order L. This can be achieved by using the distribution of
a pair of consecutive observations. From (1), the distribution of a pair of consecutive observations
(Y
t,Y
t+1)is defined by the density
p(yt,yt+1) =
L
X
=1
πf(yt)g(yt+1),(2)
where gis the density of Yt+1 given Xt=and is defined by
g(yt+1) =
L
X
m=1
A[ℓ, m]fm(yt+1).(3)
5
摘要:

EstimationoftheOrderofNon-ParametricHiddenMarkovModelsusingtheSingularValuesofanIntegralOperatorMarieDuRoydeChaumaraymarie.du-roy-de-chaumaray@univ-rennes2.frMathematicalResearchInstituteofRennesIRMARRennesUniversity,Rennes,FranceSalimaElKoleisalima.el-kolei@ensai.frUniv.Rennes,Ensai,CNRS,CRESTUMR91...

展开>> 收起<<
Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator Marie Du Roy de Chaumaray marie.du-roy-de-chaumarayuniv-rennes2.fr.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:619.49KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注