1 Data-Driven Quickest Change Detection in Hidden Markov Models

2025-04-30 0 0 544.73KB 11 页 10玖币
侵权投诉
1
Data-Driven Quickest Change Detection in (Hidden)
Markov Models
Qi Zhang Zhongchang Sun Luis C. Herrera Shaofeng Zou
Abstract—The paper investigates the problems of quickest
change detection in Markov models and hidden Markov models
(HMMs). Sequential observations are taken from a (hidden)
Markov model. At some unknown time, an event occurs in the
system and changes the transition kernel of the Markov model
and/or the emission probability of the HMM. The objective is to
detect the change quickly, while controlling the average running
length (ARL) to false alarm. The data-driven setting is studied,
where no knowledge of the pre-, post-change distributions is
available. Kernel-based data-driven algorithms are developed,
which can be applied in the setting with continuous state, can be
updated in a recursive fashion, and are computationally efficient.
Lower bounds on the ARL and upper bound on the worst-case
average detection delay (WADD) are derived. The WADD is at
most of the order of the logarithm of the ARL. The algorithms
are further numerically validated on two practical problems of
fault detection in DC microgrid and photovoltaic systems.
Index Terms—Maximum Mean Discrepancy, Kernel Method,
Fault Detection, Non-i.i.d..
I. INTRODUCTION
In the problem of quickest change detection (QCD), se-
quential observations are generated from a stochastic process.
At some unknown time (change-point), an event occurs and
causes the data-generating distribution to undergo a change.
The objective is to quickly detect the change, while controlling
the false alarm rate. This QCD problem has been widely
studied in the literature [3]–[7]. It models a wide range
of applications, e.g., fault detection in DC microgrids [8],
quality control in online manufacturing systems [9], spectrum
monitoring in wireless communications [10], early detection
of epidemics [11] and signal processing in genetic area [12].
Most existing studies investigate the setting where pre-
and post-change samples are independent and identically dis-
tributed (i.i.d.), respectively. In [13]–[15], the authors prove
that the CuSum algorithm is optimal when the samples are
i.i.d.. However, the i.i.d. assumption may not hold in practice,
e.g., in photovoltaic systems [16], samples are not independent
over time. In [17]–[20], the general QCD problem for the non-
i.i.d. setting is studied, where the approaches are model-based
with known pre- and post-change distributions. In [21]–[23],
the case where samples are generated from a Markov model is
investigated. In these literature, both the pre- and post-change
distributions are known exactly, and therefore those algorithms
This paper was presented in part at the 2023 IEEE International Confer-
ence on Acoustics, Speech, and Signal Processing [1] and the 2023 IEEE
International Symposium on Information Theory [2].
The authors are with the Department of Electrical Engineering, Univer-
sity at Buffalo, Buffalo, NY 14228 USA (e-mail: qzhang48@buffalo.edu,
zhongcha@buffalo.edu, lcherrer@buffalo.edu, szou3@buffalo.edu).
may not be applicable if such information is unavailable or
the performance may degrade significantly if such information
is inaccurate. In this paper, we first study QCD problem in
Markov models under the data-driven setting, where neither
the pre- nor the post-change transition kernel is known. We
then focus on the more challenging hidden Markov models
(HMMs), where the pre- and post-change transition kernels
and emission probability distributions are unknown.
A. Related Works
For Markov models, the data-driven setting where the pre-
and the post-change transition kernels are unknown is studied
in, e.g., [24], [25]. QCD in Markov models with known
pre-change and unknown post-change transition kernels is
investigated in [24]. The maximum likelihood estimate (MLE)
of the unknown post-change transition kernel is employed
to construct a generalized CuSum-type test. However, this
method only works for finite state Markov chains or param-
eterized Markov transition kernels, which is not applicable
to general problems with continuous state. QCD in Markov
models with both pre- and post-change transition kernels
unknown is studied in [25], where a kernel-based method is
developed. A kernel method of maximum mean discrepancy
(MMD) [26] is employed to construct a CuSum-type test. The
kernel MMD method for data-driven QCD under the i.i.d.
setting is also investigated in e.g., [27], [28]. However, they
focus on the case with i.i.d. samples, and thus their approach
cannot be directly applied to our Markov models.
In [29]–[33], QCD in HMMs is studied. All these studies
investigate the model-based setting with both pre- and post-
change information known. However, the information may
not be accurate or available in practice, especially when the
change is due to some unexpected fault. In [29], the model-
based setting with known pre- and post-change distributions
for QCD in HMMs with a finite state space is investigated.
The L1-norm of products of Markov random matrices are
employed to construct a CuSum test. This approach however
only applies to HMMs with a finite state space. In [30], the
same problem is investigated. The ratio of the L1-norms of
products of Markov random matrices is employed to construct
the log-likelihood in their Shiryayev–Roberts–Pollak test. In
[32], a recursive algorithm is designed for the same problem.
A quasi-generalized likelihood ratio is employed to update
the algorithm in a recursive fashion. The proposed approach
however only applies to HMMs with two states and does not
generalize to problems with a continuous state space. QCD
in HMMs under Bayesian setting is studied in [31], which
arXiv:2210.11988v5 [eess.SP] 8 Nov 2023
2
assumes the change point follows some prior distribution. All
the above studies investigate QCD in HMMs with a finite state
space and need both the pre- and post-change transition kernels
and emission probability distributions.
In [33], QCD in auto-regressive (AR) model is studied,
where observations are i.i.d. before the change and follow
an AR model after the change. A computationally efficient
algorithm is proposed. However, for the data-driven setting,
only the lower bound on average running length (ARL) is
provided. QCD in AR models is also studied in [16], where the
likelihood ratio is approximated using a data-driven method.
However, the authors did not provide any theoretical guarantee.
B. Major Contributions
We develop kernel-based data-driven algorithms for QCD
in (hidden) Markov models. We keep a sample buffer of size
m. Once the buffer is filled, we calculate the MMD between
samples in the buffer and a reference batch of pre-change
samples. We use this MMD to construct a CuSum-type test.
Subsequently, we clear the buffer and continue sampling. The
statistic has a negative drift, causing the CuSum to fluctuate
around 0 before the change. After the change, it has a positive
drift, leading the CuSum to exceed the threshold quickly.
We theoretically demonstrate that the lower bound on ARL
increases exponentially with the threshold and that the upper
bound on the worst-case average detection delay (WADD) is
linear in the threshold for both Markov models and HMMs.
Hence, the WADD is at most in the logarithm order of the
ARL. This result matches with the result of QCD under the
model-based setting. Combined with the universal lower bound
on the WADD in [17], our algorithm is optimal at the order-
level. The primary difficulty in our proof arises from the fact
that the samples are generated from (hidden) Markov models,
and thus it is essential to explicitly characterize the bias in the
MMD estimate for the analysis of ARL and WADD.
When this paper was in preparation, the first version of [25]
was posted where the ARL lower bound grows linearly with
the threshold, which indicates overly frequent false alarms.
Later [25] provides an improved result that the ARL is at least
exponentially in the threshold and the upper bound on WADD
is linear in the threshold. However, the lower bound on ARL
in [25] requires the threshold to be sufficiently large. Here
we do not need such assumption. In addition, [25] focuses
on Markov models. To the best of the authors’ knowledge,
we are the first to develop data-driven algorithms for QCD
in HMMs with theoretical performance guarantees and order-
level optimal performance. Our algorithm has a complexity
of O(mn)for every nsamples, whereas the computational
complexity in [25] is O(m2n). Our simulation results also
demonstrate that our algorithm has a smaller WADD for a
given ARL, and is computationally efficient.
We conduct extensive experiments on synthetic data and
two practical fault detection problems in power systems.
The numerical results show that for fault detection in DC
microgrids, our algorithm outperforms the algorithm proposed
in [25] in data-driven settings. For line-to-line fault detection
in photovoltaic systems, our data-driven algorithm outperforms
the model-based methods if applied with inaccurate system
knowledge in [34], [35].
This journal paper provides complete proofs for the results,
which were omitted due to space limitation in the conference
papers [1], [2]. Moreover, we also add extensive experiments
on practical applications of fault detection in DC microgrid
and photovoltaic systems, and compare our algorithms with
existing approaches.
II. PROBLEM FORMULATION
Markov Models: Consider a Markov chain {Xt}
t=1 de-
fined on a probability space (X,F,P)with P(Xt
A|Xt1, . . . , X1) = P(XtA|Xt1)for each t1and
A⊆ X. Define the transition kernel P:X → P(X)for the
Markov chain {Xt}
t=1 , where P(Xt|Xt1)is the transition
probability density of Xtgiven Xt1and P(X)denotes the
probability simplex on X. The transition probability density
changes at an unknown time τfrom Pinto Q:X → P(X).
Hidden Markov Models: The state Xtof the Markov chain
cannot be observed directly. Instead, we have access to an
observable sequence {X
t}
t=1 which is adjoined to the Markov
chain {Xt}
t=1 such that {Xt, X
t}
t=1 is a Markov chain. For
any measurable set A⊆ X, the following condition holds:
P(XtA|Xt1, . . . , X1, X
t1, . . . , X
1) = P(XtA|Xt1),
P(X
tA|Xt, . . . , X1, X
t1, . . . , X
1) = P(X
tA|Xt).
Let P:X → P(X)be the emission probability. At some
unknown time τ, the transition kernel changes from Pto Q
and/or the emission probability changes from Pto Q.
The objective is to detect the change quickly subject to false
alarm constraints. In this paper, we investigate the data-driven
setting, where P, P , Q, Qare unknown.
Let ˆ
Tbe a stopping time and let Pτ(Eτ) be the probability
measure (expectation) when the change happens at time τ.
Let P(E) be the probability measure (expectation) if no
change happens. There is a reference sequence of samples
{Yt}
t=1 from transition kernel P[25], [27], [28]. Use Ftto
denote the σ-field generated by {Xi, Yi}t
i=1. For HMMs, there
is a reference sequence of observations {Y
t}
t=1 generated
from a HMM with transition kernel Pand emission probability
P. Denote the hidden state by Yt. Also use Ftto denote the
σ-field generated by {Xi, X
i, Yi, Y
i}t
i=1. Define the ARL and
the WADD for ˆ
T:
ARL(ˆ
T) = E[ˆ
T],(1)
WADD(ˆ
T) = sup
τ1
Eτ[( ˆ
Tτ)+|Fτ1].(2)
Here the ARL quantifies the frequency of false alarms, and
the WADD demonstrates the number of samples required to
trigger an alarm. The objective is to minimize the WADD
under the ARL constraint:
min
ˆ
T
WADD(ˆ
T),s.t. ARL(ˆ
T)ψ, (3)
where ψ > 0is some pre-specified constant. We assume
that the Markov chains with transition kernels Pand Qare
3
uniformly ergodic [24], [25], i.e., for any measurable A⊆ X
and x∈ X
|P(Xt+iA|Xi=x)πP(A)| ≤ RPλt
P,(4)
|P1(Xt+iA|Xi=x)πQ(A)| ≤ RQλt
Q,(5)
where πPand πQdenote the invariant distributions for
Markov models with transition kernels Pand Q, and 0<
RP, RQ<,0< λP, λQ<1are some constants.
It can be demonstrated that under P, the invariant dis-
tribution of {Xt, X
t}
t=1 is πP(x)P(x|x), and under P1,
the invariant distribution is πQ(x)Q(x|x). Let π
P(x) =
RXπP(dx)P(x|x),and π
Q(x) = RXπQ(dx)Q(x|x)be the
marginal distribution of the observation under the invariant
distribution by π
Pand π
Q, respectively.
A. Maximum Mean Discrepancy
In this section, we provide a brief introduction to kernel
mean embedding and MMD [26], [36]. Consider a positive
definite kernel function k:X×X R, in a reproducing kernel
Hilbert space (RKHS) denoted by Hk. Let ⟨·,·⟩Hkbe the inner
product in the RKHS. For any x, y ∈ X, we have k(x, ·)∈ Hk
and k(x, y) = k(x, ·), k(y, ·)Hk.Following the reproducing
property, any function f∈ Hk,f(x) = f, k(x, ·)Hk. In
this paper, we study a bounded kernel function k, i.e., 0
k(x, y)1. Let µF=EXF[k(X, ·)] be the kernel mean
embedding of a probability distribution F. For a characteristic
kernel k,µF=µGif and only if the probability Gand Fare
identical [26]. The definition of MMD between distribution F
and Gis as follows:
D(F, G) = sup
f:fHk1EXF[f(X)] EYG[f(Y)].(6)
The squared MMD can be written equivalently as [37]:
D2(F, G) = EX, ¯
XF[k(X, ¯
X)] + EY, ¯
YG[k(Y, ¯
Y)]
2EXF,Y G[k(X, Y )].(7)
III. MARKOV MODELS
In this section, we present our results for Markov models.
Markov chains with distinct transition kernels may generate
identical stationary distributions [25]. As a result, a straight-
forward extension of existing QCD methods designed for the
i.i.d. setting [27], [28], which estimates the MMD between
stationary distributions πPand πQ, may not be work even
P̸=Q. To address this issue, we utilize the second-order
Markov chain [25]. We define the product σ-algebra on
X ×X, which is generated by the collection of all measurable
rectangles as follows: F ⊗F =σ{A×B:A⊆ X, B ⊆ X}.
Define the second-order probability measure eπPon measur-
able space (X × X,F ⊗ F)as follows: eπP(AB) =
RAπP(dx)P(Xi+1 B|Xi=x),where Aand Bare
elements of the Fσ-algebra. According to Lemma 1 in [25], if
the transition kernels are distinct, the second-order probability
measures are distinct. The kernel function and MMD discussed
in Section II-A can be straightforwardly extended to the space
of e
X=X × X.
It can be easily shown that under P(P1), the second-order
Markov chain {e
Xt= (Xt, Xt+1)}
t=1 is uniformly ergodic.
We then construct the test statistic and the stopping rule using
the second-order Markov chain. Our approach is to estimate
the MMD between the invariant measures of the second-
order Markov chains, and employ this estimate to construct
a CuSum-type test.
As a first step, we divide the samples into non-overlapping
blocks of m. Let e
Xt= (Xt, Xt+1),e
Yt= (Yt, Yt+1)
be the second-order Markov chains. For the observation
sequence{e
Xt}
t=1, the pre-change stationary distribution is
eπPand the post-change stationary distribution is eπQ. For
the reference sequence {e
Yt}
t=1, the stationary distribution is
eπP. For the t-th block, t= 0,1,2, . . ., define the empirical
measure of {e
Xmt+i}m1
i=1 as Ft
e
X=1
m1Pm1
i=1 δe
Xmt+i,where
δe
Xiis the Dirac measure. Similarly, we can define Ft
e
Y. The
MMD D(Ft
e
X, F t
e
Y)can then be computed using (7). Define
St=D(Ft
e
X, F t
e
Y)σ, where 0< σ < D(eπP,eπQ)is a positive
constant that will be determined later. Then, our stopping time
is defined as
T(c) = inf (mt +t: max
0it
t
X
j=i
Sj> c),(8)
where c > 0is a predetermined threshold.
Our algorithm can be updated in a recursive fashion:
max0itPt
j=iSj= max n0,max0it1Pt1
j=iSj+Sto.
Furthermore, for every block with size mthe computational
complexity for MMD is O(m2). At time n, there are n
m
non-overlapping blocks in total. Thus, the total computational
complexity up to time nis O(mn). In contrast, the algorithm
in [25] uses overlapping blocks and has an overall computa-
tional complexity of O(m2n).
Intuitively, when mis large, we have that D(Ft
e
X, F t
e
Y)
D(eπP,eπP)=0before the change and D(Ft
e
X, F t
e
Y)
D(eπP,eπQ)>0after the change. For 0< σ < D(eπP,eπQ), the
test statistic has a negative drift before the change, causing the
CuSum to fluctuate around 0, while after the change, it has
a positive drift, leading the CuSum to exceed the threshold
quickly.
We present the upper bound on the WADD of (8) in
the following theorem. Define aP=q22λP+4RP
(m1)(1λP), aQ=
q22λQ+4RQ
(m1)(1λQ), a =aP+aQ,and d=D(eπP,eπQ)σ.
Theorem 1. The WADD for the stopping time in (8) can be
upper bounded as follows:
WADD(T(c)) 2admc
(ad)2+(a+d)mc
(ad)2+ 2m+a+ad
dam
=O(mc).(9)
Proof. Let τbe the index of the last sample within the block
where the change occurs. Thus, samples after τare generated
from transition kernel Q. Moreover, we have (T(c)τ)+
(T(c)τ)++m. Let
ξ=aP+aQ+p(D(eπP,eπQ)σ)(aP+aQ)
D(eπP,eπQ)σaPaQ
.
摘要:

1Data-DrivenQuickestChangeDetectionin(Hidden)MarkovModelsQiZhangZhongchangSunLuisC.HerreraShaofengZouAbstract—ThepaperinvestigatestheproblemsofquickestchangedetectioninMarkovmodelsandhiddenMarkovmodels(HMMs).Sequentialobservationsaretakenfroma(hidden)Markovmodel.Atsomeunknowntime,aneventoccursinthes...

展开>> 收起<<
1 Data-Driven Quickest Change Detection in Hidden Markov Models.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:544.73KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注