1 Data-Driven Quickest Change Detection in Hidden Markov Models

2025-04-30 0 0 544.73KB 11 页 10玖币

侵权投诉

Data-Driven Quickest Change Detection in (Hidden)

Markov Models

Qi Zhang Zhongchang Sun Luis C. Herrera Shaofeng Zou

Abstract—The paper investigates the problems of quickest

change detection in Markov models and hidden Markov models

(HMMs). Sequential observations are taken from a (hidden)

Markov model. At some unknown time, an event occurs in the

system and changes the transition kernel of the Markov model

and/or the emission probability of the HMM. The objective is to

detect the change quickly, while controlling the average running

length (ARL) to false alarm. The data-driven setting is studied,

where no knowledge of the pre-, post-change distributions is

available. Kernel-based data-driven algorithms are developed,

which can be applied in the setting with continuous state, can be

updated in a recursive fashion, and are computationally efﬁcient.

Lower bounds on the ARL and upper bound on the worst-case

average detection delay (WADD) are derived. The WADD is at

most of the order of the logarithm of the ARL. The algorithms

are further numerically validated on two practical problems of

fault detection in DC microgrid and photovoltaic systems.

Index Terms—Maximum Mean Discrepancy, Kernel Method,

Fault Detection, Non-i.i.d..

I. INTRODUCTION

In the problem of quickest change detection (QCD), se-

quential observations are generated from a stochastic process.

At some unknown time (change-point), an event occurs and

causes the data-generating distribution to undergo a change.

The objective is to quickly detect the change, while controlling

the false alarm rate. This QCD problem has been widely

studied in the literature [3]–[7]. It models a wide range

of applications, e.g., fault detection in DC microgrids [8],

quality control in online manufacturing systems [9], spectrum

monitoring in wireless communications [10], early detection

of epidemics [11] and signal processing in genetic area [12].

Most existing studies investigate the setting where pre-

and post-change samples are independent and identically dis-

tributed (i.i.d.), respectively. In [13]–[15], the authors prove

that the CuSum algorithm is optimal when the samples are

i.i.d.. However, the i.i.d. assumption may not hold in practice,

e.g., in photovoltaic systems [16], samples are not independent

over time. In [17]–[20], the general QCD problem for the non-

i.i.d. setting is studied, where the approaches are model-based

with known pre- and post-change distributions. In [21]–[23],

the case where samples are generated from a Markov model is

investigated. In these literature, both the pre- and post-change

distributions are known exactly, and therefore those algorithms

This paper was presented in part at the 2023 IEEE International Confer-

ence on Acoustics, Speech, and Signal Processing [1] and the 2023 IEEE

International Symposium on Information Theory [2].

The authors are with the Department of Electrical Engineering, Univer-

sity at Buffalo, Buffalo, NY 14228 USA (e-mail: qzhang48@buffalo.edu,

zhongcha@buffalo.edu, lcherrer@buffalo.edu, szou3@buffalo.edu).

may not be applicable if such information is unavailable or

the performance may degrade signiﬁcantly if such information

is inaccurate. In this paper, we ﬁrst study QCD problem in

Markov models under the data-driven setting, where neither

the pre- nor the post-change transition kernel is known. We

then focus on the more challenging hidden Markov models

(HMMs), where the pre- and post-change transition kernels

and emission probability distributions are unknown.

A. Related Works

For Markov models, the data-driven setting where the pre-

and the post-change transition kernels are unknown is studied

in, e.g., [24], [25]. QCD in Markov models with known

pre-change and unknown post-change transition kernels is

investigated in [24]. The maximum likelihood estimate (MLE)

of the unknown post-change transition kernel is employed

to construct a generalized CuSum-type test. However, this

method only works for ﬁnite state Markov chains or param-

eterized Markov transition kernels, which is not applicable

to general problems with continuous state. QCD in Markov

models with both pre- and post-change transition kernels

unknown is studied in [25], where a kernel-based method is

developed. A kernel method of maximum mean discrepancy

(MMD) [26] is employed to construct a CuSum-type test. The

kernel MMD method for data-driven QCD under the i.i.d.

setting is also investigated in e.g., [27], [28]. However, they

focus on the case with i.i.d. samples, and thus their approach

cannot be directly applied to our Markov models.

In [29]–[33], QCD in HMMs is studied. All these studies

investigate the model-based setting with both pre- and post-

change information known. However, the information may

not be accurate or available in practice, especially when the

change is due to some unexpected fault. In [29], the model-

based setting with known pre- and post-change distributions

for QCD in HMMs with a ﬁnite state space is investigated.

The L1-norm of products of Markov random matrices are

employed to construct a CuSum test. This approach however

only applies to HMMs with a ﬁnite state space. In [30], the

same problem is investigated. The ratio of the L1-norms of

products of Markov random matrices is employed to construct

the log-likelihood in their Shiryayev–Roberts–Pollak test. In

[32], a recursive algorithm is designed for the same problem.

A quasi-generalized likelihood ratio is employed to update

the algorithm in a recursive fashion. The proposed approach

however only applies to HMMs with two states and does not

generalize to problems with a continuous state space. QCD

in HMMs under Bayesian setting is studied in [31], which

arXiv:2210.11988v5 [eess.SP] 8 Nov 2023

assumes the change point follows some prior distribution. All

the above studies investigate QCD in HMMs with a ﬁnite state

space and need both the pre- and post-change transition kernels

and emission probability distributions.

In [33], QCD in auto-regressive (AR) model is studied,

where observations are i.i.d. before the change and follow

an AR model after the change. A computationally efﬁcient

algorithm is proposed. However, for the data-driven setting,

only the lower bound on average running length (ARL) is

provided. QCD in AR models is also studied in [16], where the

likelihood ratio is approximated using a data-driven method.

However, the authors did not provide any theoretical guarantee.

B. Major Contributions

We develop kernel-based data-driven algorithms for QCD

in (hidden) Markov models. We keep a sample buffer of size

m. Once the buffer is ﬁlled, we calculate the MMD between

samples in the buffer and a reference batch of pre-change

samples. We use this MMD to construct a CuSum-type test.

Subsequently, we clear the buffer and continue sampling. The

statistic has a negative drift, causing the CuSum to ﬂuctuate

around 0 before the change. After the change, it has a positive

drift, leading the CuSum to exceed the threshold quickly.

We theoretically demonstrate that the lower bound on ARL

increases exponentially with the threshold and that the upper

bound on the worst-case average detection delay (WADD) is

linear in the threshold for both Markov models and HMMs.

Hence, the WADD is at most in the logarithm order of the

ARL. This result matches with the result of QCD under the

model-based setting. Combined with the universal lower bound

on the WADD in [17], our algorithm is optimal at the order-

level. The primary difﬁculty in our proof arises from the fact

that the samples are generated from (hidden) Markov models,

and thus it is essential to explicitly characterize the bias in the

MMD estimate for the analysis of ARL and WADD.

When this paper was in preparation, the ﬁrst version of [25]

was posted where the ARL lower bound grows linearly with

the threshold, which indicates overly frequent false alarms.

Later [25] provides an improved result that the ARL is at least

exponentially in the threshold and the upper bound on WADD

is linear in the threshold. However, the lower bound on ARL

in [25] requires the threshold to be sufﬁciently large. Here

we do not need such assumption. In addition, [25] focuses

on Markov models. To the best of the authors’ knowledge,

we are the ﬁrst to develop data-driven algorithms for QCD

in HMMs with theoretical performance guarantees and order-

level optimal performance. Our algorithm has a complexity

of O(mn)for every nsamples, whereas the computational

complexity in [25] is O(m2n). Our simulation results also

demonstrate that our algorithm has a smaller WADD for a

given ARL, and is computationally efﬁcient.

We conduct extensive experiments on synthetic data and

two practical fault detection problems in power systems.

The numerical results show that for fault detection in DC

microgrids, our algorithm outperforms the algorithm proposed

in [25] in data-driven settings. For line-to-line fault detection

in photovoltaic systems, our data-driven algorithm outperforms

the model-based methods if applied with inaccurate system

knowledge in [34], [35].

This journal paper provides complete proofs for the results,

which were omitted due to space limitation in the conference

papers [1], [2]. Moreover, we also add extensive experiments

on practical applications of fault detection in DC microgrid

and photovoltaic systems, and compare our algorithms with

existing approaches.

II. PROBLEM FORMULATION

Markov Models: Consider a Markov chain {Xt}∞

t=1 de-

ﬁned on a probability space (X,F,P)with P(Xt∈

A|Xt−1, . . . , X1) = P(Xt∈A|Xt−1)for each t≥1and

A⊆ X. Deﬁne the transition kernel P:X → P(X)for the

Markov chain {Xt}∞

t=1 , where P(Xt|Xt−1)is the transition

probability density of Xtgiven Xt−1and P(X)denotes the

probability simplex on X. The transition probability density

changes at an unknown time τfrom Pinto Q:X → P(X).

Hidden Markov Models: The state Xtof the Markov chain

cannot be observed directly. Instead, we have access to an

observable sequence {X′

t}∞

t=1 which is adjoined to the Markov

chain {Xt}∞

t=1 such that {Xt, X′

t}∞

t=1 is a Markov chain. For

any measurable set A⊆ X, the following condition holds:

P(Xt∈A|Xt−1, . . . , X1, X′

t−1, . . . , X′

1) = P(Xt∈A|Xt−1),

P(X′

t∈A|Xt, . . . , X1, X′

t−1, . . . , X′

1) = P′(X′

t∈A|Xt).

Let P′:X → P(X)be the emission probability. At some

unknown time τ, the transition kernel changes from Pto Q

and/or the emission probability changes from P′to Q′.

The objective is to detect the change quickly subject to false

alarm constraints. In this paper, we investigate the data-driven

setting, where P, P ′, Q, Q′are unknown.

Let ˆ

Tbe a stopping time and let Pτ(Eτ) be the probability

measure (expectation) when the change happens at time τ.

Let P∞(E∞) be the probability measure (expectation) if no

change happens. There is a reference sequence of samples

{Yt}∞

t=1 from transition kernel P[25], [27], [28]. Use Ftto

denote the σ-ﬁeld generated by {Xi, Yi}t

i=1. For HMMs, there

is a reference sequence of observations {Y′

t}∞

t=1 generated

from a HMM with transition kernel Pand emission probability

P′. Denote the hidden state by Yt. Also use Ftto denote the

σ-ﬁeld generated by {Xi, X′

i, Yi, Y ′

i}t

i=1. Deﬁne the ARL and

the WADD for ˆ

ARL(ˆ

T) = E∞[ˆ

T],(1)

WADD(ˆ

T) = sup

τ≥1

Eτ[( ˆ

T−τ)+|Fτ−1].(2)

Here the ARL quantiﬁes the frequency of false alarms, and

the WADD demonstrates the number of samples required to

trigger an alarm. The objective is to minimize the WADD

under the ARL constraint:

min

WADD(ˆ

T),s.t. ARL(ˆ

T)≥ψ, (3)

where ψ > 0is some pre-speciﬁed constant. We assume

that the Markov chains with transition kernels Pand Qare

uniformly ergodic [24], [25], i.e., for any measurable A⊆ X

and x∈ X

|P∞(Xt+i∈A|Xi=x)−πP(A)| ≤ RPλt

P,(4)

|P1(Xt+i∈A|Xi=x)−πQ(A)| ≤ RQλt

Q,(5)

where πPand πQdenote the invariant distributions for

Markov models with transition kernels Pand Q, and 0<

RP, RQ<∞,0< λP, λQ<1are some constants.

It can be demonstrated that under P∞, the invariant dis-

tribution of {Xt, X′

t}∞

t=1 is πP(x)P′(x′|x), and under P1,

the invariant distribution is πQ(x)Q′(x′|x). Let π′

P(x′) =

RXπP(dx)P′(x′|x),and π′

Q(x′) = RXπQ(dx)Q′(x′|x)be the

marginal distribution of the observation under the invariant

distribution by π′

Pand π′

Q, respectively.

A. Maximum Mean Discrepancy

In this section, we provide a brief introduction to kernel

mean embedding and MMD [26], [36]. Consider a positive

deﬁnite kernel function k:X×X → R, in a reproducing kernel

Hilbert space (RKHS) denoted by Hk. Let ⟨·,·⟩Hkbe the inner

product in the RKHS. For any x, y ∈ X, we have k(x, ·)∈ Hk

and k(x, y) = ⟨k(x, ·), k(y, ·)⟩Hk.Following the reproducing

property, any function f∈ Hk,f(x) = ⟨f, k(x, ·)⟩Hk. In

this paper, we study a bounded kernel function k, i.e., 0≤

k(x, y)≤1. Let µF=EX∼F[k(X, ·)] be the kernel mean

embedding of a probability distribution F. For a characteristic

kernel k,µF=µGif and only if the probability Gand Fare

identical [26]. The deﬁnition of MMD between distribution F

and Gis as follows:

D(F, G) = sup

f:∥f∥Hk≤1EX∼F[f(X)] −EY∼G[f(Y)].(6)

The squared MMD can be written equivalently as [37]:

D2(F, G) = EX, ¯

X∼F[k(X, ¯

X)] + EY, ¯

Y∼G[k(Y, ¯

Y)]

−2EX∼F,Y ∼G[k(X, Y )].(7)

III. MARKOV MODELS

In this section, we present our results for Markov models.

Markov chains with distinct transition kernels may generate

identical stationary distributions [25]. As a result, a straight-

forward extension of existing QCD methods designed for the

i.i.d. setting [27], [28], which estimates the MMD between

stationary distributions πPand πQ, may not be work even

P̸=Q. To address this issue, we utilize the second-order

Markov chain [25]. We deﬁne the product σ-algebra on

X ×X, which is generated by the collection of all measurable

rectangles as follows: F ⊗F =σ{A×B:A⊆ X, B ⊆ X}.

Deﬁne the second-order probability measure eπPon measur-

able space (X × X,F ⊗ F)as follows: eπP(A⊗B) =

RAπP(dx)P(Xi+1 ∈B|Xi=x),where Aand Bare

elements of the Fσ-algebra. According to Lemma 1 in [25], if

the transition kernels are distinct, the second-order probability

measures are distinct. The kernel function and MMD discussed

in Section II-A can be straightforwardly extended to the space

of e

X=X × X.

It can be easily shown that under P∞(P1), the second-order

Markov chain {e

Xt= (Xt, Xt+1)}∞

t=1 is uniformly ergodic.

We then construct the test statistic and the stopping rule using

the second-order Markov chain. Our approach is to estimate

the MMD between the invariant measures of the second-

order Markov chains, and employ this estimate to construct

a CuSum-type test.

As a ﬁrst step, we divide the samples into non-overlapping

blocks of m. Let e

Xt= (Xt, Xt+1),e

Yt= (Yt, Yt+1)

be the second-order Markov chains. For the observation

sequence{e

Xt}∞

t=1, the pre-change stationary distribution is

eπPand the post-change stationary distribution is eπQ. For

the reference sequence {e

Yt}∞

t=1, the stationary distribution is

eπP. For the t-th block, t= 0,1,2, . . ., deﬁne the empirical

measure of {e

Xmt+i}m−1

i=1 as Ft

X=1

m−1Pm−1

i=1 δe

Xmt+i,where

δe

Xiis the Dirac measure. Similarly, we can deﬁne Ft

Y. The

MMD D(Ft

X, F t

Y)can then be computed using (7). Deﬁne

St=D(Ft

X, F t

Y)−σ, where 0< σ < D(eπP,eπQ)is a positive

constant that will be determined later. Then, our stopping time

is deﬁned as

T(c) = inf (mt +t: max

0≤i≤t

j=i

Sj> c),(8)

where c > 0is a predetermined threshold.

Our algorithm can be updated in a recursive fashion:

max0≤i≤tPt

j=iSj= max n0,max0≤i≤t−1Pt−1

j=iSj+Sto.

Furthermore, for every block with size mthe computational

complexity for MMD is O(m2). At time n, there are ⌊n

m⌋

non-overlapping blocks in total. Thus, the total computational

complexity up to time nis O(mn). In contrast, the algorithm

in [25] uses overlapping blocks and has an overall computa-

tional complexity of O(m2n).

Intuitively, when mis large, we have that D(Ft

X, F t

Y)≈

D(eπP,eπP)=0before the change and D(Ft

X, F t

Y)≈

D(eπP,eπQ)>0after the change. For 0< σ < D(eπP,eπQ), the

test statistic has a negative drift before the change, causing the

CuSum to ﬂuctuate around 0, while after the change, it has

a positive drift, leading the CuSum to exceed the threshold

quickly.

We present the upper bound on the WADD of (8) in

the following theorem. Deﬁne aP=q2−2λP+4RP

(m−1)(1−λP), aQ=

q2−2λQ+4RQ

(m−1)(1−λQ), a =aP+aQ,and d=D(eπP,eπQ)−σ.

Theorem 1. The WADD for the stopping time in (8) can be

upper bounded as follows:

WADD(T(c)) ≤2√admc

(a−d)2+(a+d)mc

(a−d)2+ 2m+a+√ad

d−am

=O(mc).(9)

Proof. Let τ′be the index of the last sample within the block

where the change occurs. Thus, samples after τ′are generated

from transition kernel Q. Moreover, we have (T(c)−τ)+≤

(T(c)−τ′)++m. Let

ξ=aP+aQ+p(D(eπP,eπQ)−σ)(aP+aQ)

D(eπP,eπQ)−σ−aP−aQ

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Data-DrivenQuickestChangeDetectionin(Hidden)MarkovModelsQiZhangZhongchangSunLuisC.HerreraShaofengZouAbstract—ThepaperinvestigatestheproblemsofquickestchangedetectioninMarkovmodelsandhiddenMarkovmodels(HMMs).Sequentialobservationsaretakenfroma(hidden)Markovmodel.Atsomeunknowntime,aneventoccursinthes...

展开>> 收起<<

1 Data-Driven Quickest Change Detection in Hidden Markov Models.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Data-Driven Quickest Change Detection in Hidden Markov Models

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: