Sequential change-point detection Computation versus statistical performance Haoyun Wang1 Yao Xie1

2025-05-03 1 0 510.06KB 32 页 10玖币

侵权投诉

Sequential change-point detection:

Computation versus statistical performance

Haoyun Wang1, Yao Xie1∗

1School of Industrial and Systems Engineering, Georgia Institute of Technology,

Atlanta, Georgia, 30332, USA

Abstract

Change-point detection studies the problem of detecting the changes in the under-

lying distribution of the data stream as soon as possible after the change happens.

Modern large-scale, high-dimensional, and complex streaming data call for computa-

tionally (memory) eﬃcient sequential change-point detection algorithms that are also

statistically powerful. This gives rise to a computation versus statistical power trade-

oﬀ, an aspect less emphasized in the past in classic literature. This tutorial takes

this new perspective and reviews several sequential change-point detection procedures,

ranging from classic sequential change-point detection algorithms to more recent non-

parametric procedures that consider computation, memory eﬃciency, and model ro-

bustness in the algorithm design. Our survey also contains classic performance analysis,

which provides useful techniques for analyzing new procedures.

Keywords: Sequential change-point detection, Anomaly detection, Statistical sig-

nal processing

1 Introduction

Sequential change-point detection has been a classic topic in statistics since the 1920s (She-

whart, 1925, 1931) with motivations in quality control — the objective is to monitor the

manufacturing process by examining its statistical properties and signals a change in qual-

ity when the distribution of certain features of the products deviates from the desired one.

Since then, change-point detection ﬁnds many applications in other ﬁelds, including mon-

itoring power networks (Chen et al., 2015), internet traﬃc (Lakhina et al., 2004), sensor

networks (Hallac et al., 2015; Raghavan and Veeravalli, 2010), social networks (Raginsky

∗yao.xie@isye.gatech.edu

arXiv:2210.05181v3 [math.ST] 1 Jun 2023

et al., 2012; Peel and Clauset, 2015), medical image processing (Malladi et al., 2013), cyber-

security (Tartakovsky et al., 2012b), video surveillance (Lee and Kriegman, 2005), COVID-19

intervention (Dehning et al., 2020), and so on. Recently, there has been much interest in

out-of-distribution detection (Ren et al., 2019; Magesh et al., 2022), which aims to detect

the shift of the underlying distribution of the test data from training data, which is related

to sequential change-point detection.

This tutorial considers sequential (also known as the online or quickest, in some context)

change point detection problems, where the goal is to monitor a sequence of data or time

series for any change and raise an alarm as quickly as possible the change has occurred

to prevent any potential loss. The above problem is diﬀerent from another major type of

change-point detection problem, which is oﬄine and aims to detect and localize (possibly

multiple) change-points from sequential data in retrospect (see Truong et al. (2020) for a

review). This tutorial focuses on sequential change-point detection.

Shewhart (1925) introduced a control chart that computes a statistic from every sam-

ple, where a low statistic value represents the process still within the desired control. Later

it is generalized to compute a statistic for several consecutive samples. Page (1954) pro-

posed the CUSUM procedure based on the log-likelihood ratio between the two exactly

known distributions, one for the control and one for the anomaly. Moustakides (1986); Lor-

den (1971) proved that CUSUM has strong optimality properties. Shiryaev-Roberts (SR)

procedure (Shiryaev, 1963; Roberts, 1966) is similar to CUSUM inspired by the Bayesian

setting, which is also optimal in several senses (Pollak, 1985; Pollak and Tartakovsky, 2009;

Polunchenko and Tartakovsky, 2010; Tartakovsky et al., 2012a). More recent works on

change point detection focus on relaxing the strong assumptions of parametric and known

distributions. The generalized likelihood ratio (GLR) procedure (Lorden, 1971; Siegmund

and Venkatraman, 1995) aims to detect the change to an unknown distribution assuming

a parametric form by searching for the most probable post-change scenario. These classic

procedures and their variants can also be found in Basseville et al. (1993); Tartakovsky et al.

(2014). Sparks (2000) proposed the ﬁrst adaptive CUSUM, which detects an unknown mean

shift by inserting an adaptive estimate of new mean to the CUSUM recursion, followed by

Lorden and Pollak (2005); Abbasi and Haq (2019); Cao et al. (2018); Xie et al. (2022). Such

procedures allow an unknown post-change distribution while also having the computational

beneﬁt of CUSUM. More recently, Romano et al. (2023); Ward et al. (2022) present a novel

computationally-eﬃcient approach that implements the GLR test statistic for the Gaussian

mean shift and the Poisson parameter shift detection, respectively. The above works belong

to parametric change-point detection, i.e., assuming the pre- and post-change distributions

belong to the parametric family and detect a certain type of change (for instance, the mean

shift and covariance change). There have also been many non-parametric and distribution-

free change-point detection algorithms developed, such as kernel-based methods (Harchaoui

et al., 2008; Li et al., 2019; Song and Chen, 2022) and graph-based method (Chen and Zhang,

2015; Chu and Chen, 2019).

Sequential change-point detection is diﬀerent from the traditional ﬁxed-sample hypothesis

test. It is fundamentally a repeated likelihood ratio test because of the unknown change

point. Moreover, the test statistics are correlated when scanning through time to detect a

potential change point. Such correlation is signiﬁcant, complicates the analysis, and must

be explicitly characterized. For instance, in the GLR procedure, we are interested in the

probability that the maximum likelihood ratio of all segments of consecutive observations

exceeds a certain threshold. In this paper, we review two diﬀerent analytical techniques to

tackle such a challenge: an earlier technique developed based on renewal theory (Siegmund,

1985), and a more recently developed method based on a change-of-measure technique for

the extreme value of Gaussian random ﬁelds (Yakir, 2013).

Despite many methodological and theoretical development for sequential change-point

detection, the aspect of (computational and memory) eﬃciency versus performance tradeoﬀ

has been less emphasized. Many change-point detection algorithms are designed, taking such

considerations implicitly. This trade-oﬀ is becoming more prominent in modern applications

due to the need to process large-scale streaming data in real-time and detect changes as

quickly as possible. For example, in social network monitoring, one Twitter community will

contain at least hundreds of users, let alone the entire network.

Thus, in this tutorial, we aim to contrast classic and new approaches through the lens of

computational eﬃciency versus statistical performance tradeoﬀ; we hope this can enlighten

developing new techniques for analyzing new algorithms. We highlight the tradeoﬀ by pre-

senting the classic and new performance analysis techniques, ranging from the most standard

parametric models to the more recent nonparametric and distribution-free models introduced

to achieve computational beneﬁts, such as robustness and ﬂexibility in handling real data.

We focus on proof techniques such as renewal theory and the change-of-measure technique,

as they are essential for analyzing the classic performance metrics such as the Average Run

Length (ARL) and Expected Detection Delay (EDD).

The rest of the paper is organized as follows. Section 2 describes the formulation of

a sequential change-point detection problem and several classic detection procedures. In

Section 3, we deﬁne the performance metric and provide the classic analysis of CUSUM in

renewal theory. Section 4, 5, 6 each gives a detailed review of recent literature on managing

unknown distributions, from parametric to distribution-free. Section 7 concludes our review.

We would like to acknowledge that there are other recent techniques for analyzing alter-

native performance metrics, such as Maillard (2019); Yu et al. (2020); Chen et al. (2022),

which we do not discuss in this paper due to space limitations.

2 Basics of change-point detection and problem setup

2.1 Problem setup

Consider a basic sequential change-point detection problem, where the aim is to detect

an unknown change-point. We are given samples x1, x2, . . . such that before the change

happens, the data distribution follows distribution f0∈ F0, and after the change happens,

the distribution shifts to f1∈ F1. At each time step tafter collecting the sample xt, we can

decide whether to raise an alarm or not. Our goal is to raise an alarm as soon as the change

has happened under the false alarm constraint.

The simplest setting is when the sets are singletons, i.e., the pre- and post-change distri-

butions are f0and f1, respectively. This happens when we understand the physical nature

of the process or there is enough historical data to estimate the pre-change distribution f0.

The post-change distribution f1can be either a target anomaly or the smallest change we

wish to detect. The anomaly may occur at some time v∈N, resulting in x1, . . . , xv−1

i.i.d.

∼f0

and xv, xv+1, . . . i.i.d.

∼f1. If the change never occurs, we write v=∞. Here we assume

the change-point vis deterministic but unknown, and the observations before and after the

change-point are all independent and identically distributed.

A change-point detection procedure is a stopping time τ, which has the following property.

For each t= 1,2, . . . , the event {τ > t}is measurable with respect to σ(x1, . . . , xt), the σ-

ﬁeld generated by data till time t. In other words, whether the procedure decides to stop

right after observing the t-th sample depends solely on the history, not the future. Common

detection procedures are based on the choice of a certain detection statistic, computing

the detection statistic using the most recent data to form Tt, and stopping the ﬁrst time

that the detection statistics Ttsignals a potential change-point (typically by comparing with

a predetermined threshold b; the choice of the threshold bbalances the false alarm and

detection delay). Thus, the detection procedure can be deﬁned as a stopping time

τ= min{t≥1 : Tt> b}.

Since, in practice, there are usually abundant pre-change samples for us to estimate f0

with high accuracy, here we only consider the case with known pre-change distribution f0.

Uncertainties in the pre-change distribution can also be addressed though, and we give such

an example in Section 5 and 6. In such cases, it will be harder to control false alarms,

and the detection procedure will be more complex computationally in general. We would

often like to treat the post-change distribution f1as unknown since it is due to an unknown

anomaly. Then instead of considering a single distribution f1, we consider the post-change

distribution belonging to a parametric distribution family f1(·, θ) with parameter θ∈Θ1.

2.2 Computation and robustness considerations versus statistical

performance

Two commonly used performance metrics for change-point detection are the average run

length (ARL) and the expected detection delay (EDD), which we describe below and intro-

duce more formally in Section 3.1. The ARL (related to the false alarm rate) is the expected

stopping time under the pre-change distribution (i.e., where there is no change point), and

the EDD is the expected stopping time after a change point has happened. Typically, one

would choose a threshold to control the ARL to satisfy ARL ≥γfor some chosen large lower

bound γ, and measure the statistical performance by the EDD for a ﬁxed γ. The well-known

lower bound (see, e.g., Lorden (1971)) is EDD = log γ(1 + o(1))/D(f1∥f0) as γ→ ∞, where

D(f1∥f0) is the Kullback-Leibler (KL) divergence between f0and f1. A larger KL divergence

means it is easier to distinguish f1from f0and thus a smaller EDD is possible.

Another way to describe the objective is to minimize

−ARL + λEDD,

with some hyper-parameter λ. For some detection procedure τwhich minimizes the above

for certain λ, it also minimizes the EDD among all detection procedures with ARL ≥γ=

ARL(τ). Most of the existing change-point detection procedures can be regarded as trying

to solve the following minimax problem (with slightly varying deﬁnitions on the ARL and

EDD):

min

τ∈T max

f1∈F1−ARLf0(τ) + λEDDf1(τ),(1)

where Tis the set of stopping times, and F1is the set of possible post-change distributions.

We include f0, f1in the subscript to show the dependence of the performance metrics on those

distributions, but since there is often a large amount of reference data to estimate the pre-

change distribution, here we only consider f1to be unknown. The statistical performance of a

detection procedure can then be represented by the value maxf1∈F1−ARLf0(τ)+λEDDf1(τ).

One can expect that as the size of F1grows, a detection procedure either becomes more com-

plex or has worse statistical performance. This leads to the trade-oﬀ between computation,

model robustness (the size of F1), and statistical performance.

Given observations x1, x2, . . . , xt, intuitively, to utilize observations and achieve the best

statistical performance fully, we would use all the past samples in the detection statistic – as

is in the case for the CUSUM and GLR procedures. However, this can become prohibitive

in practice as t, the duration we have run the detection procedure grows larger and larger

(unless the algorithm is fully recursive such as CUSUM).

A practical online change-point detection algorithm should have constant computation

complexity and memory requirement O(1) per iteration (unit time), but this is not likely to be

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Sequentialchange-pointdetection:ComputationversusstatisticalperformanceHaoyunWang1,YaoXie1∗1SchoolofIndustrialandSystemsEngineering,GeorgiaInstituteofTechnology,Atlanta,Georgia,30332,USAAbstractChange-pointdetectionstudiestheproblemofdetectingthechangesintheunder-lyingdistributionofthedatastreamasso...

展开>> 收起<<

Sequential change-point detection Computation versus statistical performance Haoyun Wang1 Yao Xie1.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sequential change-point detection Computation versus statistical performance Haoyun Wang1 Yao Xie1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: