Sequential change-point detection Computation versus statistical performance Haoyun Wang1 Yao Xie1

2025-05-03 0 0 510.06KB 32 页 10玖币
侵权投诉
Sequential change-point detection:
Computation versus statistical performance
Haoyun Wang1, Yao Xie1
1School of Industrial and Systems Engineering, Georgia Institute of Technology,
Atlanta, Georgia, 30332, USA
Abstract
Change-point detection studies the problem of detecting the changes in the under-
lying distribution of the data stream as soon as possible after the change happens.
Modern large-scale, high-dimensional, and complex streaming data call for computa-
tionally (memory) efficient sequential change-point detection algorithms that are also
statistically powerful. This gives rise to a computation versus statistical power trade-
off, an aspect less emphasized in the past in classic literature. This tutorial takes
this new perspective and reviews several sequential change-point detection procedures,
ranging from classic sequential change-point detection algorithms to more recent non-
parametric procedures that consider computation, memory efficiency, and model ro-
bustness in the algorithm design. Our survey also contains classic performance analysis,
which provides useful techniques for analyzing new procedures.
Keywords: Sequential change-point detection, Anomaly detection, Statistical sig-
nal processing
1 Introduction
Sequential change-point detection has been a classic topic in statistics since the 1920s (She-
whart, 1925, 1931) with motivations in quality control — the objective is to monitor the
manufacturing process by examining its statistical properties and signals a change in qual-
ity when the distribution of certain features of the products deviates from the desired one.
Since then, change-point detection finds many applications in other fields, including mon-
itoring power networks (Chen et al., 2015), internet traffic (Lakhina et al., 2004), sensor
networks (Hallac et al., 2015; Raghavan and Veeravalli, 2010), social networks (Raginsky
yao.xie@isye.gatech.edu
1
arXiv:2210.05181v3 [math.ST] 1 Jun 2023
et al., 2012; Peel and Clauset, 2015), medical image processing (Malladi et al., 2013), cyber-
security (Tartakovsky et al., 2012b), video surveillance (Lee and Kriegman, 2005), COVID-19
intervention (Dehning et al., 2020), and so on. Recently, there has been much interest in
out-of-distribution detection (Ren et al., 2019; Magesh et al., 2022), which aims to detect
the shift of the underlying distribution of the test data from training data, which is related
to sequential change-point detection.
This tutorial considers sequential (also known as the online or quickest, in some context)
change point detection problems, where the goal is to monitor a sequence of data or time
series for any change and raise an alarm as quickly as possible the change has occurred
to prevent any potential loss. The above problem is different from another major type of
change-point detection problem, which is offline and aims to detect and localize (possibly
multiple) change-points from sequential data in retrospect (see Truong et al. (2020) for a
review). This tutorial focuses on sequential change-point detection.
Shewhart (1925) introduced a control chart that computes a statistic from every sam-
ple, where a low statistic value represents the process still within the desired control. Later
it is generalized to compute a statistic for several consecutive samples. Page (1954) pro-
posed the CUSUM procedure based on the log-likelihood ratio between the two exactly
known distributions, one for the control and one for the anomaly. Moustakides (1986); Lor-
den (1971) proved that CUSUM has strong optimality properties. Shiryaev-Roberts (SR)
procedure (Shiryaev, 1963; Roberts, 1966) is similar to CUSUM inspired by the Bayesian
setting, which is also optimal in several senses (Pollak, 1985; Pollak and Tartakovsky, 2009;
Polunchenko and Tartakovsky, 2010; Tartakovsky et al., 2012a). More recent works on
change point detection focus on relaxing the strong assumptions of parametric and known
distributions. The generalized likelihood ratio (GLR) procedure (Lorden, 1971; Siegmund
and Venkatraman, 1995) aims to detect the change to an unknown distribution assuming
a parametric form by searching for the most probable post-change scenario. These classic
procedures and their variants can also be found in Basseville et al. (1993); Tartakovsky et al.
(2014). Sparks (2000) proposed the first adaptive CUSUM, which detects an unknown mean
shift by inserting an adaptive estimate of new mean to the CUSUM recursion, followed by
Lorden and Pollak (2005); Abbasi and Haq (2019); Cao et al. (2018); Xie et al. (2022). Such
procedures allow an unknown post-change distribution while also having the computational
benefit of CUSUM. More recently, Romano et al. (2023); Ward et al. (2022) present a novel
computationally-efficient approach that implements the GLR test statistic for the Gaussian
mean shift and the Poisson parameter shift detection, respectively. The above works belong
to parametric change-point detection, i.e., assuming the pre- and post-change distributions
belong to the parametric family and detect a certain type of change (for instance, the mean
shift and covariance change). There have also been many non-parametric and distribution-
2
free change-point detection algorithms developed, such as kernel-based methods (Harchaoui
et al., 2008; Li et al., 2019; Song and Chen, 2022) and graph-based method (Chen and Zhang,
2015; Chu and Chen, 2019).
Sequential change-point detection is different from the traditional fixed-sample hypothesis
test. It is fundamentally a repeated likelihood ratio test because of the unknown change
point. Moreover, the test statistics are correlated when scanning through time to detect a
potential change point. Such correlation is significant, complicates the analysis, and must
be explicitly characterized. For instance, in the GLR procedure, we are interested in the
probability that the maximum likelihood ratio of all segments of consecutive observations
exceeds a certain threshold. In this paper, we review two different analytical techniques to
tackle such a challenge: an earlier technique developed based on renewal theory (Siegmund,
1985), and a more recently developed method based on a change-of-measure technique for
the extreme value of Gaussian random fields (Yakir, 2013).
Despite many methodological and theoretical development for sequential change-point
detection, the aspect of (computational and memory) efficiency versus performance tradeoff
has been less emphasized. Many change-point detection algorithms are designed, taking such
considerations implicitly. This trade-off is becoming more prominent in modern applications
due to the need to process large-scale streaming data in real-time and detect changes as
quickly as possible. For example, in social network monitoring, one Twitter community will
contain at least hundreds of users, let alone the entire network.
Thus, in this tutorial, we aim to contrast classic and new approaches through the lens of
computational efficiency versus statistical performance tradeoff; we hope this can enlighten
developing new techniques for analyzing new algorithms. We highlight the tradeoff by pre-
senting the classic and new performance analysis techniques, ranging from the most standard
parametric models to the more recent nonparametric and distribution-free models introduced
to achieve computational benefits, such as robustness and flexibility in handling real data.
We focus on proof techniques such as renewal theory and the change-of-measure technique,
as they are essential for analyzing the classic performance metrics such as the Average Run
Length (ARL) and Expected Detection Delay (EDD).
The rest of the paper is organized as follows. Section 2 describes the formulation of
a sequential change-point detection problem and several classic detection procedures. In
Section 3, we define the performance metric and provide the classic analysis of CUSUM in
renewal theory. Section 4, 5, 6 each gives a detailed review of recent literature on managing
unknown distributions, from parametric to distribution-free. Section 7 concludes our review.
We would like to acknowledge that there are other recent techniques for analyzing alter-
native performance metrics, such as Maillard (2019); Yu et al. (2020); Chen et al. (2022),
which we do not discuss in this paper due to space limitations.
3
2 Basics of change-point detection and problem setup
2.1 Problem setup
Consider a basic sequential change-point detection problem, where the aim is to detect
an unknown change-point. We are given samples x1, x2, . . . such that before the change
happens, the data distribution follows distribution f0∈ F0, and after the change happens,
the distribution shifts to f1∈ F1. At each time step tafter collecting the sample xt, we can
decide whether to raise an alarm or not. Our goal is to raise an alarm as soon as the change
has happened under the false alarm constraint.
The simplest setting is when the sets are singletons, i.e., the pre- and post-change distri-
butions are f0and f1, respectively. This happens when we understand the physical nature
of the process or there is enough historical data to estimate the pre-change distribution f0.
The post-change distribution f1can be either a target anomaly or the smallest change we
wish to detect. The anomaly may occur at some time vN, resulting in x1, . . . , xv1
i.i.d.
f0
and xv, xv+1, . . . i.i.d.
f1. If the change never occurs, we write v=. Here we assume
the change-point vis deterministic but unknown, and the observations before and after the
change-point are all independent and identically distributed.
A change-point detection procedure is a stopping time τ, which has the following property.
For each t= 1,2, . . . , the event {τ > t}is measurable with respect to σ(x1, . . . , xt), the σ-
field generated by data till time t. In other words, whether the procedure decides to stop
right after observing the t-th sample depends solely on the history, not the future. Common
detection procedures are based on the choice of a certain detection statistic, computing
the detection statistic using the most recent data to form Tt, and stopping the first time
that the detection statistics Ttsignals a potential change-point (typically by comparing with
a predetermined threshold b; the choice of the threshold bbalances the false alarm and
detection delay). Thus, the detection procedure can be defined as a stopping time
τ= min{t1 : Tt> b}.
Since, in practice, there are usually abundant pre-change samples for us to estimate f0
with high accuracy, here we only consider the case with known pre-change distribution f0.
Uncertainties in the pre-change distribution can also be addressed though, and we give such
an example in Section 5 and 6. In such cases, it will be harder to control false alarms,
and the detection procedure will be more complex computationally in general. We would
often like to treat the post-change distribution f1as unknown since it is due to an unknown
anomaly. Then instead of considering a single distribution f1, we consider the post-change
distribution belonging to a parametric distribution family f1(·, θ) with parameter θΘ1.
4
2.2 Computation and robustness considerations versus statistical
performance
Two commonly used performance metrics for change-point detection are the average run
length (ARL) and the expected detection delay (EDD), which we describe below and intro-
duce more formally in Section 3.1. The ARL (related to the false alarm rate) is the expected
stopping time under the pre-change distribution (i.e., where there is no change point), and
the EDD is the expected stopping time after a change point has happened. Typically, one
would choose a threshold to control the ARL to satisfy ARL γfor some chosen large lower
bound γ, and measure the statistical performance by the EDD for a fixed γ. The well-known
lower bound (see, e.g., Lorden (1971)) is EDD = log γ(1 + o(1))/D(f1f0) as γ→ ∞, where
D(f1f0) is the Kullback-Leibler (KL) divergence between f0and f1. A larger KL divergence
means it is easier to distinguish f1from f0and thus a smaller EDD is possible.
Another way to describe the objective is to minimize
ARL + λEDD,
with some hyper-parameter λ. For some detection procedure τwhich minimizes the above
for certain λ, it also minimizes the EDD among all detection procedures with ARL γ=
ARL(τ). Most of the existing change-point detection procedures can be regarded as trying
to solve the following minimax problem (with slightly varying definitions on the ARL and
EDD):
min
τ∈T max
f1∈F1ARLf0(τ) + λEDDf1(τ),(1)
where Tis the set of stopping times, and F1is the set of possible post-change distributions.
We include f0, f1in the subscript to show the dependence of the performance metrics on those
distributions, but since there is often a large amount of reference data to estimate the pre-
change distribution, here we only consider f1to be unknown. The statistical performance of a
detection procedure can then be represented by the value maxf1∈F1ARLf0(τ)+λEDDf1(τ).
One can expect that as the size of F1grows, a detection procedure either becomes more com-
plex or has worse statistical performance. This leads to the trade-off between computation,
model robustness (the size of F1), and statistical performance.
Given observations x1, x2, . . . , xt, intuitively, to utilize observations and achieve the best
statistical performance fully, we would use all the past samples in the detection statistic – as
is in the case for the CUSUM and GLR procedures. However, this can become prohibitive
in practice as t, the duration we have run the detection procedure grows larger and larger
(unless the algorithm is fully recursive such as CUSUM).
A practical online change-point detection algorithm should have constant computation
complexity and memory requirement O(1) per iteration (unit time), but this is not likely to be
5
摘要:

Sequentialchange-pointdetection:ComputationversusstatisticalperformanceHaoyunWang1,YaoXie1∗1SchoolofIndustrialandSystemsEngineering,GeorgiaInstituteofTechnology,Atlanta,Georgia,30332,USAAbstractChange-pointdetectionstudiestheproblemofdetectingthechangesintheunder-lyingdistributionofthedatastreamasso...

展开>> 收起<<
Sequential change-point detection Computation versus statistical performance Haoyun Wang1 Yao Xie1.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:510.06KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注