Sequential Gradient Descent and Quasi-Newtons Method for Change-Point Analysis Xianyang Zhang1and Trisha Dawn1

2025-05-03 0 0 561.72KB 26 页 10玖币

侵权投诉

Sequential Gradient Descent and Quasi-Newton’s

Method for Change-Point Analysis

Xianyang Zhang∗1and Trisha Dawn1

1Texas A&M University

Abstract. One common approach to detecting change-points is minimizing a cost

function over possible numbers and locations of change-points. The framework includes

several well-established procedures, such as the penalized likelihood and minimum de-

scription length. Such an approach requires ﬁnding the cost value repeatedly over diﬀer-

ent segments of the data set, which can be time-consuming when (i) the data sequence is

long and (ii) obtaining the cost value involves solving a non-trivial optimization problem.

This paper introduces a new sequential method (SE) that can be coupled with gradient

descent (SeGD) and quasi-Newton’s method (SeN) to ﬁnd the cost value eﬀectively. The

core idea is to update the cost value using the information from previous steps without

re-optimizing the objective function. The new method is applied to change-point detec-

tion in generalized linear models and penalized regression. Numerical studies show that

the new approach can be orders of magnitude faster than the Pruned Exact Linear Time

(PELT) method without sacriﬁcing estimation accuracy.

Keywords. Change-point detection, Dynamic programming, Generalized linear models,

Penalized linear regression, Stochastic gradient descent.

1 Introduction

Change-point analysis is concerned with detecting and locating structure breaks in

the underlying model of a data sequence. The ﬁrst work on change point analysis goes

back to the 1950s, where the goal was to locate a shift in the mean of an independent and

identically distributed Gaussian sequence for industrial quality control purposes (Page,

1954,1955). Since then, change-point analysis has generated important activity in statis-

tics and various application settings such as signal processing, climate science, economics,

ﬁnancial analysis, medical science, and bioinformatics. We refer the readers to Brodsky

and Darkhovsky (1993); Cs¨org¨o et al. (1997); Tartakovsky et al. (2014) for book-length

treatments and Aue and Horv´ath (2013); Niu et al. (2016); Aminikhanghahi and Cook

(2017); Truong et al. (2020); Liu et al. (2021) for reviews on this subject.

There are two main branches of change-point detection methods: online methods that

aim to detect changes as early as they occur in an online setting and oﬄine methods that

retrospectively detect changes when all samples have been observed. The focus of this pa-

per is on the oﬄine setting. A typical oﬄine change-point detection method involves three

∗Address correspondence to Xianyang Zhang (zhangxiany@stat.tamu.edu).

arXiv:2210.12235v1 [stat.ML] 21 Oct 2022

major components: the cost function, the search method, and the penalty/constraint

(Truong et al.,2020). The choice of the cost function and search method has a crucial

impact on the method’s computational complexity. As increasingly larger data sets are

being collected in modern applications, there is an urgent need to develop more eﬃcient

algorithms to handle such big data sets. Examples include testing the structure breaks

for genetics data and detecting changes in the volatility of big ﬁnancial data.

One popular way to tackle the change-point detection problem is to cast it into a

model-selection problem by solving a penalized optimization problem over possible num-

bers and locations of change-points. The framework includes several well-established

procedures, such as the penalized likelihood and minimum description length. The cor-

responding optimization can be solved exactly using dynamic programming (Auger and

Lawrence,1989;Jackson et al.,2005) whose computational cost is PT

t=1 Pt

s=1 q(s), where

Tis the number of data points and q(s) denotes the time complexity for calculating the

cost function value based on sdata points. Killick et al. (2012) introduced the pruned

exact linear time (PELT) algorithm with a pruning step in dynamic programming. PELT

reduces the computational cost without aﬀecting the exactness of the resulting segmenta-

tion. Rigaill (2010) proposed an alternative pruned dynamic programming algorithm with

the aim of reducing the computational eﬀort. However, in the worst case scenario, the

computational cost of dynamic programming coupled with the above pruning strategies

remains the order of O(PT

t=1 Pt

s=1 q(s)).

Unlike the pruning strategy, this paper aims to improve the computational eﬃciency

of dynamic programming from a diﬀerent perspective. We focus on the class of problems

where the cost function involves solving a non-trivial optimization problem without a

closed-form solution. Dynamic programming requires repeatedly solving the optimization

over diﬀerent data sequence segments, which can be very time-consuming for big data.

This paper makes the following contributions to address the issue.

1. A new sequential updating method (SE) that can be coupled with the gradient

descent (SeGD) and quasi-Newton’s method (SeN) is proposed to update the pa-

rameter estimate and the cost value in dynamic programming. The new strategy

avoids repeatedly optimizing the objective function based on each data segment.

It thus signiﬁcantly improves the computational eﬃciency of the vanilla PELT, es-

pecially when the cost function involves solving a non-trivial optimization problem

without a closed-form solution. Though our algorithm is no longer exact, numerical

studies suggest that the new method achieves almost the same estimation accuracy

as PELT does.

2. SeGD is related to the stochastic gradient descent (SGD) without-replacement sam-

pling (Shamir,2016;Nagaraj et al.,2019;Rajput et al.,2020). The main diﬀerence

is that our update is along the time order of the data points, and hence no sam-

pling or additional randomness is introduced. Using some techniques from SGD

and transductive learning theory, we obtain the convergence rate of the approxi-

mate cost value derived from the algorithm to the true cost value.

3. The proposed method applies to a broad class of statistical models, such as para-

metric likelihood models, generalized linear models, nonparametric models, and

penalized regression.

Finally, we mention two other routes to reduce the computational complexity in

change-point analysis. The ﬁrst one is to relax the l0penalty on the number of parameters

Dynamic programming PELT SE

Time complexity PT

t=1 Pt

s=1 q(s)PT

t=1 Ps∈Rtq(s)q0PT

t=1 |Rt|

Table 1: Comparison of the computational complexity. Here q(s) denotes the time com-

plexity for calculating the cost value based on sdata points and q0is the time complexity

for performing the one-step update described in Section 3.

to an l1penalty (such as the total variation penalty) on the parameters to encourage a

piece-wise constant solution. The resulting convex optimization problem can be solved in

nearly linear time (Harchaoui and L´evy-Leduc,2010). In contrast, our method directly

tackles the problem with the l0penalty. The second approach includes diﬀerent approxi-

mation schemes, including window-based methods, binary segmentation and its variants

(Vostrikova,1981;Fryzlewicz,2014), and bottom-up segmentation (Keogh et al.,2001).

These methods are usually quite eﬃcient and can be combined with various test statistics

though they only provide approximate solutions. Our method can be regarded as a new

approximation scheme for the l0penalization problem.

The rest of the paper is organized as follows. In Section 2, we brieﬂy review the

dynamic programming and the pruning scheme in change-point analysis. We describe

the details of the Se algorithms in Section 3, including the motivation, its application

in generalized linear models, and an extension to handle the case where the cost value

involves solving a penalized optimization. We study the convergence property of the

algorithm in Section 4. Sections 5presents numerical results for synthesized and real

data. Section 6concludes.

2 Dynamic Programming and Pruning

2.1 Dynamic programming

Change-point analysis concerns the partition of a data set ordered by time (space or

other variables) into piece-wise homogeneous segments such that each piece shares the

same behavior. Speciﬁcally, we denote the data by z= (z1, . . . , zT). For 1 ≤s≤t≤T,

let zs:t= (zs, . . . , zt). If we assume that there are kchange-points in the data, then we

can split the data into k+1 distinct segments. We let the location of the jth change-point

be τjfor j= 1,2, . . . , k, and set τ0= 0 and τk+1 =T. The (j+ 1)th segment contains the

data zτj+1, . . . , zτj+1 for j= 0,1, . . . , k. We let τ= (τ1, . . . , τk) be the set of change-point

locations. The problem we aim to address is to infer both the number of change points

and their locations.

Throughout the discussions, we let C(zs+1:t) for s < t denote the cost for a segment

consisting of the data points zs+1, . . . , zt. Of particular interest is the cost function deﬁned

C(zs+1:t) = min

θ∈Θ

i=s+1

l(zi, θ) (1)

where l(·, θ) is the individual cost parameterized by θthat belongs to a compact parameter

space Θ ⊂Rd. Examples include (i) l(·, θ) is the negative log-likelihood of zi; (2) l(zi, θ) =

L(f(xi, θ), yi) with zi= (xi, yi), where Lis a loss function and f(·, θ) is an unknown

regression function parameterized by θ. See more details and discussions in Section 3.2.

In this paper, we consider segmenting data by solving a penalized optimization prob-

lem. For 0 ≤k≤T−1, deﬁne

Ck,T = min

j=0

C(zτj+1:τj+1 ).

We estimate the number of change-points by minimizing a linear combination of the cost

value and a penalty function f, i.e.,

min

k{Ck,T +f(k, T )}.

If the penalty function is linear in kwith f(k, T ) = βT(k+ 1) for some βT>0,then we

can write the objective function as

min

k{Ck,T +f(k, T )}= min

k,τ

j=0 C(zτj+1:τj+1 ) + βT.

One way to solve the penalized optimization problem is through the dynamic program-

ming approach (Killick et al.,2012;Jackson et al.,2005). Consider segmenting the data

z1:t. Denote F(t) to be the minimum value of the penalized cost mink{Ck,T +f(k, T )}

for segmenting such data. We derive a recursion for F(t) by conditioning on the last

change-point location,

F(t) = min

k,τ

j=0 C(zτj+1:τj+1 ) + βT

= min

k,τ"k−1

j=0 C(zτj+1:τj+1 ) + βT+C(zτk+1:t) + βT#

= min

τ

min

k,τ

j=0 C(zτj+1:τj+1 ) + βT+C(zτ+1:t) + βT



= min

τ{F(τ) + C(zτ+1:t) + βT},(2)

where τk+1 =tin the ﬁrst equation and τ˜

k+1 =τin the third equation. The segmentations

can be recovered by taking the argument τwhich minimizes (2), i.e.,

τ∗= argmin

0≤τ <t {F(τ) + C(zτ+1:t) + βT},(3)

which gives the optimal location of the last change-point in the segmentation of z1:t. The

procedure is repeated until all the change-point locations are identiﬁed.

Figure 1: Illustration of dynamic programming in change-point detection.

2.2 Pruning

A popular way to increase the eﬃciency of dynamic programming is by pruning the

candidate set for ﬁnding the last change-point in each iteration. For the cost function in

(1), we have for any τ < t < T ,C(zτ+1:t) + C(zt+1:T)≤C(zτ+1:T). Killick et al. (2012)

showed that for some t > τ if

F(τ) + C(zτ+1:t)> F (t),

then at any future point t0> t,τcan never be the optimal location of the most recent

change-point prior to t0. Deﬁne a sequence of sets {Rt}T

t=1 recursively as

Rt={τ∈Rt−1∪ {t−1}:F(τ) + C(zτ+1:t−1)≤F(t−1)}.

Then F(t) can be computed as

F(t) = min

τ∈Rt{F(τ) + C(zτ+1:t) + β}

and the minimizer τ∗in (3) belongs to Rt. This pruning technique forms the basis of

the Pruned Exact Linear Time (PELT) algorithm. Under suitable conditions that allow

the expected number of change-points to increase linearly with T,Killick et al. (2012)

showed that the expected computational cost for PELT is bounded by LT for some

constant L < ∞.In the worst case where no pruning occurs, the computational cost of

PELT is the same as the vanilla dynamic programming.

3 Methodology

3.1 Sequential algorithms

For large-scale data, the computational cost of PELT can still be prohibitive due

to the burden of repeatedly solving the optimization problem (1). For many statistical

models, the time complexity for obtaining C(zs+1:t) is linear in the number of observations

t−s. Therefore, in the worst-case scenario, the overall time complexity can be as high as

O(T3). To alleviate the problem, we propose a fast algorithm by sequentially updating

the cost function using a gradient-type method to reduce the computational cost while

maintaining similar estimation accuracy. Instead of repeatedly solving the optimization

problem to obtain the cost value for each data segment, we propose to update the cost

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SequentialGradientDescentandQuasi-Newton'sMethodforChange-PointAnalysisXianyangZhang*1andTrishaDawn11TexasA&MUniversityAbstract.Onecommonapproachtodetectingchange-pointsisminimizingacostfunctionoverpossiblenumbersandlocationsofchange-points.Theframeworkincludesseveralwell-establishedprocedures,sucha...

展开>> 收起<<

Sequential Gradient Descent and Quasi-Newtons Method for Change-Point Analysis Xianyang Zhang1and Trisha Dawn1.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sequential Gradient Descent and Quasi-Newtons Method for Change-Point Analysis Xianyang Zhang1and Trisha Dawn1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: