Sequential Gradient Descent and Quasi-Newtons Method for Change-Point Analysis Xianyang Zhang1and Trisha Dawn1

2025-05-03 0 0 561.72KB 26 页 10玖币
侵权投诉
Sequential Gradient Descent and Quasi-Newton’s
Method for Change-Point Analysis
Xianyang Zhang1and Trisha Dawn1
1Texas A&M University
Abstract. One common approach to detecting change-points is minimizing a cost
function over possible numbers and locations of change-points. The framework includes
several well-established procedures, such as the penalized likelihood and minimum de-
scription length. Such an approach requires finding the cost value repeatedly over differ-
ent segments of the data set, which can be time-consuming when (i) the data sequence is
long and (ii) obtaining the cost value involves solving a non-trivial optimization problem.
This paper introduces a new sequential method (SE) that can be coupled with gradient
descent (SeGD) and quasi-Newton’s method (SeN) to find the cost value effectively. The
core idea is to update the cost value using the information from previous steps without
re-optimizing the objective function. The new method is applied to change-point detec-
tion in generalized linear models and penalized regression. Numerical studies show that
the new approach can be orders of magnitude faster than the Pruned Exact Linear Time
(PELT) method without sacrificing estimation accuracy.
Keywords. Change-point detection, Dynamic programming, Generalized linear models,
Penalized linear regression, Stochastic gradient descent.
1 Introduction
Change-point analysis is concerned with detecting and locating structure breaks in
the underlying model of a data sequence. The first work on change point analysis goes
back to the 1950s, where the goal was to locate a shift in the mean of an independent and
identically distributed Gaussian sequence for industrial quality control purposes (Page,
1954,1955). Since then, change-point analysis has generated important activity in statis-
tics and various application settings such as signal processing, climate science, economics,
financial analysis, medical science, and bioinformatics. We refer the readers to Brodsky
and Darkhovsky (1993); Cs¨org¨o et al. (1997); Tartakovsky et al. (2014) for book-length
treatments and Aue and Horv´ath (2013); Niu et al. (2016); Aminikhanghahi and Cook
(2017); Truong et al. (2020); Liu et al. (2021) for reviews on this subject.
There are two main branches of change-point detection methods: online methods that
aim to detect changes as early as they occur in an online setting and offline methods that
retrospectively detect changes when all samples have been observed. The focus of this pa-
per is on the offline setting. A typical offline change-point detection method involves three
Address correspondence to Xianyang Zhang (zhangxiany@stat.tamu.edu).
1
arXiv:2210.12235v1 [stat.ML] 21 Oct 2022
major components: the cost function, the search method, and the penalty/constraint
(Truong et al.,2020). The choice of the cost function and search method has a crucial
impact on the method’s computational complexity. As increasingly larger data sets are
being collected in modern applications, there is an urgent need to develop more efficient
algorithms to handle such big data sets. Examples include testing the structure breaks
for genetics data and detecting changes in the volatility of big financial data.
One popular way to tackle the change-point detection problem is to cast it into a
model-selection problem by solving a penalized optimization problem over possible num-
bers and locations of change-points. The framework includes several well-established
procedures, such as the penalized likelihood and minimum description length. The cor-
responding optimization can be solved exactly using dynamic programming (Auger and
Lawrence,1989;Jackson et al.,2005) whose computational cost is PT
t=1 Pt
s=1 q(s), where
Tis the number of data points and q(s) denotes the time complexity for calculating the
cost function value based on sdata points. Killick et al. (2012) introduced the pruned
exact linear time (PELT) algorithm with a pruning step in dynamic programming. PELT
reduces the computational cost without affecting the exactness of the resulting segmenta-
tion. Rigaill (2010) proposed an alternative pruned dynamic programming algorithm with
the aim of reducing the computational effort. However, in the worst case scenario, the
computational cost of dynamic programming coupled with the above pruning strategies
remains the order of O(PT
t=1 Pt
s=1 q(s)).
Unlike the pruning strategy, this paper aims to improve the computational efficiency
of dynamic programming from a different perspective. We focus on the class of problems
where the cost function involves solving a non-trivial optimization problem without a
closed-form solution. Dynamic programming requires repeatedly solving the optimization
over different data sequence segments, which can be very time-consuming for big data.
This paper makes the following contributions to address the issue.
1. A new sequential updating method (SE) that can be coupled with the gradient
descent (SeGD) and quasi-Newton’s method (SeN) is proposed to update the pa-
rameter estimate and the cost value in dynamic programming. The new strategy
avoids repeatedly optimizing the objective function based on each data segment.
It thus significantly improves the computational efficiency of the vanilla PELT, es-
pecially when the cost function involves solving a non-trivial optimization problem
without a closed-form solution. Though our algorithm is no longer exact, numerical
studies suggest that the new method achieves almost the same estimation accuracy
as PELT does.
2. SeGD is related to the stochastic gradient descent (SGD) without-replacement sam-
pling (Shamir,2016;Nagaraj et al.,2019;Rajput et al.,2020). The main difference
is that our update is along the time order of the data points, and hence no sam-
pling or additional randomness is introduced. Using some techniques from SGD
and transductive learning theory, we obtain the convergence rate of the approxi-
mate cost value derived from the algorithm to the true cost value.
3. The proposed method applies to a broad class of statistical models, such as para-
metric likelihood models, generalized linear models, nonparametric models, and
penalized regression.
Finally, we mention two other routes to reduce the computational complexity in
change-point analysis. The first one is to relax the l0penalty on the number of parameters
2
Dynamic programming PELT SE
Time complexity PT
t=1 Pt
s=1 q(s)PT
t=1 PsRtq(s)q0PT
t=1 |Rt|
Table 1: Comparison of the computational complexity. Here q(s) denotes the time com-
plexity for calculating the cost value based on sdata points and q0is the time complexity
for performing the one-step update described in Section 3.
to an l1penalty (such as the total variation penalty) on the parameters to encourage a
piece-wise constant solution. The resulting convex optimization problem can be solved in
nearly linear time (Harchaoui and L´evy-Leduc,2010). In contrast, our method directly
tackles the problem with the l0penalty. The second approach includes different approxi-
mation schemes, including window-based methods, binary segmentation and its variants
(Vostrikova,1981;Fryzlewicz,2014), and bottom-up segmentation (Keogh et al.,2001).
These methods are usually quite efficient and can be combined with various test statistics
though they only provide approximate solutions. Our method can be regarded as a new
approximation scheme for the l0penalization problem.
The rest of the paper is organized as follows. In Section 2, we briefly review the
dynamic programming and the pruning scheme in change-point analysis. We describe
the details of the Se algorithms in Section 3, including the motivation, its application
in generalized linear models, and an extension to handle the case where the cost value
involves solving a penalized optimization. We study the convergence property of the
algorithm in Section 4. Sections 5presents numerical results for synthesized and real
data. Section 6concludes.
2 Dynamic Programming and Pruning
2.1 Dynamic programming
Change-point analysis concerns the partition of a data set ordered by time (space or
other variables) into piece-wise homogeneous segments such that each piece shares the
same behavior. Specifically, we denote the data by z= (z1, . . . , zT). For 1 stT,
let zs:t= (zs, . . . , zt). If we assume that there are kchange-points in the data, then we
can split the data into k+1 distinct segments. We let the location of the jth change-point
be τjfor j= 1,2, . . . , k, and set τ0= 0 and τk+1 =T. The (j+ 1)th segment contains the
data zτj+1, . . . , zτj+1 for j= 0,1, . . . , k. We let τ= (τ1, . . . , τk) be the set of change-point
locations. The problem we aim to address is to infer both the number of change points
and their locations.
Throughout the discussions, we let C(zs+1:t) for s < t denote the cost for a segment
consisting of the data points zs+1, . . . , zt. Of particular interest is the cost function defined
as
C(zs+1:t) = min
θΘ
t
X
i=s+1
l(zi, θ) (1)
where l(·, θ) is the individual cost parameterized by θthat belongs to a compact parameter
space Θ Rd. Examples include (i) l(·, θ) is the negative log-likelihood of zi; (2) l(zi, θ) =
L(f(xi, θ), yi) with zi= (xi, yi), where Lis a loss function and f(·, θ) is an unknown
3
regression function parameterized by θ. See more details and discussions in Section 3.2.
In this paper, we consider segmenting data by solving a penalized optimization prob-
lem. For 0 kT1, define
Ck,T = min
τ
k
X
j=0
C(zτj+1:τj+1 ).
We estimate the number of change-points by minimizing a linear combination of the cost
value and a penalty function f, i.e.,
min
k{Ck,T +f(k, T )}.
If the penalty function is linear in kwith f(k, T ) = βT(k+ 1) for some βT>0,then we
can write the objective function as
min
k{Ck,T +f(k, T )}= min
k,τ
k
X
j=0 C(zτj+1:τj+1 ) + βT.
One way to solve the penalized optimization problem is through the dynamic program-
ming approach (Killick et al.,2012;Jackson et al.,2005). Consider segmenting the data
z1:t. Denote F(t) to be the minimum value of the penalized cost mink{Ck,T +f(k, T )}
for segmenting such data. We derive a recursion for F(t) by conditioning on the last
change-point location,
F(t) = min
k,τ
k
X
j=0 C(zτj+1:τj+1 ) + βT
= min
k,τ"k1
X
j=0 C(zτj+1:τj+1 ) + βT+C(zτk+1:t) + βT#
= min
τ
min
˜
k,τ
˜
k
X
j=0 C(zτj+1:τj+1 ) + βT+C(zτ+1:t) + βT
= min
τ{F(τ) + C(zτ+1:t) + βT},(2)
where τk+1 =tin the first equation and τ˜
k+1 =τin the third equation. The segmentations
can be recovered by taking the argument τwhich minimizes (2), i.e.,
τ= argmin
0τ <t {F(τ) + C(zτ+1:t) + βT},(3)
which gives the optimal location of the last change-point in the segmentation of z1:t. The
procedure is repeated until all the change-point locations are identified.
4
Figure 1: Illustration of dynamic programming in change-point detection.
2.2 Pruning
A popular way to increase the efficiency of dynamic programming is by pruning the
candidate set for finding the last change-point in each iteration. For the cost function in
(1), we have for any τ < t < T ,C(zτ+1:t) + C(zt+1:T)C(zτ+1:T). Killick et al. (2012)
showed that for some t > τ if
F(τ) + C(zτ+1:t)> F (t),
then at any future point t0> t,τcan never be the optimal location of the most recent
change-point prior to t0. Define a sequence of sets {Rt}T
t=1 recursively as
Rt={τRt1∪ {t1}:F(τ) + C(zτ+1:t1)F(t1)}.
Then F(t) can be computed as
F(t) = min
τRt{F(τ) + C(zτ+1:t) + β}
and the minimizer τin (3) belongs to Rt. This pruning technique forms the basis of
the Pruned Exact Linear Time (PELT) algorithm. Under suitable conditions that allow
the expected number of change-points to increase linearly with T,Killick et al. (2012)
showed that the expected computational cost for PELT is bounded by LT for some
constant L < .In the worst case where no pruning occurs, the computational cost of
PELT is the same as the vanilla dynamic programming.
3 Methodology
3.1 Sequential algorithms
For large-scale data, the computational cost of PELT can still be prohibitive due
to the burden of repeatedly solving the optimization problem (1). For many statistical
models, the time complexity for obtaining C(zs+1:t) is linear in the number of observations
ts. Therefore, in the worst-case scenario, the overall time complexity can be as high as
O(T3). To alleviate the problem, we propose a fast algorithm by sequentially updating
the cost function using a gradient-type method to reduce the computational cost while
maintaining similar estimation accuracy. Instead of repeatedly solving the optimization
problem to obtain the cost value for each data segment, we propose to update the cost
5
摘要:

SequentialGradientDescentandQuasi-Newton'sMethodforChange-PointAnalysisXianyangZhang*1andTrishaDawn11TexasA&MUniversityAbstract.Onecommonapproachtodetectingchange-pointsisminimizingacostfunctionoverpossiblenumbersandlocationsofchange-points.Theframeworkincludesseveralwell-establishedprocedures,sucha...

展开>> 收起<<
Sequential Gradient Descent and Quasi-Newtons Method for Change-Point Analysis Xianyang Zhang1and Trisha Dawn1.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:561.72KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注