
A Momentum Accelerated Adaptive Cubic Regularization Method for
Nonconvex Optimization
Yihang Gao∗Michael K. Ng∗
Abstract
The cubic regularization method (CR) and its adaptive ver-
sion (ARC) are popular Newton-type methods in solving
unconstrained non-convex optimization problems, due to its
global convergence to local minima under mild conditions.
The main aim of this paper is to develop a momentum accel-
erated adaptive cubic regularization method (ARCm) to im-
prove the convergent performance. With the proper choice
of momentum step size, we show the global convergence of
ARCm and the local convergence can also be guaranteed
under the K L property. Such global and local convergence
can also be established when inexact solvers with low com-
putational costs are employed in the iteration procedure.
Numerical results for non-convex logistic regression and ro-
bust linear regression models are reported to demonstrate
that the proposed ARCm significantly outperforms state-of-
the-art cubic regularization methods (e.g., CR, momentum-
based CR, ARC) and the trust region method. In particular,
the number of iterations required by ARCm is less than 10%
to 50% required by the most competitive method (ARC) in
the experiments.
Keywords. Adaptive cubic regularization, mo-
mentum, K L property, inexact solvers, global and local
convergence
1 Introduction
Most machine learning tasks involve solving challenging
(non-convex) optimization problems
(1.1) min
x∈Rdf(x).
Second-order critical points are usually the preferred so-
lutions for (1.1) since many machine learning problems
have been shown to have only strict saddle points and
global minima without spurious local minima [8, 16].
Second-order methods that exploit Hessian information
have been proposed to escape saddle points [15, 5] and
enjoy faster local convergence [14, 19, 21]. Cubic regu-
larization method (CR), first proposed by Griewank [9],
and later independently by Nesterov and Polyak [15],
and Weiser et al. [18], one of the second-order methods,
∗Department of Mathematics, The University of Hong Kong,
Pokfulam, Hong Kong SAR.
is widely applied in solving inverse problems [4, 12], re-
gression models [19, 20] as well as minimax problems
[11].
The vanilla CR method is formulated as
sk∈arg min
s
∇f(xk)>s+1
2s>∇2f(xk)s+Mk
6ksk3
2,
xk+1 =xk+sk,
where Mk:= Mis a fixed pre-defined constant for all
iterations. In recent decades, various techniques are
adopted to improve the performance of CR. Wang et
al. [17] accelerated the vanilla CR by a momentum
term (CRm), where the method works well mainly by
enlarging the step size of skwhen the cubic penalty
parameter Mis over-estimated.
Cartis et al. [4] proposed adaptive cubic regulariza-
tion method (ARC), which adaptively assigns Mkbased
on the quality of the step sk, as an analogy to the trust
region method (TR) [5]. Stochastic (subsampled) ARC
(e.g., Kohler et al. [13] and Zhou et al. [20] etc.) were
developed to reduce the overall computation, where the
gradient ∇f(xk) and the Hessian ∇2f(xk) are inexactly
evaluated. To the best of our knowledge, ARC behaves
better in most of the applications compared with CR
methods that fix M(e.g., vanilla CR and CRm). A
natural question is how to further accelerate ARC with
little extra cost in each iteration.
In this paper, we investigate a momentum-
accelerated adaptive cubic regularization method
(ARCm). Here are our contributions and the outline
of the paper.
•We develop a momentum-accelerated adaptive cu-
bic regularization method (ARCm). We adopt a
general scheme for momentum, which is more suit-
able for ARC than CR (see in Theorem 2.1). The
extra computation is cheap and ignorant as both
the momentum and its step size are solely based
on {sk}and are free of gradient or Hessian evalua-
tions.
•With the proper setting of step size for momentum
(see in the Algorithm 1), the global convergence of
ARCm to second-order critical points is satisfied
arXiv:2210.05987v1 [math.OC] 12 Oct 2022