sampling methods [3–8], ∇f(x) is replaced by (an approximation of) the Goldstein ε-
subdifferential ∂εf(x) [9]. For several algorithms from these classes, a linear speed of
convergence (at best) was proven [10–12]. Since for smooth optimization, Newton’s
method yields (at least) superlinear convergence by using second-order derivatives,
the question arises whether such a fast method can be constructed for the nonsmooth
case as well. It turns out that this task is quite challenging: In [13] it was called a
“wondrous grail” and it can be related Problem 14 in [14].
To the best of our knowledge, there are currently two methods for which superlin-
ear convergence can be proven for certain classes of nonsmooth optimization problems.
The first method is the bundle-based VU-algorithm from [15]. The idea of this method
is to identify a subspace Ualong which fbehaves smoothly and to then perform a
Newton-like step tangent to U. One of the challenges of this approach is to identify the
subspace Uautomatically. It appears that the only mechanism that is guaranteed to
achieve this is described in [16], and requires the objective to be piecewise twice contin-
uously differentiable with convex selection functions (with some additional technical
assumptions). The second method is the recent SuperPolyak [17]. It is based on the
subproblem of the bundle method by Polyak [18], but uses iteratively computed bun-
dle information and treats the cutting-plane inequalities as equalities. The objective
functions it is designed for are functions that possess a sharp minimum (i.e., the objec-
tive value grows linearly away from the minimum) with known optimal value. While
these are the only methods (that we are aware of) for which superlinear convergence
can be proven, there are several other methods that attempt to achieve this rate by
using some form of second-order information: In [19–21], methods were proposed that
arise by infusing second-order information into cutting-plane models. Alternatively,
there are methods that directly generalize quasi-Newton to the nonsmooth case [22–
28]. However, there are certain drawbacks in all of the approaches mentioned this far:
Both the VU-algorithm and SuperPolyak require certain special structures of f. For
[19–21], a proper definition of “second-order information” of a nonsmooth function is
missing, which makes it difficult to exploit this information theoretically for proving
fast convergence. Similarly, for the generalized quasi-Newton approaches, there is no
corresponding “nonsmooth Newton’s method” from which their convergence theory
can be inherited.
The goal of this article is to construct a fast method that avoids the above
drawbacks. The basic idea is to use the maximum of existing second-order Taylor
expansions around a point as a local model of the objective function, as was already
proposed in [20]. There, whenever the Hessian matrix does not exist at a point y, “the
Hessian at some point infinitely close to y” (cf. [20], p. 2) is taken instead. In con-
trast to this, we will base our model on the second-order ε-jet, which can be seen as
a second-order version of the Goldstein ε-subdifferential. It contains the coefficients
of all existing second-order Taylor expansions (and their limits) in an ε-ball around a
given point and allows us to define the aforementioned model in a well-defined way.
Based on minimization of this model, we first derive a purely abstract descent method
(Algo. 4.1) for the case where the entire ε-jet is available at every point. To obtain a
method that is actually implementable, we then derive a sampling strategy to approx-
imate the ε-jet (Algo. 4.2), for which availability of a single element of the 0-jet at
2