
An important family of efficient methods for solving problems in the form of model (OP)
are Lagrangian-based methods, and most notably augmented Lagrangian methods [35, 25, 37].
Starting with the classical proximal method of multipliers [39], and until more recently [40,
12, 13, 9], such methods, which are based on proximal/projection computations (due to the
nonsmooth functions RXand RYin Problem (OP)) have been successfully developed and cor-
responding provable convergence rates have been established. However, in many cases of interest
such proximal/projection computations are not tractable in high-dimensional problems, for in-
stance when either RXor RYis a nuclear norm regularizer for matrices, which underlies many
recovery problems of low-rank matrices and tensors (see, for instance, [7, 6, 16]), or an indicator
function for a polytope. Thus, with the growing interest in recent years in so-called projection-
free methods, which are mostly based on the use of linear minimization oracles (LMO) instead
of proximal/projection oracles through the Frank-Wolfe method (aka conditional gradient, see
for instance [26]), and are often much more efficient to implement for high-dimensional problems
(e.g., in case RXor RYis a nuclear norm regularizer or an indicator function for a polytope
which captures some well-studied combinatorial structure, see for instance [26, 24]), have been
studied [33, 43, 41]. However, these methods suffer from slow convergence rates compared to
their proximal/projection-based counterparts, with the worst-case guaranteed convergence rate
being at best O(1/√T), where Tis the number of iterations executed. This rate is not known to
be improvable even under additional standard curvature assumptions such as strong convexity
of the function f(·) in Problem (OP)1. A recent attempt to obtain faster projection-free meth-
ods under relatively mild assumptions has been made in [22], however as we discuss in detail in
the appendix (see Section A), there is a major problem with their proof which does not seem
easily fixable. We also refer the interested reader to the excellent discussions in [22] on major
issues with other previous attempts to prove faster rates for projection-free methods.
For the simpler problem of minimizing a smooth convex objective function over a convex
and compact set, and in particular in case the feasible set is either a polytope or a nuclear
norm ball of matrices or a spectrahedron (set of positive semidefinite matrices with unit trace),
several recent works showed how simple modifications of the Frank-Wolfe method can lead
to provably faster convergence rates, under standard curvature assumptions, see for instance
[19, 30, 3, 17, 1, 20]. Thus, in the context of the significantly more complex Problem (OP), our
work considers the following natural question:
Can we design a projection-free augmented Lagrangian-based method that, at least under
standard curvature assumptions, improves upon the current O(1/√T)convergence rate?
We answer this question on the affirmative side by providing a projection-free method with a
rate of O(1/T ), both in terms of the objective function residual and the feasibility gap of the
affine constraint in Problem (OP).
Our approach departs from previous projection-free methods which guarantee only a rate
of O(1/√T) in two aspects. First, as already suggested, we make a curvature assumption on
Problem (OP): we introduce a curvature condition we call primal quadratic gap (see definition
in the sequel). In particular, this condition holds whenever the smooth function f(·) in Problem
(OP) is strongly convex, but also holds in case f(·) is a composition of a strongly convex function
with a linear transformation (e.g., a least squares objective, which need not be strongly convex)
and RX,RYare indicators for polytopes. Second, while previous projection-free methods rely
on the availability of a linear minimization oracle (LMO), in this work we consider a slightly
stronger oracle which was already considered in recent works [1, 21, 20] (these however do
not apply to problems such as Problem (OP), which includes affine constraints), namely the
weak proximal oracle (WPO)2. In a nutshell, this oracle solves a certain relaxed version of the
1This is not surprising since it is well-known that in general, and as opposed to projection/proximal-based
methods, the Frank-Wolfe method does not benefit from strong convexity, see for instance discussions in [19, 17, 1].
2The term “weak proximal oracle” was originally coined in [20].
2