
ROBUST DATA-DRIVEN ACCELERATED MIRROR DESCENT
Hong Ye Tan⋆Subhadip Mukherjee⋆†Junqi Tang⋆
Andreas Hauptmann‡§ Carola-Bibiane Sch¨
onlieb⋆
⋆University of Cambridge, DAMTP, Cambridge CB3 0WA, U.K.
†Department of Computer Science, University of Bath, U.K.
‡University of Oulu, Research Unit of Mathematical Sciences, P.O.Box 8000, 90014 University of Oulu
§University College London, Department of Mathematics, 25 Gordon St, London WC1H 0AY, U.K.
ABSTRACT
Learning-to-optimize is an emerging framework that
leverages training data to speed up the solution of certain
optimization problems. One such approach is based on the
classical mirror descent algorithm, where the mirror map is
modelled using input-convex neural networks. In this work,
we extend this functional parameterization approach by intro-
ducing momentum into the iterations, based on the classical
accelerated mirror descent. Our approach combines short-
time accelerated convergence with stable long-time behavior.
We empirically demonstrate additional robustness with re-
spect to multiple parameters on denoising and deconvolution
experiments.
Index Terms—Mirror descent, accelerated optimization,
learning-to-optimize, input-convex neural networks
1. INTRODUCTION
Learning-to-optimize is a general framework of methods that
seek to minimize some objective functions quickly (in as few
iterations as possible). This has been proposed in various
settings such as unsupervised learning or using primal-dual
methods, with applications such as imaging inverse problems
and neural network training [1, 2, 3]. A recent work focuses
on introducing a learned functional parameterization into the
classical mirror descent (MD) algorithm, obtaining faster con-
vergence rates with approximate convergence guarantees [4].
In this work, we propose a learned convex optimization
approach based on the classical accelerated mirror descent
(AMD) scheme. Our aim is to minimize a family of convex
functions fin some function class Fof qualitatively similar
problems, such as variational image denoising. In this work,
we will study and demonstrate convergence and robustness
of our proposed learned scheme, and compare it to existing
learned and classical optimization methods. Section 2 out-
lines the AMD scheme, and section 3 contains various exper-
imental results on the robustness of our proposed method.
1.1. Mirror Descent
Mirror descent is a generalization of gradient descent (GD),
first introduced by Nemirovsky and Yudin [5]. By exploiting
the geometry of the cost function, MD achieves competitive
convergence rate bounds for certain problems, including on-
line learning and tomographic reconstruction [6, 7]. We first
outline the method as presented by Beck and Teboulle [8].
Let Ψbe a convex function on a closed convex set
X ⊆ Rn. Let (Rn)∗denote the corresponding dual space
of Rn. We denote by ¯
R=R∪ {+∞} the extended real
line. Recall that the convex conjugate (or Fenchel con-
jugate) of Ψ, denoted Ψ∗: (Rn)∗→¯
R, is given by
Ψ∗(y) = supx∈X (⟨y, x⟩ − f(x)).Ψinduces a Bregman
divergence BΨ:X × X → ¯
R, defined as:
BΨ(x, y) = Ψ(x)−Ψ(y)− ⟨∇Ψ(y), x −y⟩.
Definition 1 (Mirror potential) We say Ψ : X → Ris
amirror potential if it is continuously differentiable and
strongly convex. We call the gradient ∇Ψ : X → (Rn)∗a
mirror map.
If Ψis a mirror potential, then the convex conjugate Ψ∗is
everywhere differentiable, and moreover satisfies ∇Ψ∗=
(∇Ψ)−1[8, 9]. The standard MD update rule for minimizing
convex fwith initialization x0∈ X is as follows [8]:
yk=∇Ψ(xk)−tk∇f(xk), xk+1 =∇Ψ∗(yk).(1)
For convex Lipschitz f, MD is able to attain O(1/√k)con-
vergence rate. Similarly to GD, algorithmic extensions such
as stochasticity, ergodicity or acceleration are available for
MD [10, 11, 12, 13]. MD can be additionally be interpreted as
natural GD, with a Riemannian metric induced by the mirror
map [14, 15]. While the inclusion of the mirror map makes
this method more flexible than GD, current applications are
limited by the requirement of a hand-crafted mirror map. For
efficient computation, such a mirror map is further restricted
by the requirement of a closed-form convex conjugate, which
is generally unavailable.
arXiv:2210.12238v2 [math.OC] 2 Jun 2023