An Approximation-Based Regularized Extra-Gradient Method for Monotone Variational Inequalities Kevin HuangShuzhong Zhang_2

2025-04-30 0 0 1.05MB 29 页 10玖币
侵权投诉
An Approximation-Based Regularized Extra-Gradient Method for
Monotone Variational Inequalities
Kevin HuangShuzhong Zhang
October 11, 2022
Abstract
In this paper, we propose a general extra-gradient scheme for solving monotone variational
inequalities (VI), referred to here as Approximation-based Regularized Extra-gradient method
(ARE). The first step of ARE solves a VI subproblem with an approximation operator satisfying
apth-order Lipschitz bound with respect to the original mapping, further coupled with the
gradient of a (p+ 1)th-order regularization. The optimal global convergence is guaranteed by
including an additional extra-gradient step, while a pth-order superlinear local convergence is
shown to hold if the VI is strongly monotone. The proposed ARE is inclusive and general,
in the sense that a variety of solution methods can be formulated within this framework as
different manifestations of approximations, and their iteration complexities would follow through
in a unified fashion. The ARE framework relates to the first-order methods, while opening
up possibilities to developing higher-order methods specifically for structured problems that
guarantee the optimal iteration complexity bounds.
Keywords: variational inequality, extra-gradient method, tensor method, composite operators.
1 Introduction
Let X Rnbe a convex set; F(x) : Rn7→ Rnbe a vector mapping. The following problem is
known as the variational inequality problem (VI):
Find x∈ X such that F(x)>(xx)0 for all x∈ X.
As a notation we denote the solution set as
VIX(F(x)) := {x|x∈ X such that F(x)>(xx)0 for all x∈ X}.
We assume VIX(F(x)) is non-empty throughout this paper. The study of finite-dimensional VI
problems dates back to 1960’s where the complementarity problem was developed to solve for var-
ious equilibria, such as economic equilibrium, traffic equilibrium, and in general Nash equilibrium.
Department of Industrial and System Engineering, University of Minnesota, huan1741@umn.edu
Department of Industrial and System Engineering, University of Minnesota, zhangs@umn.edu
1
arXiv:2210.04440v1 [math.OC] 10 Oct 2022
For a comprehensive study of the applications, theories and algorithms of VI, readers are referred
to the celebrated monograph by Facchinei and Pang [5].
In this paper, we are interested in a specific class of VI, where the operator Fis monotone:
hF(x)F(y), x yi ≥ µkxyk2,x, y ∈ X (1)
for some µ0. If there exists some µ > 0 such that (1) holds, it is referred to strongly monotone
and VIX(F(x)) is a singleton. The earliest methods developed to solve VI of this type are the
projection method due to Sibony [33]:
xk+1 := arg min
x∈X hF(xk), x xki+γk
2kxxkk2,(2)
and the proximal point method due to Martinet [14]:
xk+1 VIX(F(x) + γk(xxk)),(3)
for positive {γk}k0. These two methods form the basis of most, if not all, methods developed for
monotone VI in the research community thus far.
Korpelevich [10] first introduced an extra-step in the update as follows:
xk+0.5:= arg min
x∈X hF(xk), x xki+γk
2kxxkk2,
xk+1 := arg min
x∈X hF(xk+0.5), x xki+γk
2kxxkk2.
(4)
The iteration complexity of the extra-gradient method (4) is later established by Tseng [37]. In
particular, if the operator is strongly monotone (µ > 0), it is Oκln 1
for an -solution, where
κ=L
µis the condition number for Lipschitz continuous operator with constant L. This is a
significant improvement over Oκ2ln 1
 of the vanilla projection method (2), and it is in fact
optimal among first-order methods (i.e. using only the information of F(·)) applied to such class of
problems (with lower bound recently established by Zhang et al. [40]). Many algorithms developed
for monotone VI thereafter adopt this concept of extra-step update and can be considered as
variants of the extra-gradient method, such as modified forward-backward method [36], mirror-
prox method [19], dual-extrapolation method [24, 27], hybrid proximal extra-gradient method [18],
extra-point method [7].
To facilitate the discussion, let us first introduce a few terminologies that will be used throughout
the paper. The term “pth-order method” will be used following the convention of optimization.
In particular, by considering F(x) = f(x) specifically as a gradient mapping of some function
f(x), the first-order method in VI refers to using only the information from the operator F(·), and
the pth-order method refers to using the (p1)th-order derivative of the operator: p1F. As
a result, the term “gradient” will also be used to refer to F(·) due to the background of VI in
solving saddle-point and optimization models. In the pth-order method, the Lipschitz continuity of
p1F(x) is assumed with constant Lp:
k∇p1F(x)− ∇p1F(y)k ≤ Lpkxyk.(5)
2
In this paper, the proposed Approximation-based Regularized Extra-gradient method (ARE) can be
viewed as a generalization of the extra-gradient method (4) in some sense. While the intermediate
iterate xk+0.5in extra-gradient method (4) is updated by a gradient projection step, in ARE it is
replaced by solving a VI subproblem:
xk+0.5:= VIX(˜
F(x;xk) + γkxxkkp1(xxk)),(6)
where ˜
F(x;xk) is an approximation mapping at xkthat satisfies a pth-order Lipschitz bound with
respect to F(x) (will be formally defined later), and kxxkkp1(xxk) is the gradient mapping of
a (p+ 1)th-order regularization. Therefore, we refer to the update in 6 as (p+ 1)th-order regularzied
VI subproblem. A common choice of ˜
F(x;xk) is the Taylor approximation of F(x) at xk, namely,
˜
F(x;xk) :=
p1
X
i=0
1
i!iF(xk)[xxk]i.
Such choice of ˜
F(x;xk) not only recovers the extra-gradient method when p= 1, but also gives a
succinct update principle for higher-order methods when p > 1. However, the Taylor approximation
needs not be the only motivation for the ARE. We show that the key underlying condition is the
aforementioned pth-order Lipschitz bound, therefore any approximation satisfying this condition
can be considered as a valid method under the general framework of ARE. This not only generalizes
the existing methods but also opens up the possibilities of developing different methods from those
in the literature, and we will discuss several such specific schemes in Section (6). By applying the
abstraction of “approximation” in ARE, a unifying and concise analysis is available to establish the
iteration complexity bound that can be readily specified to any concrete approximation in various
methods given the different problem structures at hand.
The rest of the paper is organized as follows. Section 2 reviews relevant first-order and higher-
order methods for solving monotone VI. Section 3 formally presents ARE and analyzes the global
convergence for both monotone and strongly monotone cases. Section 4 continues the discussion
with strongly monotone VI and establishes the local superlinear convergence. A modified ARE
algorithm is in place to guarantee both global linear and local superlinear convergence. Section 5
is devoted to the discussion of solving the VI subproblem with the approximation operator under
special cases. In Section 6, we present several structured ARE schemes given the original operator
is of the composite form F(x) = H(x) + G(x) and discuss their connections to the existing meth-
ods. We further discuss two specialized approximation concepts: the outer approximation and the
inner approximation, given the general composite form F(x) = H(G(x)). Numerical results from
preliminary experiments are demonstrated in Section 7, and we conclude the paper in Section 8.
2 Literature Review
Historically, Martinet [14] first introduced the notion of proximal point method, which was later
studied and popularized by Rochafellar [32]. Tseng in [37] studied the linear convergence of proximal
point method, extra-gradient method, and matrix-splitting method, given a certain error bound
is satisfied. The strongly monotone operator can be an immediate example of such case. As a
matter of fact, subsequently developed methods such as modified forward-backward method [36],
3
mirror-prox method [19], dual-extrapolation method [24, 27], hybrid proximal extra-gradient (HPE)
method [18], extra-point method [7] are all based on the concept of extra-gradient.
Another type of method, known as optimistic gradient descent ascent method (OGDA), was first
proposed by Popov [31]:
xk+1 := PXxkαF (xk)η(F(xk)F(xk1)),(7)
for some positive α, η > 0, where PXdenotes the projection operator onto X. Unlike the update
in extra-gradient method (4) which uses an extra step, OGDA only requires one update (one
projection) per iteration and uses the information from the previous iterate xk1instead. The
optimal convergence of OGDA, in both monotone and strongly monotone VI, is established by
Mokhtari et al. [15, 16]. The extra-point method proposed by Huang and Zhang [7] extends the
concepts of the extra-gradient method, OGDA, Nesterov’s acceleration in optimization [21], and
the “heavy-ball” method by Polyak [30] and combines them in a unifying update scheme. If the
parameters associated to these different components satisfy a certain constraint set, it is shown that
optimal iteration complexity is guaranteed. There is another line of work that studies variants of
extra-gradient type methods [38, 11, 9] and proximal point methods [35, 12, 29] with the anchoring
update, where in each iteration the initial iterate is used as the component of convex combination.
The iterates produced are shown to converge among these different methods [39], at a rate same
as the optimal convergence rate (to the solution), and the iteration complexities are improved by
constant orders compared to vanilla extra-gradient method.
The above methods are known as the first-order methods. The lower bound of the iteration
complexity for the first-order methods applied to monotone VI is Ω 1
, as established by Ne-
mirovsky and Yudin [20], while for strongly monotone VI, it is Ω κln 1
, shown by Zhang et
al. [40] in the context of strongly-convex-strongly-concave saddle-point problems. Methods such
as extra-gradient method, mirror-prox method, dual-extrapolation method [24, 27], HPE, OGDA,
extra-point method have been proven to achieve these lower bounds, hence optimal.
The work of Taji et al. [34] is among the first to consider second-order methods for solving VI. A
linearized VI subproblem with operator F(xk) + F(xk)(xxk) is solved in each iteration and the
merit function f(x) = max
x0∈XhF(x), x x0iµ
2kxx0k2is used to prove the global convergence, with
an additional local quadratic convergence. However, no explicit iteration complexity is established
for second-order methods until recently. Following the line of research in [34], Huang and Zhang
[6] specifically consider unconstrained strongly-convex-strongly-concave saddle point problem and
incorporate the idea of cubic regularization (originally proposed by Nesterov in the context of
optimization [26]), proving the global iteration complexity Oκ2+κL2
µln 1
, where L2is the
Lipschitz constant of the Hessian information, in addition to the local quadratic convergence.
Another line of research on second-order methods was started by Monteiro and Svaiter [17]. They
propose a Newton Proximal Extragradient (NPE) method, which can be viewed as a special case
of the HPE with large step size. In HPE, the first step solves approximately the proximal point
update (3) (denote as xk+0.5), while the second step is a regular extra-gradient step. The “large
step size” condition, which is key to guarantee a superior convergence rate, requires:
1
γkθ
kxk+0.5xkk(8)
4
for some constant θ > 0. Note that since xk+0.5depends on γk, a certain procedure is required
to determine xk+0.5and γksuch that (8) also holds. By observing that the set of γksatisfying
the condition is in fact a closed interval, they develop a bisection method to iteratively reduce
the range of γkand solve for xk+0.5for each fixed γkuntil the condition is satisfied. They show
that for monotone VI, NPE admits O1/2
3iteration complexity for ergodic mean of xk+0.5over
0kN1, which is an improvement over the optimal first-order complexity O1
. While
NPE can also be expressed in the form of second-order mirror-prox method, Bullins and Lai [3]
propose a “higher-order mirror-prox method”, extending the second-order mirror-prox method to
pth-order and establish O1/ 2
p+1 iteration complexity. They replace the linearization F(xk) +
F(xk)(xxk) with the Taylor approximation of F(xk),
p1
P
i=0
1
i!iF(xk)[xxk]i, together with an
higher-order constraint on γkand xk+0.5similar to (8). They also demonstrate an explicit procedure
to instantiate the proposed method in unconstrained problem with p= 2 and a bisection method
to search for xk+0.5and γk. In [28], Ostroukhov et al. further extend the higher-order mirror-
prox method to strongly monotone VI by incorporating the restart procedure, which yields global
iteration complexity OLp
µ2
p+1 ln 1
. The local quadratic convergence is then guaranteed by
incorporating CRN-SPP proposed in [6]. Nesterov in [22] proposes solving constrained convex
optimization with cubic regularized Newton method and extends the results to monotone VI with
cubic regularized Newton modification of the dual-extrapolation method [24]. The global iteration
complexity is shown to be O(1
) for monotone VI, with local quadratic convergence for strongly
monotone VI.
Recently, there are new developments of higher-order methods for VI that are closely related to
the work in this paper. Jiang and Mokhtari [8] propose the Generalized Optimistic Method, which
is a general pth-order variant of OGDA. Instead of using F(xk) to approximate the proximal point
update direction F(xk+1) with correction F(xk)F(xk1) as in OGDA (7), they propose to use a
general approximation P(xk+1;Ik) with correction F(xk)P(xk;Ik1), where P(x;Ik) can contain
pth-order information and Ikis the information up to kth iteration. Unlike ARE, the Generalized
Optimistic Method does not require an additional projection step in the update, nor does it require
restart to establish linear convergence for strongly monotone VI (see [28] and the discussion in
Section 3.2). However, it still requires to incorporate a bisection subroutine to solve the higher-
order subproblem, similar to the higher-order mirror-prox method [3], while ARE does not. Adil
et al. [1] propose a pth-order method that improves upon the higher-order mirror-prox method
in [3]. The improvement comes from incorporating the gradient of (p+ 1)th-order regularization
in the higher-order VI subproblem, which makes the bisection subroutine unnecessary, and the
global complexity is improved by a logarithmic factor. The special case of ARE, where the Taylor
approximation is used as the approximation operator ˜
F(x;xk) in the (p+ 1)th-order regularized VI
subproblem (6), coincides with the method proposed in [1] for solving monotone VI. In this paper,
we further develop the global and local convergence for strongly monotone VI (see Section 3.2 and
4) and discuss the possibilities beyond Taylor approximation in the subproblem, which is one of the
key underlying motivation behind the ARE framework. Lin and Jordan [13] propose a pth-order
generalization of Nesterov’s dual extrapolation method [24], referred to Perseus. Same as ARE
and [1], Perseus does not require bisection subroutines by solving the VI subproblem with higher-
5
摘要:

AnApproximation-BasedRegularizedExtra-GradientMethodforMonotoneVariationalInequalitiesKevinHuang*ShuzhongZhang„October11,2022AbstractInthispaper,weproposeageneralextra-gradientschemeforsolvingmonotonevariationalinequalities(VI),referredtohereasApproximation-basedRegularizedExtra-gradientmethod(ARE)....

展开>> 收起<<
An Approximation-Based Regularized Extra-Gradient Method for Monotone Variational Inequalities Kevin HuangShuzhong Zhang_2.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:29 页 大小:1.05MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注