
has been achieved in [
17
,
16
]. Furthermore, To the best of our knowledge, OGDA and EG methods have
not been extensively explored in nonconvex-nonconcave settings except in a few recent works on structured
nonconvex-nonconcave problems in which the analysis is done through the lens of a variational inequality.
This line of work is discussed in the Nonconvex-nonconcave section. Moreover, recently, Guo et al.
[18]
established the convergence rate of OGDA in NC-SC, however, they have
µ
-PL assumption on Φ(
x
), which is
a strong assumption and further allows them to show the convergence rate in terms of the objective gap.
However, we did not make such an assumption on the primal function, and hence unlike [
18
], we measure the
convergence by the gradient norm of the primal function.
Nonconvex-strongly-concave (NC-SC) problems.
In deterministic setting, Lin et al.
[30]
demonstrated
the first non-asymptotic convergence of GDA to
-stationary point of Φ(
x
), with the gradient complexity of
O
(
κ2
2
). Lin et al.
[31]
and Zhang et al.
[55]
proposed triple loop algorithms achieving gradient complexity of
O
(
√κ
2
)by leveraging ideas from catalyst methods (adding
αkx−x0k2
to the objective function), and inexact
proximal point methods, which nearly match the existing lower bound [
27
,
55
,
20
]. Approximating the inner
loop optimization of catalyst idea by one step of GDA, Yang et al [
52
] developed a single loop algorithm
called smoothed AGDA, which provably converges to
-stationary point, with gradient complexity of
O
(
κ
2
).
For stochastic setting, Lin et al [
30
] showed that Stochastic GDA, with choosing dual and primal learning rate
ratio of
O
(
1
κ2
), converges to
-stationary point with gradient complexity of
O
(
κ3
4
). Chen et al.
[4]
proposed a
double loop algorithm whose outer loop performs one step of gradient descent on the primal variable, and
inner loop performs multiple steps of gradient ascent. Using this idea, they achieved gradient complexity of
O
(
κ3
4
)with fixed batch size. However, their algorithm is double loop, and the iteration complexity of the
inner loop is
O
(
κ
). Yang et al [
52
] also introduced the stochastic version of smoothed AGDA we mentioned
earlier. They showed gradient complexity of
O
(
κ2
4
), using fixed batch size. They achieved the best-known
rate for NC-PL problems, which is an even weaker assumption than NC-SC.
Nonconvex-concave.
Recently, due to the surge of GANs [
14
] and adversarially robust neural network
training, a line of researches are focusing on nonconvex-concave or even nonconvex-nonconcave minimax
optimization problems [
36
,
29
,
41
,
44
,
48
,
13
,
32
,
33
,
24
]. For nonconvex-concave setting, to our best knowledge,
Rafique et al [
44
] is the pioneer to propose provable nonconvex-concave minimax algorithm, where they
proposed Proximally Guided Stochastic Mirror Descent Method, which achieves
O
(
−6
)gradient complexity
to find stationary point. Nouiehed et al [
41
] presented a double-loop algorithm to solve nonconvex-concave
with constraint on both
x
and
y
, and achieved
O
(
−7
)rate. Lin et al [
30
] provided the first analysis of
the classic algorithm (S)GDA on nonconvex-strongly-concave and nonconvex-concave functions, and in
nonconvex-concave setting they achieve
O
(
−6
)for GDA and
O
(
−8
)for SGDA. Zhang et al [
53
] proposed
smoothed-GDA and also achieve
O
(
−8
)rate. Thekumparampil et al.
[48]
proposed Proximal Dual Implicit
Accelerated Gradient method and achieved the best known rate
O
(
−3
)for nonconvex-concave problem. Kong
and Monteiro
[26]
proposed an accelerated inexact proximal point method and also achieve
O
(
−3
)rate. Lin
et al [
31
] designed near-optimal algorithm using an acceleration method with
O
(
−3
)rate. However, their
algorithms require double or triple loops and are not as easy to implement as GDA, OGDA, or EG methods.
Nonconvex-nonconcave.
Minimax optimization problems can be cast as one of the special cases of
variational inequality problems (VIPs) [
1
,
34
]. Thus, one way of studying the convergence in Nonconvex-
nonconcave problems is to leverage some variants of Variational Inequality properties such as Monotone
variational inequality, Minty variational inequality (MVI), weak MVI, and negative comonotone, which are
weaker assumptions compared to convex-concave problems. For instance, Loizou et al.
[35]
showed the linear
convergence of SGDA under expected co-coercivity, a condition that potentially holds for the non-monotone
problem. Moreover, it has been shown that deterministic EG obtains gradient complexity of
O
(
1
2
)for the
aforementioned settings [
7
,
10
,
47
,
42
]. Alternatively, another line of works established the convergence
under the weaker notions of strong convexity such as the Polyak-Łojasiewicz (PL) condition, or
ρ
-weakly
convex. Yang et al [
51
] established the linear convergence of the AGDA algorithm assuming the two-sided PL
condition. Hajizadeh et al [
19
] achieved the same results for EG under the weakly-convex, weakly-concave
assumption.
3