SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities

2025-04-26 0 0 395.85KB 11 页 10玖币

侵权投诉

SARAH-based Variance-reduced Algorithm for

Stochastic Finite-sum Cocoercive Variational

Inequalities

Aleksandr Beznosikov1,2Alexander Gasnikov1,3,4

1Moscow Institute of Physics and Technology, Dolgoprudny, Russia

2HSE University, Moscow, Russia

3IITP RAS, Moscow, Russia

4Caucasus Mathematical Center, Adyghe State University, Maikop, Russia

Abstract. Variational inequalities are a broad formalism that encom-

passes a vast number of applications. Motivated by applications in ma-

chine learning and beyond, stochastic methods are of great importance.

In this paper we consider the problem of stochastic ﬁnite-sum cocoer-

cive variational inequalities. For this class of problems, we investigate

the convergence of the method based on the SARAH variance reduction

technique. We show that for strongly monotone problems it is possible to

achieve linear convergence to a solution using this method. Experiments

conﬁrm the importance and practical applicability of our approach.

Keywords: stochastic optimization ·variational inequalities ·ﬁnite-

sum problems

1 Introduction

In this paper we focus on the following unconstrained variational inequality (VI)

problem:

Find z∗∈Rdsuch that F(z∗)=0,(1)

where F:Rd→Rdis some operator. This formulation is broad and encompasses

many popular classes of tasks arising in practice. The simplest, however, widely

encountered example of the VI is the minimization problem:

min

z∈Rdf(z).

To represent it in the form (1), it is suﬃcient to take F(z) = ∇f(z). As another

also popular practical example, we can consider a saddle point or min-max prob-

lem:

min

x∈Rdx

max

y∈Rdy

g(x, y).

Here we need to take F(z)=[∇xg(x, y),−∇yg(x, y)].

From a machine learning perspective, it is interesting not the deterministic

formulation (1), but the stochastic one. More speciﬁcally, we want to consider

arXiv:2210.05994v1 [math.OC] 12 Oct 2022

2 A. Beznosikov, A. Gasnikov

the setup with the operator F(z) = Eξ∼D [Fξ(z)], where ξis a random variable,

Dis some distribution, Fξ:Rd→Rdis a stochastic operator. But it is often the

case (especially in practical problems) that the distribution Dis unknown, but

we have some samples from D. Then, one can replace with a ﬁnite-sum Monte

Carlo approximation, i.e.

F(z) = 1

i=1

Fi(z).(2)

In the case of minimization problems, statements of the form (1)+(2) are also

called empirical risk minimization [37]. These types of problems arise both in

classical machine learning problems such as simple regressions and in complex,

large-scale problems such as neural networks [23]. When it comes to saddle point

problems, in recent times the so-called adversarial approach has become popular.

Here one can highlight Generative Adversarial Networks (GANs) [13] and the

adversarial training of models [40,25].

Based on the examples mentioned above, it can be noted that for operators

of the form (2), computing the full value of Fis a possible but not desirable

operation, since it is typically very expensive compared to computing a single

operator Fi. Therefore, when constructing an algorithm for the problem (1)+(2),

one wants to avoid computing (or compute very rarely) the full Foperator. This

task can be solved by a stochastic gradient descent (SGD) framework. Currently,

stochastic methods for minimization problems already have a huge background

[15]. The ﬁrst methods of this type were proposed back in the 1950s by [35]. For

example, in the most classic variant, SGD could be written as follows:

zk+1 =zk−ηvk,(3)

where η > 0 is a predeﬁned step-size and vk=∇fi(zk), where i∈[n] is cho-

sen randomly [38]. In this case, the variance of vtis the main source of slower

convergence or convergence only to the neighbourhood of the solution [7,31,21].

But for minimization problems of the ﬁnite-sum type, one can achieve stronger

theoretical and practical results compared to the method (3). This requires the

use of a variance reduction technique. Recently, many variance-reduced variants

of SGD have been proposed, including SAG/SAGA [36,10,34], SVRG [19,3,39],

MISO [27], SARAH [29,32,30,18], SPIDER [11], STORM [9], PAGE [24]. The

essence of one of the earliest and best known variance-reduced methods SVRG

is to use vk=∇fi(zk)− ∇fi(˜z) + ∇f(˜z), where i∈[n] is picked at random,

where i∈[n] is picked at random and the point ˜zis updated very rarely (hence

we do not need to compute the full gradient often). With this type of methods it

is possible to achieve a linear convergence to the solution. But for both convex

and non-convex smooth minimization problems, the best theoretical guarantees

of convergence are given by other variance-reduced technique SARAH (and its

modiﬁcations: SPIDER, STORM, PAGE).

In turn, stochastic methods are also investigated for variational inequalities

and saddle point problems [20,12,17,28,16,6,14,4,5], including methods based on

variance reduction techniques [33,8,2,1,22,4,5]. Most of these methods are based

SARAH for Stochastic Cocoercive Variational Inequalities 3

on the SVRG approach. At the same time, SARAH-based methods have not been

explored for VIs. But as we noted earlier, these methods are the most attractive

from the theoretical point of view for minimization problems. The purpose of

this paper is to partially close the question of SARAH approach for stochastic

ﬁnite-sum variational inequalities.

2 Problem setup and assumptions

Notation. We use hx, yi:=Pn

i=1 xiyito denote standard inner product of

x, y ∈Rdwhere xicorresponds to the i-th component of xin the standard basis

in Rd. It induces `2-norm in Rdin the following way kxk2:=phx, xi.

Recall that we consider the problem (1), where the operator F has the form

(2). Additionally, we assume

Assumption 1 (Cocoercivity) Each operator Fiis `-cocoercive, i.e. for all

u, v ∈Rdwe have

kFi(u)−Fi(v)k2≤`hFi(u)−Fi(v), u −vi.(4)

This assumption is somehow a more restricted analogue of the Lipschetzness

of Fi. For convex minimization problems, `-Lipschitzness and `-cocoercivity are

equivalent. Regarding variational inequalities and saddle point problems, see

[26].

Assumption 2 (Strong monotonicity) The operator Fis µ-strongly mono-

tone, i.e. for all u, v ∈Rdwe have

hF(u)−F(v); u−vi ≥ µku−vk2.(5)

For minimization problems this property means strong convexity, and for saddle

point problems strong convexity–strong concavity.

3 Main part

For general Lipschitzness variational inequalities, stochastic methods are usually

based not on SGD, but on the Stochastic Extra Gradient method [20]. But due

to the fact that we consider cocoercive VIs, it is suﬃcient to look at SGD like

methods for this class of problems. For example, [26] considers SGD, [5] - SVRG.

Following this reasoning, we base our method on the original SARAH [29].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SARAH-basedVariance-reducedAlgorithmforStochasticFinite-sumCocoerciveVariationalInequalitiesAleksandrBeznosikov1;2AlexanderGasnikov1;3;41MoscowInstituteofPhysicsandTechnology,Dolgoprudny,Russia2HSEUniversity,Moscow,Russia3IITPRAS,Moscow,Russia4CaucasusMathematicalCenter,AdygheStateUniversity,Maikop,...

展开>> 收起<<

SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: