SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities

2025-04-26 0 0 395.85KB 11 页 10玖币
侵权投诉
SARAH-based Variance-reduced Algorithm for
Stochastic Finite-sum Cocoercive Variational
Inequalities
Aleksandr Beznosikov1,2Alexander Gasnikov1,3,4
1Moscow Institute of Physics and Technology, Dolgoprudny, Russia
2HSE University, Moscow, Russia
3IITP RAS, Moscow, Russia
4Caucasus Mathematical Center, Adyghe State University, Maikop, Russia
Abstract. Variational inequalities are a broad formalism that encom-
passes a vast number of applications. Motivated by applications in ma-
chine learning and beyond, stochastic methods are of great importance.
In this paper we consider the problem of stochastic finite-sum cocoer-
cive variational inequalities. For this class of problems, we investigate
the convergence of the method based on the SARAH variance reduction
technique. We show that for strongly monotone problems it is possible to
achieve linear convergence to a solution using this method. Experiments
confirm the importance and practical applicability of our approach.
Keywords: stochastic optimization ·variational inequalities ·finite-
sum problems
1 Introduction
In this paper we focus on the following unconstrained variational inequality (VI)
problem:
Find zRdsuch that F(z)=0,(1)
where F:RdRdis some operator. This formulation is broad and encompasses
many popular classes of tasks arising in practice. The simplest, however, widely
encountered example of the VI is the minimization problem:
min
zRdf(z).
To represent it in the form (1), it is sufficient to take F(z) = f(z). As another
also popular practical example, we can consider a saddle point or min-max prob-
lem:
min
xRdx
max
yRdy
g(x, y).
Here we need to take F(z)=[xg(x, y),−∇yg(x, y)].
From a machine learning perspective, it is interesting not the deterministic
formulation (1), but the stochastic one. More specifically, we want to consider
arXiv:2210.05994v1 [math.OC] 12 Oct 2022
2 A. Beznosikov, A. Gasnikov
the setup with the operator F(z) = Eξ∼D [Fξ(z)], where ξis a random variable,
Dis some distribution, Fξ:RdRdis a stochastic operator. But it is often the
case (especially in practical problems) that the distribution Dis unknown, but
we have some samples from D. Then, one can replace with a finite-sum Monte
Carlo approximation, i.e.
F(z) = 1
n
n
X
i=1
Fi(z).(2)
In the case of minimization problems, statements of the form (1)+(2) are also
called empirical risk minimization [37]. These types of problems arise both in
classical machine learning problems such as simple regressions and in complex,
large-scale problems such as neural networks [23]. When it comes to saddle point
problems, in recent times the so-called adversarial approach has become popular.
Here one can highlight Generative Adversarial Networks (GANs) [13] and the
adversarial training of models [40,25].
Based on the examples mentioned above, it can be noted that for operators
of the form (2), computing the full value of Fis a possible but not desirable
operation, since it is typically very expensive compared to computing a single
operator Fi. Therefore, when constructing an algorithm for the problem (1)+(2),
one wants to avoid computing (or compute very rarely) the full Foperator. This
task can be solved by a stochastic gradient descent (SGD) framework. Currently,
stochastic methods for minimization problems already have a huge background
[15]. The first methods of this type were proposed back in the 1950s by [35]. For
example, in the most classic variant, SGD could be written as follows:
zk+1 =zkηvk,(3)
where η > 0 is a predefined step-size and vk=fi(zk), where i[n] is cho-
sen randomly [38]. In this case, the variance of vtis the main source of slower
convergence or convergence only to the neighbourhood of the solution [7,31,21].
But for minimization problems of the finite-sum type, one can achieve stronger
theoretical and practical results compared to the method (3). This requires the
use of a variance reduction technique. Recently, many variance-reduced variants
of SGD have been proposed, including SAG/SAGA [36,10,34], SVRG [19,3,39],
MISO [27], SARAH [29,32,30,18], SPIDER [11], STORM [9], PAGE [24]. The
essence of one of the earliest and best known variance-reduced methods SVRG
is to use vk=fi(zk)− ∇fi(˜z) + f(˜z), where i[n] is picked at random,
where i[n] is picked at random and the point ˜zis updated very rarely (hence
we do not need to compute the full gradient often). With this type of methods it
is possible to achieve a linear convergence to the solution. But for both convex
and non-convex smooth minimization problems, the best theoretical guarantees
of convergence are given by other variance-reduced technique SARAH (and its
modifications: SPIDER, STORM, PAGE).
In turn, stochastic methods are also investigated for variational inequalities
and saddle point problems [20,12,17,28,16,6,14,4,5], including methods based on
variance reduction techniques [33,8,2,1,22,4,5]. Most of these methods are based
SARAH for Stochastic Cocoercive Variational Inequalities 3
on the SVRG approach. At the same time, SARAH-based methods have not been
explored for VIs. But as we noted earlier, these methods are the most attractive
from the theoretical point of view for minimization problems. The purpose of
this paper is to partially close the question of SARAH approach for stochastic
finite-sum variational inequalities.
2 Problem setup and assumptions
Notation. We use hx, yi:=Pn
i=1 xiyito denote standard inner product of
x, y Rdwhere xicorresponds to the i-th component of xin the standard basis
in Rd. It induces `2-norm in Rdin the following way kxk2:=phx, xi.
Recall that we consider the problem (1), where the operator F has the form
(2). Additionally, we assume
Assumption 1 (Cocoercivity) Each operator Fiis `-cocoercive, i.e. for all
u, v Rdwe have
kFi(u)Fi(v)k2`hFi(u)Fi(v), u vi.(4)
This assumption is somehow a more restricted analogue of the Lipschetzness
of Fi. For convex minimization problems, `-Lipschitzness and `-cocoercivity are
equivalent. Regarding variational inequalities and saddle point problems, see
[26].
Assumption 2 (Strong monotonicity) The operator Fis µ-strongly mono-
tone, i.e. for all u, v Rdwe have
hF(u)F(v); uvi ≥ µkuvk2.(5)
For minimization problems this property means strong convexity, and for saddle
point problems strong convexity–strong concavity.
3 Main part
For general Lipschitzness variational inequalities, stochastic methods are usually
based not on SGD, but on the Stochastic Extra Gradient method [20]. But due
to the fact that we consider cocoercive VIs, it is sufficient to look at SGD like
methods for this class of problems. For example, [26] considers SGD, [5] - SVRG.
Following this reasoning, we base our method on the original SARAH [29].
摘要:

SARAH-basedVariance-reducedAlgorithmforStochasticFinite-sumCocoerciveVariationalInequalitiesAleksandrBeznosikov1;2AlexanderGasnikov1;3;41MoscowInstituteofPhysicsandTechnology,Dolgoprudny,Russia2HSEUniversity,Moscow,Russia3IITPRAS,Moscow,Russia4CaucasusMathematicalCenter,AdygheStateUniversity,Maikop,...

展开>> 收起<<
SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:395.85KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注