
ZO One-Point Estimate with Distributed Stochastic GT Technique
several lower bounds to confirm that the convergence rate cannot be better than O(1
√k)
after k iterations for strongly convex and smooth objective functions.
All Qu and Li (2018); Lorenzo and Scutari (2016); Nedi´c et al. (2017); Shi et al. (2015);
Li et al. (2022); Jiang et al. (2022) present a distributed gradient-tracking method that
employs local auxiliary variables to track the average of all agents’ gradients, considering
the availability of accurate gradient information. In both Li et al. (2022); Jiang et al. (2022),
each local objective function is an average of finite instantaneous functions. Thus, they
incorporate the gradient-tracking algorithm with stochastic averaging gradient technology
(Li et al., 2022) (smooth convex optimization) or with variance reduction techniques (Jiang
et al., 2022) (smooth nonconvex optimization). At each iteration, they randomly select
only one gradient of an instantaneous function to approximate the local batch gradient. In
Li et al. (2022), this is an unbiased estimate of the local gradient, whereas, in Jiang et al.
(2022), it is biased. Nevertheless, both references assume access to an exact gradient oracle.
In Tang et al. (2021), they develop two algorithms for nonconvex multi-agent optimiza-
tion. One is a gradient-tracking algorithm based on a noise-free 2d-point estimator of the
gradient, achieving a rate of O(1
k) with smoothness assumptions and a linear rate for an
extra ν-gradient dominated objective assumption. The other is based on a 2-point unbiased
estimator without global gradient tracking and achieves a rate of O(1
√klog k) under Lips-
chitz continuity and smoothness conditions and O(1
k) under an extra gradient dominated
function structure. Both these gradient estimators are unbiased with respect to the exact
gradient and even have a vanishing variance.
All Pu and Nedi´c (2018); Xin et al. (2019); Pu (2020); Lu et al. (2019) assume access
to local stochastic first-order oracles where the gradient is unbiased and with a bounded
variance. In the first three, they additionally assume smooth and strongly-convex local
objectives, and they all accomplish a linear convergence rate under a constant step size.
Pu and Nedi´c (2018) proposes a distributed stochastic gradient-tracking method (DSGT)
and a gossip-like stochastic gradient-tracking method (GSGT) where at each round, each
agent wakes up with a certain probability. Further, in Pu and Nedi´c (2018), when the
step-size is diminishing, the convergence rate is that of O(1
k). Xin et al. (2019) employs
a gradient-tracking algorithm for agents communicating over a strongly-connected graph.
Pu (2020) introduces a robust gradient-tracking method (R-Push-Pull) in the context of
noisy information exchange between agents and with a directed network topology. In Lu
et al. (2019), they propose a gradient-tracking based nonconvex stochastic decentralized
(GNSD) algorithm for nonconvex optimization problems in machine learning, and they
fulfill a convergence rate of O(1
√k) under constant step size.
1.4 Contributions
In this paper, we consider smooth and convex local objectives, and we extend the gradient-
tracking algorithm to the case where we do not have an estimation of the gradient. Under
the realistic assumption that the agent only has access to a single noisy function value at
each time, without necessarily knowing the form of this function, we propose a one-point
estimator in a stochastic framework. Naturally, one-point estimators are biased with respect
to the true gradient and suffer from high variance (Liu et al., 2020) hence they do not match
the assumptions for convergence presented in Tang et al. (2021); Pu and Nedi´c (2018); Xin
5