
Federated Learning Using Variance Reduced
Stochastic Gradient for Probabilistically Activated
Agents
Mohammadreza Rostami and Solmaz S. Kia, Senior Member, IEEE
Abstract
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves
both variance reduction and a faster convergence rate to an optimal solution in the setting where each
agent has an arbitrary probability of selection in each iteration. In distributed machine learning, when
privacy matters, FL is a functional tool. Placing FL in an environment where it has some irregular
connections of agents (devices), reaching a trained model in both an economical and quick way can
be a demanding job. The first layer of our algorithm corresponds to the model parameter propagation
across agents done by the server. In the second layer, each agent does its local update with a stochastic
and variance-reduced technique called Stochastic Variance Reduced Gradient (SVRG). We leverage the
concept of variance reduction from stochastic optimization when the agents want to do their local update
step to reduce the variance caused by stochastic gradient descent (SGD). We provide a convergence
bound for our algorithm which improves the rate from O(1
√K)to O(1
K)by using a constant step-size.
We demonstrate the performance of our algorithm using numerical examples.
I. INTRODUCTION
In recent years, with the technological advances in modern smart devices, each phone, tablet, or smart home system,
generates and stores an abundance of data, which, if harvested collaboratively with other users’ data, can lead to
learning models that support many intelligent applications such as smart health and image classification [1], [2].
Standard traditional machine learning approaches require centralizing the training data on one machine, cloud, or
in a data center. However, the data collected on modern smart devices are often of sensitive nature that discourages
users from relying on centralized solutions. Federated Learning (FL) [3], [4] has been proposed to decouple the
ability to do machine learning from the need to store the data in a centralized location. The idea of Federated
Learning is to enable smart devices to collaboratively learn a shared prediction model while keeping all the training
data on the device.
Figure 1 shows a schematic representation of an FL architecture. In FL, collaborative learning without data sharing
is accomplished by each agent receiving a current model weight from the server. Then, each participating learning
separately updates the model by implementing a stochastic gradient descent (SGD) [5] using its own locally collected
datasets. Then, the participating agents send their locally calculated model weights to a server/aggregator, which
often combines the models through a simple averaging, as in FedAvg [4], to be sent back to the agents. The process
repeats until a satisfactory model is obtained. Federated learning relies heavily on communication between learner
agents (clients) and a moderating server. Engaging all the clients in the learning procedure at each time step of the
algorithm results in huge communication cost. On the other hand, poor channel quality and intermittent connectivity
can completely derail training. For resource management, in the original popular FL algorithms such as FedAvg in
[4], at each round of the algorithm, a batch of agents are selected uniformly at random to receive the updated model
weights and perform local learning. FedAvg and similar FL algorithms come with convergence guarantees [6]–[9]
under the assumption of availability of the randomly selected agents at each round. However, in practice due to
factors such as energy and time constraints, agents’ availability is not ubiquitous at all times. Thus, some works
have been done to solve this problem via device scheduling [10]–[14]. Nevertheless, the agents’ availability can be
a function of unforeseen factors such as communication channel quality, and thus is not deterministic and known
in advance.
The authors are with the Department of Mechanical and Aerospace Engineering, University of California Irvine, Irvine, CA
92697, {mrostam2,solmaz}@uci.edu. This work was supported by NSF, under CAREER Award ECCS-1653838.
arXiv:2210.14362v2 [cs.LG] 1 Apr 2023