A related work [19] considers distributed constrained optimization problems with noisy communication
links. The work differs from our current study as they assumed i.i.d. weight matrices with a symmetric
expected weight matrix. In a recent work [20], a two-time-scale gradient descent algorithm has been presented
for (strongly) convex cost functions over a convex compact set. Assuming a fixed topology for the underlying
network, the uniform weighting of the local cost functions, and a specific scheme for the lossy sharing of
information, it is shown that under certain conditions, the agents’ states converge to the optimal solution of
the problem almost surely. Considering the averaging-based distributed optimization over random networks
with possible dependence on the past and under certain conditions, the almost sure convergence to an optimal
point is presented in [21].
In this paper, we study the distributed convex optimization problems over time-varying networks with
imperfect information sharing. We consider the two-time-scale gradient descent method studied in [18,22]
to solve the optimization problem. One time-scale adjusts the (imperfect) incoming information from the
neighboring agents, and one time-scale controls the local cost functions’ gradients. It is shown that with a
proper choice of parameters, the proposed algorithm reaches the global optimal point for strongly convex loss
functions at a rate of
O
(
T−1/2
)and achieves a convergence rate of
O
(
T−1/3+
)with any
>
0for non-convex
cost function in
L2
sense. Here, we identify the sufficient conditions on the step-sizes sequences for the almost
sure convergence of the agent’s states to an optimal solution for the class of convex cost functions.
The paper is organized as follows. We conclude this section by discussing the notations used in the paper.
We formulate the main problem and state the relevant underlying assumptions in Section 2. In Section 3, we
present our main results. To corroborate our theoretical analysis, we present simulation results in Section 4.
We discuss some preliminary results which are required to prove the main results in Section 5. The proofs
of the preliminary lemmas are presented in Appendix. Then, we present the proof of the main results in
Section 6and Section 7. Finally, we conclude the paper and discuss some possible directions for future works
in Section 8.
Notation
. We denote the set of integers
{
1
, . . . , n}
by [
n
]. In this paper, we consider
n
agents that
are minimizing a function in
Rd
. We assume that all local objective functions are acting on
row
vectors
in
R1×d
=
Rd
, and thus we view vectors in
Rd
as row vectors. Note that the rest of the vectors, i.e., the
vectors in
Rn×1
=
Rn
, are assumed to be column vectors. We denote the
L2
norm of a vector
x∈Rd
by
kxk
. For a matrix
A∈Rn×d
, and a strictly positive stochastic vector
r∈Rn
, we define the
r−
norm
of
A
by
kAk2
r
=
Pn
i=1 rikAik2
2
, where
Ai
denotes the
i
-th row of
A
. We denote the Frobenius norm for a
matrix
A∈Rn×d
by
kAk2
=
Pn
i=1 Pd
j=1 |Aij |2
. The difference operator ∆on a real-valued sequence
{a
(
t
)
}
is defined as ∆a(t) := a(t+ 1) −a(t)for every t≥1.
2 Problem Formulation
In this section, we first formulate the problem of interest. Then, we present the underlying assumptions on
the information exchange, the network connectivity, the properties of the cost functions, and the time-scales
sequences.
2.1 Problem Statement
Consider a set of
n≥
2agents, that are connected through a time-varying network. Each agent
i∈
[
n
]has
access to a local cost function
fi
:
Rd→R
. The goal of this work is to solve the optimization problem which
is given by
min
x1,x2,...,xn∈Rd
n
X
i=1
rifi(xi)subject to x1=··· =xn,
where r= (r1, r2, . . . , rn)Tis a stochastic vector, i.e., ri≥0and Pn
i=1 ri= 1.
At each iteration
t≥
1, we represent the topology of the network connecting the agents by a directed
graph
G(t) = ([n],E(t))
, where the vertex set [
n
]represents the agents, and the edge set
E
(
t
)
⊆
[
n
]
×
[
n
]
represents the connectivity pattern of the agents at time
t
, i.e., the edge
(i, j)∈ E(t)
denotes a directed edge
from agent
i
to agent
j
. We assume that at each time
t≥
1, each agent
i∈
[
n
]can only send a message to its
2