where
Lj
is the local objective function over the
jth
worker. Problem in Eq. (1) arises in numerous
areas, such as distributed signal processing [
19
], multi-agent optimization [
36
], etc. However, such
problem does not consider the data heterogeneity [
57
,
40
,
39
,
30
] among different workers (i.e.,
data distribution of workers could be substantially different from each other [
44
]). Indeed, it has
been shown that traditional federated approaches, such as FedAvg [
33
], built for independent and
identically distributed (IID) data may perform poorly when applied to Non-IID data [
27
]. This issue
can be mitigated via learning a robust model that aims to achieve uniformly good performance over all
workers by solving the following distributionally robust optimization (DRO) problem in a distributed
manner:
min
w∈Wmax
p∈Ω⊆∆N
F(w,p) := Xjpjfj(w),(3)
where
p= [p1,· · · , pN]∈RN
is the adversarial distribution in
N
workers, the
jth
entry in this vector,
i.e.,
pj
represents the adversarial distribution value for the
jth
worker.
∆N={p∈RN
+:1>p= 1}
and
Ω
is a subset of
∆N
. Agnostic federated learning (AFL) [
35
] firstly introduces the distributionally
robust (agnostic) loss in federated learning and provides the convergence rate for (strongly) convex
functions. However, AFL does not discuss the setting of
Ω
. DRFA-Prox [
16
] considers
Ω= ∆N
and imposes a regularizer on adversarial distribution to leverage the prior distribution. Nevertheless,
three key challenges have not yet been addressed by prior works. First, whether it is possible to
construct an uncertainty framework that can not only flexibly maintain the trade-off between the
model robustness and performance but also effectively leverage the prior distribution? Second, how to
design asynchronous algorithms with guaranteed convergence? Compared to synchronous algorithms,
the master in asynchronous algorithms can update its parameters after receiving updates from only
a small subset of workers [
58
,
10
]. Asynchronous algorithms are particularly desirable in practice
since they can relax strict data dependencies and ensure convergence even in the presence of device
failures [58]. Finally, whether it is possible to flexibly adjust the degree of robustness? Moreover, it
is necessary to provide convergence guarantee when the objectives (i.e.,
fj(wj),∀j
) are non-convex.
To this end, we propose ASPIRE-EASE to effectively address the aforementioned challenges. Firstly,
different from existing works, the prior distribution is incorporated within the constraint in our
formulation, which can not only leverage the prior distribution more effectively but also achieve
guaranteed feasibility for any adversarial distribution within the uncertainty set. The prior distribution
can be obtained from side information or uniform distribution [
41
], which is necessary to construct
the uncertainty (ambiguity) set and obtain a more robust model [
16
]. Specifically, we formulate the
prior distribution informed distributionally robust optimization (PD-DRO) problem as:
min
z∈Z,{wj∈W}max
p∈PXjpjfj(wj)(4)
s.t.z=wj, j =1,· · ·, N,
var.z,w1,w2,· · · ,wN,
where
z∈Rp
is the global consensus variable,
wj∈Rp
is the local variable (local model parameter) of
jth
worker and
Z⊆Rp
is a nonempty closed convex set.
P⊆RN
+
is the uncertainty (ambiguity) set of
adversarial distribution
p
, which is set based on the prior distribution. To solve the PD-DRO problem
in an asynchronous distributed manner, we first propose
A
synchronous
S
ingle-loo
P
alternat
I
ve
g
R
adient proj
E
ction (ASPIRE), which employs simple gradient projection steps for the update of
primal and dual variables at every iteration, thus is computationally efficient. Next, the it
E
rative
A
ctive
SE
t method (EASE) is employed to replace the traditional cutting plane method to improve
the computational efficiency and speed up the convergence. We further provide the convergence
guarantee for the proposed algorithm. Furthermore, a new uncertainty set, i.e., constrained
D
-norm
(
CD
-norm), is proposed in this paper and its advantages include: 1) it can flexibly control the degree
of robustness; 2) the resulting subproblem is computationally simple; 3) it can effectively leverage
the prior distribution and flexibly set the bounds for every pj.
Contributions. Our contributions can be summarized as follows:
1.
We formulate a PD-DRO problem with
CD
-norm uncertainty set. PD-DRO incorporates the prior
distribution as constraints which can leverage prior distribution more effectively and guarantee ro-
bustness. In addition,
CD
-norm is developed to model the ambiguity set around the prior distribution
and it provides a flexible way to control the trade-off between model robustness and performance.
2.
We develop a single-loop asynchronous algorithm, namely ASPIRE-EASE, to optimize PD-
DRO in an asynchronous distributed manner. ASPIRE employs simple gradient projection steps to
2