FeDXL Provable Federated Learning for Deep X-Risk Optimization

2025-05-06 0 0 1.52MB 34 页 10玖币

侵权投诉

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

Zhishuai Guo 1Rong Jin 2Jiebo Luo 3Tianbao Yang 1

Abstract

In this paper, we tackle a novel federated learn-

ing (FL) problem for optimizing a family of X-

risks, to which no existing FL algorithms are

applicable. In particular, the objective has the

form of

Ez∼S1f(Ez′∼S2ℓ(w;z,z′))

, where two

sets of data

S1,S2

are distributed over multiple

machines,

ℓ(·;·,·)

is a pairwise loss that only de-

pends on the prediction outputs of the input data

pairs

(z,z′)

. This problem has important appli-

cations in machine learning, e.g., AUROC maxi-

mization with a pairwise loss, and partial AUROC

maximization with a compositional loss. The chal-

lenges for designing an FL algorithm for X-risks

lie in the non-decomposability of the objective

over multiple machines and the interdependency

between different machines. To this end, we

propose an active-passive decomposition frame-

work that decouples the gradient’s components

with two types, namely active parts and passive

parts, where the active parts depend on local data

that are computed with the local model and the

passive parts depend on other machines that are

communicated/computed based on historical mod-

els and samples. Under this framework, we design

two FL algorithms (FeDXL) for handling linear

and nonlinear

, respectively, based on federated

averaging and merging and develop a novel the-

oretical analysis to combat the latency of the pas-

sive parts and the interdependency between the

local model parameters and the involved data for

computing local gradient estimators. We establish

both iteration and communication complexities

and show that using the historical samples and

models for computing the passive parts do not

degrade the complexities. We conduct empirical

Department of Computer Science and Engineering,

Texas A&M University

Alibaba

Department of Computer

Science, University of Rochester. Correspondence to:

Zhishuai Guo

zhishguo@tamu.edu

, Tianbao Yang

tianbao-

yang@tamu.edu>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

studies of FeDXL for deep AUROC and partial

AUROC maximization, and demonstrate their per-

formance compared with several baselines.

1. Introduction

This work is motivated by solving the following optimiza-

tion problem arising in many ML applications in a federated

learning (FL) setting:

min

w∈Rd

|S1|X

z∈S1

f1

|S2|X

z′∈S2

ℓ(w,z,z′)

| {z }

g(w,z,S2)

,(1)

where

and

denote two sets of data points that are

distributed over many machines,

denotes the model of a

prediction function

h(w,·)∈Rdo

f(·)

is a deterministic

function that could be linear or non-linear (possibly non-

convex), and

ℓ(w,z,z′) = ℓ(h(w,z), h(w,z′))

denotes a

pairwise loss that only depends on the prediction outputs of

the input data

z,z′

. The above problem belongs to a broader

family of machine learning problems called deep X-risk

optimization (DXO) (Yang,2022). We provide details of

some X-risk minimization applications in Appendix B.

When

is a linear function, the above problem is the classic

pairwise loss minimization problem, which has applications

in AUROC (AUC) maximization (Gao et al.,2013;Zhao

et al.,2011;Gao & Zhou,2015;Calders & Jaroszewicz,

2007;Charoenphakdee et al.,2019;Yang et al.,2021b;

Yang & Ying,2022), bipartite ranking (Cohen et al.,1997;

emen

c¸

on et al.,2008;Kotlowski et al.,2011;Dembczyn-

ski et al.,2012), and distance metric learning (Radenovi

et al.,2016;Wu et al.,2017). When

is a non-linear

function, the above problem is a special case of ﬁnite-

sum coupled compositional optimization problem (Wang &

Yang,2022), which has found applications in various perfor-

mance measure optimization such as partial AUC maximiza-

tion (Zhu et al.,2022), average precision maximization (Qi

et al.,2021;Wang et al.,2022), NDCG maximization (Qiu

et al.,2022), p-norm push optimization (Rudin,2009;Wang

& Yang,2022) and contrastive loss optimization (Gold-

berger et al.,2004;Yuan et al.,2022).

This is in sharp contrast with most existing studies on FL

algorithms (Yang,2013;Kone

y et al.,2016;McMahan

arXiv:2210.14396v4 [cs.LG] 18 Aug 2023

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

et al.,2017;Kairouz et al.,2021;Smith et al.,2018;Stich,

2018;Yu et al.,2019a;b;Khaled et al.,2020;Woodworth

et al.,2020b;a;Karimireddy et al.,2020b;Haddadpour et al.,

2019), which focus on the following empirical risk mini-

mization (ERM) problem with the data set

distributed

over different machines:

min

w∈Rd

|S| X

z∈S

ℓ(w,z).(2)

The major differences between DXO and ERM are (i) the

ERM’s objective is decomposable over training data, while

the DXO is not; and (ii) the data-dependent losses in ERM

are decoupled between different data points; in contrast the

data-dependent loss in DXO couples different training data

points. These differences pose a big challenge for DXO

in the FL setting where the training data are distributed

on different machines and are prohibited to be moved to a

central server. In particular, the gradient of X-risk cannot be

written as the sum of local gradients at individual machines

that only depend on the local data in those machines. Instead,

the gradient of DXO at each machine not only depends on

local data but also on data in other machines. As a result,

the design of communication-efﬁcient FL algorithms for

DXO is much more complicated than that for ERM. In

addition, the presence of non-linear function

makes the

algorithm design and analysis even more challenging than

that with linear

. There are two levels of coupling in

DXO with nonlinear

with one level at the pairwise loss

ℓ(h(w,z), h(w,z′))

and another level at the non-linear risk

f(g(w,z,S2))

, which makes estimation of stochastic

gradient more tricky.

Although DXO can be solved by existing algorithms in a

centralized learning setting (Hu et al.,2020;Wang & Yang,

2022), extension of the existing algorithms to the FL set-

ting is non-trivial. This is different from the extension of

centralized algorithms for ERM problems to the FL set-

ting. In the design and analysis of FL algorithms for ERM,

the individual machines compute local gradients and up-

date local models and communicate periodically to average

models. The rationale of local FL algorithms for ERM is

that as long as the gap error between local models and the

averaged model is on par with the noise in the stochastic

gradients by controlling the communication frequency, the

convergence of local FL algorithms will not be sacriﬁced

and is able to enjoy the parallel speed-up of using multiple

machines. However, this rationale is not sufﬁcient for de-

veloping FL algorithms for DXO optimization due to the

challenges mentioned above.

To address these challenges, we propose two novel FL algo-

rithms named FeDXL1 and FeDXL2 for DXO with linear

and non-linear

, respectively. The main innovation in the

algorithm design lies at an active-passive decomposition

framework that decouples the gradient of the objective into

two types, active parts and passive parts. The active parts

depend on data in local machines and the passive parts de-

pend on data in other machines. We estimate the active parts

using the local data and the local model and estimate the

passive parts using the information with delayed communi-

cations from other machines that are computed at historical

models in the previous round. In terms of analysis, the chal-

lenge is that the model used in the computation of stochastic

gradient estimator depends on the (historical) samples for

computing the passive parts at the current iteration, which

is only exacerbated in the presence of non-linear function

. We develop a novel analysis that allows us to transfer the

error of the gradient estimator into the latency error of the

passive parts and the gap error between local models and

the global model. Hence, the rationale is that as long as the

latency error of the passive parts and the gap error between

local models and the global model is on par with the noise

in the stochastic gradient estimator we are able to achieve

convergence and linear speed-up.

The main contributions of this work are as follows:

•

We propose two novel communication-efﬁcient algo-

rithms, FeDXL1 and FeDXL2, for DXO with linear and

nonlinear

, respectively, based on federated averaging

and merging. Besides communicating local models for

federated averaging, the proposed algorithms need to com-

municate local prediction outputs only periodically for

federated merging to enable the computing of passive

parts. The diagram of the proposed FeDXL algorithms is

shown in Figure 1.

•

We perform novel technical analysis to prove the conver-

gence of both algorithms. We show that both algorithms

enjoy parallel speed-up in terms of the iteration complex-

ity, and a lower-order communication complexity.

•

We conduct empirical studies on two tasks for federated

deep partial AUC optimization with a compositional loss

and federated deep AUC optimization with a pairwise

loss, and demonstrate the advantages of the proposed

algorithms over several baselines.

2. Related Work

FL for ERM. The challenge of FL is how to utilize the

distributed data to learn a ML model with light commu-

nication cost without harming the data privacy (Kone

et al.,2016;McMahan et al.,2017). To reduce the com-

munication cost, many algorithms have been proposed to

skip communications (Stich,2018;Yu et al.,2019a;b;Yang,

2013;Karimireddy et al.,2020b) or compress the communi-

cated statistics (Stich et al.,2018;Basu et al.,2019;Jiang

& Agrawal,2018;Wangni et al.,2018;Bernstein et al.,

2018). Tight analysis has been performed in various stud-

ies (Kairouz et al.,2021;Yu et al.,2019a;b;Khaled et al.,

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

Figure 1.

Illustration of the proposed Active-Passive Decomposition Framework of FeDXL, which is enabled by Federated Averaging

and Merging, where the merged prediction outputs from previous rounds are used for computing the passive parts in stochastic gradient

estimator, and its active parts are computed by using local model and local data.

2020;Woodworth et al.,2020b;a;Karimireddy et al.,2020b;

Haddadpour et al.,2019). However, most of these works

target at ERM.

FL for Non-ERM Problems. In (Guo et al.,2020;

Yuan et al.,2021a;Deng & Mahdavi,2021;Deng et al.,

2020;Liu et al.,2020;Sharma et al.,2022), feder-

ated minimax optimization algorithms have been stud-

ied, which are not applicable to our problem when

non-convex. Gao et al. (2022) considered a much sim-

pler federated compositional optimization in the form of

PkEζ∼Dk

ffk(Eξ∼Dk

ggk(w;ξ); ζ)

, where

denotes the ma-

chine index. Compared with the X-risk, their objective does

not involve interdependence between different machines. Li

et al. (2022); Huang et al. (2022) analyzed FL algorithms

for bi-level problems where only the low-level objective

involves distribution over many machines. Tarzanagh et al.

(2022) considered another federated bilevel problem, where

both upper and lower level objective are distributed over

many machines, but the lower level objective is not cou-

pled with the data in the upper objective. Xing et al. (2022)

studied a federated bilevel optimization in a server-clients

setting, where the central server solves an objective that

depends on optimal solutions of local clients. Our problem

cannot be mapped into these federated bilevel optimization

problems. There are works that optimize non-ERM prob-

lems using local data or data from other machines, which

are mostly adhoc and lack of theoretical guarantees (Han

et al.,2022;Zhang et al.,2020;Wu et al.,2022;Li & Huang,

2022).

Centralized Algorithms for DXO. In the centralized set-

ting DXO has been considered in recent works (Qi et al.,

2021;Wang et al.,2022;Wang & Yang,2022;Qiu et al.,

2022). In particular, Wang & Yang (2022) have proposed

a stochastic algorithm named SOX for solving (1) and

achieved state-of-the-art sample complexity of

O(|S1|/ϵ4)

to ensure the expected convergence to an

-stationary point.

Nevertheless, it is non-trivial to extend the centralized al-

gorithms to the FL setting due to the challenges mentioned

earlier. Recently, (Jiang et al.,2022) further proposed an

advanced variance-reduction technique named MSVR to

improve the sample complexity of solving ﬁnite-sum cou-

pled compositional optimization problems. We provide a

summary of state-of-the-art sample complexities for solving

DXO in both centralized and FL setting in Table 1.

3. FeDXL for DXO

We assume

S1,S2

are split into

non-overlapping sub-

sets that are distributed over

clients

, i.e.,

S1=S1

1∪

1. . . ∪ SN

and

S2=S1

2∪ S2

2. . . ∪ SN

. We denote by

Ez∼S =1

|S| Pz∈S

. Denote by

∇1ℓ(·,·)

and

∇2ℓ(·,·)

the

partial gradients in terms of the ﬁrst argument and the sec-

ond argument, respectively. Without loss of generality, we

assume the dimensionality of

h(w,z)

is 1 (i.e.,

do= 1

) in

the following presentation. Notations used in the algorithms

are summarized in Appendix A.

1We use clients and machines interchangeably.

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

Table 1.

Comparison for sample complexity on each machine for solving the DXO problem to ﬁnd an

-stationary point, i.e.,

E[∥F(w)∥2]≤ϵ2

is the number of ﬁnite-sum components in outer ﬁnite-sum setting, which is the number of data on the outer

function.

nin

denotes the number of ﬁnite-sum components for the inner function

when it is of ﬁnite-sum structure. In federated learning

setting, nidenotes the number components in the outer function of machine i.

Method Sample Complexity Setting

BSGD (Hu et al.,2020)O(1/ϵ6)Inner Expectation + Outer Expectation

BSpiderBoost (Hu et al.,2020)O(1/ϵ5)Inner Expectation + Outer Expectation

Centralized SOX (Wang & Yang,2022)O(n/ϵ4)Inner Expectation + Outer Finite-sum

MSVR (Jiang et al.,2022)O(max(1/ϵ4, n/ϵ3)) Inner Expectation + Outer Finite-sum

MSVR (Jiang et al.,2022)O(n√nin/ϵ2)Inner Finite-sum + Outer Finite-sum

Federated This Work O(maxini/ϵ4)Inner Expectation + Outer Finite-sum

3.1. FeDXL1 for DXO with linear f

We consider the following FL objective for DXO:

min

w∈RdF(w) = 1

i=1

Ez∈Si

j=1

z′∈Sj

ℓ(h(w,z), h(w,z′)).

(3)

To highlight the challenge and motivate FeDXL, we decom-

pose the gradient of the objective function into:

∇F(w) =

i=1

Ez∈Si

j=1

z′∈Sj

2∇1ℓ(h(w,z), h(w,z′))∇h(w,z)

| {z }

∆i1

i=1

Ez′∈Si

j=1

z∈Sj

1∇2ℓ(h(w,z), h(w,z′))∇h(w,z′)

|{z }

∆i2

Let

∇Fi(w) := ∆i,1+ ∆i,2

. Then

∇F(w) =

i=1 ∇Fi(w).

With the above decomposition, we can see that the main

task at the local client

is to estimate the gradient terms

∆i1

and

∆i2

. Due to the symmetry between

∆i1

and

∆i2

below, we only use

∆i1

as an illustration for explaining the

proposed algorithm. The difﬁculty in computing

∆i1

lies

at it relies on data in other machines due to the presence of

z′∈Sj

for all

. To overcome this difﬁculty, we decouple

the data-dependent factors in

∆i1

into two types marked by

green and blue shown below:

Ez∈Si

| {z }

local1

j=1

z′∈Sj

| {z }

global1

∇1ℓ(h(w,z)

| {z }

local2

, h(w,z′)

| {z }

global2

)∇h(w,z)

| {z }

local3

(4)

It is notable that the three green terms can be estimated

or computed based the local data. In particular, local1 can

be estimated by sampling data from

and local2 and lo-

cal3 can be computed based on the sampled data

and the

local model parameter. The difﬁculty springs from esti-

mating and computing the two blue terms that depend on

data on all machines. We would like to avoid communi-

cating

h(w;z′)

at every iteration for estimating the blue

terms as each communication would incur additional com-

munication overhead. To tackle this, we propose to lever-

age the historical information computed in the previous

round

. To put this into context of optimization, we con-

sider the update at the

-th iteration during the

-th round,

where

k= 0, . . . , K −1

. Let

i,k

denote the local model

-th client at the

-th iteration within

-th round. Let

i,k,1∈ Si

1,zr

i,k,2∈ Si

denote the data sampled at the

-th

iteration from

and

, respectively. Each local machine

will compute

h(wr

i,k,zr

i,k,1)

and

h(wr

i,k,zr

i,k,2)

, which will

be used for computing the active parts. Across all iterations

k= 0, . . . , K −1

, we will accumulate the computed pre-

diction outputs over sampled data and stored in two sets

i,1={h(wr

i,k,zr

i,k,1), k = 0, . . . , K −1}

and

i,2=

{h(wr

i,k,zr

i,k,2), k = 0, . . . , K −1}

. At the end of round

we will communicate

i,K

and

i,1

and

i,2

to the central

server, which will average the local models to get a global

model

and also merge

1=Hr

1,1∪Hr

2,1. . .∪Hr

N,1

and

2=Hr

1,2∪ Hr

2,2. . . ∪ Hr

N,2

. These merged information

will be broadcast to each individual client. Then, at the

-th

iteration in the

-th round, we estimate the blue term by

sampling

hr−1

2,ξ ∈ Hr−1

without replacement and compute

an estimator of ∆i1by

i,k,1=∇1ℓ(h(wr

i,k,zr

i,k,1)

| {z }

active

, hr−1

2,ξ

| {z }

passive

)∇h(wr

i,k,zr

i,k,1)

| {z }

active

(5)

where

ξ= (j, t, zr−1

j,t,2)

represents a random variable

that captures the randomness in the sampled client

j∈

{1, . . . , N}

, iteration index

k∈ {0, . . . , K −1}

and data

sample

zr−1

j,t,2∈ Sj

, which is used for estimating the global1

in (4). We refer to the green factors in

Gi,k,1

as the ac-

tive parts and the blue factor in

Gi,k,1

as the passive part.

Similarly, we can estimate ∆i2by Gi,k,2

i,k,2=∇2ℓ(hr−1

1,ζ

| {z }

passive

, h(wr

i,k,zr

i,k,2)

| {z }

active

)∇h(wr

i,k,zr

i,k,2)

| {z }

active

(6)

A round is deﬁned as a sequence of local updates between two

consecutive communications.

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

where

hr−1

1,ζ ∈ Hr−1

is a randomly sampled prediction

output in the previous round with

ζ= (j′, t′,zr−1

j′,t′,1)

rep-

resenting a random variable including a client sample

j′

and iteration sample

t′

and the data sample

zr−1

j′,t′,1

. Then

we will update the local model parameter

i,k

by using a

gradient estimator Gr

i,k,1+Gr

i,k,2.

We present the detailed steps of the proposed algorithm

FeDXL1 in Algorithm 1. Several remarks are following: (i)

at every round, the algorithm needs to communicate both

the model parameters

i,K

and the historical prediction

outputs

Hr−1

i,1

and

Hr−1

i,2

, where

Hr−1

i,∗

is constructed by

collecting all or sub-sampled computed predictions in the

(r−1)

-th round. The bottom line for constructing

Hr−1

i,∗

is to ensure that

Hr−1

∗

contains at least

independently

sampled predictions that are from the previous round on all

machines such that the corresponding data samples involved

Hr−1

∗

can be used to approximate

NPN

i=1 Ez∈Si

∗K

times. Hence, to keep the communication costs minimal,

each client at least needs to sample

O(⌈K/N ⌉)

sampled

predictions from all iterations

k= 0,1, . . . , K −1

and

send them to the server for constructing

Hr−1

∗

, which is

then broadcast to all clients for computing the passive parts

in the round

. As a result, the minimal communication

costs per-round per-client is

O(d+Kdo/N)

. Nevertheless,

for simplicity in Algorithm 1we simply put all historical

predictions into Hr−1

i,∗.

Similar to all other FL algorithms, FeDXL1 does not re-

quire communicating the raw input data, hence protects the

privacy of the data. However, compared with most FL algo-

rithms for ERM, FeDXL1 for DXO has an additional com-

munication overhead at least

O(doK/N )

which depends

on the dimensionality of prediction output

. For learning

a high-dimensional model (e.g. deep neural network with

d≫1

) with score-based pairwise losses (

do= 1

), the addi-

tional communication cost

O(K/N )

could be marginal. For

updating the buffer

Bi,1

and

Bi,2

, we can simply ﬂush the

history and add the newly received

Rr−1

i,1

and

Rr−1

i,2

with

random shufﬂing to Bi,1and Bi,2, respectively.

For analysis, we make the following assumptions regarding

the DXO with linear fproblem, i.e., problem (3).

Assumption 3.1.

•ℓ(·)is differentiable, Lℓ-smooth and Cℓ-Lipschitz.

•h(·,z)

is differentiable,

-smooth and

-Lipschitz

on wfor any z∈ S1∪ S2.

•Ez∈Si

Ej∈[1:N]E

z′∈Sj

2∥∇1ℓ(h(w,z), h(w,z′))∇h(w,z)

+∇2ℓ(h(w,z), h(w,z′))∇h(w,z′)−∇Fi(w)∥2≤σ2

•∃Dsuch that ∥∇Fi(w)− ∇F(w)∥2≤D2,∀i.

Algorithm 1 FeDXL1: FL for DXO with linear f

1: On Client i:Require parameters η, K

Initialize model

i,K

and initialize Buffer

Bi,1,Bi,2=

∅

Sample

points from

, compute their predictions

using model w0

i,K denoted by H0

i,1

Sample

points from

, compute their predictions

using model w0

i,K denoted by H0

i,2

5: for r= 1, ..., R do

6: Sends wr−1

i,K to the server

7: Receives ¯

wrfrom the server and set wr

i,0=¯

8: Send Hr−1

i,1,Hr−1

i,2to the server

9: Receive Rr−1

i,1,Rr−1

i,2from the server

10:

Update buffer

Bi,1,Bi,2

using

Rr−1

i,1,Rr−1

i,2

with

shufﬂing ⋄see text for updating the buffer

11: Set Hr

i,1=∅,Hr

i,2=∅

12: for k= 0, .., K −1do

13:

Sample

i,k,1

from

, sample

i,k,2

from

2⋄

sample two mini-batches of data

14:

Take next

hr−1

and

hr−1

from

Bi,1

and

Bi,2

, resp.

15: Compute h(wr

i,k,zr

i,k,1)and h(wr

i,k,zr

i,k,2)

16:

Add

h(wr

i,k,zr

i,k,1)

into

i,1

and add

h(wr

i,k,zr

i,k,2)into Hr

i,2

17:

Compute

i,k,1

and

i,k,2

according to (5) and (6)

18: wr

i,k+1 =wr

i,k −η(Gr

i,k,1+Gr

i,k,2)

19: end for

20: end for

21: On Server

22: for r= 1, ..., R do

23:

Receive

wr−1

i,K

, from clients

i∈[N]

, compute

wr=

NPN

i=1 wr

i,K and broadcast it to all clients.

24:

Collects

Hr−1

1=Hr−1

1,1∪ Hr−1

2,1. . . ∪ Hr−1

N,1

and

Hr−1

2=Hr−1

1,2∪ Hr−1

2,2. . . ∪ Hr−1

N,2

25: Set Rr−1

i,1=Hr−1

1,Rr−1

i,2=Hr−1

26: Send Rr−1

i,1,Rr−1

i,2to client ifor all i∈[N]

27: end for

The ﬁrst three assumptions are standard in the optimization

of DXO problems (Wang & Yang,2022). The last assump-

tion embodies the data heterogeneity that is also common

in federated learning (Yu et al.,2019a;Karimireddy et al.,

2020b). Next, we present the theoretical results on the con-

vergence of FeDXL1.

Theorem 3.2. Under Assumption 3.1, by setting

η=

O(N

R2/3)and K=O(R1/3

N), Algorithm 1ensures that

E1

r=1 ∥∇F(¯

wr−1)∥2≤1

R2/3.(7)

Remark. To get

E[1

RPR

r=1 ∥∇F(¯

wr−1)∥2]≤ϵ2

, we just

need to set

R=O(1

ϵ3)

η=Nϵ2

and

K=1

Nϵ

. The num-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FeDXL:ProvableFederatedLearningforDeepX-RiskOptimizationZhishuaiGuo1RongJin2JieboLuo3TianbaoYang1AbstractInthispaper,wetackleanovelfederatedlearn-ing(FL)problemforoptimizingafamilyofX-risks,towhichnoexistingFLalgorithmsareapplicable.Inparticular,theobjectivehastheformofEz∼S1f(Ez′∼S2ℓ(w;z,z′)),wheret...

展开>> 收起<<

FeDXL Provable Federated Learning for Deep X-Risk Optimization.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FeDXL Provable Federated Learning for Deep X-Risk Optimization

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: