FedMT Federated Learning with Mixed-type Labels Qiong Zhang Institute of Statistics and Big Data

2025-04-27 1 0 1.01MB 23 页 10玖币

侵权投诉

FedMT: Federated Learning with Mixed-type Labels

Qiong Zhang

Institute of Statistics and Big Data

Renmin University of China

Jing Peng∗

Institute of Statistics and Big Data

Renmin University of China

Xin Zhang∗

MetaAI

Aline Talhouk

Department of Obstetrics & Gynecology

The University of British Columbia

Gang Niu

RIKEN

Xiaoxiao Li†

Department of Electrical and Computer Engineering

The University of British Columbia

February 16, 2024

Abstract

In federated learning (FL), classiﬁers (e.g., deep networks) are trained on datasets from multiple

data centers without exchanging data across them, which improves the sample eﬃciency. However, the

conventional FL setting assumes the same labeling criterion in all data centers involved, thus limiting its

practical utility. This limitation becomes particularly notable in domains like disease diagnosis, where

diﬀerent clinical centers may adhere to diﬀerent standards, making traditional FL methods unsuitable. This

paper addresses this important yet under-explored setting of FL, namely FL with mixed-type labels, where

the allowance of diﬀerent labeling criteria introduces inter-center label space diﬀerences. To address this

challenge eﬀectively and eﬃciently, we introduce a model-agnostic approach called FedMT, which estimates

label space correspondences and projects classiﬁcation scores to construct loss functions. The proposed

FedMT is versatile and integrates seamlessly with various FL methods, such as FedAvg. Experimental

results on benchmark and medical datasets highlight the substantial improvement in classiﬁcation accuracy

achieved by FedMT in the presence of mixed-type labels.

1 Introduction

Federated learning (FL) allows centers to collaboratively train a model while maintaining data locally,

avoiding the data centralization constraints imposed by regulations such as the California Consumer Privacy

Act (Legislature, 2018), Health Insurance Portability and Accountability Act (Act, 1996), and the General Data

Protection Regulation (Voigt et al., 2018). This approach has gained popularity across various applications.

Well-established FL methods, such as FedAvg (McMahan et al., 2017), employ iterative optimization algorithms

for joint model training across centers. In each round, individual centers perform stochastic gradient descent

(SGD) for several steps before communicating their current model weights to a central server for aggregation.

In the conventional FL classiﬁcation framework, a classiﬁer is trained jointly, assuming the same labeling

criterion across all participating centers. However, in real applications such as healthcare, the standards for

disease diagnosis may be diﬀerent across clinical centers due to the varying levels of expertise or technology

available at diﬀerent centers. Diﬀerent centers may adhere to distinct diagnostic and statistical manuals (Ep-

stein and Loren, 2013; McKeown et al., 2015), making it diﬃcult to enforce a uniﬁed labeling criterion or

perform relabeling, especially when labeling based on studies that cannot be replicated. Consequently, this

results in disparate label spaces across centers. Furthermore, the center with the most intricate labeling

criterion, crucial for future predictions thus referred as desired label space, typically has a limited number of

samples due to labeling complexity or associated costs. We consider this practical but underexplored scenario

as learning from mixed-type labels and aim to answer the following important question:

∗contributed equally

†Corresponding to: Xiaoxiao Li (xiaoxiao.li@ece.ubc.ca).

arXiv:2210.02042v4 [cs.LG] 15 Feb 2024

In the context of labeling criteria varying across centers, how can we eﬀectively incorporate the

commonly used FL pipeline (e.g., FedAvg) to jointly learn an FL model in the desired label space?

100

010

001

0.6 00.4

(a) Different label spaces (b) Our proposed FedMT

!"!

Class overlap

{%!,'

%!}&and&{%",'

%!}

"= FedAvg(!

'#$(", #

'#$(!

", 2#

#!)'#$(!

", 2#

##)

FedMT(T)

Desired label space

The other label space

Known correspondence matrix .

Same backbone

Client update

Estimate .

Predict

Client True label

Pseudo label

Count

"!!

""!

"!3 0 0

""0 2 0

"#0 4 0

"$0 0 4

"%3 0 2

Normalize

FedMT(E) : plug-in

FedAvg

Update order

Same backbone

Client update

Same backbone

Server update

Figure 1: Illustration of the problem setting and our proposed FedMT method. (a) We consider diﬀerent

label spaces (i.e., desired label space

with

classes and the other space

with

classes) where classes may overlap,

such as

and

. Annotation within the desired label space is typically more challenging and resource-intensive,

resulting in a scarcity of labeled samples. (b) We employ a ﬁxed label space correspondence matrix

to establish

associations between label spaces, eﬀectively linking

and

. Our method, denoted as FedMT (T), locally corrects

class scores

using

within the FedAvg framework. In instances where the correspondence matrix

is unknown, we

propose a pseudo-label based method to estimate

. Subsequently, FedMT (E) incorporates

into the loss function

to correct class scores.

Problem setting: To address the question, we consider a simpliﬁed but generalizable setting as illustrated in

Fig. 1 that two types of labeling criteria exist in FL. The label spaces are not necessarily nested, namely, it is

possible for a class in one label space to overlap with multiple classes in another space (e.g., disease diagnoses

often exhibit imperfect agreement). Additionally, drawing inspiration from a healthcare scenario, we assume a

limited availability of labeled data (

5%) within the desired label space. One supercenter adheres to the

complex labeling criterion according to the desired label space, while others use another distinct simpler

labeling criterion. The supercenter serves as the coordinating server for FL but also engages in local model

updating like other clients. All centers jointly train an FL model following the standard FL training protocol,

as shown in Fig. 1 (b).

Under the problem setting described above, alternative approaches to handling diﬀerent label spaces

include personalized FL (Collins et al., 2021). However, these methods often neglect to exploit the inherent

correspondence between diﬀerent label spaces. Alternatively, transfer learning (Yang et al., 2019), which

involves pre-training a model in one space and ﬁne-tuning it in other spaces, can be an alternative solution in

FL, but suboptimal pre-training may lead to negative transfer (Chen et al., 2019). Other centralized strategies

that require the pooling of all data features for similarity comparison through complex training strategies (Hu

et al., 2022) increase privacy risk and are impractical for widely used FL methods like FedAvg. In light of the

limitations associated with these methods, we aim to address two key challenges: a) simultaneously leveraging

diﬀerent types of labels and their correspondences without additional feature exchange, and b) learning the

FL model end-to-end.

In this work, we introduce a plug-and-play method called FedMT. This versatile strategy seamlessly

integrates with various FL pipelines, such as FedAvg. Speciﬁcally, we use models with identical architectures,

with the output dimension matching the number of classes in the desired label space across all centers. To

utilize client data from other label spaces for supervision, we employ a probability projection to align the two

spaces by mapping class scores.

Contributions: Our contributions are multifaceted. First, we explore a novel and underexplored problem

setting–FL under mixed-type labels. This is particularly signiﬁcant in real-world applications, notably

within the realm of medical care. Second, we propose a novel and versatile FL method, FedMT, which is a

computationally eﬃcient and versatile solution; Third, we provide theoretical analysis on the generalization

error of learning from data using mixed-type with projection matrix; Lastly, our approach shows better

performance in predicting in the desired label space compared to other methods, demonstrated by extensive

experiments on benchmark datasets in diﬀerent settings and real medical data. Additionally, we are also able

to predict in the other space as a byproduct and we observe improved classiﬁcation compared with other

feasible baseline methods.

2 Related Work

Federated learning (FL) FL is emerging as a learning paradigm for distributed clients that bypasses

data sharing to train local models collaboratively. To aggregate the model parameters, FedAvg (McMahan

et al., 2017) would be the most widely used approach in FL. FedAvg variants have been proposed to improve

optimization (Reddi et al., 2020; Rothchild et al., 2020) and for non-iid data (Li et al., 2021, 2020; Karimireddy

et al., 2020). Semi-supervised FL (Jeong et al., 2021; Bdair et al., 2021; Diao et al., 2022) considers scenarios

where client samples are unlabeled. In contrast, our client data contains labeled samples but from diﬀerent

label spaces. Additionally, FL with samples from a single class on each client (Yu et al., 2020) diﬀers from

our setting, as all samples are labeled using the same criterion in theirs.

Learning with labels of varying granularity In centralized setting, learning with labels of diﬀerent

granularity often involves the coarse-to-ﬁne (C2F) label learning scenario, aiming to learn ﬁne-grained labels

given a set of coarse-grained labels. Notably, approaches such as Chen et al. (2021a); Touvron et al. (2021);

Taherkhani et al. (2019); Ristin et al. (2015) heavily rely on the hierarchical assumption of coarse and ﬁne

labels, assuming knowledge of the hierarchical structure. In contrast, our proposed method refrains from

assuming the hierarchical structure of the two sets of labels, and we do not presume knowledge of this

hierarchical structure. Moreover, centralized C2F methods, including Touvron et al. (2021); Hu et al. (2022),

typically necessitate access to and communication of features from both types of labels (and consequently

diﬀerent clients) to calculate their losses. In our problem setting, each client possesses only one type of label.

Extend these centralized C2F methods to the FL setting would incur additional communication costs and

increase privacy risk.

3 Methods

In this section, we ﬁrst overview the classical FL approach, then present the mathematical formulation of our

problem and the proposed method.

3.1 Problem Formulation

We address a classiﬁcation problem in which diﬀerent labeling criteria are used at various data centers. For

the sake of clarity, we consider two labeling criteria, denoted as

{Yk}K

k=1

and

Yj}J

j=1

with

K > J

In this scenario, let (

x, y, ey

)

∈ X × Y × e

, where both

and

serve as labels for the feature

under

two diﬀerent labeling criteria. The key constraint is the observation of only one label at each data center.

Driven by applications in the medical ﬁeld, a ‘specialized center’ is established, and its limited dataset

originates from the desired label space. This center is referred to as a server, and its dataset is denoted

{

(

i, ys

) :

i∈

[

]

}

, where

i∈ Y

. Furthermore, a total of

labeled data, characterized by

a diﬀerent criterion, is distributed across

clients. For each

c∈

[

denotes the indices of data in

{

(

i,eyc

) :

i∈Sc}

on the

-th client. Here,

eyc

i∈e

represents labels from the alternative label space.

The corresponding unobserved label in the desired label space is denoted as

. Importantly, we assume

disjoint datasets across all centers, i.e.,

Si∩Sj

∅

. Let

|Sc|

, and the total labeled data is given by

Pc∈[C]Nc

. Our objective is to train a global classiﬁer in the desired label space denoted as

X →

∆

K−1

using data from diﬀerent label spaces within the system.

3.2 Preliminary: Classical FL

We begin by examining the classical Federated Learning (FL) approach, speciﬁcally FedAvg (McMahan et al.,

2017), wherein all centers adhere to the same labeling criterion. Consider the feature

and label

with joint

density

(

x, y

). A

-class classiﬁer

X →

∆

K−1

models the class score as

(

k|x

) =

(

), where

denotes the kth element of f. The label is predicted via by= arg maxk∈[K]fk(x).

In the classical FL setup, each client

c∈

[

] possesses an independent identically distributed (IID) labeled

training set

{

(

i, yc

)

}Nc

i=1

of size

. The objective of classical FL is to enable

clients to jointly train

a global classiﬁer

that generalizes well with respect to

(

x, y

) without sharing their local data

. With

access to data in all centers, the overall risk is R(f) = C−1PC

c=1 b

Rc(f;Dc) where

Rc(f;Dc) = 1

i=1

ℓCE(f(xc

i), yc

i),(1)

the cross-entropy loss ℓCE(fc(x), y) = −PK

k=1 (y=k) log fc

k(x), and (·) is the indicator function.

To minimize the overall loss

(

) without data sharing, FedAvg involves alternating between a few local

stochastic gradient updates on clients using client data, followed by an average update on the server.

However, in cases where labeling criteria diﬀer across data centers, class scores across clients may not align.

For instance, in a deep neural network, the number of neurons in the last layer may vary due to mismatches

in the number of classes on diﬀerent clients, leading to the failure of the server to average the model weights.

Therefore, we propose a novel approach to deal with diﬀerent labeling criteria.

3.3 Proposed Method

For classiﬁcation problems, the standard CE loss, denoted as

ℓCE

, assumes that the dimension of class scores

matches the number of classes. In our scenario, we can construct the server CE loss

(

;

) as in

(1)

using

server data. However, the class scores

(

)

∈

∆

K−1

, while

eyc∈e

is a

-class label on clients. This mismatch

prevents the direct use of the conventional CE loss on clients, prompting the question of how to evaluate

risk on these clients. To leverage the communication eﬃciency of FedAvg, we employ identical backbones on

both the server and the clients. Constructing loss functions on clients becomes crucial for our classiﬁcation

problem with mixed-type labels. The core idea behind our proposed method is to align class scores and labels

through projection. Further details of the method are provided below.

Client loss construction Each client sample has an unobserved label in the desired label space

. For a

given image

, our objective is to ﬁnd the

-class scores

(

Yk|xc

)

k=1

. According to the law of total

probability:

P(eyc=e

Yj|xc) = X

P(eyc=e

Yj|yc=Yk,xc)P(yc=Yk|xc)

P(eyc=e

Yj|yc=Yk)P(yc=Yk|xc).

Here, the last equality is based on the assumption of instance independence in conditional probability. This

equality allows K-class scores to be expressed linearly in terms of J-class scores. Let M∈[0,1]J×Kwith

Mjk =P(eyc=e

Yj|yc=Yk).(2)

When Mis known, we naturally obtain the following local loss based on projection class scores:

c(f;Dc) = 1

i=1

ℓCE(Mf(xc

i),eyc

i) (3)

for any

c∈

[

]. As each local loss involves only samples from the corresponding client and does not require

coordination with other clients, we could jointly optimize all data centers using general FL strategies, such

as FedAvg (McMahan et al., 2017) we use, or other variants such as Li et al. (2020), Li et al. (2021),

and Karimireddy et al. (2020).

Moreover, it is worth emphasizing that our proposed method readily extends to scenarios where label

spaces diﬀer across clients. This extension can be achieved by integrating the client-speciﬁc correspondence

matrix, denoted as Mc, into the projection-based local loss (3).

3.4 Estimation of the correspondence matrix

The correspondence matrix

plays a crucial role in our proposed method. While the method description

above assumes knowledge of the projection matrix, practical application often involves direct computation of

using domain knowledge. However, such domain knowledge is not available in some cases. To address this,

we introduce an eﬀective procedure for estimating Mfrom the data.

Algorithm 1 FL using FedMT (Ours)

Server Input: server model fs,small server dataset Ds={(xs,ys)}where ys∈ Y

Client Input: aggregation step-size

ηagg

, and global communication round

, regularization parameters

λ1

λ2, client dataset Dc={(xc,yc)}

1: For r= 1 →Rrounds, run Aon server and B &C on each client iteratively.

2: procedure A.ServerModelUpdate(r)

3: fs←f▷Receive updated model from proc. C

4: for τ= 1 →tdo

5: fs←fs−ηsgd · ∇b

Rs(fs;Ds)

6: send fsto proc. B

7: procedure B.ClientModelUpdate(r)

8: fc←f▷Receive updated model from proc. A

9: Generate pseudo-labels byc

i=fc(xc

10: Construct high-conﬁdence sample subset Sﬁx

c(ξ) = {i∈Sc: maxkfc

k(xc

i)> ξ}

11: if Sﬁx

c(ξ) = ∅then

12: stop & return.

13: else

14: Construct an equal-size mixup subset Smix

c(ξ) = Sample |Sﬁx

c(ξ)|with replacement from Sc}

15: for τ= 1 →tdo

16: for batch (xﬁx

b,byﬁx

b),(xmix

b,bymix

b) with b∈Sﬁx

c, Smix

cdo

17: λmix ∼Beta(a, a)

18: xmix ←λmixxﬁx

b+ (1 −λmix)xmix

19: Lﬁx ←ℓ(f(A(xﬁx

b)),byﬁx

20: Lmix ←λmixℓ(f(xmix),byﬁx

b) + (1 −λmix)ℓ(f(xmix),bymix

b))

21: if Mis not known then

22: Estimate b

Mvia (4)

23: Construct local loss b

Rreg

c(fc;Dc) via (3.4)

24: fc←fc−ηsgd∇b

Rreg

c(fc;Dc)

25: send fl−fto proc. C for l∈[C]

26: procedure C. ModelAgg(r)

27: receive model updates from proc. B

28: f←f−ηagg ·Pc∈{[C]}(fc−f)

29: broadcast fto proc. A

Recalling the deﬁnition of the correspondence matrix in

(2)

, when both sets of labels are known for

each sample, a practical estimate for

Mjk

is to use its empirical version

Mjk

Pi∈Sc

(

eyc

Yj, yc

)

/Pi∈Sc

(

). However, a challenge arises with this estimate as the true label

in the desired

label space remains unknown for client data. To overcome this, we use the aggregated model to predict

and

view the predicted pseudo-label as the true label for estimation. To enhance the eﬃciency of the estimator,

we estimate the correspondence matrix using only samples with high-conﬁdence pseudo-labels.

In particular, let

be the aggregated model weight in the

th communication round. Let

byc

arg maxkfr

k(xc

i) represent the pseudo-label for the ith sample on the cth client. Additionally, deﬁne

Sﬁx

c(ξ) = {i∈Sc: max

kfr

k(xc

i)> ξ}

as the set of high-conﬁdence samples on the

th client in the

th communication round. The conﬁdence

threshold 0

< ξ <

1 serves as a preselected hyperparameter for all clients. If, for a speciﬁc client

, we ﬁnd

that

Sﬁx

(

) =

∅

, then the process halts and the client refrains from transmitting data to the server. Otherwise,

the correspondence matrix on cth client in rth communication round is deﬁned as

jk =Pi∈Sfix

c(ξ)(eyc

i=e

Yj,byc

i=Yk)

Pi∈Sfix

c(ξ)(byc

i=Yk).(4)

The estimated correspondence matrix also applies to the scenario where label spaces diﬀer across clients.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FedMT:FederatedLearningwithMixed-typeLabelsQiongZhangInstituteofStatisticsandBigDataRenminUniversityofChinaJingPeng∗InstituteofStatisticsandBigDataRenminUniversityofChinaXinZhang∗MetaAIAlineTalhoukDepartmentofObstetrics&GynecologyTheUniversityofBritishColumbiaGangNiuRIKENXiaoxiaoLi†DepartmentofElect...

展开>> 收起<<

FedMT Federated Learning with Mixed-type Labels Qiong Zhang Institute of Statistics and Big Data.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FedMT Federated Learning with Mixed-type Labels Qiong Zhang Institute of Statistics and Big Data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: