Addressing Heterogeneity in Federated Learning via Distributional Transformation Haolin Yuan1 Bo Hui1 Yuchen Yang1 Philippe Burlina12

2025-04-30 0 0 1.68MB 18 页 10玖币

侵权投诉

Addressing Heterogeneity in Federated Learning

via Distributional Transformation

Haolin Yuan1∗, Bo Hui1∗, Yuchen Yang1∗, Philippe Burlina1,2,

Neil Zhenqiang Gong3, and Yinzhi Cao1

1Department of Computer Science, Johns Hopkins University

{hyuan4, bo.hui, yc.yang, yinzhi.cao}@jhu.edu

2Johns Hopkins University Applied Physics Laboratory (JHU/APL)

Philippe.Burlina@jhuapl.edu

3Duke University

neil.gong@duke.edu

Abstract.

Federated learning (FL) allows multiple clients to collabora-

tively train a deep learning model. One major challenge of FL is when

data distribution is heterogeneous, i.e., diﬀers from one client to another.

Existing personalized FL algorithms are only applicable to narrow cases,

e.g., one or two data classes per client, and therefore they do not satis-

factorily address FL under varying levels of data heterogeneity. In this

paper, we propose a novel framework, called DisTrans, to improve FL

performance (i.e., model accuracy) via train and test-time distributional

transformations along with a double-input-channel model structure. Dis-

Trans works by optimizing distributional oﬀsets and models for each

FL client to shift their data distribution, and aggregates these oﬀsets at

the FL server to further improve performance in case of distributional

heterogeneity. Our evaluation on multiple benchmark datasets shows

that DisTrans outperforms state-of-the-art FL methods and data aug-

mentation methods under various settings and diﬀerent degrees of client

distributional heterogeneity.

1 Introduction

Federated learning [

] (FL) is an emerging distributed machine learning

(ML) framework that enables clients to learn models together with the help of a

central server. In FL, each client learns a local model that is sent to the FL server

for aggregation, and subsequently the FL server returns the aggregated model to

the client. The process is repeated until convergence. One emerging and unsolved

FL challenge is that the data distribution at each client can be heterogeneous.

For example, for FL based skin diagnostics, the skin disease distribution for each

hospital / client can vary signiﬁcantly. In another use case of smartphone face

veriﬁcation, data distributions collected at each mobile device can vary from one

client to another. Such distributional heterogeneity often leads to suboptimal

accuracy of the ﬁnal FL model.

∗The ﬁrst three authors have equal contributions to the paper.

arXiv:2210.15025v1 [cs.CV] 26 Oct 2022

2 H. Yuan et al.

upload

download

global server

global offset

local model

local offset

client1

client2clientc

...

goal 2 goal 1

X: training data t: offset

(1-α)X + αt

(1+α)X - αt

( ),

Local double-channel model

Model aggregation

backbone dense logits

global model

Offset aggregation

global model

global offset

client1 ,

upload

dowload

Fig. 1: The pipelines of DisTrans. Each client jointly optimizes the oﬀset and

model in local training phase, then uploads both to the central server for aggre-

gation. The aggregated model and oﬀset are sent back to clients for next-round.

There are two types of approaches to learn FL models under data hetero-

geneity: (i) improving FL’s training process and (ii) improving clients’ local data.

Unfortunately, neither improves FL under varied levels of data heterogeneity.

On one hand, existing FL methods [

], especially personalized FLs [

learn a model (or even multiple models) using customized loss functions or model

architectures based on heterogeneity level. However, existing personalized FL

algorithms are designed for highly heterogeneous distribution. FedAwS [

] can

only train FL models when local client’s data has one positive label. The per-

formance of pfedMe [

] and pfedHN [

] degrades to even 5% to 18% lower

accuracy than FedAvg [

], when the data distribution is between heterogeneity

and homogeneity.

On the other hand, traditional centralized machine learning also rely on

data transformations, i.e., data augmentation, [

] to improve

model’s performance. Such transformations could be used for a pre-processing of

all the training data or an addition to the existing training set. Until very recently,

data transformations are also used during test time [

] to improve

learning models, e.g., adversarial robustness [

]. However, it remains unclear

whether and how data transformation can improve FL particularly under diﬀerent

client heterogeneity. The major challenge is how to tailor transformations for

each client with diﬀerent data distributions.

In this paper, we propose the ﬁrst FL distributional transformation frame-

work, called DisTrans, to address this heterogeneity challenge by altering local

data distributions via a client-speciﬁc data shift applied both on train and

test/inference data. Our distributional transformation alters each client’s data

distribution so that such distribution becomes less heterogeneous and thus the

local models can be better aggregated at the server. Speciﬁcally, DisTrans

performs a so-called joint optimization, at each client, to train the local model

and generate an oﬀset that is added to the local data. That is, an DisTrans’s

FL Distributional Transformation 3

client alternately performs two steps in each round: 1) optimizing the personalized

oﬀset to transform the local data via distribution shifts and 2) optimizing a local

model to ﬁt its oﬀsetted local data. After client-side optimization, the FL server

aggregates both the personalized oﬀsets and the local models from all the clients

and sends the aggregated global model and oﬀset back to each client. During

testing, each client adds its personalized oﬀset to each testing input before using

the global model to predict its label.

DisTrans is designed with a special network architecture, called a double-

input-channel model, to accommodate client-side oﬀsets. This double-input-

channel model has a backbone network shared by both channels, a dense layer

accepting outputs from two channels in parallel, and a logits layer that merges

channel-related outputs from the dense layer. This double architecture allows the

oﬀset to be added to an (training or testing) input in one channel but subtracted

from the input in the other. Such addition and subtraction better preserves the

information in the original training and testing data because the original data

can be recovered from the data with oﬀset in the two channels.

We perform extensive evaluation of DisTrans using ﬁve diﬀerent image

datasets and compare it against state-of-the-art (SOTA) methods. Our eval-

uation shows that DisTrans outperforms SOTA FL methods across various

distributional settings of the clients’ local data by 1%–10% with respect to testing

accuracy. Moreover, our evaluation shows that DisTrans achieves 1%–7% higher

testing accuracy than other data transformation / augmentation approaches, i.e.,

mixup [

] and AdvProp [

]. The code for DisTrans is made available under

(https://github.com/hyhmia/DisTrans).

2 Related Work

Existing federated learning (FL) studies focus on improving accuracy [

convergence [

], communication cost [

], se-

curity and privacy [

], or others [

]. Our work focuses on FL

accuracy.

Personalized Federated Learning.

Prior studies [

] have attempted

to address personalization, i.e., to make a model better ﬁt a client’s local training

data. For instance, FedAwS [

] investigates FL problems where each local model

only has access to the positive data associated with only a single class and

imposes a geometric regularizer at the server after each round to encourage

classes to spread out in the embedding space. pFedMe [

] formulates a new

bi-level optimization problem and uses Moreau envelopes to regularize each client

loss function and to decouple personalized model optimization from the global

model learning. pFedHN [

] utilizes a hypernetwork model as the global model

to generate weights for each local model. MOON [

] uses contrastive learning

to maximize the agreement between local and global model.

Data Transformation.

Data transformation applies label-preserving trans-

formations to images and is a standard technique to improve model accu-

racy in centralized learning. Most of the recent data transformation meth-

4 H. Yuan et al.

0 5 1 0 1 5

V a l u e o f w

0 . 0

0 . 5

1 . 0

1 . 5

T r a i n i n g l o s s

C l i e n t 1

C l i e n t 2

C l i e n t 1 w i t h o f f s e t 1

C l i e n t 2 w i t h o f f s e t 2

(a) y=cos(wx) on local clients.

2 3 4

V a l u e o f g l o b a l w

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

G l o b a l t r a i n i n g l o s s

C l i e n t 1

C l i e n t 2

C l i e n t 1 w i t h o f f s e t 1

C l i e n t 2 w i t h o f f s e t 2

(b) y=wx on FL clients.

Fig. 2: Training loss with respect to optimal weight

on two clients’ local training

data with and w/o oﬀset. We observe that oﬀsets can make the training loss

against weight more consistent on local clients and help FL model converge.

ods [

] focus on transforming datasets during the training

phase. For instance, mixup [

] transforms the training data by mixing up the

features and their corresponding labels; and AdvProp [

] transforms the training

data by adding adversarial examples. Additionally, transforming data at testing

time [

] has received increased attention. The basic test-time trans-

formations use multiple data augmentations [

] at test time to classify one

image and get the averaged results. P´erez et.al [

] aims to enhance adversarial

robustness via test-time transformation. As a comparison, DisTrans is the ﬁrst

to utilize test-time transformation to improve federated learning accuracy under

data heterogeneity.

3 Motivation

DisTrans’s intuition is to transform each client’s training and testing data with

oﬀsets to improve FL under heterogeneous data. That is, DisTrans transforms

the client-side data distribution so that the learned local models are less het-

erogeneous and can be better aggregated. To better illustrate this intuition, we

describe two simple learning problems as motivating examples. Speciﬁcally, we

show that well-optimized and selected oﬀsets can (i) align two learning problems

at diﬀerent FL clients and (ii) help the aggregated model converge.

Local Non-convex Learning Problems.

We consider a non-convex learning

problem, i.e.,

(

) =

cos

(

) where

w∈R

, at two local clients with heterogeneous

data. The local data is generated via

x, y ∈R

with

cos

(

wtrue

clientkx

) +

clientk

where

is drawn i.i.d from Gaussian distribution and

clientk

is Gaussian noise

with mean value as 0. The oﬀsets are

where

is a ﬁxed value at both clients

and

is chosen via brute force search. Figure 2a shows the squared training

loss with and without oﬀsets. The diﬀerence between the training losses of two

learning models are reduced, thus making two clients consistent.

Linear Regression Problems with An Aggregation Server.

We train two

local linear models, i.e.,

(

) =

with the model parameter

w∈R2

, aggregate

the parameters at a server following FL, and then repeat the two steps following

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AddressingHeterogeneityinFederatedLearningviaDistributionalTransformationHaolinYuan1,BoHui1,YuchenYang1,PhilippeBurlina1;2,NeilZhenqiangGong3,andYinzhiCao11DepartmentofComputerScience,JohnsHopkinsUniversityfhyuan4,bo.hui,yc.yang,yinzhi.caog@jhu.edu2JohnsHopkinsUniversityAppliedPhysicsLaboratory(J...

展开>> 收起<<

Addressing Heterogeneity in Federated Learning via Distributional Transformation Haolin Yuan1 Bo Hui1 Yuchen Yang1 Philippe Burlina12.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Addressing Heterogeneity in Federated Learning via Distributional Transformation Haolin Yuan1 Bo Hui1 Yuchen Yang1 Philippe Burlina12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: