Addressing Heterogeneity in Federated Learning via Distributional Transformation Haolin Yuan1 Bo Hui1 Yuchen Yang1 Philippe Burlina12

2025-04-30 0 0 1.68MB 18 页 10玖币
侵权投诉
Addressing Heterogeneity in Federated Learning
via Distributional Transformation
Haolin Yuan1, Bo Hui1, Yuchen Yang1, Philippe Burlina1,2,
Neil Zhenqiang Gong3, and Yinzhi Cao1
1Department of Computer Science, Johns Hopkins University
{hyuan4, bo.hui, yc.yang, yinzhi.cao}@jhu.edu
2Johns Hopkins University Applied Physics Laboratory (JHU/APL)
Philippe.Burlina@jhuapl.edu
3Duke University
neil.gong@duke.edu
Abstract.
Federated learning (FL) allows multiple clients to collabora-
tively train a deep learning model. One major challenge of FL is when
data distribution is heterogeneous, i.e., differs from one client to another.
Existing personalized FL algorithms are only applicable to narrow cases,
e.g., one or two data classes per client, and therefore they do not satis-
factorily address FL under varying levels of data heterogeneity. In this
paper, we propose a novel framework, called DisTrans, to improve FL
performance (i.e., model accuracy) via train and test-time distributional
transformations along with a double-input-channel model structure. Dis-
Trans works by optimizing distributional offsets and models for each
FL client to shift their data distribution, and aggregates these offsets at
the FL server to further improve performance in case of distributional
heterogeneity. Our evaluation on multiple benchmark datasets shows
that DisTrans outperforms state-of-the-art FL methods and data aug-
mentation methods under various settings and different degrees of client
distributional heterogeneity.
1 Introduction
Federated learning [
35
,
30
,
18
,
48
] (FL) is an emerging distributed machine learning
(ML) framework that enables clients to learn models together with the help of a
central server. In FL, each client learns a local model that is sent to the FL server
for aggregation, and subsequently the FL server returns the aggregated model to
the client. The process is repeated until convergence. One emerging and unsolved
FL challenge is that the data distribution at each client can be heterogeneous.
For example, for FL based skin diagnostics, the skin disease distribution for each
hospital / client can vary significantly. In another use case of smartphone face
verification, data distributions collected at each mobile device can vary from one
client to another. Such distributional heterogeneity often leads to suboptimal
accuracy of the final FL model.
The first three authors have equal contributions to the paper.
arXiv:2210.15025v1 [cs.CV] 26 Oct 2022
2 H. Yuan et al.
upload
download
global server
global offset
local model
local offset
client1
client2clientc
...
goal 2 goal 1
X: training data t: offset
(1-α)X + αt
(1+α)X - αt
( ),
Local double-channel model
Model aggregation
backbone dense logits
global model
Offset aggregation
global model
global offset
client1 ,
upload
dowload
Fig. 1: The pipelines of DisTrans. Each client jointly optimizes the offset and
model in local training phase, then uploads both to the central server for aggre-
gation. The aggregated model and offset are sent back to clients for next-round.
There are two types of approaches to learn FL models under data hetero-
geneity: (i) improving FL’s training process and (ii) improving clients’ local data.
Unfortunately, neither improves FL under varied levels of data heterogeneity.
On one hand, existing FL methods [
35
,
2
,
14
], especially personalized FLs [
28
,
26
],
learn a model (or even multiple models) using customized loss functions or model
architectures based on heterogeneity level. However, existing personalized FL
algorithms are designed for highly heterogeneous distribution. FedAwS [
50
] can
only train FL models when local client’s data has one positive label. The per-
formance of pfedMe [
43
] and pfedHN [
39
] degrades to even 5% to 18% lower
accuracy than FedAvg [
35
], when the data distribution is between heterogeneity
and homogeneity.
On the other hand, traditional centralized machine learning also rely on
data transformations, i.e., data augmentation, [
7
,
8
,
46
,
52
,
27
,
9
,
53
,
31
] to improve
model’s performance. Such transformations could be used for a pre-processing of
all the training data or an addition to the existing training set. Until very recently,
data transformations are also used during test time [
38
,
40
,
21
,
15
,
42
] to improve
learning models, e.g., adversarial robustness [
38
]. However, it remains unclear
whether and how data transformation can improve FL particularly under different
client heterogeneity. The major challenge is how to tailor transformations for
each client with different data distributions.
In this paper, we propose the first FL distributional transformation frame-
work, called DisTrans, to address this heterogeneity challenge by altering local
data distributions via a client-specific data shift applied both on train and
test/inference data. Our distributional transformation alters each client’s data
distribution so that such distribution becomes less heterogeneous and thus the
local models can be better aggregated at the server. Specifically, DisTrans
performs a so-called joint optimization, at each client, to train the local model
and generate an offset that is added to the local data. That is, an DisTrans’s
FL Distributional Transformation 3
client alternately performs two steps in each round: 1) optimizing the personalized
offset to transform the local data via distribution shifts and 2) optimizing a local
model to fit its offsetted local data. After client-side optimization, the FL server
aggregates both the personalized offsets and the local models from all the clients
and sends the aggregated global model and offset back to each client. During
testing, each client adds its personalized offset to each testing input before using
the global model to predict its label.
DisTrans is designed with a special network architecture, called a double-
input-channel model, to accommodate client-side offsets. This double-input-
channel model has a backbone network shared by both channels, a dense layer
accepting outputs from two channels in parallel, and a logits layer that merges
channel-related outputs from the dense layer. This double architecture allows the
offset to be added to an (training or testing) input in one channel but subtracted
from the input in the other. Such addition and subtraction better preserves the
information in the original training and testing data because the original data
can be recovered from the data with offset in the two channels.
We perform extensive evaluation of DisTrans using five different image
datasets and compare it against state-of-the-art (SOTA) methods. Our eval-
uation shows that DisTrans outperforms SOTA FL methods across various
distributional settings of the clients’ local data by 1%–10% with respect to testing
accuracy. Moreover, our evaluation shows that DisTrans achieves 1%–7% higher
testing accuracy than other data transformation / augmentation approaches, i.e.,
mixup [
51
] and AdvProp [
46
]. The code for DisTrans is made available under
(https://github.com/hyhmia/DisTrans).
2 Related Work
Existing federated learning (FL) studies focus on improving accuracy [
35
,
50
,
43
,
39
],
convergence [
12
,
6
,
37
,
17
,
45
,
32
], communication cost [
24
,
41
,
22
,
3
,
23
,
13
,
34
,
49
], se-
curity and privacy [
36
,
10
,
5
,
4
], or others [
16
,
20
,
11
,
47
]. Our work focuses on FL
accuracy.
Personalized Federated Learning.
Prior studies [
43
,
50
,
39
] have attempted
to address personalization, i.e., to make a model better fit a client’s local training
data. For instance, FedAwS [
50
] investigates FL problems where each local model
only has access to the positive data associated with only a single class and
imposes a geometric regularizer at the server after each round to encourage
classes to spread out in the embedding space. pFedMe [
43
] formulates a new
bi-level optimization problem and uses Moreau envelopes to regularize each client
loss function and to decouple personalized model optimization from the global
model learning. pFedHN [
39
] utilizes a hypernetwork model as the global model
to generate weights for each local model. MOON [
29
] uses contrastive learning
to maximize the agreement between local and global model.
Data Transformation.
Data transformation applies label-preserving trans-
formations to images and is a standard technique to improve model accu-
racy in centralized learning. Most of the recent data transformation meth-
4 H. Yuan et al.
0 5 1 0 1 5
V a l u e o f w
0 . 0
0 . 5
1 . 0
1 . 5
T r a i n i n g l o s s
C l i e n t 1
C l i e n t 2
C l i e n t 1 w i t h o f f s e t 1
C l i e n t 2 w i t h o f f s e t 2
(a) y=cos(wx) on local clients.
2 3 4
V a l u e o f g l o b a l w
0 . 5
1 . 0
1 . 5
2 . 0
2 . 5
G l o b a l t r a i n i n g l o s s
C l i e n t 1
C l i e n t 2
C l i e n t 1 w i t h o f f s e t 1
C l i e n t 2 w i t h o f f s e t 2
(b) y=wx on FL clients.
Fig. 2: Training loss with respect to optimal weight
w
on two clients’ local training
data with and w/o offset. We observe that offsets can make the training loss
against weight more consistent on local clients and help FL model converge.
ods [
7
,
8
,
46
,
52
,
27
,
9
,
53
,
31
] focus on transforming datasets during the training
phase. For instance, mixup [
51
] transforms the training data by mixing up the
features and their corresponding labels; and AdvProp [
46
] transforms the training
data by adding adversarial examples. Additionally, transforming data at testing
time [
38
,
40
,
21
,
15
,
42
] has received increased attention. The basic test-time trans-
formations use multiple data augmentations [
15
,
42
] at test time to classify one
image and get the averaged results. P´erez et.al [
38
] aims to enhance adversarial
robustness via test-time transformation. As a comparison, DisTrans is the first
to utilize test-time transformation to improve federated learning accuracy under
data heterogeneity.
3 Motivation
DisTrans’s intuition is to transform each client’s training and testing data with
offsets to improve FL under heterogeneous data. That is, DisTrans transforms
the client-side data distribution so that the learned local models are less het-
erogeneous and can be better aggregated. To better illustrate this intuition, we
describe two simple learning problems as motivating examples. Specifically, we
show that well-optimized and selected offsets can (i) align two learning problems
at different FL clients and (ii) help the aggregated model converge.
Local Non-convex Learning Problems.
We consider a non-convex learning
problem, i.e.,
f
(
x
) =
cos
(
wx
) where
wR
, at two local clients with heterogeneous
data. The local data is generated via
x, y R
with
y
=
cos
(
wtrue
clientkx
) +
clientk
,
where
x
is drawn i.i.d from Gaussian distribution and
clientk
is Gaussian noise
with mean value as 0. The offsets are
px
+
q
where
p
is a fixed value at both clients
and
q
is chosen via brute force search. Figure 2a shows the squared training
loss with and without offsets. The difference between the training losses of two
learning models are reduced, thus making two clients consistent.
Linear Regression Problems with An Aggregation Server.
We train two
local linear models, i.e.,
f
(
x
) =
wx
with the model parameter
wR2
, aggregate
the parameters at a server following FL, and then repeat the two steps following
摘要:

AddressingHeterogeneityinFederatedLearningviaDistributionalTransformationHaolinYuan1,BoHui1,YuchenYang1,PhilippeBurlina1;2,NeilZhenqiangGong3,andYinzhiCao11DepartmentofComputerScience,JohnsHopkinsUniversityfhyuan4,bo.hui,yc.yang,yinzhi.caog@jhu.edu2JohnsHopkinsUniversityAppliedPhysicsLaboratory(J...

展开>> 收起<<
Addressing Heterogeneity in Federated Learning via Distributional Transformation Haolin Yuan1 Bo Hui1 Yuchen Yang1 Philippe Burlina12.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.68MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注