RETHINKING NORMALIZATION METHODS IN FEDERATED LEARNING Zhixu Du

2025-04-29 0 0 762.27KB 9 页 10玖币
侵权投诉
RETHINKING NORMALIZATION METHODS IN FEDERATED
LEARNING
Zhixu Du
Duke University
zhixu.du@duke.edu
Jingwei Sun
Duke University
jingwei.sun@duke.edu
Ang Li
Duke University
ang.li630@duke.edu
Pin-Yu Chen
IBM Research AI
pin-yu.chen@ibm.com
Jianyi Zhang
Duke University
jianyi.zhang@duke.edu
Hai "Helen" Li
Duke University
hai.li@duke.edu
Yiran Chen
Duke University
yiran.chen@duke.edu
ABSTRACT
Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks
by not explicitly sharing private data. In this work, we explicitly uncover
external covariate shift
problem in FL, which is caused by the independent local training processes on different devices. We
demonstrate that external covariate shifts will lead to the obliteration of some devices’ contributions
to the global model. Further, we show that normalization layers are indispensable in FL since their
inherited properties can alleviate the problem of obliterating some devices’ contributions. However,
recent works have shown that batch normalization, which is one of the standard components in many
deep neural networks, will incur accuracy drop of the global model in FL. The essential reason for
the failure of batch normalization in FL is poorly studied. We unveil that external covariate shift is
the key reason why batch normalization is ineffective in FL. We also show that layer normalization is
a better choice in FL which can mitigate the external covariate shift and improve the performance
of the global model. We conduct experiments on CIFAR10 under non-IID settings. The results
demonstrate that models with layer normalization converge fastest and achieve the best or comparable
accuracy for three different model architectures.
Keywords: Federated Learning, Batch normalization, Layer normalization
1 Introduction
Federated learning (FL) McMahan et al
.
(2017); Tang et al
.
(2021) is a popular distributed learning approach that
enables a large number of devices to train a shared model in a federated fashion without explicitly sharing their local
data. In order to reduce communication cost, most FL methods enable participating devices to conduct multiple steps
of training before uploading their local models to the central server for aggregation. However, multiple steps of local
training on edge devices would cause internal covariate shift Ioffe and Szegedy (2015) on local models, which is a
known problem in the centralized (non-FL) setting. Internal covariate shift describes the phenomenon that during
the training of deep neural networks (DNN), each layer’s input distribution varies due to the parameter changes of
preceding layers. Such an issue requires the internal neurons in a given layer to adapt to varying input distributions, and
hence slows down the convergence of model training.
Equal contribution
arXiv:2210.03277v1 [cs.LG] 7 Oct 2022
Centralized
training BatchNorm
Local
training Aggregation
Internal covariate shift
External covariate shift Device 1
Device 2
Device N
Figure 1: Internal and external covariate shift.
Internal covariate shift has been well studied in the centralized learning scenarios and an effective approach to mitigate
this issue is batch normalization. Further, batch normalization has many good properties which will stable the training
process exploited by previous work. In FL systems, participating devices perform several batches of local training in
each communication round, thus, internal covariate shift raises a concern for the local training. In FL, the updates of
model parameters vary across devices during local training. Without any constrains, the internal covariate shift across
devices will be varied, leading to gaps of statistics information given the same channel among different devices. We
name this unique phenomenon in FL as external covariate shift. Due to external covariate shift, the model neurons of a
given channel on one device need to adapt to the feature distribution of the same channel on other devices, which slows
down the convergence of global model training. Further, external covariate shift may also lead to large discrepancy in
the norm of weights and may obliterate contribution from devices with weights of small norm.
We show in this paper that inherited good properties of normalization will shed light on solving external covariate shift.
However, existing works Li et al
.
(2021); Hsieh et al
.
(2020) show that batch normalization will incur the accuracy
drop of global model in FL. These works simply attribute the failure of batch normalization in FL to the discrepancies
of local data distributions across devices. In this work, we show our key observation that the ineffectiveness of batch
normalization in FL is not only caused by the data distribution discrepancies, but also resulted from the diverged internal
covariate shift among different devices due to the stochastic training process. Batch normalization drops the accuracy
of global model when applied to solve external covariate shift because the feature distribution of the global model after
aggregation is not predictable. Further, we also show that layer normalization does not suffer from the problem and can
server as the placement of batch normalization in FL.
The experiment results demonstrate that layer normalization can effectively mitigate the external covariate shift and
speedup the convergence of the global model training. In particular, layer normalization achieves the fastest convergence
and best or comparable accuracy upon convergence on three different model architectures.
Our key contributions are summarized as follows:
To the best of our knowledge, this is the first work to explicitly reveal external covariate shift in FL, which is
an important issue that affects the convergence of FL training.
We propose a simple yet effective placement of batch normalization in Federated Learning, i.e., layer nor-
malization, which can effectively mitigate the external covariate shift and speedup the convergence of FL
training.
2 Preliminaries
2.1 Internal covariate shift and Activation normalization
In the training of deep neural networks, each layer’s input distribution keeps changing due to updates of parameters in
the preceding layer. Consequently, layers are forced to keep adapting to the varying input distributions, leading to slow
2
摘要:

RETHINKINGNORMALIZATIONMETHODSINFEDERATEDLEARNINGZhixuDuDukeUniversityzhixu.du@duke.eduJingweiSunDukeUniversityjingwei.sun@duke.eduAngLiDukeUniversityang.li630@duke.eduPin-YuChenIBMResearchAIpin-yu.chen@ibm.comJianyiZhangDukeUniversityjianyi.zhang@duke.eduHai"Helen"LiDukeUniversityhai.li@duke.eduY...

展开>> 收起<<
RETHINKING NORMALIZATION METHODS IN FEDERATED LEARNING Zhixu Du.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:762.27KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注