RETHINKING NORMALIZATION METHODS IN FEDERATED LEARNING Zhixu Du

2025-04-29 0 0 762.27KB 9 页 10玖币

侵权投诉

RETHINKING NORMALIZATION METHODS IN FEDERATED

LEARNING

Zhixu Du∗

Duke University

zhixu.du@duke.edu

Jingwei Sun∗

Duke University

jingwei.sun@duke.edu

Ang Li

Duke University

ang.li630@duke.edu

Pin-Yu Chen

IBM Research AI

pin-yu.chen@ibm.com

Jianyi Zhang

Duke University

jianyi.zhang@duke.edu

Hai "Helen" Li

Duke University

hai.li@duke.edu

Yiran Chen

Duke University

yiran.chen@duke.edu

ABSTRACT

Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks

by not explicitly sharing private data. In this work, we explicitly uncover

external covariate shift

problem in FL, which is caused by the independent local training processes on different devices. We

demonstrate that external covariate shifts will lead to the obliteration of some devices’ contributions

to the global model. Further, we show that normalization layers are indispensable in FL since their

inherited properties can alleviate the problem of obliterating some devices’ contributions. However,

recent works have shown that batch normalization, which is one of the standard components in many

deep neural networks, will incur accuracy drop of the global model in FL. The essential reason for

the failure of batch normalization in FL is poorly studied. We unveil that external covariate shift is

the key reason why batch normalization is ineffective in FL. We also show that layer normalization is

a better choice in FL which can mitigate the external covariate shift and improve the performance

of the global model. We conduct experiments on CIFAR10 under non-IID settings. The results

demonstrate that models with layer normalization converge fastest and achieve the best or comparable

accuracy for three different model architectures.

Keywords: Federated Learning, Batch normalization, Layer normalization

1 Introduction

Federated learning (FL) McMahan et al

(2017); Tang et al

(2021) is a popular distributed learning approach that

enables a large number of devices to train a shared model in a federated fashion without explicitly sharing their local

data. In order to reduce communication cost, most FL methods enable participating devices to conduct multiple steps

of training before uploading their local models to the central server for aggregation. However, multiple steps of local

training on edge devices would cause internal covariate shift Ioffe and Szegedy (2015) on local models, which is a

known problem in the centralized (non-FL) setting. Internal covariate shift describes the phenomenon that during

the training of deep neural networks (DNN), each layer’s input distribution varies due to the parameter changes of

preceding layers. Such an issue requires the internal neurons in a given layer to adapt to varying input distributions, and

hence slows down the convergence of model training.

∗Equal contribution

arXiv:2210.03277v1 [cs.LG] 7 Oct 2022

Centralized

training BatchNorm

Local

training Aggregation

Internal covariate shift

External covariate shift Device 1

Device 2

Device N

…

Figure 1: Internal and external covariate shift.

Internal covariate shift has been well studied in the centralized learning scenarios and an effective approach to mitigate

this issue is batch normalization. Further, batch normalization has many good properties which will stable the training

process exploited by previous work. In FL systems, participating devices perform several batches of local training in

each communication round, thus, internal covariate shift raises a concern for the local training. In FL, the updates of

model parameters vary across devices during local training. Without any constrains, the internal covariate shift across

devices will be varied, leading to gaps of statistics information given the same channel among different devices. We

name this unique phenomenon in FL as external covariate shift. Due to external covariate shift, the model neurons of a

given channel on one device need to adapt to the feature distribution of the same channel on other devices, which slows

down the convergence of global model training. Further, external covariate shift may also lead to large discrepancy in

the norm of weights and may obliterate contribution from devices with weights of small norm.

We show in this paper that inherited good properties of normalization will shed light on solving external covariate shift.

However, existing works Li et al

(2021); Hsieh et al

(2020) show that batch normalization will incur the accuracy

drop of global model in FL. These works simply attribute the failure of batch normalization in FL to the discrepancies

of local data distributions across devices. In this work, we show our key observation that the ineffectiveness of batch

normalization in FL is not only caused by the data distribution discrepancies, but also resulted from the diverged internal

covariate shift among different devices due to the stochastic training process. Batch normalization drops the accuracy

of global model when applied to solve external covariate shift because the feature distribution of the global model after

aggregation is not predictable. Further, we also show that layer normalization does not suffer from the problem and can

server as the placement of batch normalization in FL.

The experiment results demonstrate that layer normalization can effectively mitigate the external covariate shift and

speedup the convergence of the global model training. In particular, layer normalization achieves the fastest convergence

and best or comparable accuracy upon convergence on three different model architectures.

Our key contributions are summarized as follows:

•

To the best of our knowledge, this is the ﬁrst work to explicitly reveal external covariate shift in FL, which is

an important issue that affects the convergence of FL training.

•

We propose a simple yet effective placement of batch normalization in Federated Learning, i.e., layer nor-

malization, which can effectively mitigate the external covariate shift and speedup the convergence of FL

training.

2 Preliminaries

2.1 Internal covariate shift and Activation normalization

In the training of deep neural networks, each layer’s input distribution keeps changing due to updates of parameters in

the preceding layer. Consequently, layers are forced to keep adapting to the varying input distributions, leading to slow

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RETHINKINGNORMALIZATIONMETHODSINFEDERATEDLEARNINGZhixuDuDukeUniversityzhixu.du@duke.eduJingweiSunDukeUniversityjingwei.sun@duke.eduAngLiDukeUniversityang.li630@duke.eduPin-YuChenIBMResearchAIpin-yu.chen@ibm.comJianyiZhangDukeUniversityjianyi.zhang@duke.eduHai"Helen"LiDukeUniversityhai.li@duke.eduY...

展开>> 收起<<

RETHINKING NORMALIZATION METHODS IN FEDERATED LEARNING Zhixu Du.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

RETHINKING NORMALIZATION METHODS IN FEDERATED LEARNING Zhixu Du

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: