Personalized Federated Learning via Heterogeneous Modular Networks Tianchun Wang1 Wei Cheng2 Dongsheng Luo3 Wenchao Yu2 Jingchao Ni4 Liang Tong2

2025-05-02 1 0 790.99KB 7 页 10玖币

侵权投诉

Personalized Federated Learning via Heterogeneous

Modular Networks

Tianchun Wang1, Wei Cheng2, Dongsheng Luo3, Wenchao Yu2, Jingchao Ni4, Liang Tong2,

Haifeng Chen2, Xiang Zhang1

1The Pennsylvania State University,2NEC Laboratories America,3Florida International University,4AWS AI Labs, Amazon

{tkw5356, xzz89}@psu.edu, {weicheng, wyu, ltong, haifeng}@nec-labs.com, dluo@ﬁu.edu, jingchni@amazon.com

Abstract—Personalized Federated Learning (PFL) which col-

laboratively trains a federated model while considering local

clients under privacy constraints has attracted much attention.

Despite its popularity, it has been observed that existing PFL

approaches result in sub-optimal solutions when the joint dis-

tribution among local clients diverges. To address this issue,

we present Federated Modular Network (FedMN), a novel PFL

approach that adaptively selects sub-modules from a module

pool to assemble heterogeneous neural architectures for different

clients. FedMN adopts a light-weighted routing hypernetwork

to model the joint distribution on each client and produce the

personalized selection of the module blocks for each client. To

reduce the communication burden in existing FL, we develop an

efﬁcient way to interact between the clients and the server. We

conduct extensive experiments on the real-world test beds and

the results show both effectiveness and efﬁciency of the proposed

FedMN over the baselines.

Index Terms—Federated Learning, Personalized Models, Mod-

ular Networks

I. INTRODUCTION

Federated Learning (FL) emerges as a prospective solu-

tion that facilitates distributed collaborative learning without

disclosing original training data whilst naturally complying

with the government regulations [1], [2]. In practice, the

problem of data heterogeneity deteriorates the performance of

the global FL model on individual clients due to the lack of

solution personalization. To tackle with it, researchers focus

on the Personalized Federated Learning (PFL) which aims

to make the global model ﬁt the distributions on most of

the devices [3], [4]. The vanilla PFL approaches ﬁrst learn

a global model and then locally adapt it to each client by

ﬁne-tuning the global parameters [5], [6]. In this case, the

trained global model can be regarded as a meta-model ready

for further personalization of each local client. In order to build

a better meta-model, many efforts have been done to bridge

the FL and the Model Agnostic Meta Learning (MAML)

[7]–[9]. However, the global generalization error typically

does not decrease much [10] for these approaches. Thus, the

performance can not be signiﬁcantly improved. Another line

of research focuses on jointly training a global model and a

local model for each client to achieve personalization [11],

[12]. This strategy does not perform well on the clients whose

local distributions are far from their average. Cluster-based

PFL approaches [13] address this issue by grouping the clients

into several clusters. The clients in a cluster share the same

model while those belonging to different clusters have different

models. Unfortunately, the model trained in one cluster will

not beneﬁt from the knowledge of the clients in other clusters,

which limits the capability to share knowledge and therefore

results in a sub-optimal solution.

An alternative strategy is adopting the Multi-Task Learning

(MTL) framework to train a PFL model [4], [14]. How-

ever, most existing efforts did not consider the difference in

conditional distribution between clients. It is an important

problem when building a federated model. For example,

labels sometimes reﬂect sentiment. Some users may label

a laptop as cheap while others label it as expensive. This

conditional distribution heterogeneity problem will cause the

model inaccurate on some clients where the p(y|x)is far

from the average. To address the problem, a recent work [10]

assumes the data distribution of each client is a mixture of

Munderlying distributions and proposes a ﬂexible framework

in which each client learns a combination of Mshared

components with different weights. It optimizes the varying

conditional distribution pi(y|x)under the assumption that

the marginal distribution pi(x) = p(x)is the same for all

clients (Assumption 2 in [10]). This assumption, however,

is restricted. For instance, in handwriting recognition, users

who write the same words might still have different stroke

widths, slants, etc. In this cases, pi(x)6=pj(x)for client

iand j. Other works [15], [16] either assume the marginal

distribution pi(x)or the conditional distribution pi(y|x)the

same across clients. In reality, data on each client may be

deviated from being identically distributed, say, Pi6=Pj

for client iand j. That is, the joint distribution Pi(x,y)

(can be rewritten as Pi(y|x)Pi(x)or Pi(x|y)Pi(y)) may be

different across clients. We call it the “joint distribution

heterogeneity” problem. Existing approaches [15], [16] fail to

completely model the difference of joint distribution between

clients because they assume one term to be the same while

varying the other one. Moreover, to accommodate different

data distributions, the homogeneous model would be too large

so that the given prediction power can be satisﬁed. Thus, the

communication costs between the server and clients would be

huge. In this case, communication would be a key bottleneck

to consider when developing FL methods. To this end, it is

desirable to design an effective PFL model to accommodate

heterogeneous clients in an efﬁcient.

To solve the aforementioned problems, in this paper, we pro-

pose a novel Federated Modular Networks (FedMN) approach,

arXiv:2210.14830v2 [cs.LG] 2 Dec 2022

which personalizes heterogeneous clients efﬁciently. The main

idea is that we implicitly partition the clients by modeling their

joint distribution into clusters and the clients in the same clus-

ter have the same architecture. Speciﬁcally, a shared module

pool with layers of module blocks (e.g., MLPs or ConvNets) is

maintained in the server. Each client decides in each update to

assemble a personalized model by selecting a combination of

the blocks from the module pool. We adopt a light-weighted

routing hypernetwork with differentiable routers to generate

the decision of module block selection for each client. The

routing hypernetwork considers the joint distribution pi(x,y)

for client iby taking the joint distribution of the data set as the

input. A decision parameterized by the routing hypernetwork

is a vector of discrete variables following the Bernoulli distri-

bution. It selects a subset of the blocks from the module pool

to form an architecture for each client. Clients with similar

decisions will be implicitly assigned to the same cluster in

each communication round. The proposed FedMN enables a

client to upload only a subset of model parameters to the

server, which decreases the communication burden compared

to traditional FL algorithms. To sum up, our contributions are

as follows: 1) We address the problem of joint distribution

heterogeneity in the personalized FL and propose a FedMN

approach to alleviate this issue. 2) We develop an efﬁcient

way to selectively upload model parameters which decreases

the communication cost between clients and the server. 3)

Extensive experiments on the real-world datasets show both

the effectiveness and the efﬁciency of our proposed FedMN

compared to the state-of-the-arts.

II. METHODOLOGY

Our method adopts modular networks which consist of a

group of encoders in the ﬁrst layer and multiple modular

blocks in the following layers. The connection decisions

between blocks are made by a routing hypernetwork.

A. Modular Networks

The modular networks ﬁrst encode the data feature into low-

dimensional embeddings by a group of encoders motivated by

[17]–[19]. Then, personalized feature embeddings are obtained

by discovering and assembling a set of modular blocks in

different ways for different clients. The modular networks has

Llayers and the l-th layer has nlblocks of sub-network. The

encoders in the 1st layer are n1independent blocks which

learns feature embeddings for each client. Formally, let xibe

the i-th sample, we get the feature embedding z(j)

iafter the

j-th encoder is applied

z(j)

i=Encoder(j)(xi), j = 1, ..., n1.(1)

The choices of encoder networks are ﬂexible. For example,

one can adopt CNNs as encoders for image data and trans-

formers for text data.

The set of feature embeddings {z(1)

i, ..., z(n1)

i}of data point

xiresulting from the encoders in the 1st layer is the input of

the following modular sub-networks constructed by a subset

of the modular blocks. There are L−1layers of blocks in

the sub-networks and each one is independent of the others.

Encoders

clientk

Modular Networks

(Xk, Yk)

Routing Hypernetwork

(output)

(input)

Encoders

client1

Modular Networks

(X1, Y1)E1

(output)

(input)

Modular Blocks

Fig. 1. The FedMN architecture. The modular networks consist of a group

of encoders in the ﬁrst layer and modular blocks in the following layers. The

connection paths between blocks are determined by the routing hypernetwork.

The input of modular networks is in a sample-wise way, while the input of

routing hypernetwork is the full local dataset for each client.

Each modular block jin layer lreceives a list of nl−1tensors

of feature embeddings from the modular sub-networks in the

layer l−1. We use MLPs as the modular blocks in this paper

and each pair of them in successive layers may be connected

or not. At most, there are Epossible connection paths between

modular blocks and E=PL−1

j=1 njnj+1 +nL.To determine

which path would be connected, we need to learn a decision

Vm∈ZE

2for client m. Each element v(m)

i∈Vmis a

binary variable with values chosen from {0,1}.v(m)

i= 1

indicates that the path between two blocks is connected, and

0 otherwise. Since some blocks may not have connected

paths, Vmalso determines which subset of blocks will be

selected from the modular pool for each client. Therefore, after

obtaining Vm, the architecture for a client is determined.

B. The Learning Objective

With the deﬁned modular networks, we can formally deﬁne

our learning objective. Suppose there are Mclients where

each client has local dataset Dm={(xi, yi)}|Dm|

i=1 . In the

FedMN framework, after getting Vm, the architecture of the

modular network for client mis ﬁxed at an epoch during

local updating. We let fθis the model parameterized by θ

that contains the parameters of both modular networks and

the routing hypernetwork. When making a prediction, we have

ˆyi=fθ(xi;Vm). Then, the empirical risk of FedMN is

min

θ,{Vm}M

m=1

|Dm|

|D| Lm(θ, Vm),(2)

where Lm(θ, Vm) = 1

|Dm|X

(xi,yi)∈Dm

`(fθ(xi;Vm), yi).

However, the direct optimization of the objective in (2) is

intractable as there are 2Ecandidates for each Vm. Thus, we

consider a relaxation by assuming that the decision of each

connection path in v(m)

i∈Vmis conditionally independent

to each other. Formally, we have

P(Vm) = Y

v(m)

i∈E

Pv(m)

i.(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PersonalizedFederatedLearningviaHeterogeneousModularNetworksTianchunWang1,WeiCheng2,DongshengLuo3,WenchaoYu2,JingchaoNi4,LiangTong2,HaifengChen2,XiangZhang11ThePennsylvaniaStateUniversity,2NECLaboratoriesAmerica,3FloridaInternationalUniversity,4AWSAILabs,Amazonftkw5356,xzz89g@psu.edu,fweicheng,wyu,l...

展开>> 收起<<

Personalized Federated Learning via Heterogeneous Modular Networks Tianchun Wang1 Wei Cheng2 Dongsheng Luo3 Wenchao Yu2 Jingchao Ni4 Liang Tong2.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Personalized Federated Learning via Heterogeneous Modular Networks Tianchun Wang1 Wei Cheng2 Dongsheng Luo3 Wenchao Yu2 Jingchao Ni4 Liang Tong2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: