Personalized Federated Learning via Heterogeneous Modular Networks Tianchun Wang1 Wei Cheng2 Dongsheng Luo3 Wenchao Yu2 Jingchao Ni4 Liang Tong2

2025-05-02 0 0 790.99KB 7 页 10玖币
侵权投诉
Personalized Federated Learning via Heterogeneous
Modular Networks
Tianchun Wang1, Wei Cheng2, Dongsheng Luo3, Wenchao Yu2, Jingchao Ni4, Liang Tong2,
Haifeng Chen2, Xiang Zhang1
1The Pennsylvania State University,2NEC Laboratories America,3Florida International University,4AWS AI Labs, Amazon
{tkw5356, xzz89}@psu.edu, {weicheng, wyu, ltong, haifeng}@nec-labs.com, dluo@fiu.edu, jingchni@amazon.com
Abstract—Personalized Federated Learning (PFL) which col-
laboratively trains a federated model while considering local
clients under privacy constraints has attracted much attention.
Despite its popularity, it has been observed that existing PFL
approaches result in sub-optimal solutions when the joint dis-
tribution among local clients diverges. To address this issue,
we present Federated Modular Network (FedMN), a novel PFL
approach that adaptively selects sub-modules from a module
pool to assemble heterogeneous neural architectures for different
clients. FedMN adopts a light-weighted routing hypernetwork
to model the joint distribution on each client and produce the
personalized selection of the module blocks for each client. To
reduce the communication burden in existing FL, we develop an
efficient way to interact between the clients and the server. We
conduct extensive experiments on the real-world test beds and
the results show both effectiveness and efficiency of the proposed
FedMN over the baselines.
Index Terms—Federated Learning, Personalized Models, Mod-
ular Networks
I. INTRODUCTION
Federated Learning (FL) emerges as a prospective solu-
tion that facilitates distributed collaborative learning without
disclosing original training data whilst naturally complying
with the government regulations [1], [2]. In practice, the
problem of data heterogeneity deteriorates the performance of
the global FL model on individual clients due to the lack of
solution personalization. To tackle with it, researchers focus
on the Personalized Federated Learning (PFL) which aims
to make the global model fit the distributions on most of
the devices [3], [4]. The vanilla PFL approaches first learn
a global model and then locally adapt it to each client by
fine-tuning the global parameters [5], [6]. In this case, the
trained global model can be regarded as a meta-model ready
for further personalization of each local client. In order to build
a better meta-model, many efforts have been done to bridge
the FL and the Model Agnostic Meta Learning (MAML)
[7]–[9]. However, the global generalization error typically
does not decrease much [10] for these approaches. Thus, the
performance can not be significantly improved. Another line
of research focuses on jointly training a global model and a
local model for each client to achieve personalization [11],
[12]. This strategy does not perform well on the clients whose
local distributions are far from their average. Cluster-based
PFL approaches [13] address this issue by grouping the clients
into several clusters. The clients in a cluster share the same
model while those belonging to different clusters have different
models. Unfortunately, the model trained in one cluster will
not benefit from the knowledge of the clients in other clusters,
which limits the capability to share knowledge and therefore
results in a sub-optimal solution.
An alternative strategy is adopting the Multi-Task Learning
(MTL) framework to train a PFL model [4], [14]. How-
ever, most existing efforts did not consider the difference in
conditional distribution between clients. It is an important
problem when building a federated model. For example,
labels sometimes reflect sentiment. Some users may label
a laptop as cheap while others label it as expensive. This
conditional distribution heterogeneity problem will cause the
model inaccurate on some clients where the p(y|x)is far
from the average. To address the problem, a recent work [10]
assumes the data distribution of each client is a mixture of
Munderlying distributions and proposes a flexible framework
in which each client learns a combination of Mshared
components with different weights. It optimizes the varying
conditional distribution pi(y|x)under the assumption that
the marginal distribution pi(x) = p(x)is the same for all
clients (Assumption 2 in [10]). This assumption, however,
is restricted. For instance, in handwriting recognition, users
who write the same words might still have different stroke
widths, slants, etc. In this cases, pi(x)6=pj(x)for client
iand j. Other works [15], [16] either assume the marginal
distribution pi(x)or the conditional distribution pi(y|x)the
same across clients. In reality, data on each client may be
deviated from being identically distributed, say, Pi6=Pj
for client iand j. That is, the joint distribution Pi(x,y)
(can be rewritten as Pi(y|x)Pi(x)or Pi(x|y)Pi(y)) may be
different across clients. We call it the “joint distribution
heterogeneity” problem. Existing approaches [15], [16] fail to
completely model the difference of joint distribution between
clients because they assume one term to be the same while
varying the other one. Moreover, to accommodate different
data distributions, the homogeneous model would be too large
so that the given prediction power can be satisfied. Thus, the
communication costs between the server and clients would be
huge. In this case, communication would be a key bottleneck
to consider when developing FL methods. To this end, it is
desirable to design an effective PFL model to accommodate
heterogeneous clients in an efficient.
To solve the aforementioned problems, in this paper, we pro-
pose a novel Federated Modular Networks (FedMN) approach,
arXiv:2210.14830v2 [cs.LG] 2 Dec 2022
which personalizes heterogeneous clients efficiently. The main
idea is that we implicitly partition the clients by modeling their
joint distribution into clusters and the clients in the same clus-
ter have the same architecture. Specifically, a shared module
pool with layers of module blocks (e.g., MLPs or ConvNets) is
maintained in the server. Each client decides in each update to
assemble a personalized model by selecting a combination of
the blocks from the module pool. We adopt a light-weighted
routing hypernetwork with differentiable routers to generate
the decision of module block selection for each client. The
routing hypernetwork considers the joint distribution pi(x,y)
for client iby taking the joint distribution of the data set as the
input. A decision parameterized by the routing hypernetwork
is a vector of discrete variables following the Bernoulli distri-
bution. It selects a subset of the blocks from the module pool
to form an architecture for each client. Clients with similar
decisions will be implicitly assigned to the same cluster in
each communication round. The proposed FedMN enables a
client to upload only a subset of model parameters to the
server, which decreases the communication burden compared
to traditional FL algorithms. To sum up, our contributions are
as follows: 1) We address the problem of joint distribution
heterogeneity in the personalized FL and propose a FedMN
approach to alleviate this issue. 2) We develop an efficient
way to selectively upload model parameters which decreases
the communication cost between clients and the server. 3)
Extensive experiments on the real-world datasets show both
the effectiveness and the efficiency of our proposed FedMN
compared to the state-of-the-arts.
II. METHODOLOGY
Our method adopts modular networks which consist of a
group of encoders in the first layer and multiple modular
blocks in the following layers. The connection decisions
between blocks are made by a routing hypernetwork.
A. Modular Networks
The modular networks first encode the data feature into low-
dimensional embeddings by a group of encoders motivated by
[17]–[19]. Then, personalized feature embeddings are obtained
by discovering and assembling a set of modular blocks in
different ways for different clients. The modular networks has
Llayers and the l-th layer has nlblocks of sub-network. The
encoders in the 1st layer are n1independent blocks which
learns feature embeddings for each client. Formally, let xibe
the i-th sample, we get the feature embedding z(j)
iafter the
j-th encoder is applied
z(j)
i=Encoder(j)(xi), j = 1, ..., n1.(1)
The choices of encoder networks are flexible. For example,
one can adopt CNNs as encoders for image data and trans-
formers for text data.
The set of feature embeddings {z(1)
i, ..., z(n1)
i}of data point
xiresulting from the encoders in the 1st layer is the input of
the following modular sub-networks constructed by a subset
of the modular blocks. There are L1layers of blocks in
the sub-networks and each one is independent of the others.
Encoders
clientk
Modular Networks
NN
NN
NN
NN
NN
NN
NN
NN
NN
xk
(Xk, Yk)
E1
E2
E3
E4
Routing Hypernetwork
yk
(output)
(input)
Encoders
client1
Modular Networks
NN
NN
NN
NN
NN
NN
NN
NN
NN
x1
(X1, Y1)E1
E2
E3
E4
y1
(output)
(input)
Modular Blocks
Modular Blocks
Fig. 1. The FedMN architecture. The modular networks consist of a group
of encoders in the first layer and modular blocks in the following layers. The
connection paths between blocks are determined by the routing hypernetwork.
The input of modular networks is in a sample-wise way, while the input of
routing hypernetwork is the full local dataset for each client.
Each modular block jin layer lreceives a list of nl1tensors
of feature embeddings from the modular sub-networks in the
layer l1. We use MLPs as the modular blocks in this paper
and each pair of them in successive layers may be connected
or not. At most, there are Epossible connection paths between
modular blocks and E=PL1
j=1 njnj+1 +nL.To determine
which path would be connected, we need to learn a decision
VmZE
2for client m. Each element v(m)
iVmis a
binary variable with values chosen from {0,1}.v(m)
i= 1
indicates that the path between two blocks is connected, and
0 otherwise. Since some blocks may not have connected
paths, Vmalso determines which subset of blocks will be
selected from the modular pool for each client. Therefore, after
obtaining Vm, the architecture for a client is determined.
B. The Learning Objective
With the defined modular networks, we can formally define
our learning objective. Suppose there are Mclients where
each client has local dataset Dm={(xi, yi)}|Dm|
i=1 . In the
FedMN framework, after getting Vm, the architecture of the
modular network for client mis fixed at an epoch during
local updating. We let fθis the model parameterized by θ
that contains the parameters of both modular networks and
the routing hypernetwork. When making a prediction, we have
ˆyi=fθ(xi;Vm). Then, the empirical risk of FedMN is
min
θ,{Vm}M
m=1
M
X
m=1
|Dm|
|D| Lm(θ, Vm),(2)
where Lm(θ, Vm) = 1
|Dm|X
(xi,yi)∈Dm
`(fθ(xi;Vm), yi).
However, the direct optimization of the objective in (2) is
intractable as there are 2Ecandidates for each Vm. Thus, we
consider a relaxation by assuming that the decision of each
connection path in v(m)
iVmis conditionally independent
to each other. Formally, we have
P(Vm) = Y
v(m)
i∈E
Pv(m)
i.(3)
摘要:

PersonalizedFederatedLearningviaHeterogeneousModularNetworksTianchunWang1,WeiCheng2,DongshengLuo3,WenchaoYu2,JingchaoNi4,LiangTong2,HaifengChen2,XiangZhang11ThePennsylvaniaStateUniversity,2NECLaboratoriesAmerica,3FloridaInternationalUniversity,4AWSAILabs,Amazonftkw5356,xzz89g@psu.edu,fweicheng,wyu,l...

展开>> 收起<<
Personalized Federated Learning via Heterogeneous Modular Networks Tianchun Wang1 Wei Cheng2 Dongsheng Luo3 Wenchao Yu2 Jingchao Ni4 Liang Tong2.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:790.99KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注