Exploiting Features and Logits in Heterogeneous Federated Learning

2025-05-06 0 0 776.57KB 24 页 10玖币
侵权投诉
Highlights
Exploiting Features and Logits in Heterogeneous Federated Learn-
ing
Yun-Hin Chan, and Edith C.H. Ngai
We proposed Felo, an FL method for the heterogeneous system envi-
ronment. It enables clients with different computational resources to
select their neural network models with different sizes and architec-
tures. In the Felo, clients exchange their mid-level features and logits
based on their class labels to exchange knowledge without a shared
public dataset.
To fill the knowledge gaps between different client models and extract
more latent information from the mid-level features, we proposed Velo,
an extension of Felo. The server utilizes conditional VAE to extract
more knowledge from the mid-level features.
The experiments show that our methods achieve the best performance
compared to the state-of-the-art methods. Our methods also outper-
form FedAvg in the homogeneous environment.
arXiv:2210.15527v2 [cs.LG] 8 Apr 2025
Exploiting Features and Logits in Heterogeneous
Federated Learning
Yun-Hin Chan, and Edith C.H. Ngai
aDepartment of Electrical and Electronic Engineering, The University of Hong
Kong, Hong Kong, China
Abstract
Due to the rapid growth of IoT and artificial intelligence, deploying neural
networks on IoT devices is becoming increasingly crucial for edge intelligence.
Federated learning (FL) facilitates the management of edge devices to col-
laboratively train a shared model while maintaining training data local and
private. However, a general assumption in FL is that all edge devices are
trained on the same machine learning model, which may be impractical con-
sidering diverse device capabilities. For instance, less capable devices may
slow down the updating process because they struggle to handle large models
appropriate for ordinary devices. In this paper, we propose a novel data-free
FL method that supports heterogeneous client models by managing features
and logits, called Felo; and its extension with a conditional VAE deployed in
the server, called Velo. Felo averages the mid-level features and logits from
the clients at the server based on their class labels to provide the average
features and logits, which are utilized for further training the client mod-
els. Unlike Felo, the server has a conditional VAE in Velo, which is used
for training mid-level features and generating synthetic features according to
the labels. The clients optimize their models based on the synthetic features
and the average logits. We conduct experiments on two datasets and show
satisfactory performances of our methods compared with the state-of-the-art
methods. Our codes will be released in the github.
Keywords: Federated learning, Heterogeneity, Variational auto-encoder.
1. Introduction
Machine learning (ML) is playing an increasingly important role in our
daily lives. It is particularly challenging to deploy ML methods in IoT devices
Preprint submitted to Computer Networks April 9, 2025
Client: Laptop
Client: Mobile Phone
Cloud Server
Client: Computer
Client: Router
Knowledge
Knowledge Knowledge
Knowledge
Figure 1: The problem illustration of system heterogeneity. These clients are
the participants in the federated learning process. The client models of participants are
different because of their various available resources. Therefore, the cloud server utilizes
shared knowledge from extracted features and logits instead of model weights to update
the client models.
due to their limited computation capabilities, limited network bandwidth,
and privacy concerns. Federated learning (FL) (McMahan et al.,2017) has
been proposed to train neural networks collaboratively in IoT devices (e.g.
sensors and mobiles) without communicating private data with each other.
The first algorithm discussed FL, called FedAvg, coordinates clients and a
central server to train a shared neural network but does not require private
data to be transmitted to a central server or other clients. In FedAvg, all
clients send model weights or gradients to the server after local training, and
then the server averages this information to obtain an updated model. This
updated model will be sent to the clients, which will continue their training
based on the updated model.
However, all the client models have to be identical in FedAvg. The server
cannot aggregate and average weights directly if the architectures of the
client models are different. Maintaining the same architecture across all
models may not be feasible, as it is difficult to assure that all clients have
the same computation capabilities, particularly in the IoT environment. Sys-
tem heterogeneity refers to a system containing devices with heterogeneous
capabilities as shown in Figure 1, which is one of the critical challenges in
FL for IoT. If the clients with different computation capabilities share the
same model architecture, the less capable clients may slow down the training
speed.
Training heterogeneous models in FL can resolve the system heterogeneity
2
problem as a client can select a model architecture suitable for its compu-
tation capability. We summarize recent studies in Table 1. Inspired by
knowledge distillation (KD)(Hinton et al.,2015), several studies (Li and
Wang,2019)(Sattler et al.,2021)(Fang and Ye,2022)(Huang et al.,2022)
have attempted to manage heterogeneous models in FL. The clients distill
knowledge, referred to as logits, from local training, and communicate with
each other by logits rather than gradients in these algorithms. Logits are
the raw, unnormalized output values produced by the last layer in a neural
network, before being passed through the last activation function like soft-
max, representing the model predictions for each class. Figure 2illustrates
the position of logits in a neural network. In FL, FedMD (Li and Wang,
2019) incorporates logits derived from a large publicly available dataset. The
clients can obtain data features from logits because these logits encode the
confidence scores of client models for each class or category, containing rich
information about the input data (Guo et al.,2017). MocoSFL (Li et al.,
2023) introduces a mechanism that utilizes replay memory on features to en-
hance KD and MoCo functions (Chen et al.,2021), a contrastive framework,
in model heterogeneous FL. FedAUX(Sattler et al.,2021) uses unsupervised
pre-training on unlabeled auxiliary data to initialize heterogeneous models in
distributed training. RHFL (Fang and Ye,2022) deploys the basic knowledge
distillation method on the unlabeled public dataset and utilizes the symmet-
ric cross-entropy loss function to compute the weights of different clients in
the KD process. FCCL (Huang et al.,2022) constructs a cross-correlation
matrix on the global unlabeled dataset to exchange information from clients
and utilizes KD to alleviate catastrophic forgetting. HypeMeFed (Shin et al.,
2024) introduces a hyper-network and KD approach to generate weights for
heterogeneous models in federated learning. However, a significant disadvan-
tage of the above methods is that the server has to possess a public dataset,
which may not be feasible due to data availability and privacy issues. It
can also be difficult for the server to collect sufficient data with the same
distribution as the private datasets in the clients.
Data-free knowledge distillation is a new approach to complete the dis-
tillation process without the training data, which is appropriate for FL. The
basic idea is to optimize noise inputs to minimize the distance to prior knowl-
edge (Nayak et al.,2019). Some studies also tried to utilize data-free knowl-
edge distillation in FL. FedGen (Zhu et al.,2021) uses a generator to sim-
ulate the latent features of all the clients. The simulated features are given
as prior knowledge of the clients. Then, the client models are updated using
3
their private datasets and these features, though it could be difficult to get
a well-trained generator. FedHe (Chan and Ngai,2021) focuses on the log-
its obtained by the client training process. The server averages these logits,
which will be used in the next training round of the clients. FedGKT (He
et al.,2020) also does not use a public dataset in the server, but its inference
process involves both the clients and the server. The former part of a neural
network is conserved in the clients, while the latter is saved in the server.
However, this approach drags the speed of a training process and requires
additional communication bandwidths during inference.
Except for the methods based on KD, some methods utilize sub-models to
handle the system heterogeneity in FL. The clients deploy sub-models from
the largest model and train the sub-models based on their local datasets.
HeteroFL (Diao et al.,2021) derived different sizes of local models from one
large capable neural network model. SlimFL (Baek et al.,2022) incorpo-
rated width-adjustable slimmable neural network (SNN) architectures into
FL, which can tune the widths of local neural networks. In (Horvath et al.,
2021), FjORD tailored model widths to clients’ capabilities by leveraging
Ordered Dropout and a self-distillation methodology. ScaleFL (Ilhan et al.,
2023) train the splitted client models with cross-entropy and KL-divergence
loss. FedASA (Deng et al.,2024) uses adaptive model aggregation to handle
non-IID data and system heterogeneity. InCoFL (Chan et al.,2024) intro-
duces the gaps in heterogeneous FL, and three splitting methods based on
convex optimization to address the gradient divergence problem in heteroge-
neous FL. However, the size of each local model is restricted by the largest
neural network model, which means the architecture must be the same except
for the number of parameters in each layer.
To support system heterogeneity and avoid the problems mentioned above,
we propose a novel data-free method called Felo and its extension called
Velo, which do not require the public dataset or utilize sub-models. The
relations among mid-level features, logits, and the architecture of a client
model are shown in Figure 2. Felo refers to mid-level Features and logits
from the client training processes as the exchanged knowledge. At the be-
ginning of Felo, the clients train their models based on their private data
and collect the mid-level features and logits from the data according to their
class labels. These mid-level features and logits are then transmitted to a
server. The server aggregates this information according to their class labels.
Finally, this server sends these aggregated features and logits back to clients,
which will be utilized to train the client models. The server also aggregates
4
摘要:

HighlightsExploitingFeaturesandLogitsinHeterogeneousFederatedLearn-ingYun-HinChan,andEdithC.H.Ngai•WeproposedFelo,anFLmethodfortheheterogeneoussystemenvi-ronment.Itenablesclientswithdifferentcomputationalresourcestoselecttheirneuralnetworkmodelswithdifferentsizesandarchitec-tures.IntheFelo,clientsex...

展开>> 收起<<
Exploiting Features and Logits in Heterogeneous Federated Learning.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:24 页 大小:776.57KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注