Exploiting Features and Logits in Heterogeneous Federated Learning

2025-05-06 0 0 776.57KB 24 页 10玖币

侵权投诉

Highlights

Exploiting Features and Logits in Heterogeneous Federated Learn-

ing

Yun-Hin Chan, and Edith C.H. Ngai

•We proposed Felo, an FL method for the heterogeneous system envi-

ronment. It enables clients with diﬀerent computational resources to

select their neural network models with diﬀerent sizes and architec-

tures. In the Felo, clients exchange their mid-level features and logits

based on their class labels to exchange knowledge without a shared

public dataset.

•To ﬁll the knowledge gaps between diﬀerent client models and extract

more latent information from the mid-level features, we proposed Velo,

an extension of Felo. The server utilizes conditional VAE to extract

more knowledge from the mid-level features.

•The experiments show that our methods achieve the best performance

compared to the state-of-the-art methods. Our methods also outper-

form FedAvg in the homogeneous environment.

arXiv:2210.15527v2 [cs.LG] 8 Apr 2025

Exploiting Features and Logits in Heterogeneous

Federated Learning

Yun-Hin Chan, and Edith C.H. Ngai

aDepartment of Electrical and Electronic Engineering, The University of Hong

Kong, Hong Kong, China

Abstract

Due to the rapid growth of IoT and artiﬁcial intelligence, deploying neural

networks on IoT devices is becoming increasingly crucial for edge intelligence.

Federated learning (FL) facilitates the management of edge devices to col-

laboratively train a shared model while maintaining training data local and

private. However, a general assumption in FL is that all edge devices are

trained on the same machine learning model, which may be impractical con-

sidering diverse device capabilities. For instance, less capable devices may

slow down the updating process because they struggle to handle large models

appropriate for ordinary devices. In this paper, we propose a novel data-free

FL method that supports heterogeneous client models by managing features

and logits, called Felo; and its extension with a conditional VAE deployed in

the server, called Velo. Felo averages the mid-level features and logits from

the clients at the server based on their class labels to provide the average

features and logits, which are utilized for further training the client mod-

els. Unlike Felo, the server has a conditional VAE in Velo, which is used

for training mid-level features and generating synthetic features according to

the labels. The clients optimize their models based on the synthetic features

and the average logits. We conduct experiments on two datasets and show

satisfactory performances of our methods compared with the state-of-the-art

methods. Our codes will be released in the github.

Keywords: Federated learning, Heterogeneity, Variational auto-encoder.

1. Introduction

Machine learning (ML) is playing an increasingly important role in our

daily lives. It is particularly challenging to deploy ML methods in IoT devices

Preprint submitted to Computer Networks April 9, 2025

Client: Laptop

Client: Mobile Phone

Cloud Server

Client: Computer

Client: Router

Knowledge

Knowledge Knowledge

Knowledge

Figure 1: The problem illustration of system heterogeneity. These clients are

the participants in the federated learning process. The client models of participants are

diﬀerent because of their various available resources. Therefore, the cloud server utilizes

shared knowledge from extracted features and logits instead of model weights to update

the client models.

due to their limited computation capabilities, limited network bandwidth,

and privacy concerns. Federated learning (FL) (McMahan et al.,2017) has

been proposed to train neural networks collaboratively in IoT devices (e.g.

sensors and mobiles) without communicating private data with each other.

The ﬁrst algorithm discussed FL, called FedAvg, coordinates clients and a

central server to train a shared neural network but does not require private

data to be transmitted to a central server or other clients. In FedAvg, all

clients send model weights or gradients to the server after local training, and

then the server averages this information to obtain an updated model. This

updated model will be sent to the clients, which will continue their training

based on the updated model.

However, all the client models have to be identical in FedAvg. The server

cannot aggregate and average weights directly if the architectures of the

client models are diﬀerent. Maintaining the same architecture across all

models may not be feasible, as it is diﬃcult to assure that all clients have

the same computation capabilities, particularly in the IoT environment. Sys-

tem heterogeneity refers to a system containing devices with heterogeneous

capabilities as shown in Figure 1, which is one of the critical challenges in

FL for IoT. If the clients with diﬀerent computation capabilities share the

same model architecture, the less capable clients may slow down the training

speed.

Training heterogeneous models in FL can resolve the system heterogeneity

problem as a client can select a model architecture suitable for its compu-

tation capability. We summarize recent studies in Table 1. Inspired by

knowledge distillation (KD)(Hinton et al.,2015), several studies (Li and

Wang,2019)(Sattler et al.,2021)(Fang and Ye,2022)(Huang et al.,2022)

have attempted to manage heterogeneous models in FL. The clients distill

knowledge, referred to as logits, from local training, and communicate with

each other by logits rather than gradients in these algorithms. Logits are

the raw, unnormalized output values produced by the last layer in a neural

network, before being passed through the last activation function like soft-

max, representing the model predictions for each class. Figure 2illustrates

the position of logits in a neural network. In FL, FedMD (Li and Wang,

2019) incorporates logits derived from a large publicly available dataset. The

clients can obtain data features from logits because these logits encode the

conﬁdence scores of client models for each class or category, containing rich

information about the input data (Guo et al.,2017). MocoSFL (Li et al.,

2023) introduces a mechanism that utilizes replay memory on features to en-

hance KD and MoCo functions (Chen et al.,2021), a contrastive framework,

in model heterogeneous FL. FedAUX(Sattler et al.,2021) uses unsupervised

pre-training on unlabeled auxiliary data to initialize heterogeneous models in

distributed training. RHFL (Fang and Ye,2022) deploys the basic knowledge

distillation method on the unlabeled public dataset and utilizes the symmet-

ric cross-entropy loss function to compute the weights of diﬀerent clients in

the KD process. FCCL (Huang et al.,2022) constructs a cross-correlation

matrix on the global unlabeled dataset to exchange information from clients

and utilizes KD to alleviate catastrophic forgetting. HypeMeFed (Shin et al.,

2024) introduces a hyper-network and KD approach to generate weights for

heterogeneous models in federated learning. However, a signiﬁcant disadvan-

tage of the above methods is that the server has to possess a public dataset,

which may not be feasible due to data availability and privacy issues. It

can also be diﬃcult for the server to collect suﬃcient data with the same

distribution as the private datasets in the clients.

Data-free knowledge distillation is a new approach to complete the dis-

tillation process without the training data, which is appropriate for FL. The

basic idea is to optimize noise inputs to minimize the distance to prior knowl-

edge (Nayak et al.,2019). Some studies also tried to utilize data-free knowl-

edge distillation in FL. FedGen (Zhu et al.,2021) uses a generator to sim-

ulate the latent features of all the clients. The simulated features are given

as prior knowledge of the clients. Then, the client models are updated using

their private datasets and these features, though it could be diﬃcult to get

a well-trained generator. FedHe (Chan and Ngai,2021) focuses on the log-

its obtained by the client training process. The server averages these logits,

which will be used in the next training round of the clients. FedGKT (He

et al.,2020) also does not use a public dataset in the server, but its inference

process involves both the clients and the server. The former part of a neural

network is conserved in the clients, while the latter is saved in the server.

However, this approach drags the speed of a training process and requires

additional communication bandwidths during inference.

Except for the methods based on KD, some methods utilize sub-models to

handle the system heterogeneity in FL. The clients deploy sub-models from

the largest model and train the sub-models based on their local datasets.

HeteroFL (Diao et al.,2021) derived diﬀerent sizes of local models from one

large capable neural network model. SlimFL (Baek et al.,2022) incorpo-

rated width-adjustable slimmable neural network (SNN) architectures into

FL, which can tune the widths of local neural networks. In (Horvath et al.,

2021), FjORD tailored model widths to clients’ capabilities by leveraging

Ordered Dropout and a self-distillation methodology. ScaleFL (Ilhan et al.,

2023) train the splitted client models with cross-entropy and KL-divergence

loss. FedASA (Deng et al.,2024) uses adaptive model aggregation to handle

non-IID data and system heterogeneity. InCoFL (Chan et al.,2024) intro-

duces the gaps in heterogeneous FL, and three splitting methods based on

convex optimization to address the gradient divergence problem in heteroge-

neous FL. However, the size of each local model is restricted by the largest

neural network model, which means the architecture must be the same except

for the number of parameters in each layer.

To support system heterogeneity and avoid the problems mentioned above,

we propose a novel data-free method called Felo and its extension called

Velo, which do not require the public dataset or utilize sub-models. The

relations among mid-level features, logits, and the architecture of a client

model are shown in Figure 2. Felo refers to mid-level Features and logits

from the client training processes as the exchanged knowledge. At the be-

ginning of Felo, the clients train their models based on their private data

and collect the mid-level features and logits from the data according to their

class labels. These mid-level features and logits are then transmitted to a

server. The server aggregates this information according to their class labels.

Finally, this server sends these aggregated features and logits back to clients,

which will be utilized to train the client models. The server also aggregates

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HighlightsExploitingFeaturesandLogitsinHeterogeneousFederatedLearn-ingYun-HinChan,andEdithC.H.Ngai•WeproposedFelo,anFLmethodfortheheterogeneoussystemenvi-ronment.Itenablesclientswithdifferentcomputationalresourcestoselecttheirneuralnetworkmodelswithdifferentsizesandarchitec-tures.IntheFelo,clientsex...

展开>> 收起<<

Exploiting Features and Logits in Heterogeneous Federated Learning.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploiting Features and Logits in Heterogeneous Federated Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: