NVIDIA FLARE Federated Learning from Simulation to Real-World Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh

2025-05-02 0 0 2.04MB 13 页 10玖币

侵权投诉

NVIDIA FLARE:

Federated Learning from Simulation to Real-World

Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh

Kristopher Kersten Ahmed Harouni Can Zhao Kevin Lu Zhihong Zhang Wenqi Li

Andriy Myronenko Dong Yang Sean Yang Nicola Rieke Abood Quraini Chester Chen

Daguang Xu Nic Ma Prerna Dogra Mona Flores Andrew Feng

NVIDIA Corporation*

Shanghai, China

Munich, Germany

Bethesda, Santa Clara, USA

Abstract

Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets

from multiple collaborators without centralizing the data. We created NVIDIA FLARE

as an open-source

software development kit (SDK) to make it easier for data scientists to use FL in their research and real-

world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine

learning approaches, which facilitate building workﬂows for distributed learning across enterprises and

enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration

utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, ﬂexible, and scalable

Python package. It allows researchers to apply their data science workﬂows in any training libraries

(PyTorch, TensorFlow, XGBoost, or even NumPy) in real-world FL settings. This paper introduces the key

design principles of NVFlare and illustrates some use cases (e.g., COVID analysis) with customizable FL

workﬂows that implement different privacy-preserving algorithms.

1 Introduction

Federated learning (FL) has become a reality for many real-world applications [

]. It enables multinational

collaborations on a global scale to build more robust and generalizable machine learning and AI models. In this

paper, we introduce NVIDIA FLARE (NVFlare), an open-source software development kit (SDK) that makes

it easier for data scientists to collaborate to develop more generalizable and robust AI models by sharing model

weights rather than private data. While FL is attractive in many industries, it is particularly beneﬁcial for healthcare

applications where patient data needs to be protected. For example, FL has been used for predicting clinical

outcomes in patients with COVID-19 [

] or to segment brain lesions in magnetic resonance imaging [

NVFlare is not limited to applications in healthcare and is designed to allow cross-silo FL [

] across enterprises

for different industries and researchers.

*Contact: {hroth,yanc,chesterc,daguangx,pdogra,andyf}@nvidia.com

1Code is available at https://github.com/NVIDIA/NVFlare.

arXiv:2210.13291v3 [cs.LG] 28 Apr 2023

In recent years, several efforts (both open-source and commercial) have been made to bring FL technology

into the healthcare sector and other industries, like TensorFlow Federated [

], PySyft [

], FedML [

], FATE [

Flower [

], OpenFL [

], Fed-BioMed [

], IBM Federated Learning [

], HP Swarm Learning [

], Federat-

edScope [

], FLUTE [

], and more. Some focus on simulated FL settings for researchers, while others prioritize

production settings. NVFlare aims to be useful for both scenarios: 1) for researchers by providing efﬁcient and

extensible simulation tools and 2) by providing an easy path to transfer research into real-world production settings,

supporting high availability and server failover, and by providing additional productivity tools such as multi-tasking

and admin commands.

2 NVIDIA FLARE Overview

NVIDIA FLARE – or short NVFlare – stands for “

IDIA

ederated

earning

pplication

untime

nvironment”.

The SDK enables researchers and data scientists to adapt their machine learning and deep learning workﬂows to

a federated paradigm. It enables platform developers to build a secure, privacy-preserving offering for distributed

multiparty collaboration.

NVFlare is a lightweight, ﬂexible, and scalable FL framework implemented in Python that is agnostic to the

underlying training library. Developers can bring their own data science workﬂows implemented in PyTorch,

TensorFlow, or even in pure NumPy, and apply them in a federated setting. A typical FL workﬂow such as the

popular federated averaging (FedAvg) algorithm [

], can be implemented in NVFlare using the following main

steps. Starting from an initial global model, each FL client trains the model on their local data for a while and sends

model updates to the server for aggregation. The server then uses the aggregated updates to update the global model

for the next round of training. This process is iterated many times until the model converges.

Though used heavily for federated deep learning, NVFlare is a generic approach for supporting collaborative

computing across multiple clients. NVFlare provides the Controller programming API for researchers to create

workﬂows for coordinating clients for collaboration. FedAvg is one such workﬂow. Another example is cyclic

weight transfer [

]. The central concept of collaboration is the notion of “task”. An FL controller assigns tasks

(e.g., deep-learning training with model weights) to one or more FL clients and processes results returned from

clients (e.g., model weight updates). The controller may assign additional tasks to clients based on the processed

results and other factors (e.g., a pre-conﬁgured number of training rounds). This task-based interaction continues

until the objectives of the study are achieved. The API supports typical controller-client interaction patterns like

Figure 1: NVFlare job execution. The Controller is a Python object that controls or coordinates the Workers to get a

job done. The controller is run on the FL server. A Worker is capable of performing tasks. Workers run on FL clients.

broadcasting a task to multiple clients, sending a task to one or more speciﬁed clients, or relaying a task to multiple

clients sequentially. Each interaction pattern has two ﬂavors: wait (block until client results are received) or no-wait.

A workﬂow developer can use these interaction patterns to create innovative workﬂows. For example, the Scat-

terAndGather controller (typically used for FedAvg-like algorithms) is implemented with the broadcast_and_wait

pattern, and the CyclicController is implemented with the relay_and_wait pattern. The controller API allows the

researcher to focus on the control logic without needing to deal with underlying communication issues. Figure 1

shows the principle. Each FL client acts as a worker that simply executes tasks assigned to it (e.g., model training)

and returns execution results to the controller. At each task interaction, there can be optional ﬁlters that process

the task data or results before passing it to the Controller (on the server side) or task executor (client side). The

ﬁlter mechanism can be used for data privacy protection (e.g., homomorphic encryption/decryption or differential

privacy) without having to alter the training algorithms.

Key Components

NVFlare is built on a componentized architecture that allows FL workloads to move from

research and simulation to real-world production deployment. Some of the key components of this SDK include:

•FL Simulator for rapid development and prototyping.

•NVFlare Dashboard

for simpliﬁed project management, secure provisioning, and deployment, orchestra-

tion.

•Reference FL algorithms

(e.g., FedAvg, FedProx, SCAFFOLD) and workﬂows, like scatter and gather,

cyclic, etc.

•Privacy preservation with differential privacy, homomorphic encryption, and more.

•Speciﬁcation-based API for extensibility, allowing customization with plug-able components.

•Tight integration with other learning frameworks like MONAI [3], XGBoost [5], and more.

High-Level Architecture

NVFlare is designed with the idea that less is more, using a speciﬁcation-based design

principle to focus on what is essential. This allows other people to be able to do what they want to do in real-world

applications by following clear API deﬁnitions. FL is an open-ended space. The API-based design allows others

to bring their implementations and solutions for various components. Controllers, task executors, and ﬁlters

are just examples of such extensible components. NVFlare provides an end-to-end operation environment for

different personas. It provides a comprehensive provisioning system that creates security credentials for secure

communications to enable the easy and secure deployment of FL applications in the real world. It also provides an

FL Simulator for running proof-of-concept studies locally. In production mode, the researcher conducts an FL study

by submitting jobs using admin commands using Notebooks or the NVFlare Console – an interactive command

tool. NVFlare provides many commands for system operation and job management. With these commands, one

can start and stop a speciﬁc client or the entire system, submit new jobs, check the status of jobs, create a job by

cloning from an existing one, and much more.

With NVFlare’s component-based design, a job is just a conﬁguration of components needed for the study. For

the control logic, the job speciﬁes the controller component to be used and any components required by the controller.

3 System Concepts

A NVFlare system is a typical client-server communication system that comprises one or more FL server(s), one

or more FL client(s), and one or more admin clients. The FL Servers open two ports for communication with FL

clients and admin clients. FL clients and admin clients connect to the opened ports. FL clients and admin clients

do not open any ports and do not directly communicate with each other. The following is an overview of the key

concepts and objects available in NVFlare and the information that can be passed between them.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NVIDIAFLARE:FederatedLearningfromSimulationtoReal-WorldHolgerR.RothYanChengYuhongWenIsaacYangZiyueXuYuan-TingHsiehKristopherKerstenAhmedHarouniCanZhaoKevinLuZhihongZhangWenqiLiAndriyMyronenkoDongYangSeanYangNicolaRiekeAboodQurainiChesterChenDaguangXuNicMaPrernaDograMonaFloresAndrewFengNVIDIACorporat...

展开>> 收起<<

NVIDIA FLARE Federated Learning from Simulation to Real-World Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

NVIDIA FLARE Federated Learning from Simulation to Real-World Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: