NVIDIA FLARE Federated Learning from Simulation to Real-World Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh

2025-05-02 0 0 2.04MB 13 页 10玖币
侵权投诉
NVIDIA FLARE:
Federated Learning from Simulation to Real-World
Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh
Kristopher Kersten Ahmed Harouni Can Zhao Kevin Lu Zhihong Zhang Wenqi Li
Andriy Myronenko Dong Yang Sean Yang Nicola Rieke Abood Quraini Chester Chen
Daguang Xu Nic Ma Prerna Dogra Mona Flores Andrew Feng
NVIDIA Corporation*
Shanghai, China
Munich, Germany
Bethesda, Santa Clara, USA
Abstract
Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets
from multiple collaborators without centralizing the data. We created NVIDIA FLARE
1
as an open-source
software development kit (SDK) to make it easier for data scientists to use FL in their research and real-
world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine
learning approaches, which facilitate building workflows for distributed learning across enterprises and
enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration
utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable
Python package. It allows researchers to apply their data science workflows in any training libraries
(PyTorch, TensorFlow, XGBoost, or even NumPy) in real-world FL settings. This paper introduces the key
design principles of NVFlare and illustrates some use cases (e.g., COVID analysis) with customizable FL
workflows that implement different privacy-preserving algorithms.
1 Introduction
Federated learning (FL) has become a reality for many real-world applications [
31
]. It enables multinational
collaborations on a global scale to build more robust and generalizable machine learning and AI models. In this
paper, we introduce NVIDIA FLARE (NVFlare), an open-source software development kit (SDK) that makes
it easier for data scientists to collaborate to develop more generalizable and robust AI models by sharing model
weights rather than private data. While FL is attractive in many industries, it is particularly beneficial for healthcare
applications where patient data needs to be protected. For example, FL has been used for predicting clinical
outcomes in patients with COVID-19 [
6
] or to segment brain lesions in magnetic resonance imaging [
35
,
34
].
NVFlare is not limited to applications in healthcare and is designed to allow cross-silo FL [
15
] across enterprises
for different industries and researchers.
*Contact: {hroth,yanc,chesterc,daguangx,pdogra,andyf}@nvidia.com
1Code is available at https://github.com/NVIDIA/NVFlare.
1
arXiv:2210.13291v3 [cs.LG] 28 Apr 2023
In recent years, several efforts (both open-source and commercial) have been made to bring FL technology
into the healthcare sector and other industries, like TensorFlow Federated [
1
], PySyft [
44
], FedML [
11
], FATE [
23
],
Flower [
2
], OpenFL [
30
], Fed-BioMed [
36
], IBM Federated Learning [
24
], HP Swarm Learning [
38
], Federat-
edScope [
40
], FLUTE [
7
], and more. Some focus on simulated FL settings for researchers, while others prioritize
production settings. NVFlare aims to be useful for both scenarios: 1) for researchers by providing efficient and
extensible simulation tools and 2) by providing an easy path to transfer research into real-world production settings,
supporting high availability and server failover, and by providing additional productivity tools such as multi-tasking
and admin commands.
2 NVIDIA FLARE Overview
NVIDIA FLARE – or short NVFlare – stands for “
NV
IDIA
F
ederated
L
earning
A
pplication
R
untime
E
nvironment”.
The SDK enables researchers and data scientists to adapt their machine learning and deep learning workflows to
a federated paradigm. It enables platform developers to build a secure, privacy-preserving offering for distributed
multiparty collaboration.
NVFlare is a lightweight, flexible, and scalable FL framework implemented in Python that is agnostic to the
underlying training library. Developers can bring their own data science workflows implemented in PyTorch,
TensorFlow, or even in pure NumPy, and apply them in a federated setting. A typical FL workflow such as the
popular federated averaging (FedAvg) algorithm [
25
], can be implemented in NVFlare using the following main
steps. Starting from an initial global model, each FL client trains the model on their local data for a while and sends
model updates to the server for aggregation. The server then uses the aggregated updates to update the global model
for the next round of training. This process is iterated many times until the model converges.
Though used heavily for federated deep learning, NVFlare is a generic approach for supporting collaborative
computing across multiple clients. NVFlare provides the Controller programming API for researchers to create
workflows for coordinating clients for collaboration. FedAvg is one such workflow. Another example is cyclic
weight transfer [
4
]. The central concept of collaboration is the notion of “task”. An FL controller assigns tasks
(e.g., deep-learning training with model weights) to one or more FL clients and processes results returned from
clients (e.g., model weight updates). The controller may assign additional tasks to clients based on the processed
results and other factors (e.g., a pre-configured number of training rounds). This task-based interaction continues
until the objectives of the study are achieved. The API supports typical controller-client interaction patterns like
Figure 1: NVFlare job execution. The Controller is a Python object that controls or coordinates the Workers to get a
job done. The controller is run on the FL server. A Worker is capable of performing tasks. Workers run on FL clients.
broadcasting a task to multiple clients, sending a task to one or more specified clients, or relaying a task to multiple
clients sequentially. Each interaction pattern has two flavors: wait (block until client results are received) or no-wait.
A workflow developer can use these interaction patterns to create innovative workflows. For example, the Scat-
terAndGather controller (typically used for FedAvg-like algorithms) is implemented with the broadcast_and_wait
2
pattern, and the CyclicController is implemented with the relay_and_wait pattern. The controller API allows the
researcher to focus on the control logic without needing to deal with underlying communication issues. Figure 1
shows the principle. Each FL client acts as a worker that simply executes tasks assigned to it (e.g., model training)
and returns execution results to the controller. At each task interaction, there can be optional filters that process
the task data or results before passing it to the Controller (on the server side) or task executor (client side). The
filter mechanism can be used for data privacy protection (e.g., homomorphic encryption/decryption or differential
privacy) without having to alter the training algorithms.
Key Components
NVFlare is built on a componentized architecture that allows FL workloads to move from
research and simulation to real-world production deployment. Some of the key components of this SDK include:
FL Simulator for rapid development and prototyping.
NVFlare Dashboard
for simplified project management, secure provisioning, and deployment, orchestra-
tion.
Reference FL algorithms
(e.g., FedAvg, FedProx, SCAFFOLD) and workflows, like scatter and gather,
cyclic, etc.
Privacy preservation with differential privacy, homomorphic encryption, and more.
Specification-based API for extensibility, allowing customization with plug-able components.
Tight integration with other learning frameworks like MONAI [3], XGBoost [5], and more.
High-Level Architecture
NVFlare is designed with the idea that less is more, using a specification-based design
principle to focus on what is essential. This allows other people to be able to do what they want to do in real-world
applications by following clear API definitions. FL is an open-ended space. The API-based design allows others
to bring their implementations and solutions for various components. Controllers, task executors, and filters
are just examples of such extensible components. NVFlare provides an end-to-end operation environment for
different personas. It provides a comprehensive provisioning system that creates security credentials for secure
communications to enable the easy and secure deployment of FL applications in the real world. It also provides an
FL Simulator for running proof-of-concept studies locally. In production mode, the researcher conducts an FL study
by submitting jobs using admin commands using Notebooks or the NVFlare Console – an interactive command
tool. NVFlare provides many commands for system operation and job management. With these commands, one
can start and stop a specific client or the entire system, submit new jobs, check the status of jobs, create a job by
cloning from an existing one, and much more.
With NVFlare’s component-based design, a job is just a configuration of components needed for the study. For
the control logic, the job specifies the controller component to be used and any components required by the controller.
3 System Concepts
A NVFlare system is a typical client-server communication system that comprises one or more FL server(s), one
or more FL client(s), and one or more admin clients. The FL Servers open two ports for communication with FL
clients and admin clients. FL clients and admin clients connect to the opened ports. FL clients and admin clients
do not open any ports and do not directly communicate with each other. The following is an overview of the key
concepts and objects available in NVFlare and the information that can be passed between them.
3
摘要:

NVIDIAFLARE:FederatedLearningfromSimulationtoReal-WorldHolgerR.RothYanChengYuhongWenIsaacYangZiyueXuYuan-TingHsiehKristopherKerstenAhmedHarouniCanZhaoKevinLuZhihongZhangWenqiLiAndriyMyronenkoDongYangSeanYangNicolaRiekeAboodQurainiChesterChenDaguangXuNicMaPrernaDograMonaFloresAndrewFengNVIDIACorporat...

展开>> 收起<<
NVIDIA FLARE Federated Learning from Simulation to Real-World Holger R. Roth Yan Cheng Yuhong Wen Isaac Yang Ziyue Xu Yuan-Ting Hsieh.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:2.04MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注