Benchmarking GPU and TPU Performance with Graph Neural Networks Xiangyang Ju1Yunsong Wang2Daniel Murnane3Nicholas Choma3Steven Farrell2

2025-05-06 0 0 396.19KB 10 页 10玖币
侵权投诉
Benchmarking GPU and TPU Performance with
Graph Neural Networks
Xiangyang Ju,1Yunsong Wang,2Daniel Murnane,3Nicholas Choma,3Steven Farrell2
and Paolo Calafiura3
1Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
2NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
3Scientific Data Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
E-mail: xju@lbl.gov,yunsongwang@lbl.gov,dtmurnane@lbl.gov,
njchoma@lbl.gov,sfarrell@lbl.gov,pcalafiura@lbl.gov
Abstract: Many artificial intelligence (AI) devices have been developed to accelerate the
training and inference of neural network models. The most common ones are the Graphics
Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly optimized for
dense data representations. However, sparse representations such as graphs are prevalent in
many domains, including science. It is therefore important to characterize the performance
of available AI accelerators on sparse data. This work analyzes and compares the GPU
and TPU performance training a Graph Neural Network (GNN) developed to solve a real-
life pattern recognition problem. Characterizing the new class of models acting on sparse
data may prove helpful in optimizing the design of deep learning libraries and future AI
accelerators.
arXiv:2210.12247v1 [cs.LG] 21 Oct 2022
Contents
1 Introduction 1
2 Graph Neural Network Benchmark 2
3 Hardware and Software 3
4 Comparison of TPU and GPU 4
5 Profiling and Roofline analysis 5
6 Conclusions 8
1 Introduction
Modern machine learning (ML) plays a critical role in numerous domains, including com-
puter vision, language processing, and speech/image recognition. Much of their success is
driven by three factors: novel deep learning models, large-scale datasets, and massive com-
puting power. ML models are getting wider and deeper, and reaching a trillion trainable
parameters [1]. To keep up with the demands of deep learning at the end of Moore’s Law,
novel specialized computing devices, collectively known as AI accelerators, are necessary.
The most popular AI accelerator is the Graphics Processing Unit (GPU). GPUs are
optimized for massively parallel execution of simple code blocks, and well-suited for linear
algebra. Similarly, the Tensor Processing Unit (TPU) is optimized for matrix operations [2].
Both GPU and TPU are optimized to operate on dense matrices.
MLPerf [3] is a machine learning benchmark suite that has gained industry-wide sup-
port and recognition. It includes computer vision, language processing, recommendation
system and gaming applications. There are other ML benchmark being proposed such as
the ParaDNN [4]. which focuses on fully connected, convolutional and recurrent neural
networks. These benchmarks, while essential for fair comparisons among different archi-
tectures, do not capture the performance characteristic of models (such as Graph Neural
Networks) that operate on sparse and irregular datasets. Sparse datasets are common
in science applications such as molecular dynamics, genomics, and High Energy Physics
(HEP). Graph representation and GNNs have seen rapidly growing applications in these
science domains. Ref [5] reviews GNN application to HEP. This work uses as a benchmark
a GNN model that solves a combinatorially hard pattern recognition problem on the Large
Hadron Collider data.
This paper is organized as follows. Section 2describes the benchmark GNN model
and the dataset used for training it. Section 3introduces the hardware platforms used for
this study. Section 4provides a thorough comparison of TPU and GPU. The performance
analysis is described in Section 5. Outlook and conclusions are in Section 6.
– 1 –
摘要:

BenchmarkingGPUandTPUPerformancewithGraphNeuralNetworksXiangyangJu,1YunsongWang,2DanielMurnane,3NicholasChoma,3StevenFarrell2andPaoloCala ura31PhysicsDivision,LawrenceBerkeleyNationalLaboratory,Berkeley,CA94720,USA2NERSC,LawrenceBerkeleyNationalLaboratory,Berkeley,CA94720,USA3Scienti cDataDivision,L...

展开>> 收起<<
Benchmarking GPU and TPU Performance with Graph Neural Networks Xiangyang Ju1Yunsong Wang2Daniel Murnane3Nicholas Choma3Steven Farrell2.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:396.19KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注