Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration Dan Sturm

2025-05-03 0 0 2.76MB 6 页 10玖币
侵权投诉
Scalable Coherent Optical Crossbar Architecture
using PCM for AI Acceleration
Dan Sturm
Electrical and Computer Engineering
University of Washington
Seattle, USA
dansturm@uw.edu
Sajjad Moazeni
Electrical and Computer Engineering
University of Washington
Seattle, USA
smoazeni@uw.edu
Abstract—Optical computing has been recently proposed as
a new compute paradigm to meet the demands of future
AI/ML workloads in datacenters and supercomputers. However,
proposed implementations so far suffer from lack of scalability,
large footprints and high power consumption, and incomplete
system-level architectures to become integrated within existing
datacenter architecture for real-world applications. In this work,
we present a truly scalable optical AI accelerator based on a
crossbar architecture. We have considered all major roadblocks
and address them in this design. Weights will be stored on
chip using phase change material (PCM) that can be mono-
lithically integrated in silicon photonic processes. All electro-
optical components and circuit blocks are modeled based on
measured performance metrics in a 45nm monolithic silicon
photonic process, which can be co-packaged with advanced
CPU/GPUs and HBM memories. We also present a system-level
modeling and analysis of our chip’s performance for the Resnet-
50V1.5, considering all critical parameters, including memory
size, array size, photonic losses, and energy consumption of
peripheral electronics. Both on-chip SRAM and off-chip DRAM
energy overheads have been considered in this modeling. We
additionally address how using a dual-core crossbar design can
eliminate programming time overhead at practical SRAM block
sizes and batch sizes. Our results show that a 128 ×128 proposed
architecture can achieve inference per second (IPS) similar to
Nvidia A100 GPU at 15.4×lower power and 7.24×lower area.
Index Terms—Optical Neural Networks, AI Accelerator, Cross-
bar, Phase Change Material, System-level Optimization
I. INTRODUCTION
Recent advancements in artificial intelligence (AI) and ma-
chine learning (ML) have been challenging our conventional
computing paradigms by demanding enormous computing
power at a dramatically faster pace than Moore’s law [1]. We
can compare the performance of today’s AI/ML processors
from two key aspects of compute power in terms of Tera op-
erations per second (TOPS) and energy-efficiency (TOPS/W)
as illustrated in Fig. 1. Despite the promising success of neuro-
morphic and analog-based computing in electrical domains for
low TOPS applications (edge-computing), these approaches
can not satisfy the requirements of datacenters and super
computers. Due fundamental bandwidth limitations, they can-
not achieve high throughput. Optical neural networks (ONNs)
can potentially overcome this barrier by providing tens of
GHz bandwidths and ultra-low losses of photonic integrated
Acknowledgement Place Holder
Fig. 1: Comparison of state-of-the-art AI/ML processors.
circuits [2], [3]. However, realizing a practical ONN-based
AI accelerator requires a holistic system design that considers
devices, circuits, chip architectures, and algorithms.
In this paper, we present a novel architecture for an ONN
accelerator in which the multiply-and-accumulate (MAC) op-
eration is performed on a coherent photonic crossbar with
programmable phase-change materials (PCM). PCM enables
low power photonic computing by storing the weights on-chip
in a nonvolatile fashion. This design provides a compact and
scalable solution for the first time, which minimally relies
on thermo-optic phase-shifters effect. We have considered all
critical circuit blocks and parameters, including memory size,
array size, photonic losses, and energy consumption of periph-
eral electronics including analog-to-digital converters (ADCs),
digital-to-analog converters (DACs), serializers, and clocking.
We develop a custom simulation framework based on existing
cycle-accurate simulation tools to model the compute cycles,
programming cycles, and DRAM accesses for a given neural
network running on a specific set of parameters (including size
of SRAM, array, and batch).
Presented work focuses on inference of convolutional neural
networks (CNN) such as ResNet50 v1.5, which is used as a
benchmark to compare our proposed accelerator performance
with state-of-the-art. Additionally, while precision and process
variation are major factors in all analog-based computers, here
we assume a INT6 precision for all the components as it has
arXiv:2210.10851v1 [cs.AR] 19 Oct 2022
been shown to be sufficient for neural networks with high
accuracy [4], [5].
This paper is organized as follows: We briefly describe
related work in Section II. In Section III, we explain the
principles of performing the MAC operation in this work.
We present an overview of overall chip architecture and CNN
operation in Section IV, and section V explains our custom
simulation methodology. Finally, we present the results of our
fully optimized design in Section VI, and compare those with
state-of-the-art in Section VIII.
II. RELATED WORK
Researchers have recently proposed a variety of methods
to realize an ONN. Most of these works focus only on
the physics and devices rather than providing a system-level
solution and analysis. In this presented discussion, we only
consider integrated solutions, as free-space solutions [6] lack
the reconfigurability that is an essential part of any “computer”
and compatibility with mainstream CMOS technology. Fur-
thermore, we note that a suitable application space for ONNs
can be datacenters as opposed to edge computing according to
Fig. 1. Below we discuss the most promising solutions so far
from the perspective of three critical factors that we believe
have been addressed in this work:
(1) Scalability: While on-chip photonics provide high band-
widths, their footprints are fundamentally orders of magnitude
larger than advanced nm-scale CMOS. With only one or
two routing layers, building an ONN processor with big
dimensions has remained elusive. This will become even
more challenging considering the need for compact ADCs and
DACs. Mach-Zehnder Interferometer (MZI)-based coherent
architectures such as [2] have large chip areas and large-scale
realizations end up exceeding a few cm2. Non-coherent PCM-
based crossbars have been also proposed [7], however they
require many wavelengths for large matrix operations and that
is impractical. Time-multiplexed coherent arrays [8], [9] also
require 2D arrays of free-space detectors and yet only perform
vector-vector multiplication in one clock cycle.
(2) Monolithic Integration: Since any practical computing
system will eventually require high-density CMOS electronics
for I/O and memory, we argue that electronics and photonic
should be monolithically integrated on a single chip using
processes such as GF 45CLO [10]. While 3D integration
is typically proposed as an alternative, existing advanced
3D integration technologies (micro-bumps at 55µm pitch [])
cannot provide density for optical computing applications.
(3) Full Architecture-level Modeling and Optimization:
Practical AI accelerators, including optical processors, should
be modeled at the system level. This has been previously
discussed in [11], however, accessing DRAM through a PCIe
switch will have large energy and latency overheads. We
elaborate on this aspect here, and model the system with co-
packaged high-bandwith memory (HBM), similar to state-of-
the-art AI accelerators. In addition, we discuss impacts and
trade-offs between the programming time, batch size, and
multiple cores in this work.
Fig. 2: Photonic crossbar array with peripheral electronics for
transmitter and receiver
III. PROPOSED CROSSBAR DESIGN FOR OPTICAL MAC
The overall proposed crossbar ONN is shown in Fig. 2.
Below, we describe two key components of this design:
A. Analog-based Optical MAC Core
The cross-bar design is an N×Marray of PCM-based unit
cells. Details of these unit cells and how the array perform
MAC is briefly described below.
1) PCM-Based Unit Cell: Each unit cell multiplies an input
electric field (E-field) by the weight programmed into the PCM
section, and adds this product to an externally-inputted electric
field through each column. To do so, a portion of light from
the row waveguide (E-field into each row is denoted by |Ein,i|
in Fig. 3) is partially coupled into a bended waveguide via a
directional coupler (DC) with a cross-coupling ratio of kin,j
(input coupling strength is column-dependent). The portion
of the E-field that does not couple into the unit cell passes
through a multi-mode interference (MMI) waveguide crossing
junction and enters the next column of unit cells to the right.
Each bended waveguide has a µm-long section covered with
PCM. Individual PCM cells can be programmed electrically to
be either in the amorphous or crystalline state, or somewhere
in between, in a non-volatile fashion [7], [8]. Programming
energy is estimated to be around 100pJ [7], [8]. The state
of PCM changes the absorption coefficient, and hence it can
change the amplitude of E-field. Consequently, if the PCM’s
programmed transmission in E-field domain is wi,j , the E-
field at the end of each bended waveguide will be |Ein,i| ×
kin,j ×wi,j , which we refer to as Ep,(i,j). This outcome will
be coupled into a column waveguide via another DC (with
kout,i coupling ratio), and it will travel into the next row in
the bottom, where the coherent summation with another row of
products will occur through the DC region across each column.
The output coupling strength is row-dependent.
摘要:

ScalableCoherentOpticalCrossbarArchitectureusingPCMforAIAccelerationDanSturmElectricalandComputerEngineeringUniversityofWashingtonSeattle,USAdansturm@uw.eduSajjadMoazeniElectricalandComputerEngineeringUniversityofWashingtonSeattle,USAsmoazeni@uw.eduAbstract—Opticalcomputinghasbeenrecentlyproposedasa...

展开>> 收起<<
Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration Dan Sturm.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:2.76MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注