Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration Dan Sturm

2025-05-03 0 0 2.76MB 6 页 10玖币

侵权投诉

Scalable Coherent Optical Crossbar Architecture

using PCM for AI Acceleration

Dan Sturm

Electrical and Computer Engineering

University of Washington

Seattle, USA

dansturm@uw.edu

Sajjad Moazeni

Electrical and Computer Engineering

University of Washington

Seattle, USA

smoazeni@uw.edu

Abstract—Optical computing has been recently proposed as

a new compute paradigm to meet the demands of future

AI/ML workloads in datacenters and supercomputers. However,

proposed implementations so far suffer from lack of scalability,

large footprints and high power consumption, and incomplete

system-level architectures to become integrated within existing

datacenter architecture for real-world applications. In this work,

we present a truly scalable optical AI accelerator based on a

crossbar architecture. We have considered all major roadblocks

and address them in this design. Weights will be stored on

chip using phase change material (PCM) that can be mono-

lithically integrated in silicon photonic processes. All electro-

optical components and circuit blocks are modeled based on

measured performance metrics in a 45nm monolithic silicon

photonic process, which can be co-packaged with advanced

CPU/GPUs and HBM memories. We also present a system-level

modeling and analysis of our chip’s performance for the Resnet-

50V1.5, considering all critical parameters, including memory

size, array size, photonic losses, and energy consumption of

peripheral electronics. Both on-chip SRAM and off-chip DRAM

energy overheads have been considered in this modeling. We

additionally address how using a dual-core crossbar design can

eliminate programming time overhead at practical SRAM block

sizes and batch sizes. Our results show that a 128 ×128 proposed

architecture can achieve inference per second (IPS) similar to

Nvidia A100 GPU at 15.4×lower power and 7.24×lower area.

Index Terms—Optical Neural Networks, AI Accelerator, Cross-

bar, Phase Change Material, System-level Optimization

I. INTRODUCTION

Recent advancements in artiﬁcial intelligence (AI) and ma-

chine learning (ML) have been challenging our conventional

computing paradigms by demanding enormous computing

power at a dramatically faster pace than Moore’s law [1]. We

can compare the performance of today’s AI/ML processors

from two key aspects of compute power in terms of Tera op-

erations per second (TOPS) and energy-efﬁciency (TOPS/W)

as illustrated in Fig. 1. Despite the promising success of neuro-

morphic and analog-based computing in electrical domains for

low TOPS applications (edge-computing), these approaches

can not satisfy the requirements of datacenters and super

computers. Due fundamental bandwidth limitations, they can-

not achieve high throughput. Optical neural networks (ONNs)

can potentially overcome this barrier by providing tens of

GHz bandwidths and ultra-low losses of photonic integrated

Acknowledgement Place Holder

Fig. 1: Comparison of state-of-the-art AI/ML processors.

circuits [2], [3]. However, realizing a practical ONN-based

AI accelerator requires a holistic system design that considers

devices, circuits, chip architectures, and algorithms.

In this paper, we present a novel architecture for an ONN

accelerator in which the multiply-and-accumulate (MAC) op-

eration is performed on a coherent photonic crossbar with

programmable phase-change materials (PCM). PCM enables

low power photonic computing by storing the weights on-chip

in a nonvolatile fashion. This design provides a compact and

scalable solution for the ﬁrst time, which minimally relies

on thermo-optic phase-shifters effect. We have considered all

critical circuit blocks and parameters, including memory size,

array size, photonic losses, and energy consumption of periph-

eral electronics including analog-to-digital converters (ADCs),

digital-to-analog converters (DACs), serializers, and clocking.

We develop a custom simulation framework based on existing

cycle-accurate simulation tools to model the compute cycles,

programming cycles, and DRAM accesses for a given neural

network running on a speciﬁc set of parameters (including size

of SRAM, array, and batch).

Presented work focuses on inference of convolutional neural

networks (CNN) such as ResNet50 v1.5, which is used as a

benchmark to compare our proposed accelerator performance

with state-of-the-art. Additionally, while precision and process

variation are major factors in all analog-based computers, here

we assume a INT6 precision for all the components as it has

arXiv:2210.10851v1 [cs.AR] 19 Oct 2022

been shown to be sufﬁcient for neural networks with high

accuracy [4], [5].

This paper is organized as follows: We brieﬂy describe

related work in Section II. In Section III, we explain the

principles of performing the MAC operation in this work.

We present an overview of overall chip architecture and CNN

operation in Section IV, and section V explains our custom

simulation methodology. Finally, we present the results of our

fully optimized design in Section VI, and compare those with

state-of-the-art in Section VIII.

II. RELATED WORK

Researchers have recently proposed a variety of methods

to realize an ONN. Most of these works focus only on

the physics and devices rather than providing a system-level

solution and analysis. In this presented discussion, we only

consider integrated solutions, as free-space solutions [6] lack

the reconﬁgurability that is an essential part of any “computer”

and compatibility with mainstream CMOS technology. Fur-

thermore, we note that a suitable application space for ONNs

can be datacenters as opposed to edge computing according to

Fig. 1. Below we discuss the most promising solutions so far

from the perspective of three critical factors that we believe

have been addressed in this work:

(1) Scalability: While on-chip photonics provide high band-

widths, their footprints are fundamentally orders of magnitude

larger than advanced nm-scale CMOS. With only one or

two routing layers, building an ONN processor with big

dimensions has remained elusive. This will become even

more challenging considering the need for compact ADCs and

DACs. Mach-Zehnder Interferometer (MZI)-based coherent

architectures such as [2] have large chip areas and large-scale

realizations end up exceeding a few cm2. Non-coherent PCM-

based crossbars have been also proposed [7], however they

require many wavelengths for large matrix operations and that

is impractical. Time-multiplexed coherent arrays [8], [9] also

require 2D arrays of free-space detectors and yet only perform

vector-vector multiplication in one clock cycle.

(2) Monolithic Integration: Since any practical computing

system will eventually require high-density CMOS electronics

for I/O and memory, we argue that electronics and photonic

should be monolithically integrated on a single chip using

processes such as GF 45CLO [10]. While 3D integration

is typically proposed as an alternative, existing advanced

3D integration technologies (micro-bumps at 55µm pitch [])

cannot provide density for optical computing applications.

(3) Full Architecture-level Modeling and Optimization:

Practical AI accelerators, including optical processors, should

be modeled at the system level. This has been previously

discussed in [11], however, accessing DRAM through a PCIe

switch will have large energy and latency overheads. We

elaborate on this aspect here, and model the system with co-

packaged high-bandwith memory (HBM), similar to state-of-

the-art AI accelerators. In addition, we discuss impacts and

trade-offs between the programming time, batch size, and

multiple cores in this work.

Fig. 2: Photonic crossbar array with peripheral electronics for

transmitter and receiver

III. PROPOSED CROSSBAR DESIGN FOR OPTICAL MAC

The overall proposed crossbar ONN is shown in Fig. 2.

Below, we describe two key components of this design:

A. Analog-based Optical MAC Core

The cross-bar design is an N×Marray of PCM-based unit

cells. Details of these unit cells and how the array perform

MAC is brieﬂy described below.

1) PCM-Based Unit Cell: Each unit cell multiplies an input

electric ﬁeld (E-ﬁeld) by the weight programmed into the PCM

section, and adds this product to an externally-inputted electric

ﬁeld through each column. To do so, a portion of light from

the row waveguide (E-ﬁeld into each row is denoted by |Ein,i|

in Fig. 3) is partially coupled into a bended waveguide via a

directional coupler (DC) with a cross-coupling ratio of kin,j

(input coupling strength is column-dependent). The portion

of the E-ﬁeld that does not couple into the unit cell passes

through a multi-mode interference (MMI) waveguide crossing

junction and enters the next column of unit cells to the right.

Each bended waveguide has a µm-long section covered with

PCM. Individual PCM cells can be programmed electrically to

be either in the amorphous or crystalline state, or somewhere

in between, in a non-volatile fashion [7], [8]. Programming

energy is estimated to be around 100pJ [7], [8]. The state

of PCM changes the absorption coefﬁcient, and hence it can

change the amplitude of E-ﬁeld. Consequently, if the PCM’s

programmed transmission in E-ﬁeld domain is wi,j , the E-

ﬁeld at the end of each bended waveguide will be |Ein,i| ×

kin,j ×wi,j , which we refer to as Ep,(i,j). This outcome will

be coupled into a column waveguide via another DC (with

kout,i coupling ratio), and it will travel into the next row in

the bottom, where the coherent summation with another row of

products will occur through the DC region across each column.

The output coupling strength is row-dependent.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ScalableCoherentOpticalCrossbarArchitectureusingPCMforAIAccelerationDanSturmElectricalandComputerEngineeringUniversityofWashingtonSeattle,USAdansturm@uw.eduSajjadMoazeniElectricalandComputerEngineeringUniversityofWashingtonSeattle,USAsmoazeni@uw.eduAbstractOpticalcomputinghasbeenrecentlyproposedasa...

展开>> 收起<<

Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration Dan Sturm.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration Dan Sturm

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: