
been shown to be sufficient for neural networks with high
accuracy [4], [5].
This paper is organized as follows: We briefly describe
related work in Section II. In Section III, we explain the
principles of performing the MAC operation in this work.
We present an overview of overall chip architecture and CNN
operation in Section IV, and section V explains our custom
simulation methodology. Finally, we present the results of our
fully optimized design in Section VI, and compare those with
state-of-the-art in Section VIII.
II. RELATED WORK
Researchers have recently proposed a variety of methods
to realize an ONN. Most of these works focus only on
the physics and devices rather than providing a system-level
solution and analysis. In this presented discussion, we only
consider integrated solutions, as free-space solutions [6] lack
the reconfigurability that is an essential part of any “computer”
and compatibility with mainstream CMOS technology. Fur-
thermore, we note that a suitable application space for ONNs
can be datacenters as opposed to edge computing according to
Fig. 1. Below we discuss the most promising solutions so far
from the perspective of three critical factors that we believe
have been addressed in this work:
(1) Scalability: While on-chip photonics provide high band-
widths, their footprints are fundamentally orders of magnitude
larger than advanced nm-scale CMOS. With only one or
two routing layers, building an ONN processor with big
dimensions has remained elusive. This will become even
more challenging considering the need for compact ADCs and
DACs. Mach-Zehnder Interferometer (MZI)-based coherent
architectures such as [2] have large chip areas and large-scale
realizations end up exceeding a few cm2. Non-coherent PCM-
based crossbars have been also proposed [7], however they
require many wavelengths for large matrix operations and that
is impractical. Time-multiplexed coherent arrays [8], [9] also
require 2D arrays of free-space detectors and yet only perform
vector-vector multiplication in one clock cycle.
(2) Monolithic Integration: Since any practical computing
system will eventually require high-density CMOS electronics
for I/O and memory, we argue that electronics and photonic
should be monolithically integrated on a single chip using
processes such as GF 45CLO [10]. While 3D integration
is typically proposed as an alternative, existing advanced
3D integration technologies (micro-bumps at 55µm pitch [])
cannot provide density for optical computing applications.
(3) Full Architecture-level Modeling and Optimization:
Practical AI accelerators, including optical processors, should
be modeled at the system level. This has been previously
discussed in [11], however, accessing DRAM through a PCIe
switch will have large energy and latency overheads. We
elaborate on this aspect here, and model the system with co-
packaged high-bandwith memory (HBM), similar to state-of-
the-art AI accelerators. In addition, we discuss impacts and
trade-offs between the programming time, batch size, and
multiple cores in this work.
Fig. 2: Photonic crossbar array with peripheral electronics for
transmitter and receiver
III. PROPOSED CROSSBAR DESIGN FOR OPTICAL MAC
The overall proposed crossbar ONN is shown in Fig. 2.
Below, we describe two key components of this design:
A. Analog-based Optical MAC Core
The cross-bar design is an N×Marray of PCM-based unit
cells. Details of these unit cells and how the array perform
MAC is briefly described below.
1) PCM-Based Unit Cell: Each unit cell multiplies an input
electric field (E-field) by the weight programmed into the PCM
section, and adds this product to an externally-inputted electric
field through each column. To do so, a portion of light from
the row waveguide (E-field into each row is denoted by |Ein,i|
in Fig. 3) is partially coupled into a bended waveguide via a
directional coupler (DC) with a cross-coupling ratio of kin,j
(input coupling strength is column-dependent). The portion
of the E-field that does not couple into the unit cell passes
through a multi-mode interference (MMI) waveguide crossing
junction and enters the next column of unit cells to the right.
Each bended waveguide has a µm-long section covered with
PCM. Individual PCM cells can be programmed electrically to
be either in the amorphous or crystalline state, or somewhere
in between, in a non-volatile fashion [7], [8]. Programming
energy is estimated to be around 100pJ [7], [8]. The state
of PCM changes the absorption coefficient, and hence it can
change the amplitude of E-field. Consequently, if the PCM’s
programmed transmission in E-field domain is wi,j , the E-
field at the end of each bended waveguide will be |Ein,i| ×
kin,j ×wi,j , which we refer to as Ep,(i,j). This outcome will
be coupled into a column waveguide via another DC (with
kout,i coupling ratio), and it will travel into the next row in
the bottom, where the coherent summation with another row of
products will occur through the DC region across each column.
The output coupling strength is row-dependent.