AnalogVNN A Fully Modular Framework for Modeling and Optimizing Photonic Neural Networks Vivswan Shah and Nathan Youngblood

2025-04-30 0 0 5.19MB 28 页 10玖币
侵权投诉
AnalogVNN: A Fully Modular Framework for Modeling and
Optimizing Photonic Neural Networks
Vivswan Shah, and Nathan Youngblood
Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261
Corresponding author: vivswanshah@pitt.edu and nathan.youngblood@pitt.edu
Abstract: In this paper, we present AnalogVNN, a simulation framework built on PyTorch which
can simulate the effects of optoelectronic noise, limited precision, and signal normalization present
in photonic neural network accelerators. We use this framework to train and optimize linear and
convolutional neural networks with up to 9 layers and ~1.7 million parameters, while gaining
insights into how normalization, activation function, reduced precision, and noise influence
accuracy in analog photonic neural networks. By following the same layer structure design present
in PyTorch, the AnalogVNN framework allows users to convert most digital neural network
models to their analog counterparts with just a few lines of code, taking full advantage of the open-
source optimization, deep learning, and GPU acceleration libraries available through PyTorch.
I. Introduction
In the past decade, there has been exponential growth in the size, complexity, efficiency,
and robustness of deep neural networks [1], [2]. However, the computing resources needed to fuel
this continuous advancement in machine learning have also grown exponentially - much faster
than the performance and efficiency improvements of the hardware used to train networks [2] or
perform inference [3]. The computation of DNNs is comprised of approximately 90% linear
operations (matrix-vector multiplications or convolutions) and 10% simple nonlinear operations
(e.g., sigmoid, ReLU, tanh, etc.) [4]. These linear operations can be processed with high efficiency
and low latency using analog computational schemes by leveraging parallelized circuits in the
analog domain. For example, a single READ operation on a memristor array can perform an entire
matrix-vector multiplication in a single clock cycle [5]–[7]. So, by performing the computation in
the analog domain, we can significantly reduce the training and inference times of neural networks
[8]. One particularly attractive approach is photonic analog computing which promises ultra-low
latency, high energy efficiency, and unparalleled data throughput [9]–[14]. This is possible due to
the enhanced modulation speeds of optical waveguides and fibers over their electronic counterparts
which suffer from resistive and capacitive losses [15]. Thus, by using photonics, the large-scale
linear operations of neural networks can be performed efficiently at high modulation speeds and
ultra-low latencies, potentially leading to networks with high throughput and real-time processing
[16].
Translating neural network models directly from the digital domain to any other domain
(the photonic analog domain in our case) without any changes to its structure or hyperparameters
will cause a significant reduction in the accuracy and generalizability of the model. Neural
networks are explicitly trained to operate in the environment they were trained in. Simply
arXiv:2210.10048v2 [cs.LG] 6 Jun 2023
translating the weight or structure of a network to a new environment with reduced precision or
increased noise is as problematic as it is the biological counterpart of a head transplant. By
carefully designing neural networks based on their computing environment (digital, analog, or
physical), a significant improvement in the model’s generalizability, accuracy, and ability to learn
the crucial features of the dataset can be seen [17]. For analog DNN accelerators, this can be done
by implementing models on-chip for different hyperparameter combinations and testing each of
them, but this is time-consuming and costly to do. To overcome this problem, we can simulate
computation in the analog domain and test against various hyperparameters virtually. It has already
been shown that even a crude simulation of the target domain can overcome much of the accuracy
and generalizability problems [17], [18].
We have developed the Analog Virtual Neural Network (AnalogVNN) framework [19] to
do exactly this. Distinct from other approaches which typically focus on modeling the physical
response of the analog hardware in question [8], [17], [18], [20]–[22], we have chosen to abstract
the physical properties of the analog hardware and instead model the effects an ensemble of analog
computing elements at a higher level (i.e., normalization, limited precision, stochastic rounding,
and additive noise). This approach greatly simplifies the translation of digital neural network
models to the analog domain, while minimizing the additional computational overhead required to
model analog hardware as illustrated in Figure 1a. We have built AnalogVNN on PyTorch [23] to
easily simulate the effects of optoelectronic noise, limited precision, and signal normalization
present in all photonic analog hardware. While we have designed the AnalogVNN framework with
photonic hardware in mind (e.g., coherent [11], [24], electro-absorptive [25], phase-change [26],
microring resonator [27], and dispersive fiber-based architectures [12] as illustrated in Figure 1b),
the generality of our approach allows all researchers to easily extend our work to other analog
neural networks, such as those based on electronic, magnetic, or spintronic hardware [8], [9], [28]
[30].
The repository for AnalogVNN is available at https://analogvnn.github.io
Sample code: https://analogvnn.github.io/sample_code
II. The Analog Virtual Neural Network (AnalogVNN) Methodology
The photonic analog domain differs from the digital domain in two major ways. First, one
has to account for continuous variability due to added noise from physical processes and second,
the precision is typically limited by photon shot noise to 8-bits or less for the optical powers and
modulation speeds of interest [31]. In the case of photonic weights, physical processes such as
thermal drift in microring resonators or stochastic effects in the programming of phase-change
photonic memory, introduce stochastic noise to the weight matrix. Photonic analog inputs, on the
other hand, have been limited to even lower precision in practice (e.g., 4-bit for PAM-16
modulators). As high-speed optical modulators have primarily targeted telecommunication
applications, they are typically designed to generate a limited set of optical amplitudes which
minimize bit error rates from optoelectronic noise and timing jitter. These characteristics can be
abstracted and simulated by adding intermediate layers which intentionally introduce noise and
reduce precision (using Noise and Reduce Precision layers respectively) to a digital model for the
linear analog system (Figure 1c-d). In this way, the digital models are able to efficiently imitate
the analog environment, and exploration of analog hyperparameters can be achieved more
effectively. The optimization of the network can therefore be more efficient, and we can begin to
identify hyperparameters that improve the model’s generalizability, accuracy, learning rate, and
the ability for the network to learn the crucial features of the dataset [32][36].
Figure
1: Overview of the AnalogVNN framework. a) Trade-
off between computational burden and
accuracy of model used to simulate analog AI hardware. AnalogVNN bridges the gap between fully digital
and fully analog frameworks.
b)
Conceptual illustration of the three main abstraction layers within
AnalogVNN
:
optical modulation (input vector), trainable photonic weights (matrix operation), and optical
detection (output vector). The physical implementation of the photonic hardware is abstracted to
normalization, noise, and reduced precision layers, allowing our framework to be applied to various
photonic architectures.
c) Overview of a 3-layer linear
model with three Analog Linear and Activation
layers. The inner working of an Analog Linear layer is shown in the red dotted box with its corresponding
on
-chip pho
tonic implementation. Blue and Green arrows represent forward and backward pass
respectively.
d)
Illustration of the optoelectronic analog effects modeled during a forward pass through a
linear analog layer.
Contributions
Simulating an analog neural network with PyTorch or TensorFlow requires the
aforementioned additional features which are not present in current frameworks. The first
requirement is the ability to create parameterized weights, biases, and layers without affecting the
gradient flow graph of the network. As an example, one should be able to add normalization,
precision, and noise to weights, biases, or between layers and still retain the ability to calculate as
if these additional layers were not present. Second, training analog network models may require a
new analog optimizer to train efficiently. These optimizers can be created by combining the
properties of the analog domain with those already well-established digital optimizers (like Adam,
SDG, etc.). Nandakumar et al. 2020 [8] show an example of this in which a new Reduce Precision
Optimizer was used to train an on-chip network faster and more efficiently. Third, as stated earlier,
by following the same layer structure design present in PyTorch, AnalogVNN allows users to
convert most digital neural network models to their analog counterparts with just a few lines of
code. When comparing PyTorch sample code [37] from its tutorial to AnalogVNN sample code
[38], the only differences which are specific to AnalogVNN can be found in add layer function
(which adds Reduce Precision, Noise and Normalization layers) and PseudoParameter (which
converts digital parameters into analog parameters), hence adding only 12 new lines code unique
to AnalogVNN. Because we have built AnalogVNN with PyTorch [23] modularity and
compatibility in mind, this provides ease of use and same-day access (i.e., zero-day access) to
future new PyTorch features.
This combination of features and options provides a robust and customizable environment
for researchers to design, simulate, and test arbitrary photonic or analog neural network models
based on other similar hardware. We first use AnalogVNN to design and optimize hyperparameters
in small 3- to 6-layer photonic image classification models. We then show the generality of our
conclusions from these smaller models by optimizing the larger and more complex 9-layer CIFAR-
10 convolutional neural network (CNN) model by P. Kaur for the photonic analog domain [8],
[39]. The main features of AnalogVNN, which we used in this paper are the ability to control the
gradient flow graph (e.g., skipping noise layer and reduce precision layer during backpropagation,
see Figure 1c) and the introduction of noise, reduce precision, and normalization to model
parameters (e.g., the weights and biases of the network) as shown in Figure 1d. Finally, after testing
and training, the final analog neural network with optimized hyperparameters can be transferred
to a photonic chip for on-chip optimization.
III. Analog Layer Design Approach
To design a layer that can simulate an analog system such as a photonic processor, we look
at the major factors which make a photonic network different from a digital network. Namely,
photonic and digital networks differ in their inputs, outputs, and storage units (weights and biases).
In digital systems, inputs can have very high precision (216 discrete levels or 16-bit floating point
is used by PyTorch and TensorFlow [23], [40], [41]) and very low noise (negligible noise due to
digital operation), while in photonic systems inputs are generated by optical sources (typically
lasers) and are limited by the physical output characteristics of these optical sources. Relative
intensity noise (RIN) and modulation frequency place a fundamental upper bound on the precision
of the optical inputs. For example, to encode an 8-bit analog signal with a low noise laser source
(RIN =165 dB/Hz), the modulation frequency will be limited to around 4 GHz or less regardless
of the optical power [31]. In addition to relative intensity noise, other noise sources such as optical
shot noise from the laser, thermal noise from the analog driving circuitry, and timing jitter, can
contaminate the analog signal and limit the maximum precision of the inputs [42]. So, while digital
inputs can take on a very large range of values due to their high precision, this is not true for analog
systems. Analog inputs also typically operate on a system of regular divisions across a relative
scale rather than the binary representation of maximal values across many bits like in digital
systems. For example, the maximum power of the laser is typically mapped to encode a normalized
value between 0 and 1, while the phase of the optical input can be used to encode negative values
in coherent architectures. Additionally, analog systems driven by digital inputs are made pseudo-
analog to encode discrete digital values in the presence of noise (e.g., as in PAM-4, PAM-16, or
64-QAM modulation schemes). To virtualize these effects illustrated in Figure 2a, we first
normalize the input signals, divide them to a certain precision, and then add noise. It is important
for noise to be added after digitization since it would otherwise be removed by the digitization
process. Therefore, the differences between inputs in the digital and photonics domains can be
simulated using: 1) Normalization, 2) Reduce Precision, and 3) Noise layers to represent the analog
response to digital inputs (see Figure 2b). Details on the mathematical implementation of each of
these layers can be found in the Appendix.
Figure
2: Modeling approach for analog input and output layers. a) Example optoelectronic
circuit
schematic
and b) AnalogVNN implementation for encoding analog
inputs to the optical processing unit.
Blue, black, and red
circuit connections represent digital, electrical, and optical interconnects
respectively.
Various
sources of noise can be modeled by adding multiple noise layers in series. c) Typical e
lectrical
c
ircuit diagram and d)
AnalogVNN implementation used to convert processed optical signals back to the
digital domain.
During the hyperparameter exploration in the following sections, a single noise
Gaussian
noise layer
was used in both the optical input and electrical output layers for simplicity.
摘要:

AnalogVNN:AFullyModularFrameworkforModelingandOptimizingPhotonicNeuralNetworksVivswanShah,andNathanYoungbloodDepartmentofElectricalandComputerEngineering,UniversityofPittsburgh,Pittsburgh,PA15261Correspondingauthor:vivswanshah@pitt.eduandnathan.youngblood@pitt.eduAbstract:Inthispaper,wepresentAnalog...

展开>> 收起<<
AnalogVNN A Fully Modular Framework for Modeling and Optimizing Photonic Neural Networks Vivswan Shah and Nathan Youngblood.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:28 页 大小:5.19MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注