AnalogVNN A Fully Modular Framework for Modeling and Optimizing Photonic Neural Networks Vivswan Shah and Nathan Youngblood

2025-04-30 0 0 5.19MB 28 页 10玖币

侵权投诉

AnalogVNN: A Fully Modular Framework for Modeling and

Optimizing Photonic Neural Networks

Vivswan Shah, and Nathan Youngblood

Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261

Corresponding author: vivswanshah@pitt.edu and nathan.youngblood@pitt.edu

Abstract: In this paper, we present AnalogVNN, a simulation framework built on PyTorch which

can simulate the effects of optoelectronic noise, limited precision, and signal normalization present

in photonic neural network accelerators. We use this framework to train and optimize linear and

convolutional neural networks with up to 9 layers and ~1.7 million parameters, while gaining

insights into how normalization, activation function, reduced precision, and noise influence

accuracy in analog photonic neural networks. By following the same layer structure design present

in PyTorch, the AnalogVNN framework allows users to convert most digital neural network

models to their analog counterparts with just a few lines of code, taking full advantage of the open-

source optimization, deep learning, and GPU acceleration libraries available through PyTorch.

I. Introduction

In the past decade, there has been exponential growth in the size, complexity, efficiency,

and robustness of deep neural networks [1], [2]. However, the computing resources needed to fuel

this continuous advancement in machine learning have also grown exponentially - much faster

than the performance and efficiency improvements of the hardware used to train networks [2] or

perform inference [3]. The computation of DNNs is comprised of approximately 90% linear

operations (matrix-vector multiplications or convolutions) and 10% simple nonlinear operations

(e.g., sigmoid, ReLU, tanh, etc.) [4]. These linear operations can be processed with high efficiency

and low latency using analog computational schemes by leveraging parallelized circuits in the

analog domain. For example, a single READ operation on a memristor array can perform an entire

matrix-vector multiplication in a single clock cycle [5]–[7]. So, by performing the computation in

the analog domain, we can significantly reduce the training and inference times of neural networks

[8]. One particularly attractive approach is photonic analog computing which promises ultra-low

latency, high energy efficiency, and unparalleled data throughput [9]–[14]. This is possible due to

the enhanced modulation speeds of optical waveguides and fibers over their electronic counterparts

which suffer from resistive and capacitive losses [15]. Thus, by using photonics, the large-scale

linear operations of neural networks can be performed efficiently at high modulation speeds and

ultra-low latencies, potentially leading to networks with high throughput and real-time processing

[16].

Translating neural network models directly from the digital domain to any other domain

(the photonic analog domain in our case) without any changes to its structure or hyperparameters

will cause a significant reduction in the accuracy and generalizability of the model. Neural

networks are explicitly trained to operate in the environment they were trained in. Simply

arXiv:2210.10048v2 [cs.LG] 6 Jun 2023

translating the weight or structure of a network to a new environment with reduced precision or

increased noise is as problematic as it is the biological counterpart of a head transplant. By

carefully designing neural networks based on their computing environment (digital, analog, or

physical), a significant improvement in the model’s generalizability, accuracy, and ability to learn

the crucial features of the dataset can be seen [17]. For analog DNN accelerators, this can be done

by implementing models on-chip for different hyperparameter combinations and testing each of

them, but this is time-consuming and costly to do. To overcome this problem, we can simulate

computation in the analog domain and test against various hyperparameters virtually. It has already

been shown that even a crude simulation of the target domain can overcome much of the accuracy

and generalizability problems [17], [18].

We have developed the Analog Virtual Neural Network (AnalogVNN) framework [19] to

do exactly this. Distinct from other approaches which typically focus on modeling the physical

response of the analog hardware in question [8], [17], [18], [20]–[22], we have chosen to abstract

the physical properties of the analog hardware and instead model the effects an ensemble of analog

computing elements at a higher level (i.e., normalization, limited precision, stochastic rounding,

and additive noise). This approach greatly simplifies the translation of digital neural network

models to the analog domain, while minimizing the additional computational overhead required to

model analog hardware as illustrated in Figure 1a. We have built AnalogVNN on PyTorch [23] to

easily simulate the effects of optoelectronic noise, limited precision, and signal normalization

present in all photonic analog hardware. While we have designed the AnalogVNN framework with

photonic hardware in mind (e.g., coherent [11], [24], electro-absorptive [25], phase-change [26],

microring resonator [27], and dispersive fiber-based architectures [12] as illustrated in Figure 1b),

the generality of our approach allows all researchers to easily extend our work to other analog

neural networks, such as those based on electronic, magnetic, or spintronic hardware [8], [9], [28]–

[30].

The repository for AnalogVNN is available at https://analogvnn.github.io

Sample code: https://analogvnn.github.io/sample_code

II. The Analog Virtual Neural Network (AnalogVNN) Methodology

The photonic analog domain differs from the digital domain in two major ways. First, one

has to account for continuous variability due to added noise from physical processes and second,

the precision is typically limited by photon shot noise to 8-bits or less for the optical powers and

modulation speeds of interest [31]. In the case of photonic weights, physical processes such as

thermal drift in microring resonators or stochastic effects in the programming of phase-change

photonic memory, introduce stochastic noise to the weight matrix. Photonic analog inputs, on the

other hand, have been limited to even lower precision in practice (e.g., 4-bit for PAM-16

modulators). As high-speed optical modulators have primarily targeted telecommunication

applications, they are typically designed to generate a limited set of optical amplitudes which

minimize bit error rates from optoelectronic noise and timing jitter. These characteristics can be

abstracted and simulated by adding intermediate layers which intentionally introduce noise and

reduce precision (using Noise and Reduce Precision layers respectively) to a digital model for the

linear analog system (Figure 1c-d). In this way, the digital models are able to efficiently imitate

the analog environment, and exploration of analog hyperparameters can be achieved more

effectively. The optimization of the network can therefore be more efficient, and we can begin to

identify hyperparameters that improve the model’s generalizability, accuracy, learning rate, and

the ability for the network to learn the crucial features of the dataset [32]–[36].

Figure

1: Overview of the AnalogVNN framework. a) Trade-

off between computational burden and

accuracy of model used to simulate analog AI hardware. AnalogVNN bridges the gap between fully digital

and fully analog frameworks.

Conceptual illustration of the three main abstraction layers within

AnalogVNN

optical modulation (input vector), trainable photonic weights (matrix operation), and optical

detection (output vector). The physical implementation of the photonic hardware is abstracted to

normalization, noise, and reduced precision layers, allowing our framework to be applied to various

photonic architectures.

c) Overview of a 3-layer linear

model with three Analog Linear and Activation

layers. The inner working of an Analog Linear layer is shown in the red dotted box with its corresponding

-chip pho

tonic implementation. Blue and Green arrows represent forward and backward pass

respectively.

Illustration of the optoelectronic analog effects modeled during a forward pass through a

linear analog layer.

Contributions

Simulating an analog neural network with PyTorch or TensorFlow requires the

aforementioned additional features which are not present in current frameworks. The first

requirement is the ability to create parameterized weights, biases, and layers without affecting the

gradient flow graph of the network. As an example, one should be able to add normalization,

precision, and noise to weights, biases, or between layers and still retain the ability to calculate as

if these additional layers were not present. Second, training analog network models may require a

new analog optimizer to train efficiently. These optimizers can be created by combining the

properties of the analog domain with those already well-established digital optimizers (like Adam,

SDG, etc.). Nandakumar et al. 2020 [8] show an example of this in which a new Reduce Precision

Optimizer was used to train an on-chip network faster and more efficiently. Third, as stated earlier,

by following the same layer structure design present in PyTorch, AnalogVNN allows users to

convert most digital neural network models to their analog counterparts with just a few lines of

code. When comparing PyTorch sample code [37] from its tutorial to AnalogVNN sample code

[38], the only differences which are specific to AnalogVNN can be found in add layer function

(which adds Reduce Precision, Noise and Normalization layers) and PseudoParameter (which

converts digital parameters into analog parameters), hence adding only 12 new lines code unique

to AnalogVNN. Because we have built AnalogVNN with PyTorch [23] modularity and

compatibility in mind, this provides ease of use and same-day access (i.e., zero-day access) to

future new PyTorch features.

This combination of features and options provides a robust and customizable environment

for researchers to design, simulate, and test arbitrary photonic or analog neural network models

based on other similar hardware. We first use AnalogVNN to design and optimize hyperparameters

in small 3- to 6-layer photonic image classification models. We then show the generality of our

conclusions from these smaller models by optimizing the larger and more complex 9-layer CIFAR-

10 convolutional neural network (CNN) model by P. Kaur for the photonic analog domain [8],

[39]. The main features of AnalogVNN, which we used in this paper are the ability to control the

gradient flow graph (e.g., skipping noise layer and reduce precision layer during backpropagation,

see Figure 1c) and the introduction of noise, reduce precision, and normalization to model

parameters (e.g., the weights and biases of the network) as shown in Figure 1d. Finally, after testing

and training, the final analog neural network with optimized hyperparameters can be transferred

to a photonic chip for on-chip optimization.

III. Analog Layer Design Approach

To design a layer that can simulate an analog system such as a photonic processor, we look

at the major factors which make a photonic network different from a digital network. Namely,

photonic and digital networks differ in their inputs, outputs, and storage units (weights and biases).

In digital systems, inputs can have very high precision (216 discrete levels or 16-bit floating point

is used by PyTorch and TensorFlow [23], [40], [41]) and very low noise (negligible noise due to

digital operation), while in photonic systems inputs are generated by optical sources (typically

lasers) and are limited by the physical output characteristics of these optical sources. Relative

intensity noise (RIN) and modulation frequency place a fundamental upper bound on the precision

of the optical inputs. For example, to encode an 8-bit analog signal with a low noise laser source

(RIN =−165 dB/Hz), the modulation frequency will be limited to around 4 GHz or less regardless

of the optical power [31]. In addition to relative intensity noise, other noise sources such as optical

shot noise from the laser, thermal noise from the analog driving circuitry, and timing jitter, can

contaminate the analog signal and limit the maximum precision of the inputs [42]. So, while digital

inputs can take on a very large range of values due to their high precision, this is not true for analog

systems. Analog inputs also typically operate on a system of regular divisions across a relative

scale rather than the binary representation of maximal values across many bits like in digital

systems. For example, the maximum power of the laser is typically mapped to encode a normalized

value between 0 and 1, while the phase of the optical input can be used to encode negative values

in coherent architectures. Additionally, analog systems driven by digital inputs are made pseudo-

analog to encode discrete digital values in the presence of noise (e.g., as in PAM-4, PAM-16, or

64-QAM modulation schemes). To virtualize these effects illustrated in Figure 2a, we first

normalize the input signals, divide them to a certain precision, and then add noise. It is important

for noise to be added after digitization since it would otherwise be removed by the digitization

process. Therefore, the differences between inputs in the digital and photonics domains can be

simulated using: 1) Normalization, 2) Reduce Precision, and 3) Noise layers to represent the analog

response to digital inputs (see Figure 2b). Details on the mathematical implementation of each of

these layers can be found in the Appendix.

Figure

2: Modeling approach for analog input and output layers. a) Example optoelectronic

circuit

schematic

and b) AnalogVNN implementation for encoding analog

inputs to the optical processing unit.

Blue, black, and red

circuit connections represent digital, electrical, and optical interconnects

respectively.

Various

sources of noise can be modeled by adding multiple noise layers in series. c) Typical e

lectrical

ircuit diagram and d)

AnalogVNN implementation used to convert processed optical signals back to the

digital domain.

During the hyperparameter exploration in the following sections, a single noise

Gaussian

noise layer

was used in both the optical input and electrical output layers for simplicity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnalogVNN:AFullyModularFrameworkforModelingandOptimizingPhotonicNeuralNetworksVivswanShah,andNathanYoungbloodDepartmentofElectricalandComputerEngineering,UniversityofPittsburgh,Pittsburgh,PA15261Correspondingauthor:vivswanshah@pitt.eduandnathan.youngblood@pitt.eduAbstract:Inthispaper,wepresentAnalog...

展开>> 收起<<

AnalogVNN A Fully Modular Framework for Modeling and Optimizing Photonic Neural Networks Vivswan Shah and Nathan Youngblood.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AnalogVNN A Fully Modular Framework for Modeling and Optimizing Photonic Neural Networks Vivswan Shah and Nathan Youngblood

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: