Accelerating the training of single-layer binary neural networks using the HHL quantum algorithm Sonia Lopez Alarcon Cory Merkel Martin Hoffnagle Sabrina Ly

2025-04-30 0 0 606.35KB 9 页 10玖币
侵权投诉
Accelerating the training of single-layer binary
neural networks using the HHL quantum algorithm
Sonia Lopez Alarcon, Cory Merkel, Martin Hoffnagle, Sabrina Ly
Department of Computer Engineering
Rochester Institute of Technology
Rochester NY, USA
{slaeec, cemeec, mah5414, sjl2178}[rit.edu
Alejandro Pozas-Kerstjens
Institute of Mathematical Sciences
(CSIC-UCM-UC3M-UAM)
Madrid, Spain
physics[alexpozas.com
Abstract—Binary Neural Networks are a promising technique
for implementing efficient deep models with reduced storage and
computational requirements. The training of these is however,
still a compute-intensive problem that grows drastically with the
layer size and data input. At the core of this calculation is the
linear regression problem. The Harrow-Hassidim-Lloyd (HHL)
quantum algorithm has gained relevance thanks to its promise
of providing a quantum state containing the solution of a linear
system of equations. The solution is encoded in superposition at
the output of a quantum circuit. Although this seems to provide
the answer to the linear regression problem for the training
neural networks, it also comes with multiple, difficult-to-avoid
hurdles. This paper shows, however, that useful information can
be extracted from the quantum-mechanical implementation of
HHL, and used to reduce the complexity of finding the solution
on the classical side.
Index Terms—Quantum Computing, Binary Neural Networks,
Quantum Machine learning
I. INTRODUCTION AND MOTIVATION
Machine Learning and in particular deep learning and
neural networks, are almost ubiquitous in today’s world, from
finances to healthcare. The growth of data and the size of the
models, however, place increasing pressure on the computa-
tional resources that support them, and they constantly fall
short of growing expectations and demands.
Although still in very early stages, Quantum Computing
holds some promises of mitigating this issue, even if as
a hybrid model that will partially solve the problem when
combined with classical approaches. Current trends in this
direction include the Quantum Approximate Optimization
Algorithm, QAOA [1]. An optimization problem is at the
core of any neural network training process, with the goal
of finding the weights to be assigned to the nodes of the
network. QAOA attempts to solve this optimization problem
by combining implementing a paramatrizable quantum circuit
for which parameters are adjusted to find the solution within
a predefined cost function. It is possible to implement this
This work is partially supported by the Spanish Ministry of Science and
Innovation MCIN/AEI/ 10.13039/501100011033 (“Severo Ochoa Programme
for Centres of Excellence in R&D” CEX2019-000904-S and grant PID2020-
113523GB-I00), the Spanish Ministry of Economic Affairs and Digital Trans-
formation (project QUANTUM ENIA, as part of the Recovery, Transformation
and Resilience Plan, funded by EU program NextGenerationEU), Comunidad
de Madrid (QUITEMAD-CM P2018/TCS-4342), and the CSIC Quantum
Technologies Platform PTI-001.
approach within the Noisy, Intermediate-Scale Quantum com-
puting era [2], which unfortunately only provides approximate
solutions for, at the moment, very small models. However,
the fact that current quantum computing devices are physical
systems subject to noise can also be exploited for aiding the
training of classical machine learning models [3].
On the other hand, the optimization problem associated with
the training of single-layer neural networks can be approx-
imated as a linear regression problem. The HHL quantum
algorithm [4, 5] was proposed as a potential solution linear
systems of equations. However, plainly using HHL to solve
the training of NN is unfeasible due to a number of reasons
[6], even if the technology progresses to provide sufficient
resources and low noise levels.
One of the hurdles is that HHL provides, at the output of the
quantum circuit, a quantum state that encodes the solution of
the linear system of equations in superposition. Extracting the
value of each of the components of the solution vector requires
in the best case scenario —i.e., the solution being in the +R
field— to perform many measurements in the computational
basis, which allegedly ruins the potential quantum advantage.
The number of measurements required (computed as the
number of runs of the protocol times the number of qubits
measured at each run) to accurately extract the amplitudes
of a superposition state with coefficients in +Ris generally
assumed to be in the order of n, where nis the dimension of
quantum system, or in the particular case of neural network
training, the number of weights. This, however, depends on the
accuracy that is expected of the solution, and the shape of the
outcome probability distribution. If the solution vector is not in
the +Rfield, the situation is even worse, requiring quantum
tomography to extract the amplitudes of the quantum states
in superposition. This problem is discussed in more depth in
Section IV.
It is possible, however, to apply operations to the solution
in superposition. One of such operations is the SWAP test
[7], which can be used to extract the distance between two
quantum states, encoding it in the amplitudes of a single
qubit as will be shown in Section III. In a similar context
[5], the SWAP test has been used in the past to verify the
correctness of proof of concept implementations of HHL.
When the implementation is correct, the distance with the
arXiv:2210.12707v1 [quant-ph] 23 Oct 2022
Fig. 1: The proposed quantum implementation allows to use
the SWAP test to calculate the distance between the weights
state —state containing the solution to the optimal weights
vector|Wiand some other selected test state |tW i. The
search space is then reduced from the hyperspace of all
possible solutions to the surface of the hyper-sphere with
center in |tW iand radius d.
foreknown solution state is equal to zero. This is, however,
not useful unless one is able to calculate the solution —
classically— beforehand, and this solution is implementable
as a quantum superposition.
This paper proposes that, instead of being used to compare
the circuit’s preparation with the solution (which implies
knowing the solution in advance), the SWAP test be performed
against several, fixed, “reference” states, and the information
obtained is used to reduce the computational complexity of the
classical search for the optimal set of weights in the training
process of Binary neural networks (BNN).
BNNs are promising as possible computationally efficient
implementation to neural networks. BNNs are the specific
realization of quantized NN in which the weights will be
limited to one of two values 1 or -1, or for the implementation
targeted in this paper, 0 or 1. BNNs are also promising as a
possible candidate to the application of HHL in combination
with the SWAP test to extract information for two reasons:
(1) the solution is promised to be in the +Rfield, and (2) the
vector to compare against is easy to implement as a quantum
superposition state.
Figure 1 illustrates this approach. As this paper will show,
given a “reference” test, the SWAP test can provide the
distance from the solution to the this state, and hence, reduce
the classical search of the solution to the hypersphere of
distance dfrom the reference state.
II. BACKGROUND AND RELATED WORK
A. Binary Neural Networks
Machine learning continues to make rapid advances in its
ability to accurately model complex relationships in high-
dimensional data. A prime example is the application of deep
artificial neural networks (ANNs) for classification of objects
in images. From automatic mail sorting to self-driving cars,
this capability is a key enabler to automate some of the most
critical aspects of human decision making. However, the size
and complexity of ANNs performing these operations appears
to grow exponentially with linear gains in their performance
[8]. This has led to top-performing models with billions of
parameters that require massive computational resources to
train and evaluate. Consequently, there is a push to reduce the
size of ANN models to enable their deployment on resource-
constrained hardware such as edge devices. While several
methods exist, one of the most popular approaches is to reduce
the precision of ANN computations and parameters. In the
extreme case of binary neural networks (BNN), the ANN
weights and activation functions are reduced from 32- or 64-
bit floating point values to 1-bit representations (e.g. -1 and 1,
or 0 and 1).
BNNs are usually trained using quantization-aware meth-
ods, where a full-precision model is quantized during the
forward pass and then updated using backpropagation with
approximations for the gradient of the quantization operator.
Given an ANN model Πand a loss function L, a weight
connecting neuron iin layer l1and neuron jin layer l
gets updated by gradient descent:
wl
i,j =αL {q(Π)}
wl
i,j
(1)
where qis the quantization function and αis the learning
rate. The quantization function can take several forms. The
simplest one is rounding the corresponding value (weights
or activations) to the nearest valid quantization level. Dur-
ing training, the forward and backpropagation steps require
matrix-vector multiplications at each layer. Here, we assume
that the computation of the binary activation function, which
requires only a scalar comparison operation, is much less
costly than the matrix-vector product. Thus, the time com-
plexity approximately scales as OL
P
l=1
NlNl1, where Lis
the total number of layers and Nlis the number of neurons
in layer l. This can be bounded from above as OkLN2
max,
where Nmax is the maximum layer size and kis the number
of training iterations. Generally the complexity of kis O(1/),
where is the loss level. If information about the solution is
known a priori, then the gradient descent can be constrained,
possibly leading to a smaller k. For example, if the Euclidean
distance dto the solution wfrom a test point cis known,
then the search space becomes restricted to the hypersphere
centered at c. This is demonstrated in Figure 2. Note that
the intersection of the hypersphere with the vertices of the
hypercube form a hyperplane of form
c·w=Nd2
2
1(2)
Later, we will show that restricting the search space of an
N-dimensional linear least squares problem to an N1-
dimensional hyperplane, and within the boundaries of the
1Respecting the ML and QC notations, wand ccorrespond to |Wiand
|tW irespectively in Figure 1 and other sections of this paper.
摘要:

Acceleratingthetrainingofsingle-layerbinaryneuralnetworksusingtheHHLquantumalgorithmSoniaLopezAlarcon,CoryMerkel,MartinHoffnagle,SabrinaLyDepartmentofComputerEngineeringRochesterInstituteofTechnologyRochesterNY,USAfslaeec,cemeec,mah5414,sjl2178g[rit.eduAlejandroPozas-KerstjensInstituteofMathematical...

展开>> 收起<<
Accelerating the training of single-layer binary neural networks using the HHL quantum algorithm Sonia Lopez Alarcon Cory Merkel Martin Hoffnagle Sabrina Ly.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:606.35KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注