Accelerating the training of single-layer binary neural networks using the HHL quantum algorithm Sonia Lopez Alarcon Cory Merkel Martin Hoffnagle Sabrina Ly

2025-04-30 0 0 606.35KB 9 页 10玖币

侵权投诉

Accelerating the training of single-layer binary

neural networks using the HHL quantum algorithm

Sonia Lopez Alarcon, Cory Merkel, Martin Hoffnagle, Sabrina Ly

Department of Computer Engineering

Rochester Institute of Technology

Rochester NY, USA

{slaeec, cemeec, mah5414, sjl2178}[rit.edu

Alejandro Pozas-Kerstjens

Institute of Mathematical Sciences

(CSIC-UCM-UC3M-UAM)

Madrid, Spain

physics[alexpozas.com

Abstract—Binary Neural Networks are a promising technique

for implementing efﬁcient deep models with reduced storage and

computational requirements. The training of these is however,

still a compute-intensive problem that grows drastically with the

layer size and data input. At the core of this calculation is the

linear regression problem. The Harrow-Hassidim-Lloyd (HHL)

quantum algorithm has gained relevance thanks to its promise

of providing a quantum state containing the solution of a linear

system of equations. The solution is encoded in superposition at

the output of a quantum circuit. Although this seems to provide

the answer to the linear regression problem for the training

neural networks, it also comes with multiple, difﬁcult-to-avoid

hurdles. This paper shows, however, that useful information can

be extracted from the quantum-mechanical implementation of

HHL, and used to reduce the complexity of ﬁnding the solution

on the classical side.

Index Terms—Quantum Computing, Binary Neural Networks,

Quantum Machine learning

I. INTRODUCTION AND MOTIVATION

Machine Learning and in particular deep learning and

neural networks, are almost ubiquitous in today’s world, from

ﬁnances to healthcare. The growth of data and the size of the

models, however, place increasing pressure on the computa-

tional resources that support them, and they constantly fall

short of growing expectations and demands.

Although still in very early stages, Quantum Computing

holds some promises of mitigating this issue, even if as

a hybrid model that will partially solve the problem when

combined with classical approaches. Current trends in this

direction include the Quantum Approximate Optimization

Algorithm, QAOA [1]. An optimization problem is at the

core of any neural network training process, with the goal

of ﬁnding the weights to be assigned to the nodes of the

network. QAOA attempts to solve this optimization problem

by combining implementing a paramatrizable quantum circuit

for which parameters are adjusted to ﬁnd the solution within

a predeﬁned cost function. It is possible to implement this

This work is partially supported by the Spanish Ministry of Science and

Innovation MCIN/AEI/ 10.13039/501100011033 (“Severo Ochoa Programme

for Centres of Excellence in R&D” CEX2019-000904-S and grant PID2020-

113523GB-I00), the Spanish Ministry of Economic Affairs and Digital Trans-

formation (project QUANTUM ENIA, as part of the Recovery, Transformation

and Resilience Plan, funded by EU program NextGenerationEU), Comunidad

de Madrid (QUITEMAD-CM P2018/TCS-4342), and the CSIC Quantum

Technologies Platform PTI-001.

approach within the Noisy, Intermediate-Scale Quantum com-

puting era [2], which unfortunately only provides approximate

solutions for, at the moment, very small models. However,

the fact that current quantum computing devices are physical

systems subject to noise can also be exploited for aiding the

training of classical machine learning models [3].

On the other hand, the optimization problem associated with

the training of single-layer neural networks can be approx-

imated as a linear regression problem. The HHL quantum

algorithm [4, 5] was proposed as a potential solution linear

systems of equations. However, plainly using HHL to solve

the training of NN is unfeasible due to a number of reasons

[6], even if the technology progresses to provide sufﬁcient

resources and low noise levels.

One of the hurdles is that HHL provides, at the output of the

quantum circuit, a quantum state that encodes the solution of

the linear system of equations in superposition. Extracting the

value of each of the components of the solution vector requires

in the best case scenario —i.e., the solution being in the +R

ﬁeld— to perform many measurements in the computational

basis, which allegedly ruins the potential quantum advantage.

The number of measurements required (computed as the

number of runs of the protocol times the number of qubits

measured at each run) to accurately extract the amplitudes

of a superposition state with coefﬁcients in +Ris generally

assumed to be in the order of n, where nis the dimension of

quantum system, or in the particular case of neural network

training, the number of weights. This, however, depends on the

accuracy that is expected of the solution, and the shape of the

outcome probability distribution. If the solution vector is not in

the +Rﬁeld, the situation is even worse, requiring quantum

tomography to extract the amplitudes of the quantum states

in superposition. This problem is discussed in more depth in

Section IV.

It is possible, however, to apply operations to the solution

in superposition. One of such operations is the SWAP test

[7], which can be used to extract the distance between two

quantum states, encoding it in the amplitudes of a single

qubit as will be shown in Section III. In a similar context

[5], the SWAP test has been used in the past to verify the

correctness of proof of concept implementations of HHL.

When the implementation is correct, the distance with the

arXiv:2210.12707v1 [quant-ph] 23 Oct 2022

Fig. 1: The proposed quantum implementation allows to use

the SWAP test to calculate the distance between the weights

state —state containing the solution to the optimal weights

vector— |Wiand some other selected test state |tW i. The

search space is then reduced from the hyperspace of all

possible solutions to the surface of the hyper-sphere with

center in |tW iand radius d.

foreknown solution state is equal to zero. This is, however,

not useful unless one is able to calculate the solution —

classically— beforehand, and this solution is implementable

as a quantum superposition.

This paper proposes that, instead of being used to compare

the circuit’s preparation with the solution (which implies

knowing the solution in advance), the SWAP test be performed

against several, ﬁxed, “reference” states, and the information

obtained is used to reduce the computational complexity of the

classical search for the optimal set of weights in the training

process of Binary neural networks (BNN).

BNNs are promising as possible computationally efﬁcient

implementation to neural networks. BNNs are the speciﬁc

realization of quantized NN in which the weights will be

limited to one of two values 1 or -1, or for the implementation

targeted in this paper, 0 or 1. BNNs are also promising as a

possible candidate to the application of HHL in combination

with the SWAP test to extract information for two reasons:

(1) the solution is promised to be in the +Rﬁeld, and (2) the

vector to compare against is easy to implement as a quantum

superposition state.

Figure 1 illustrates this approach. As this paper will show,

given a “reference” test, the SWAP test can provide the

distance from the solution to the this state, and hence, reduce

the classical search of the solution to the hypersphere of

distance dfrom the reference state.

II. BACKGROUND AND RELATED WORK

A. Binary Neural Networks

Machine learning continues to make rapid advances in its

ability to accurately model complex relationships in high-

dimensional data. A prime example is the application of deep

artiﬁcial neural networks (ANNs) for classiﬁcation of objects

in images. From automatic mail sorting to self-driving cars,

this capability is a key enabler to automate some of the most

critical aspects of human decision making. However, the size

and complexity of ANNs performing these operations appears

to grow exponentially with linear gains in their performance

[8]. This has led to top-performing models with billions of

parameters that require massive computational resources to

train and evaluate. Consequently, there is a push to reduce the

size of ANN models to enable their deployment on resource-

constrained hardware such as edge devices. While several

methods exist, one of the most popular approaches is to reduce

the precision of ANN computations and parameters. In the

extreme case of binary neural networks (BNN), the ANN

weights and activation functions are reduced from 32- or 64-

bit ﬂoating point values to 1-bit representations (e.g. -1 and 1,

or 0 and 1).

BNNs are usually trained using quantization-aware meth-

ods, where a full-precision model is quantized during the

forward pass and then updated using backpropagation with

approximations for the gradient of the quantization operator.

Given an ANN model Πand a loss function L, a weight

connecting neuron iin layer l−1and neuron jin layer l

gets updated by gradient descent:

∆wl

i,j =−α∂L {q(Π)}

∂wl

i,j

(1)

where qis the quantization function and αis the learning

rate. The quantization function can take several forms. The

simplest one is rounding the corresponding value (weights

or activations) to the nearest valid quantization level. Dur-

ing training, the forward and backpropagation steps require

matrix-vector multiplications at each layer. Here, we assume

that the computation of the binary activation function, which

requires only a scalar comparison operation, is much less

costly than the matrix-vector product. Thus, the time com-

plexity approximately scales as OL

l=1

NlNl−1, where Lis

the total number of layers and Nlis the number of neurons

in layer l. This can be bounded from above as OkLN2

max,

where Nmax is the maximum layer size and kis the number

of training iterations. Generally the complexity of kis O(1/),

where is the loss level. If information about the solution is

known a priori, then the gradient descent can be constrained,

possibly leading to a smaller k. For example, if the Euclidean

distance dto the solution w∗from a test point cis known,

then the search space becomes restricted to the hypersphere

centered at c. This is demonstrated in Figure 2. Note that

the intersection of the hypersphere with the vertices of the

hypercube form a hyperplane of form

c·w=N−d2

1(2)

Later, we will show that restricting the search space of an

N-dimensional linear least squares problem to an N−1-

dimensional hyperplane, and within the boundaries of the

1Respecting the ML and QC notations, wand ccorrespond to |Wiand

|tW irespectively in Figure 1 and other sections of this paper.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Acceleratingthetrainingofsingle-layerbinaryneuralnetworksusingtheHHLquantumalgorithmSoniaLopezAlarcon,CoryMerkel,MartinHoffnagle,SabrinaLyDepartmentofComputerEngineeringRochesterInstituteofTechnologyRochesterNY,USAfslaeec,cemeec,mah5414,sjl2178g[rit.eduAlejandroPozas-KerstjensInstituteofMathematical...

展开>> 收起<<

Accelerating the training of single-layer binary neural networks using the HHL quantum algorithm Sonia Lopez Alarcon Cory Merkel Martin Hoffnagle Sabrina Ly.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Accelerating the training of single-layer binary neural networks using the HHL quantum algorithm Sonia Lopez Alarcon Cory Merkel Martin Hoffnagle Sabrina Ly

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: