Preprint. Under review. QUARK A G RADIENT -FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS

2025-05-06 0 0 7.32MB 19 页 10玖币
侵权投诉
Preprint. Under review.
QUARK: A GRADIENT-FREE QUANTUM LEARNING
FRAMEWORK FOR CLASSIFICATION TASKS
Zhihao Zhang
Carnegie Mellon University
zhihaoz3@cs.cmu.edu
Zhuoming Chen
Tsinghua University
chen-zm19@mails.tsinghua.edu.cn
Heyang Huang
Microsoft
rahuang@microsoft.com
Zhihao Jia
Carnegie Mellon University
zhihao@cmu.edu
ABSTRACT
As more practical and scalable quantum computers emerge, much attention has
been focused on realizing quantum supremacy in machine learning. Existing
quantum ML methods either (1) embed a classical model into a target Hamil-
tonian to enable quantum optimization or (2) represent a quantum model using
variational quantum circuits and apply classical gradient-based optimization. The
former method leverages the power of quantum optimization but only supports
simple ML models, while the latter provides flexibility in model design but re-
lies on gradient calculation, resulting in barren plateau (i.e., gradient vanishing)
and frequent classical-quantum interactions. To address the limitations of exist-
ing quantum ML methods, we introduce Quark, a gradient-free quantum learn-
ing framework that optimizes quantum ML models using quantum optimization.
Quark does not rely on gradient computation and therefore avoids barren plateau
and frequent classical-quantum interactions. In addition, Quark can support more
general ML models than prior quantum ML methods and achieves a dataset-size-
independent optimization complexity. Theoretically, we prove that Quark can out-
perform classical gradient-based methods by reducing model query complexity for
highly non-convex problems; empirically, evaluations on the Edge Detection and
Tiny-MNIST tasks show that Quark can support complex ML models and signif-
icantly reduce the number of measurements needed for discovering near-optimal
weights for these tasks.
1 INTRODUCTION
Quantum computing provides a new computational paradigm to achieve exponential speedups
over classical counterparts for various tasks, such as cryptography (Shor,1994), scientific simu-
lation (Tazhigulov et al.,2022), and data analytics (Arute et al.,2019). A key advantage of quantum
computing is its ability to entangle multiple quantum bits, called qubits, allowing nqubits to encode
a2n-dimensional vector, while encoding this vector in classical computing requires 2nbits.
Inspired by this potential, recent work (Jaderberg et al.,2022;Macaluso et al.,2020b;Torta et al.,
2021;Kapoor et al.,2016;Bauer et al.,2020;Farhi & Neven,2018a;Schuld et al.,2014;Cong et al.,
2019b) has focused on realizing quantum speedups over classical algorithms in the field of super-
vised learning. Existing quantum ML work can be divided into two categories: classical model with
quantum optimization (CMQO) and quantum model with classical optimization (QMCO). First,
CMQO methods embed a classical ML model jointly with the optimization problem into a tar-
get Hamiltonian and optimize the model using quantum adiabatic evolution (QAE) (Finnila et al.,
1994) or quantum approximate optimization algorithm (QAOA) (Farhi et al.,2014;Torta et al.,
2021). As the transition between a classical model and the target Hamiltonian only applies to low-
order polynomial activations (see Figure 2), CMQO methods do not support ML models with non-
linear activations that cannot be represented in low-order polynomial (e.g., ReLU). Second, QMCO
*These authors contributed equally to this work
1
arXiv:2210.01311v1 [quant-ph] 2 Oct 2022
Preprint. Under review.
RW
RO
H
UD
UM
Us
UΨ0
Us
UΨ0
Ucomb Ucomb
M
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
|0
UD
RD
preparation
Ψ0
Amplitude amplification Measure
Amplitude amplification
K Parallel Datasets
Weights
Prob
H
H
H
H
M
M
M
M
Weights
Prob
Weights
Prob
Figure 1: An overview of the Quark optimization framework. Each horizontal line indicates a
qubit, and each box on these lines represents one or multiple quantum gates applied on these
qubits. Quark’s optimization pipeline includes three stages: (1) Ψ0preparation, which initializes
RW, RD, ROand performs model’s forward processing UM, (2) amplitude amplification using a
Grover-based algorithm, and (3) weights measurement. Quark uses K-parallel datasets (KPD) to
maximize the probability of observing highly accurate weights in each measurement. Darker bars
in the probability plots denote weights with higher accuracies.
methods optimize variational quantum models 1by iteratively performing gradient descent using
classical optimizers. QMCO methods are fundamentally limited by barren plateau (i.e., gradient
vanishing (McClean et al.,2018)) and the high cost of frequent quantum-classical interactions.
To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free
quantum learning framework for classification tasks that optimizes quantum models with quantum
optimization (QMQO). Figure 1shows an overview of Quark. A key idea behind Quark is entangling
the weight 2of an ML model (i.e., the RWregister in Figure 1) and the encoded dataset (i.e., the
RDregister in Figure 1) in a quantum state, where model weights that achieve optimal classification
accuracy on the training dataset can be observed with the highest probabilities in a measurement.
Therefore, users can obtain highly accurate model weights by directly measuring the updated RW
weight register. To maximize the probability of observing optimal weights, we introduce two key
techniques.
Amplitude amplification. Quark uses a Grover-based mechanism to iteratively update the prob-
ability distribution of weights based on their training accuracies. As a result, the probability of
observing weights with higher accuracy increases after each Grover iteration, as shown in Figure 1.
K-parallel datasets (KPD). Applying amplitude amplification on one dataset results in a linear
amplification scenario where the measuring probability of each weight is proportional to its training
accuracy J(wi). We further introduce K-parallel datasets, a technique to enable exponential am-
plification. Specifically, by entangling kidentical training datasets with model weights in parallel
(using k×RD, as shown in Figure 1), the probability of observing weight wiin a measurement is
proportional to J(wi)k. Therefore, as kincreases, the optimized probability distribution of weights
gradually converges to the optimal weights.
Compared with CMQO methods, Quark provides more flexibility in model design by composing
models directly on quantum circuits and therefore supports a broader range of ML models. Com-
pared with QMCO methods, Quark does not require gradient calculation and therefore does not
suffer from barren plateau. Quark avoids frequent classical-quantum interactions by realizing both
model design and optimization fully on quantum. Besides, by using basis encoding for the train-
ing dataset, Quark supports non-linear operations (e.g., ReLU) in its model architecture, and the
optimization complexity is independent of the training dataset size.
1They are also known as variational quantum circuits (VQC)-based models in the quantum literature.
2Throughout the paper, we use the term weight to refer to the set of all trainable parameters of a model.
2
Preprint. Under review.
Quantum ComputingClassical Computing Immune to!
Barren-Plateau
Support Diverse
ML models
Quantum Model!
Classical Optimizer!
Data
VQC Model
Amplitude Encoding
W=WηJ(W)
J(W)
Gradient-based!
Optimizer
Limited to linear
operations
Quark!
(Quantum Model!
Quantum Optimizer)
Data
Basis Encoding
Quark Model
Grover-based!
Optimizer
Data
Classical Model
Hamiltonian !
Embedding
Quantum Annealing!
based optimizer
Limited to low-
order polynomial
activations
Classical Model!
Quantum Optimizer!
Figure 2: Comparison between CMQO, QMCO, and Quark (QMQO).
Theoretically, we compare model query complexity3between Quark and gradient-based methods on
a balanced C-way classification task, and prove that Quark can outperform gradient-based methods
by reducing model query complexity for highly non-convex problems. In addition, we prove that
using K-parallel datasets can further reduce model query complexity under certain circumstances.
Simulations on two tasks (i.e., Edge Detection and Tiny-MNIST) show that Quark supports complex
ML models, which can include quantum convolution, pooling, fully connected, and ReLU layers.
In addition, Quark can significantly reduce the number of measurements needed for discovering a
near-optimal weight by applying amplitude amplification and KPD.
Contributions. This paper makes the following contributions:
We propose Quark, a gradient-free quantum learning framework for classification tasks that
optimizes quantum ML models with quantum optimization. Quark avoids barren plateau
and frequent classical-quantum interactions, supports more general models than prior quan-
tum ML frameworks, and achieves a dataset-size-independent optimization complexity.
Theoretically, we prove that Quark can outperform gradient-based methods by reducing
model query complexity for highly non-convex problems and that using KPD can further
reduce model query complexity.
Empirically, we show that Quark can support complex ML models and significantly reduce
the number of measurements needed for discovering a near-optimal weight for the Edge
Detection and Tiny-MNIST tasks.
2 RELATED WORK
Figure 2compares Quark with existing quantum ML approaches.
2.1 CLASSICAL MODEL WITH QUANTUM OPTIMIZATION
Existing CMQO methods aim at solving classical ML problems with quantum optimization tech-
niques by leveraging the advantage of quantum parallelism (Nielsen & Chuang,2002). Based on
the well-established algorithmic foundation in quantum annealing (Finnila et al.,1994;Kadowaki
& Nishimori,1998;Brooke et al.,1999;Santoro et al.,2002;Santoro & Tosatti,2006) and adiabatic
quantum computing (Farhi et al.,2001;Albash & Lidar,2018), prior work (Denil & De Freitas,
2011;Dumoulin et al.,2014;Adachi & Henderson,2015) attempts for the quantum restricted Boltz-
mann machine (RBM) by formulating RBM as an Ising model (Cipra,1987). Inspired by the quan-
tum approximate optimization algorithm (QAOA) (Farhi et al.,2014), Torta et al. (2021) embeds
a single binary perceptron layer into a target Hamiltonian to search for optimal weights. However,
CMQO methods are limited by the locality restriction of the target Hamiltonian and can only embed
3Number of model forward being called
3
Preprint. Under review.
models with low-order polynomial activations (e.g., square). This limitation prevents CMQO meth-
ods from supporting practical deep learning architectures, which generally contain non-polynomial
activations such as ReLU and sigmoid.
Similar to Quark, Kapoor et al. (2016) also uses Grover’s algorithm to find a hyperplane that can
perfectly separate the training dataset. However, this method only applies to an idealistic setup
where a hyperplane with perfect classification exists in its search space. Besides, the method cannot
adapt to generic model architectures other than single-layer perceptrons.
2.2 QUANTUM MODEL WITH CLASSICAL OPTIMIZATION
Motivated by the recent advances in variational quantum algorithms (VQAs) (Cerezo et al.,2021),
QMCO methods use variational quantum circuits (VQC) (Benedetti et al.,2019) to represent the
trainable parameters of an ML model. Havl´
ıˇ
cek et al. (2019); Schuld & Killoran (2019) use VQC
as a variational feature map to reproduce linear support vector machines (SVM) and kernel methods
on quantum circuits, which can outperform classical counterparts under certain circumstances (Liu
et al.,2021).
Besides conventional ML methods, recent work has also explored the feasibility of classical neu-
ral networks on quantum circuits (Massoli et al.,2022). Farhi & Neven (2018b); Macaluso et al.
(2020a); Killoran et al. (2019) use VQC as building blocks for their quantum perceptron models
with a classical gradient-based optimizer. Quantum dissipative neural network (Beer et al.,2020)
(QDNN) and quantum convolutional neural network (Cong et al.,2019a) (QCNN), on the other
hand, move a step forward towards more complicated neural architectures. QDNN enlarges its
model space by applying unitary operators on both the input and output qubits, while QCNN uses a
measurement-controlled operation to enable non-linear operations. However, McClean et al. (2018)
shows that the barren plateau phenomenon commonly exists in VQC-based methods, where gradi-
ents vanish exponentially with the model size. Though Beer et al. (2020) claims to design a VQC
model immune to barren plateau, Sharma et al. (2022) contradicts such claim with analytical proof.
Though Du et al. (2021) uses a Grover algorithm as part of their method, they still require VQC as
their model building blocks that require gradients update.
Besides, due to amplitude-based data encoding, VQC-based methods in general suffer from a linear
dependency with respect to dataset size in terms of model query complexity during training. Another
drawback for amplitude encoding is that due to the unitary constraint of quantum transformations,
non-linear operations are hard to implement for VQC-based methods. In contrast, our method uses
basis encoding that concerns only qubits state transformation rather than amplitude transformation,
which enables more efficient model query complexity and more general non-linear transformations.
3PRELIMINARIES
3.1 NOTATIONS
Let D={(xi, yi)}iNdenote a training dataset, where xi∈ {0,1}dxis the binarized fea-
ture vector associated with the i-th sample, and yi∈ {0,1}dyis its label. Let ˆy=f(w, ˆx) :
{0,1}dw× {0,1}dx→ {0,1}dydenote our model parameterized by w∈ {0,1}dw. Given an
objective l(ˆy, y), our goal is to find a near-optimal wthat minimizes/maximizes the overall objec-
tive J(w) = 1
NP(xi,yi)∈D l(f(w, xi), yi). We focus on classification tasks and use the objective
l(ˆy, y) = (ˆy=y). We use | · | to denote the cardinality of a set and absolute value of a scalar,
and use k · k2and k · kto denote the L2-norm and infinity norm of a vector. Finally, we use to
denote tensor product, and use ¬,,, and to denote NEGATE, XOR, AND, and OR in logical
expressions, respectively.
3.2 QUANTUM BASICS
A bit in the quantum regime, called a qubit, is represented by a super-position of |0iand |1i, which
is formally defined as |zi=α|0i+eβ|1i, where αand βare the amplitudes, and eis the
relative phase. Furthermore, the rule also enforces hz|zi= 1 where hz|is the conjugate transpose
of |zi. In an n-qubit system, a quantum state is represented as a superposition of 2nbasis states.
For computational simplicity, we will be using the basis states that are spanned by {|0i,|1i}nand
denote the superposition of a n-qubits state as |Ψi=P2n1
i=0 1
2n|iiwhere |iiis the corresponding
computational basis. We thus use |wii ∈ {|0i,|1i}dwto denote weight basis state, |xj, yjifor the
entangled data basis state where |xji ∈ {|0i,|1i}dxis the feature and |yji ∈ {|0i,|1i}dythe label.
4
摘要:

Preprint.Underreview.QUARK:AGRADIENT-FREEQUANTUMLEARNINGFRAMEWORKFORCLASSIFICATIONTASKSZhihaoZhangCarnegieMellonUniversityzhihaoz3@cs.cmu.eduZhuomingChenTsinghuaUniversitychen-zm19@mails.tsinghua.edu.cnHeyangHuangMicrosoftrahuang@microsoft.comZhihaoJiaCarnegieMellonUniversityzhihao@cmu.eduABSTRACT...

展开>> 收起<<
Preprint. Under review. QUARK A G RADIENT -FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:19 页 大小:7.32MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注