Preprint. Under review. QUARK A G RADIENT -FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS

2025-05-06 0 0 7.32MB 19 页 10玖币

侵权投诉

Preprint. Under review.

QUARK: A GRADIENT-FREE QUANTUM LEARNING

FRAMEWORK FOR CLASSIFICATION TASKS

Zhihao Zhang∗

Carnegie Mellon University

zhihaoz3@cs.cmu.edu

Zhuoming Chen∗

Tsinghua University

chen-zm19@mails.tsinghua.edu.cn

Heyang Huang

Microsoft

rahuang@microsoft.com

Zhihao Jia

Carnegie Mellon University

zhihao@cmu.edu

ABSTRACT

As more practical and scalable quantum computers emerge, much attention has

been focused on realizing quantum supremacy in machine learning. Existing

quantum ML methods either (1) embed a classical model into a target Hamil-

tonian to enable quantum optimization or (2) represent a quantum model using

variational quantum circuits and apply classical gradient-based optimization. The

former method leverages the power of quantum optimization but only supports

simple ML models, while the latter provides ﬂexibility in model design but re-

lies on gradient calculation, resulting in barren plateau (i.e., gradient vanishing)

and frequent classical-quantum interactions. To address the limitations of exist-

ing quantum ML methods, we introduce Quark, a gradient-free quantum learn-

ing framework that optimizes quantum ML models using quantum optimization.

Quark does not rely on gradient computation and therefore avoids barren plateau

and frequent classical-quantum interactions. In addition, Quark can support more

general ML models than prior quantum ML methods and achieves a dataset-size-

independent optimization complexity. Theoretically, we prove that Quark can out-

perform classical gradient-based methods by reducing model query complexity for

highly non-convex problems; empirically, evaluations on the Edge Detection and

Tiny-MNIST tasks show that Quark can support complex ML models and signif-

icantly reduce the number of measurements needed for discovering near-optimal

weights for these tasks.

1 INTRODUCTION

Quantum computing provides a new computational paradigm to achieve exponential speedups

over classical counterparts for various tasks, such as cryptography (Shor,1994), scientiﬁc simu-

lation (Tazhigulov et al.,2022), and data analytics (Arute et al.,2019). A key advantage of quantum

computing is its ability to entangle multiple quantum bits, called qubits, allowing nqubits to encode

a2n-dimensional vector, while encoding this vector in classical computing requires 2nbits.

Inspired by this potential, recent work (Jaderberg et al.,2022;Macaluso et al.,2020b;Torta et al.,

2021;Kapoor et al.,2016;Bauer et al.,2020;Farhi & Neven,2018a;Schuld et al.,2014;Cong et al.,

2019b) has focused on realizing quantum speedups over classical algorithms in the ﬁeld of super-

vised learning. Existing quantum ML work can be divided into two categories: classical model with

quantum optimization (CMQO) and quantum model with classical optimization (QMCO). First,

CMQO methods embed a classical ML model jointly with the optimization problem into a tar-

get Hamiltonian and optimize the model using quantum adiabatic evolution (QAE) (Finnila et al.,

1994) or quantum approximate optimization algorithm (QAOA) (Farhi et al.,2014;Torta et al.,

2021). As the transition between a classical model and the target Hamiltonian only applies to low-

order polynomial activations (see Figure 2), CMQO methods do not support ML models with non-

linear activations that cannot be represented in low-order polynomial (e.g., ReLU). Second, QMCO

*These authors contributed equally to this work

arXiv:2210.01311v1 [quant-ph] 2 Oct 2022

Preprint. Under review.

Us⊤

UΨ0

Us⊤

UΨ0

⋯

Ucomb Ucomb

|0⟩

k×

preparation

Ψ0

Amplitude ampliﬁcation Measure

Amplitude ampliﬁcation

K Parallel Datasets

Weights

Prob

Weights

Prob

Weights

Prob

Figure 1: An overview of the Quark optimization framework. Each horizontal line indicates a

qubit, and each box on these lines represents one or multiple quantum gates applied on these

qubits. Quark’s optimization pipeline includes three stages: (1) Ψ0preparation, which initializes

RW, RD, ROand performs model’s forward processing UM, (2) amplitude ampliﬁcation using a

Grover-based algorithm, and (3) weights measurement. Quark uses K-parallel datasets (KPD) to

maximize the probability of observing highly accurate weights in each measurement. Darker bars

in the probability plots denote weights with higher accuracies.

methods optimize variational quantum models 1by iteratively performing gradient descent using

classical optimizers. QMCO methods are fundamentally limited by barren plateau (i.e., gradient

vanishing (McClean et al.,2018)) and the high cost of frequent quantum-classical interactions.

To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free

quantum learning framework for classiﬁcation tasks that optimizes quantum models with quantum

optimization (QMQO). Figure 1shows an overview of Quark. A key idea behind Quark is entangling

the weight 2of an ML model (i.e., the RWregister in Figure 1) and the encoded dataset (i.e., the

RDregister in Figure 1) in a quantum state, where model weights that achieve optimal classiﬁcation

accuracy on the training dataset can be observed with the highest probabilities in a measurement.

Therefore, users can obtain highly accurate model weights by directly measuring the updated RW

weight register. To maximize the probability of observing optimal weights, we introduce two key

techniques.

Amplitude ampliﬁcation. Quark uses a Grover-based mechanism to iteratively update the prob-

ability distribution of weights based on their training accuracies. As a result, the probability of

observing weights with higher accuracy increases after each Grover iteration, as shown in Figure 1.

K-parallel datasets (KPD). Applying amplitude ampliﬁcation on one dataset results in a linear

ampliﬁcation scenario where the measuring probability of each weight is proportional to its training

accuracy J(wi). We further introduce K-parallel datasets, a technique to enable exponential am-

pliﬁcation. Speciﬁcally, by entangling kidentical training datasets with model weights in parallel

(using k×RD, as shown in Figure 1), the probability of observing weight wiin a measurement is

proportional to J(wi)k. Therefore, as kincreases, the optimized probability distribution of weights

gradually converges to the optimal weights.

Compared with CMQO methods, Quark provides more ﬂexibility in model design by composing

models directly on quantum circuits and therefore supports a broader range of ML models. Com-

pared with QMCO methods, Quark does not require gradient calculation and therefore does not

suffer from barren plateau. Quark avoids frequent classical-quantum interactions by realizing both

model design and optimization fully on quantum. Besides, by using basis encoding for the train-

ing dataset, Quark supports non-linear operations (e.g., ReLU) in its model architecture, and the

optimization complexity is independent of the training dataset size.

1They are also known as variational quantum circuits (VQC)-based models in the quantum literature.

2Throughout the paper, we use the term weight to refer to the set of all trainable parameters of a model.

Preprint. Under review.

Quantum ComputingClassical Computing Immune to!

Barren-Plateau

Support Diverse

ML models

Quantum Model!

Classical Optimizer!

Data

VQC Model

Amplitude Encoding

W=W−η∇J(W)

∇J(W)

Gradient-based!

Optimizer

Limited to linear

operations

Quark!

(Quantum Model!

Quantum Optimizer)

Data

Basis Encoding

Quark Model

Grover-based!

Optimizer

Data

Classical Model

Hamiltonian !

Embedding

Quantum Annealing!

based optimizer

Limited to low-

order polynomial

activations

Classical Model!

Quantum Optimizer!

Figure 2: Comparison between CMQO, QMCO, and Quark (QMQO).

Theoretically, we compare model query complexity3between Quark and gradient-based methods on

a balanced C-way classiﬁcation task, and prove that Quark can outperform gradient-based methods

by reducing model query complexity for highly non-convex problems. In addition, we prove that

using K-parallel datasets can further reduce model query complexity under certain circumstances.

Simulations on two tasks (i.e., Edge Detection and Tiny-MNIST) show that Quark supports complex

ML models, which can include quantum convolution, pooling, fully connected, and ReLU layers.

In addition, Quark can signiﬁcantly reduce the number of measurements needed for discovering a

near-optimal weight by applying amplitude ampliﬁcation and KPD.

Contributions. This paper makes the following contributions:

• We propose Quark, a gradient-free quantum learning framework for classiﬁcation tasks that

optimizes quantum ML models with quantum optimization. Quark avoids barren plateau

and frequent classical-quantum interactions, supports more general models than prior quan-

tum ML frameworks, and achieves a dataset-size-independent optimization complexity.

• Theoretically, we prove that Quark can outperform gradient-based methods by reducing

model query complexity for highly non-convex problems and that using KPD can further

reduce model query complexity.

• Empirically, we show that Quark can support complex ML models and signiﬁcantly reduce

the number of measurements needed for discovering a near-optimal weight for the Edge

Detection and Tiny-MNIST tasks.

2 RELATED WORK

Figure 2compares Quark with existing quantum ML approaches.

2.1 CLASSICAL MODEL WITH QUANTUM OPTIMIZATION

Existing CMQO methods aim at solving classical ML problems with quantum optimization tech-

niques by leveraging the advantage of quantum parallelism (Nielsen & Chuang,2002). Based on

the well-established algorithmic foundation in quantum annealing (Finnila et al.,1994;Kadowaki

& Nishimori,1998;Brooke et al.,1999;Santoro et al.,2002;Santoro & Tosatti,2006) and adiabatic

quantum computing (Farhi et al.,2001;Albash & Lidar,2018), prior work (Denil & De Freitas,

2011;Dumoulin et al.,2014;Adachi & Henderson,2015) attempts for the quantum restricted Boltz-

mann machine (RBM) by formulating RBM as an Ising model (Cipra,1987). Inspired by the quan-

tum approximate optimization algorithm (QAOA) (Farhi et al.,2014), Torta et al. (2021) embeds

a single binary perceptron layer into a target Hamiltonian to search for optimal weights. However,

CMQO methods are limited by the locality restriction of the target Hamiltonian and can only embed

3Number of model forward being called

Preprint. Under review.

models with low-order polynomial activations (e.g., square). This limitation prevents CMQO meth-

ods from supporting practical deep learning architectures, which generally contain non-polynomial

activations such as ReLU and sigmoid.

Similar to Quark, Kapoor et al. (2016) also uses Grover’s algorithm to ﬁnd a hyperplane that can

perfectly separate the training dataset. However, this method only applies to an idealistic setup

where a hyperplane with perfect classiﬁcation exists in its search space. Besides, the method cannot

adapt to generic model architectures other than single-layer perceptrons.

2.2 QUANTUM MODEL WITH CLASSICAL OPTIMIZATION

Motivated by the recent advances in variational quantum algorithms (VQAs) (Cerezo et al.,2021),

QMCO methods use variational quantum circuits (VQC) (Benedetti et al.,2019) to represent the

trainable parameters of an ML model. Havl´

ıˇ

cek et al. (2019); Schuld & Killoran (2019) use VQC

as a variational feature map to reproduce linear support vector machines (SVM) and kernel methods

on quantum circuits, which can outperform classical counterparts under certain circumstances (Liu

et al.,2021).

Besides conventional ML methods, recent work has also explored the feasibility of classical neu-

ral networks on quantum circuits (Massoli et al.,2022). Farhi & Neven (2018b); Macaluso et al.

(2020a); Killoran et al. (2019) use VQC as building blocks for their quantum perceptron models

with a classical gradient-based optimizer. Quantum dissipative neural network (Beer et al.,2020)

(QDNN) and quantum convolutional neural network (Cong et al.,2019a) (QCNN), on the other

hand, move a step forward towards more complicated neural architectures. QDNN enlarges its

model space by applying unitary operators on both the input and output qubits, while QCNN uses a

measurement-controlled operation to enable non-linear operations. However, McClean et al. (2018)

shows that the barren plateau phenomenon commonly exists in VQC-based methods, where gradi-

ents vanish exponentially with the model size. Though Beer et al. (2020) claims to design a VQC

model immune to barren plateau, Sharma et al. (2022) contradicts such claim with analytical proof.

Though Du et al. (2021) uses a Grover algorithm as part of their method, they still require VQC as

their model building blocks that require gradients update.

Besides, due to amplitude-based data encoding, VQC-based methods in general suffer from a linear

dependency with respect to dataset size in terms of model query complexity during training. Another

drawback for amplitude encoding is that due to the unitary constraint of quantum transformations,

non-linear operations are hard to implement for VQC-based methods. In contrast, our method uses

basis encoding that concerns only qubits state transformation rather than amplitude transformation,

which enables more efﬁcient model query complexity and more general non-linear transformations.

3PRELIMINARIES

3.1 NOTATIONS

Let D={(xi, yi)}i∈Ndenote a training dataset, where xi∈ {0,1}dxis the binarized fea-

ture vector associated with the i-th sample, and yi∈ {0,1}dyis its label. Let ˆy=f(w, ˆx) :

{0,1}dw× {0,1}dx→ {0,1}dydenote our model parameterized by w∈ {0,1}dw. Given an

objective l(ˆy, y), our goal is to ﬁnd a near-optimal w∗that minimizes/maximizes the overall objec-

tive J(w) = 1

NP(xi,yi)∈D l(f(w, xi), yi). We focus on classiﬁcation tasks and use the objective

l(ˆy, y) = (ˆy=y). We use | · | to denote the cardinality of a set and absolute value of a scalar,

and use k · k2and k · k∞to denote the L2-norm and inﬁnity norm of a vector. Finally, we use ⊗to

denote tensor product, and use ¬,⊕,∧, and ∨to denote NEGATE, XOR, AND, and OR in logical

expressions, respectively.

3.2 QUANTUM BASICS

A bit in the quantum regime, called a qubit, is represented by a super-position of |0iand |1i, which

is formally deﬁned as |zi=α|0i+eiφβ|1i, where αand βare the amplitudes, and eiφ is the

relative phase. Furthermore, the rule also enforces hz|zi= 1 where hz|is the conjugate transpose

of |zi. In an n-qubit system, a quantum state is represented as a superposition of 2nbasis states.

For computational simplicity, we will be using the basis states that are spanned by {|0i,|1i}nand

denote the superposition of a n-qubits state as |Ψi=P2n−1

i=0 1

√2n|iiwhere |iiis the corresponding

computational basis. We thus use |wii ∈ {|0i,|1i}dwto denote weight basis state, |xj, yjifor the

entangled data basis state where |xji ∈ {|0i,|1i}dxis the feature and |yji ∈ {|0i,|1i}dythe label.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Preprint.Underreview.QUARK:AGRADIENT-FREEQUANTUMLEARNINGFRAMEWORKFORCLASSIFICATIONTASKSZhihaoZhangCarnegieMellonUniversityzhihaoz3@cs.cmu.eduZhuomingChenTsinghuaUniversitychen-zm19@mails.tsinghua.edu.cnHeyangHuangMicrosoftrahuang@microsoft.comZhihaoJiaCarnegieMellonUniversityzhihao@cmu.eduABSTRACT...

展开>> 收起<<

Preprint. Under review. QUARK A G RADIENT -FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprint. Under review. QUARK A G RADIENT -FREE QUANTUM LEARNING FRAMEWORK FOR CLASSIFICATION TASKS

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: