
Preprint. Under review.
models with low-order polynomial activations (e.g., square). This limitation prevents CMQO meth-
ods from supporting practical deep learning architectures, which generally contain non-polynomial
activations such as ReLU and sigmoid.
Similar to Quark, Kapoor et al. (2016) also uses Grover’s algorithm to find a hyperplane that can
perfectly separate the training dataset. However, this method only applies to an idealistic setup
where a hyperplane with perfect classification exists in its search space. Besides, the method cannot
adapt to generic model architectures other than single-layer perceptrons.
2.2 QUANTUM MODEL WITH CLASSICAL OPTIMIZATION
Motivated by the recent advances in variational quantum algorithms (VQAs) (Cerezo et al.,2021),
QMCO methods use variational quantum circuits (VQC) (Benedetti et al.,2019) to represent the
trainable parameters of an ML model. Havl´
ıˇ
cek et al. (2019); Schuld & Killoran (2019) use VQC
as a variational feature map to reproduce linear support vector machines (SVM) and kernel methods
on quantum circuits, which can outperform classical counterparts under certain circumstances (Liu
et al.,2021).
Besides conventional ML methods, recent work has also explored the feasibility of classical neu-
ral networks on quantum circuits (Massoli et al.,2022). Farhi & Neven (2018b); Macaluso et al.
(2020a); Killoran et al. (2019) use VQC as building blocks for their quantum perceptron models
with a classical gradient-based optimizer. Quantum dissipative neural network (Beer et al.,2020)
(QDNN) and quantum convolutional neural network (Cong et al.,2019a) (QCNN), on the other
hand, move a step forward towards more complicated neural architectures. QDNN enlarges its
model space by applying unitary operators on both the input and output qubits, while QCNN uses a
measurement-controlled operation to enable non-linear operations. However, McClean et al. (2018)
shows that the barren plateau phenomenon commonly exists in VQC-based methods, where gradi-
ents vanish exponentially with the model size. Though Beer et al. (2020) claims to design a VQC
model immune to barren plateau, Sharma et al. (2022) contradicts such claim with analytical proof.
Though Du et al. (2021) uses a Grover algorithm as part of their method, they still require VQC as
their model building blocks that require gradients update.
Besides, due to amplitude-based data encoding, VQC-based methods in general suffer from a linear
dependency with respect to dataset size in terms of model query complexity during training. Another
drawback for amplitude encoding is that due to the unitary constraint of quantum transformations,
non-linear operations are hard to implement for VQC-based methods. In contrast, our method uses
basis encoding that concerns only qubits state transformation rather than amplitude transformation,
which enables more efficient model query complexity and more general non-linear transformations.
3PRELIMINARIES
3.1 NOTATIONS
Let D={(xi, yi)}i∈Ndenote a training dataset, where xi∈ {0,1}dxis the binarized fea-
ture vector associated with the i-th sample, and yi∈ {0,1}dyis its label. Let ˆy=f(w, ˆx) :
{0,1}dw× {0,1}dx→ {0,1}dydenote our model parameterized by w∈ {0,1}dw. Given an
objective l(ˆy, y), our goal is to find a near-optimal w∗that minimizes/maximizes the overall objec-
tive J(w) = 1
NP(xi,yi)∈D l(f(w, xi), yi). We focus on classification tasks and use the objective
l(ˆy, y) = (ˆy=y). We use | · | to denote the cardinality of a set and absolute value of a scalar,
and use k · k2and k · k∞to denote the L2-norm and infinity norm of a vector. Finally, we use ⊗to
denote tensor product, and use ¬,⊕,∧, and ∨to denote NEGATE, XOR, AND, and OR in logical
expressions, respectively.
3.2 QUANTUM BASICS
A bit in the quantum regime, called a qubit, is represented by a super-position of |0iand |1i, which
is formally defined as |zi=α|0i+eiφβ|1i, where αand βare the amplitudes, and eiφ is the
relative phase. Furthermore, the rule also enforces hz|zi= 1 where hz|is the conjugate transpose
of |zi. In an n-qubit system, a quantum state is represented as a superposition of 2nbasis states.
For computational simplicity, we will be using the basis states that are spanned by {|0i,|1i}nand
denote the superposition of a n-qubits state as |Ψi=P2n−1
i=0 1
√2n|iiwhere |iiis the corresponding
computational basis. We thus use |wii ∈ {|0i,|1i}dwto denote weight basis state, |xj, yjifor the
entangled data basis state where |xji ∈ {|0i,|1i}dxis the feature and |yji ∈ {|0i,|1i}dythe label.
4