2
algorithms based on the quantum OC-SVM and quan-
tum PCA that could achieve exponential speedup were
proposed [
5
]. However, these quantum algorithms re-
quire expensive subroutines, such as the quantum linear
solver [
38
] and matrix exponentiation [
4
] that are not suit-
able for Noisy Intermediate-Scale Quantum (NISQ) com-
puting [
39
]. In contrast, training a shallow-depth PQC
with a classical optimizer is regarded as a promising ap-
proach for near-term quantum machine learning [
40
]. This
work focuses on taking the NISQ-friendly approach that
constructs a variational quantum algorithm for one-class
classification with classical data, and verifying whether a
quantum advantage can be attained.
Numerical experiments are performed on handwritten
digits and the Fashion-MNIST dataset with open-source
Python API Qibo [
41
] for quantum circuit simulation.
The performance of VQOCC is evaluated via the area
under a receiver operating characteristic (ROC) curve
(AUC), and compared to classical methods including OC-
SVM, Kernel PCA, and deep convolutional autoencoder
(DCAE). We benchmark the performance of VQOCC
with various structures of the quantum autoencoder. The
structure of the QAE is determined by selecting data
encoding, the number of PQC layers, and the size of
the latent feature space. The general result of VQOCC
shows comparable performance to the classical methods
despite having the number of model parameters grow
only logarithmically with the data feature size. Notably,
the performance of VQOCC is better than DCAE under
similar training conditions.
The remainder of the paper is organized as follows. Sec-
tion II describes the one-class classification and reviews
some of the well-known approaches to the problem. Sec-
tion III explains the quantum autoencoder, which is the
basis of the quantum one-class classifier proposed in this
work. Section IV explains the application of quantum
autoencoder for one-class classification and constructing
different models via modifying the ansatz (i.e. structure
of the PQC) and cost functions. Numerical experiments
performed using scikit-learn and Qibo with handwrit-
ten digits and Fashion-MNIST datasets are explained
in Sec. V. This section also compares the AUC of ROC
curves of our algorithm with a one-class SVM, a kernel
PCA, and a deep convolutional autoencoder. Section VI
provides conclusion and suggestions for future work.
II. ONE-CLASS CLASSIFICATION
Assigning an input data to one of a given set of classes
is a canonical problem in pattern recognition and can be
formally described as a classification problem. Classifi-
cation aims to predict the class label of an unseen (test)
data ˜
x∈RN, given a labelled (training) dataset
D={(x1, y1),...,(xM, yM)} ⊂ RN×Zl,
where
l
is the number of classes. The one-class classifi-
cation is a special case of the aforementioned problem
when
l
= 1 [
22
,
42
]. In this case, the training dataset is
D
=
{x1,x2,...,xM}
, which is treated as a normal class,
and the goal is to identify whether a test data
˜
x
is in the
normal class or not. Since anomalous data is not used
in training, this is known as semi-supervised learning. It
is also possible to perform one-class classification with
unsupervised methods with an unlabelled dataset under
the assumption that most of the test dataset is composed
of normal data [21, 22].
Given a training dataset of normal class
D
, a decision
function
f
(
x
;
x1,x2,...,xM
) is attained from a one-class
classification algorithm, which expresses how far the input
data is from the training dataset. If the decision function
f
(
˜
x
;
x1,x2,...,xM
) has a value smaller than a certain
threshold value
Cth
(i.e.
f
(
˜
x
)
< Cth
), then
˜
x
is classified
as normal. Otherwise, if
f
(
˜
x
)
> Cth
, then the test data
is classified as anomalous. If
f
(
˜
x
) =
Cth
, the decision can
be made at random.
Two well-known statistical approaches for addressing
one-class classification problems are principal compo-
nent analysis (PCA) [
17
] and support vector machine
(SVM) [
18
–
20
]. PCA is a dimensionality reduction tech-
nique that projects data
xi
into a lower dimensional sub-
space such that the projections have the largest variances.
The projected space provides reconstructed data
ˆ
xi
. The
lower dimensional subspace is determined to minimize
the reconstruction error
Pikxi−ˆ
xik2
. Once the lower
dimensional subspace is chosen, the reconstruction error
f
(
x
) =
kx−ˆ
xk2
can be considered as a decision function
for one-class classification, since it will be small for normal
data and large for anomalous one. The kernel trick can
be utilized in PCA to include non-linearity [43].
The support vector machine is a supervised learning
model that aims to find a hyperplane that separates two
classes of training data with the maximum margin. Thus
it is commonly used in binary classification. The SVM
can be modified for one-class classification by finding a
maximum-margin hyperplane that separates normal data
from the origin. This is known as the one-class SVM
(OC-SVM) [
18
,
19
]. The decision function of OC-SVM is
f(x) = hw,Φ(x)i − b, (1)
where
w
and
b
describes the hyperplane and Φ is the
feature map. If the decision function is positive (nega-
tive), the corresponding test data is classified as normal
(anomalous).
Alternatively, the SVM can be modified for one-class
classification by finding the smallest hypersphere that
encapsulates normal data. This is known as the support
vector data description (SVDD) [
20
]. After finding the
optimal hypersphere, the data located outside of the
hypersphere is classified as anomalous. In this case, the
decision function can be expressed as
f(x) = kΦ(x)−ak2−R, (2)
where
a
is a center of the hypersphere, and
R
is a radius of
the hypersphere. Note that when the data is normalized