
QUANTUM DEEP RECURRENT REINFORCEMENT LEARNING
Samuel Yen-Chi Chen
Wells Fargo
ABSTRACT
Recent advances in quantum computing (QC) and machine
learning (ML) have drawn significant attention to the devel-
opment of quantum machine learning (QML). Reinforcement
learning (RL) is one of the ML paradigms which can be used to
solve complex sequential decision making problems. Classical
RL has been shown to be capable to solve various challenging
tasks. However, RL algorithms in the quantum world are still
in their infancy. One of the challenges yet to solve is how to
train quantum RL in the partially observable environments.
In this paper, we approach this challenge through building
QRL agents with quantum recurrent neural networks (QRNN).
Specifically, we choose the quantum long short-term memory
(QLSTM) to be the core of the QRL agent and train the whole
model with deep
Q
-learning. We demonstrate the results via
numerical simulations that the QLSTM-DRQN can solve stan-
dard benchmark such as Cart-Pole with more stable and higher
average scores than classical DRQN with similar architecture
and number of model parameters.
Index Terms—
Quantum machine learning, Reinforce-
ment learning, Recurrent neural networks, Long short-term
memory
1. INTRODUCTION
Quantum computing (QC) promises superior performance on
certain hard computational tasks over classical computers [
1
].
However, existing quantum computers are not error-corrected,
making implementation of deep quantum circuits extremely
difficult. These so-called noisy intermediate-scale quantum
(NISQ) devices [
2
] require special design of quantum circuit
architectures so that the quantum advantages can be harnessed.
Recently, a hybrid quantum-classical computing framework
[
3
] which leverage both the classical and quantum computing
has been proposed. Under this paradigm, certain computa-
tional tasks which are expected to have quantum advantages
are carried out on a quantum computer, while other tasks
such as gradient calculations remain on the classical comput-
ers. These algorithms are usually called variational quantum
The views expressed in this article are those of the authors and do not
represent the views of Wells Fargo. This article is for informational purposes
only. Nothing contained in this article should be construed as investment
advice. Wells Fargo makes no express or implied warranties and expressly
disclaims all legal, tax, and accounting implications related to this article.
algorithms are have been successful in certain ML tasks. Re-
inforcement learning (RL) is a sub-field of ML dealing with
sequential decision making tasks. RL based on deep neural
networks have gained tremendous success in complex tasks
with human-level [
4
] or super-human performance [
5
]. How-
ever, quantum RL is an emerging subject with many issues
and challenges not yet investigated. For example, existing
quantum RL methods focus on various VQCs without the re-
current structures. Nevertheless, recurrent connections are
crucial components in the classical ML to keep memory of
past time steps. The potential of such architectures, to our
best knowledge, is not yet studied in the quantum RL. In this
work, we propose the quantum deep recurrent
Q
-learning via
the application of quantum recurrent neural networks (QRNN)
as the value function approximator. Specifically, we apply the
quantum long short-term memory (QLSTM) as the core of the
QRL agent. The scheme is illustrated in Figure1. Our numeri-
cal simulation shows that the proposed framework can reach
performance comparable to or better than their classical LSTM
counterparts when the model sizes are similar and under the
same training setting.
Quantum Recurrent RL
Environment
Hybrid RL Agent
action
at
state
st
reward
rt
Quantum RNN Classical Computer
Deep Q-learning
algorithm
θ
Update
parameters
Output from
Quantum
RNN
Fig. 1
.
The hybrid quantum-classical deep recurrent Q-
learning.
2. RELATED WORK
The quantum reinforcement learning (QRL) can be traced back
to the work [
6
]. However, the framework requires the envi-
ronment to be quantum, which may not be satisfied in most
real-world cases. Here we focus on the recent developments
of VQC-based QRL dealing with classical environments. The
first VQC-based QRL [
7
], which is the quantum version of
arXiv:2210.14876v1 [quant-ph] 26 Oct 2022