considerations. Moreover, deep learning based policies lack
the interpretability and reliability of model-based methods, and
do not incorporate established models which is core in pairs
trading. In addition, these methods tend to have a long training
time and require large volumes of data for training, which can
constitute a limiting factor in high-frequency trading. This
motivates designing trading techniques that simultaneously
benefit from the approximated modelling adopted by classical
trading schemes alongside the abstractness and capabilities of
data-driven deep learning methods.
In this work, we propose KalmenNet-aided Bollinger bands
Pairs Trading (KBPT), a pairs trading algorithm that combines
SS model-based trading policies with deep learning tools,
based on model-based deep learning methodology [33]–[35].
KBPT is derived by proposing a novel SS model repre-
sentation for pairs trading obtained from assuming partial
co-integration [17] combined with an autoregressive prior
imposed on the spread. As opposed to previous SS model-
based trading policies that utilize, e.g., KF with BB for setting
the position, thus implicitly assuming that the SS model is
Gaussian and accurate, we design our policy to particularly
cope with the approximated nature of the SS model and its
expected non-Gaussianity. This is achieved by having KBPT
preserve the flow of KF-BB trading, retaining its structured
modeling and interpretability, while augmenting the KF with
a trainable RNN following the recently proposed Kalman-
Net [36]. The resulting neural augmentation, in which the
specific computation of the KF that depends on the underlying
stochasticity is learned, leverages data to track the spread in
partially known and non-Gaussian SS models.
We propose a dedicated training scheme for KBPT that
learns the pairs trading policy from sequences of past assets
pairs. The learning method is based on a two-stage procedure,
where we first train KalmanNet separately from the trading
task as a form of pretraining. There, we overcome the fact
that there is no ground-truth spread value by leveraging the
interpretable architecture of KalmanNet, and particularly its
internal prediction of the next observation which follows from
the KF flow, for unsupervised learning [37]. Then, we train
the overall trading policy, combining the neural augmented
KalmanNet with a customized BB mapping that is differ-
entiable, such that the tracking algorithm learns to produce
features that are most useful in the sense of maximizing
the PNL rather than accurately tracking the prices. By that,
we gain the ability to cope with modeling mismatch, as
the resulting architecture converts the model-based trading
algorithm into a trainable discriminative model [38] that is
trained end-to-end to maximize the PNL as a cumulative
reward.
Our empirical study compares KBPT with both model-based
trading and with deep RL-based policies for various assets
pairs. There, we demonstrate the individual gains of each of
the ingredients of KBPT, including the usefulness of the ex-
tended SS model underlying KBPT, as well as the superiority
of the proposed hybrid algorithm in systematically achieving
higher PNL compared with all considered benchmarks. Our
work extends upon its preliminary findings reported in [1]
in the proposal of the new partially co-integrated SS model,
the incorporation of a dedicated accumulated reward loss and
the two-stage training methods, as well as in the extensive
discussion, derivation, and experimental evaluations.
The rest of this paper is organized as follows: Section II
covers preliminaries in model-based trading and formulates the
problem; Section III describes the different SS models in pairs
trading and presents our proposed model; Section IV details
our proposed hybrid KBPT policy along with its learning
procedure; Section Vpresents the empirical study of KBPT,
contrasting it with both model-based and data-driven policies;
while Section VI provides concluding remarks.
Throughout this paper we use boldface lower-case letters for
vectors; e.g., x, and boldface uppercase letters for matrices,
e.g., for X. We denote the step function as U(·), with U(t) =
1for t > 0and U(t)=0for t≤0, while E{·} is the notation
for stochastic expectation. We use the term stationary process
to refer to a stochastic process that is stationary in the wide
sense. For consistency, the prices of all assets is given in USD.
II. PRELIMINARIES AND PROBLEM FORMULATION
In this section we formulate the considered model for pairs
trading. To that aim, we first review necessary preliminaries
in quantitative trading in Subsection II-A, and recall the BB
policy in Subsection II-B. These preliminaries are then used
to formulate the problem in Subsection II-C.
A. Trading Formulation
Trading strategies refer to the determining of investment
policies based on the monitoring of financial assets. Accord-
ingly, trading strategies can be generally divided into two
stages: (i)tracking of the assets into financial indicators; and
(ii)the trading policy that is based on these indicators [7].
1) Tracking: A crucial part of any trading scheme is
constantly evaluating and analyzing the financial markets, indi-
vidual securities, or sectors. Information such as price move-
ments, volatility, liquidity, volume, momentum, and market
breadth is valuable for making informed decisions in the trad-
ing market. Using this financial data, one can derive financial
indicators which enable the trader to get insight on potential
entry and exit points, assess risks, and ultimately optimize
the investment strategy. Quantitative financial indicators can
include technical indicators (e.g., moving averages, relative
strength index) [39], fundamental indicators (e.g., earnings per
share, price-to-earnings ratio), or macroeconomic indicators
(e.g., GDP growth rate, inflation rate) [40].
To formulate this mathematically, we use dtto denote the
financial information (e.g., assets price) at time t > 0. A
financial tracker, denoted φ, is a mapping of all the financial
data accumulated until time tinto financial indicators zt, i.e.,
φ:{dτ}τ≤t7→ zt.(1)
The financial indicator should provide sufficient information
for the policy to dictate the current decision, as detailed next.
2) Policy: The policy component of a trading scheme,
denoted by π, refers to the rules, guidelines, and principles
that govern the decision-making process and the execution of
trades. The policy component in general may encompass both
2