Neural Augmented Kalman Filtering with Bollinger Bands for Pairs Trading Amit Milstein Haoran Deng Guy Revach Hai Morgenstern and Nir Shlezinger

2025-05-02 0 0 1.27MB 13 页 10玖币
侵权投诉
Neural Augmented Kalman Filtering with Bollinger
Bands for Pairs Trading
Amit Milstein, Haoran Deng, Guy Revach, Hai Morgenstern, and Nir Shlezinger
Abstract—Pairs trading is a family of trading techniques that
determine their policies based on monitoring the relationships
between pairs of assets. A common pairs trading approach relies
on describing the pair-wise relationship as a linear Space State
(SS) model with Gaussian noise. This representation facilitates
extracting financial indicators with low complexity and latency
using a Kalman Filter (KF), that are then processed using
classic policies such as Bollinger Bands (BB). However, such
SS models are inherently approximated and mismatched, often
degrading the revenue. In this work, we propose KalmenNet-
aided Bollinger bands Pairs Trading (KBPT), a deep learning
aided policy that augments the operation of KF-aided BB trading.
KBPT is designed by formulating an extended SS model for pairs
trading that approximates their relationship as holding partial
co-integration. This SS model is utilized by a trading policy that
augments KF-BB trading with a dedicated neural network based
on the KalmanNet architecture. The resulting KBPT is trained in
a two-stage manner which first tunes the tracking algorithm in an
unsupervised manner independently of the trading task, followed
by its adaptation to track the financial indicators to maximize
revenue while approximating BB with a differentiable mapping.
KBPT thus leverages data to overcome the approximated nature
of the SS model, converting the KF-BB policy into a trainable
model. We empirically demonstrate that our proposed KBPT
systematically yields improved revenue compared with model-
based and data-driven benchmarks over various different assets.
I. INTRODUCTION
Quantitative methods constitute the fundamental mathe-
matical framework for analysis and prediction in financial
markets [2], [3]. A common type of quantitative methods
is algorithmic trading [4], which deals with decision-making
carried out by an agent (i.e., a trader) for the purpose of
maximizing a cumulative reward, most commonly achieving a
high Profit and Loss (PNL) balance in the market. Quantitative
trading schemes are typically comprised of two main stages:
the agent first tracks a stochastic process that describes the
prices of the assets of interest in order to extract useful trading
indicators. Then, these financial indicators are used as a basis
for decision making by setting a trading policy [5]–[7].
Quantitative trading requires a decision making mechanism
given application time constraints, i.e., a trading policy that
outputs a position based on the trading indicators. Such
Parts of this work were accepted for presentation in the 2023 IEEE Inter-
national Conference on Acoustics Speech, and Signal Processing (ICASSP)
as the paper [1]. A. Milstein and N. Shlezinger are with the School of ECE,
Ben-Gurion University of the Negev, Israel (e-mail: amitmils@post.bgu.ac.il;
nirshl@bgu.ac.il). H. Deng and G. Revach are with the Institute for Signal
and Information Processing, D-ITET, ETH Z¨
urich, Switzerland (e-mail:
haodeng@student.ethz.ch; grevach@ethz.ch). H. Morgenstern is unaffiliated
(e-mail: hai.morgenstern@gmail.com).
policies are typically based on indicators obtained as statistical
predictions of an asset price [8]. A popular classical policy is
the Bollinger Bands (BB) [9], which is based on the intuition
that if the price is much less than its mean, it will rise back
to normal level and thus one should long this asset. Due to
the fact that this method is not linear, it hedges the risk by
constraining the investment.
Classical trading schemes such as BB work well for single
stationary (and specifically, mean-reverting) processes [10]. It
is therefore sought-after to look for stationary assets, though
some schemes only look for the weaker condition of mean
reverting, e.g., using the Ornstein–Uhlenbeck formula [11].
Accordingly, algorithmic tracking of financial processes is
typically based on imposing a model on their temporal
evolution [12]. A common approach imposes simple linear
stochastic stationary model [13], often based on autoregressive
and moving average models [14]. While assets are rarely
stationary in real markets, their differences and spread (i.e.,
linear combination) are in some cases faithfully captured as
being stationary, and thus such techniques are commonly
adopted in pairs trading [4], [15]. The spread evolution and
its relationship with the assets pair is often described using
a Space State (SS) model [16]–[18], enabling tracking with
a Kalman Filter (KF) [14, Ch. 10]. A core challenge with
combining financial policies with algorithmic tracking based
on such statistical models it that they typically require strong
assumptions and prior financial knowledge. For instance, to
utilize the KF for spread tracking, one has to faithfully
capture the pairs trading as a linear Gaussian SS model. Such
models often fail to capture complicated patterns of real world
financial assets, which in turn leads to poor trading policies.
To overcome the drawbacks of classic model-based meth-
ods, recent years have witnessed a growing interest in the use
of model-agnostic deep learning. Deep learning systems are
used to capture the time evolution of financial assets [19],
extract features for trading [20], and determine trading poli-
cies [21], see survey in [22]. Common deep learning architec-
tures for financial modelling and prediction include recurrent
neural networks (RNNs) [23], auto-encoders [24], anomaly
detection [25] and attention models [26], [27]. Reinforcement
Learning (RL) is considered for training deep trading policies
[21], [28]–[32] to maximize the reward in an end-to-end
fashion. In order to generate various inputs, it was proposed
to use deep learning based natural language processing to
analyze social media and news for trading [20], [23]. Despite
their growing popularity, deep learning based quantitative
methods are subject to several drawbacks. They are based
on highly parameterized black boxes, giving rise to latency
1
arXiv:2210.15448v2 [q-fin.TR] 1 Sep 2023
considerations. Moreover, deep learning based policies lack
the interpretability and reliability of model-based methods, and
do not incorporate established models which is core in pairs
trading. In addition, these methods tend to have a long training
time and require large volumes of data for training, which can
constitute a limiting factor in high-frequency trading. This
motivates designing trading techniques that simultaneously
benefit from the approximated modelling adopted by classical
trading schemes alongside the abstractness and capabilities of
data-driven deep learning methods.
In this work, we propose KalmenNet-aided Bollinger bands
Pairs Trading (KBPT), a pairs trading algorithm that combines
SS model-based trading policies with deep learning tools,
based on model-based deep learning methodology [33]–[35].
KBPT is derived by proposing a novel SS model repre-
sentation for pairs trading obtained from assuming partial
co-integration [17] combined with an autoregressive prior
imposed on the spread. As opposed to previous SS model-
based trading policies that utilize, e.g., KF with BB for setting
the position, thus implicitly assuming that the SS model is
Gaussian and accurate, we design our policy to particularly
cope with the approximated nature of the SS model and its
expected non-Gaussianity. This is achieved by having KBPT
preserve the flow of KF-BB trading, retaining its structured
modeling and interpretability, while augmenting the KF with
a trainable RNN following the recently proposed Kalman-
Net [36]. The resulting neural augmentation, in which the
specific computation of the KF that depends on the underlying
stochasticity is learned, leverages data to track the spread in
partially known and non-Gaussian SS models.
We propose a dedicated training scheme for KBPT that
learns the pairs trading policy from sequences of past assets
pairs. The learning method is based on a two-stage procedure,
where we first train KalmanNet separately from the trading
task as a form of pretraining. There, we overcome the fact
that there is no ground-truth spread value by leveraging the
interpretable architecture of KalmanNet, and particularly its
internal prediction of the next observation which follows from
the KF flow, for unsupervised learning [37]. Then, we train
the overall trading policy, combining the neural augmented
KalmanNet with a customized BB mapping that is differ-
entiable, such that the tracking algorithm learns to produce
features that are most useful in the sense of maximizing
the PNL rather than accurately tracking the prices. By that,
we gain the ability to cope with modeling mismatch, as
the resulting architecture converts the model-based trading
algorithm into a trainable discriminative model [38] that is
trained end-to-end to maximize the PNL as a cumulative
reward.
Our empirical study compares KBPT with both model-based
trading and with deep RL-based policies for various assets
pairs. There, we demonstrate the individual gains of each of
the ingredients of KBPT, including the usefulness of the ex-
tended SS model underlying KBPT, as well as the superiority
of the proposed hybrid algorithm in systematically achieving
higher PNL compared with all considered benchmarks. Our
work extends upon its preliminary findings reported in [1]
in the proposal of the new partially co-integrated SS model,
the incorporation of a dedicated accumulated reward loss and
the two-stage training methods, as well as in the extensive
discussion, derivation, and experimental evaluations.
The rest of this paper is organized as follows: Section II
covers preliminaries in model-based trading and formulates the
problem; Section III describes the different SS models in pairs
trading and presents our proposed model; Section IV details
our proposed hybrid KBPT policy along with its learning
procedure; Section Vpresents the empirical study of KBPT,
contrasting it with both model-based and data-driven policies;
while Section VI provides concluding remarks.
Throughout this paper we use boldface lower-case letters for
vectors; e.g., x, and boldface uppercase letters for matrices,
e.g., for X. We denote the step function as U(·), with U(t) =
1for t > 0and U(t)=0for t0, while E{·} is the notation
for stochastic expectation. We use the term stationary process
to refer to a stochastic process that is stationary in the wide
sense. For consistency, the prices of all assets is given in USD.
II. PRELIMINARIES AND PROBLEM FORMULATION
In this section we formulate the considered model for pairs
trading. To that aim, we first review necessary preliminaries
in quantitative trading in Subsection II-A, and recall the BB
policy in Subsection II-B. These preliminaries are then used
to formulate the problem in Subsection II-C.
A. Trading Formulation
Trading strategies refer to the determining of investment
policies based on the monitoring of financial assets. Accord-
ingly, trading strategies can be generally divided into two
stages: (i)tracking of the assets into financial indicators; and
(ii)the trading policy that is based on these indicators [7].
1) Tracking: A crucial part of any trading scheme is
constantly evaluating and analyzing the financial markets, indi-
vidual securities, or sectors. Information such as price move-
ments, volatility, liquidity, volume, momentum, and market
breadth is valuable for making informed decisions in the trad-
ing market. Using this financial data, one can derive financial
indicators which enable the trader to get insight on potential
entry and exit points, assess risks, and ultimately optimize
the investment strategy. Quantitative financial indicators can
include technical indicators (e.g., moving averages, relative
strength index) [39], fundamental indicators (e.g., earnings per
share, price-to-earnings ratio), or macroeconomic indicators
(e.g., GDP growth rate, inflation rate) [40].
To formulate this mathematically, we use dtto denote the
financial information (e.g., assets price) at time t > 0. A
financial tracker, denoted φ, is a mapping of all the financial
data accumulated until time tinto financial indicators zt, i.e.,
φ:{dτ}τt7→ zt.(1)
The financial indicator should provide sufficient information
for the policy to dictate the current decision, as detailed next.
2) Policy: The policy component of a trading scheme,
denoted by π, refers to the rules, guidelines, and principles
that govern the decision-making process and the execution of
trades. The policy component in general may encompass both
2
quantitative and qualitative aspects: Quantitative aspects can
involve specific parameters, thresholds, or algorithms based on
financial indicators or other mathematical models. Qualitative
aspects consider factors such as market conditions, investor
sentiment, news events, or expert judgment.
The policy is the last step of the trading scheme and it
outputs the recommended actions for the trader to take in
order to optimize profits. We refer to the return of each trade
transaction the reward. In quantitative trading, the action at
time t, denoted pt, is determined using a trading policy π
based on the current indicator ztas well as past actions and
indicators, namely,
π:{zτ}τt,{pτ}τ <t 7→ pt.(2)
We henceforth focus on settings where
A1 The information dtrepresents the price of an asset.
A2 The actions correspond to long/short decisions on dt, i.e.,
holding positive or negative quantities, respectively.
The action space in A2 indicates that ptencapsulates open and
close decisions. We formulate this by writing pt= [opt,cpt],
where opt∈ {−1,0,1}is the open position policy that signals
if to short, hold or long the asset, respectively; and cpt
{0,1}is the close position policy, which gets the value 1
when an existing open position (e.g., from time t1) needs
to be closed. Otherwise, if a position needs to remain open or
there is no open position, it gets the value of 0. The order in
which positions are taken involves first checking if the closing
criteria is met, and then checking whether to open one. We
say that ptis an active position if opt=±1.
3) Reward: Under A1-A2, one can mathematically formu-
late the reward accumulated for an active position. To that
aim, let to
ibe the time the ith active position is taken and tc
i
the time it is closed. Accordingly, the reward obtained for the
ith activity of of policy πwith financial tracker φ, denoted by
rφ,π
i, is computed based the difference in the asset price over
the activity period and whether it was long or short via
rφ,π
i= opto
i·(dtc
idto
i).(3)
The reward in (3) can be positive or negative, i.e., profit or
loss, respectively.
B. Bollinger Bands Trading Policy
A popular trading policy is based on BB, which is a simple
and fundamental technique employed in a variety of trading
schemes [9]. BB consists of 3 bands plotted around the asset’s
price – upper, middle, and lower – as illustrated in Fig. 1.
The middle band is a simple Moving Average (MA), whose
window size varies per application (in Fig. 1we used a
window of 20 samples). The top and bottom bands are plotted
around the middle band where the distance can be based on
the Standard Deviation (STD) of the MA. These are typically
set at ±1 STD around the MA, though the setting may
vary depending on the application. Alternatively, one may use
confidence intervals for forming such bands.
Using these bands, one can build a trading strategy. A
natural approach to do so is applicable when dτis a stationary
Fig. 1. Asset price with Bollinger Bands illustration
price series. In this case, one can construct a financial tracker
using the empirical z-score, i.e.,
zt=φ({dτ}τt) = dtµt
σt
,(4)
where µtand σtare the empirical first and second order
moments of dt, respectively, estimated from {dτ}τt.
The BB policy is obtained by examining in which band zt
lies. In particular, if an open position is not currently being
held, a short position is taken if the asset is being overbought,
i.e. zt>1, and a long position if its being oversold , i.e.
zt<1. To formulate this mathematically, we say that an
open position is held at time tif the last open position time
denoted
τop,t max
τ <t:opτ=±1τ, (5)
is not smaller than the last close position time
τcp,t max
τt:cpτ=1 τ. (6)
The open position policy is thus determined as
opt= (U(1zt)− U (zt1)) · U (τcp,t τop,t).(7)
The reward in (3) is formulated for each active position,
and not for each time instance. In some settings, e.g., when
designing trading strategies using RL [31], [41], [42], one
is often interested in obtaining instantaneous rewards. This
achieved by closing a position after a single time step (though
it can then re-opened and treated as a new active position,
yielding an addition transaction cost, i.e., friction [18], which
we omit for simplicity). Such an operation results in
cpt=U|opt1|.(8)
Alternatively, one can determine the close position based on
the indicator, allowing a cumulative reward where a position
can be held over multiple time steps. In this case, the closing
of a position is a function of the indicator zt. For instance, one
can decide to close a currently open position if zthas crossed
3
摘要:

NeuralAugmentedKalmanFilteringwithBollingerBandsforPairsTradingAmitMilstein,HaoranDeng,GuyRevach,HaiMorgenstern,andNirShlezingerAbstract—Pairstradingisafamilyoftradingtechniquesthatdeterminetheirpoliciesbasedonmonitoringtherelationshipsbetweenpairsofassets.Acommonpairstradingapproachreliesondescribi...

展开>> 收起<<
Neural Augmented Kalman Filtering with Bollinger Bands for Pairs Trading Amit Milstein Haoran Deng Guy Revach Hai Morgenstern and Nir Shlezinger.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.27MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注