Optimal Stopping with Trees The Details Sigurd Assing s.assingwarwick.ac.uk Department of Statistics

2025-04-29 0 0 1.06MB 40 页 10玖币
侵权投诉
Optimal Stopping with Trees: The Details
Sigurd Assing s.assing@warwick.ac.uk
Department of Statistics
University of Warwick
Coventry, CV4 7AL, UK
Xin Zhi xin.zhi@warwick.ac.uk
Department of Statistics
University of Warwick
Coventry, CV4 7AL, UK
Abstract
The purpose of this paper is two-fold, first, to review a recent method introduced by S.
Becker, P. Cheridito, and P. Jentzen, for solving high-dimensional optimal stopping prob-
lems using deep Neural Networks, second, to propose an alternative algorithm replacing
Neural Networks by CART-trees which allow for more interpretation of the estimated stop-
ping rules. We in particular compare the performance of the two algorithms with respect to
the Bermudan max-call benchmark example concluding that the Bermudan max-call may
not be suitable to serve as a benchmark example for high-dimensional optimal stopping
problems. We also show how our algorithm can be used to plot stopping boundaries.
Keywords: high-dimensional optimal stopping, sequential data, CART-trees, American
Option, Bermudan Option, deep learning
1. Describing the Problem
This paper builds upon the work presented in [1] whose authors used neural networks to
directly approximate optimal stopping rules—see the paper and references therein for the
motivation of their approach.
Our interest in this approach comes from applying optimal stopping in the context of
fast changing sequential data. Before explaining what we mean by fast changing, let us first
describe our framework of optimal stopping.
Assume that ordered data,
x1,...,xM1,xM,xM+1,...,xM+N,
is given (or will sequentially be given) of which, initially, only x1,...,xM1are accessible.
Then, a process is started which makes xM+n, n = 0,1, . . . , N , available to a decision
maker, one after the other, for checking a reward which is associated with each of these
data points. The reward can only be taken while being checked. If the decision maker takes
the reward, the process stops and no further reward can be checked. If they do not take it,
the process continues. Eventually, if none of the rewards associated with xM,...,xM+N1
is taken, the process would also stop and the decision maker would have to take the reward
associated with the only remaining data point xM+N.
1
arXiv:2210.04645v2 [math.OC] 23 Apr 2023
Of course, the decision maker’s ultimate goal, which is to stop the process by taking the
highest reward, can never be achieved for sure. Instead of going for the highest reward, an
objective has to be formulated which takes into account the uncertainty of missing out on
higher rewards because the process of checking them was stopped.
But how can this uncertainty be taken into account? When the process starts, that
is checking the reward associated with xM, the only information available is given by
x1,...,xMwhich could be used to infer possible values of xM+1,...,xM+Nbefore checking
the rewards associated with these data points. This inference would only be relevant for de-
cision making if the inferred values can be associated with rewards, too. Therefore, it is usu-
ally assumed that rewards can be determined by a family of functions, u(n, ·), n = 0, . . . , N,
with each function being defined on a space which is big enough to contain all values the
data under consideration could take. In this paper that space is going to be RD, for some
possibly very large D. Adding the parameter nto these functions u(n, ·) takes into account
that the nature of associated rewards might change when sequentially going through the
data.
Now, for us, inferring possible values of xM+1,...,xM+Nmeans generating such values
using a generative model which was trained on x1,...,xM. This way, we could generate
many samples,
xM,x(k)
M+1,...,x(k)
M+N, k = 1, . . . , K,
of a probability distribution which is hopefully close to the data generating process behind
the yet-unobserved true values xM+1,...,xM+N. Note that each sample can be considered
a path starting from the same data point xMwhich, for notational purposes, is also denoted
by x(k)
M, occasionally. Since Kis supposed to be large, in what follows, we would identify this
ensemble of samples with the distribution provided by the generative model. The sample
mean would therefore play the role of the expectation operator.
Then, for each k, we would be able to find
ˆν(k)such that u( ˆν(k),x(k)
M+ ˆν(k)) = max
0nNu(n, x(k)
M+n)
leading to
Vmax
xM=1
K
K
X
k=1
u( ˆν(k),x(k)
M+ ˆν(k)),
which would be the expected maximal reward, conditional on what we were able to learn
from x1,...,xMabout the yet-unobserved true values xM+1,...,xM+N.
At this point, we should ask whether Vmax
xMwould be a better objective for the decision
maker than the ill-posed highest reward. Unfortunately it is not. By general theory of
optimal stopping1, the optimal stopping rule with respect to the objective Vmax
xMwould still
be to stop by taking the highest reward (as is almost clear from how ˆν(k)was defined).
So, even in an ideal world of no modelling error2, it would not be possible to achieve the
expected maximum reward because this would still require knowing all rewards associated
with the yet-unobserved data points xM+1,...,xM+Nwhen just checking xM.
1. Our main reference for general theory of optimal stopping is [11], in what follows.
2. The generative model might not capture the full reality.
2
But what type of expected reward could be achieved? Observe that the objective Vmax
xM
can also be written as
sup
{ν(k)}K
k=1
1
K
K
X
k=1
u(ν(k),x(k)
M+ν(k)),
where the supremum is over all stopping-ensembles {ν(k)}K
k=1 the members of which are
functions ν(k): (RD)N+1 → {0, . . . , N}. It turns out that, if one puts a constraint on
the members of possible stopping-ensembles, it would be possible to achieve the maximal
expected reward subject to this constraint.
Define
V0
xM= sup
{ν(k)
0}K
k=1
1
K
K
X
k=1
u(ν(k)
0,x(k)
M+ν(k)
0
),
where the supremum is over all stopping-ensembles {ν(k)
0}K
k=1 the members of which can be
represented3by
ν(k)
0=
N
X
n=1
n×f0,n(xM,x(k)
M+1,...,x(k)
M+n),
using functions f0,n : (RD)n+1 → {0,1}which do not dependent on k.
Usually, V0
xMwould be smaller than Vmax
xMbut it is as large as it can be subject to only
using the information available at the point of stopping. Furthermore, general theory of
optimal stopping tells that the supremum used to define V0
xMis attained at an optimal
stopping-ensemble which can be represented by a collection of functions f?
0,n, n = 1, . . . , N,
such that N
X
n=1
f?
0,n(xM,x(k)
M+1,...,x(k)
M+n) = 1VxM
0> u(0,xM).
Thus, subject to modelling error, when checking the reward associated with xM, the
decision
do not take this reward iff it is smaller than V0
xM(?)
would be optimal for achieving the objective given by V0
xMwithout the need to know any
of the rewards associated with yet-unobserved data points. And if this decision leads to
checking the reward associated with xM+1, then the decision maker could
re-train their generative model using x1,...,xM,xM+1;
based on a new4ensemble of samples
xM+1,x(k)
M+2,...,x(k)
M+N, k = 1, . . . , K,
generated by the re-trained generative model, work out
V1
xM+1 = sup
{ν(k)
1}K
k=1
1
K
K
X
k=1
u( 1 + ν(k)
1,x(k)
M+1 + ν(k)
1
)
3. These representations are not unique.
4. The observed xM+1 at the beginning of the path xM+1,x(k)
M+2,...,x(k)
M+Nis supposed to indicate that
these samples are different to the samples generated for VxM
0, slightly abusing notation.
3
taking the supremum over all stopping-ensembles {ν(k)
1}K
k=1 the members of which can
be represented by
ν(k)
1=
N1
X
n=1
n×f1,n(xM+1,x(k)
M+2,...,x(k)
M+1 + n);
make the decision whether to take the reward u(1,xM+1) by comparing it to V1
xM+1;
thus move on to checking the next reward or not, and so on, till stopping.
Remark 1.1. When the rewards associated with xM+n, n = 0, . . . , N, become sequentially
available to the decision maker, if the uncertainty of missing out on higher rewards because
the process of checking them was stopped is taken into account by sequentially training a
generative model for predicting rewards associated with yet-unobserved data points, then
V0
xM
, V1
xM+1
, . . . , VN
xM+Nwould be the largest objectives which can be achieved by decisions
solely based on rewards associated with observed data points and expectations of future
rewards.
Therefore, in the context of machine learning, optimal stopping could come with two
expensive tasks, first, training and re-training a generative model, second, working out the
objectives V0
xM
, V1
xM+1
, . . . , VN
xM+Nin a reliable way.
Re-training the model might be necessary because the data generating process be-
hind x1,...,xM+Nhas been highly inhomogeneous in the sense that inferring values for
xM+n+1,...,xM+N, after the initial training of the generative model on x1,...,xM, could
follow a distribution very different to the corresponding predictive distribution of the gen-
erative model after being trained on x1,...,xM+n, when further n1 unobserved data
points have been made available to the decision maker. Since it is not known at which n
the initial training would be out of touch in the above sense, one would have to re-train the
generative model sequentially, maybe not at step-size one as we suggested, but definitely at
a certain frequency.
If the data generating process behind the given sequential data is likely to be inhomo-
geneous as described above, we give these data the attribute fast changing.
Assume that the fast changing nature of the data requires the generative model to be
re-trained every N0Nsteps. Then, omitting the first of the bullet-points above Remark
1.1, working out the objectives V0
xM
, V1
xM+1
, . . . , VN
xM+N01would be based on N0ensembles
of sample paths starting from xM,xM+1,...,xM+N01, respectively, each of them generated
by one and the same generative model.
First, generating a full ensemble could come with a significant computational cost, sec-
ond, working out practically relevant objectives is usually computationally expensive, in
particular, when the dimension of the data is large. So, at least for any period of N0steps
between re-training the generative model, one would want to estimate decisions for each
step right at the beginning of the period using just one ensemble of sample paths.
That is, when the process of checking rewards starts, one would want to estimate a
stopping rule associated with the optimal stopping-ensemble at which V0
xMis attained, and
the decisions given by this estimate would be applied without further adjustments when
4
checking the next N01 rewards. Only then, a new cycle would start including re-training
and re-estimation.
We therefore conclude that, for our purpose, the problem rather is to be able to find
reliable estimates of optimal stopping rules at low computational cost than to work out the
objectives themselves. When searching for possible solutions we came across several recent
papers related to this problem.
Most noticeable, we think, is the approach taken in [1], and follow-up papers [2,3], where
stopping rules are represented by functions similar to our functions f0,n, n = 1, . . . , N, and
these functions are estimated by training deep neural networks.
In [4], though, stopping rules are modelled by binary decision trees. However, treating a
whole stopping rule as a decision tree comes with high computational cost, and the authors
prove indeed that estimating such trees is NP-hard, in general. Nevertheless, an advantage
of using decision trees is that the estimated stopping rules become interpretable.
In [10], the authors translate a cost-awareness problem in computing into an optimal
stopping problem and then, similar to [4], model the corresponding stopping rule by a
decision tree.
In [5], the authors closely follow [1] but estimate the functions f0,n, n = 1, . . . , N, by
successfully applying deep Q-learning.
In [7], the authors improve on the method given in [1] using randomized neural networks
though they also discuss the possibility of applying reinforcement learning in this context.
When experimenting with the algorithm suggested in [1], we noticed that, when imple-
menting it on a standard Desktop Computer, training the neural networks took particularly
long. Furthermore, some interpretability of the estimated optimal stopping rules would be
desirable in certain applications, see [10] for example.
We therefore came up with the idea of using CART-trees instead of neural networks for
estimating f0,n, n = 1, . . . , N. This way, we gain interpretability, on the one hand, and we
lower computational cost, on the other. The crucial idea, though, is to estimate each f0,n
separately, and this crucial idea goes back to [1], originally, as far as we know. Instead of
modelling the whole stopping rule by a decision tree, as in [4], our model is built on a forest
of trees, which again leads to significantly lower computational cost.
Using CART-trees comes with disadvantages, too, and one of them is the high variance
associated with any tree-method. Applying additional variance reduction techniques slows
down our algorithm, but it remains reasonably faster than the algorithm suggested in [1],
mainly because training needs less data, which is particularly important in the context of
fast changing sequential data.
Another problem is dimension: our algorithm becomes inaccurate in certain examples
involving higher dimensional data. At the moment we do not know whether this problem
is intrinsically tree-related or not. There is work in progress to answer this question.
The rest of this paper is organised as follows: Section 2motivates our main algorithm,
that is Algorithm 1, but also gives the theoretical background of CART-trees in this
context, while the final Section 3deals with applications.
5
摘要:

OptimalStoppingwithTrees:TheDetailsSigurdAssings.assing@warwick.ac.ukDepartmentofStatisticsUniversityofWarwickCoventry,CV47AL,UKXinZhixin.zhi@warwick.ac.ukDepartmentofStatisticsUniversityofWarwickCoventry,CV47AL,UKAbstractThepurposeofthispaperistwo-fold, rst,toreviewarecentmethodintroducedbyS.Becker...

展开>> 收起<<
Optimal Stopping with Trees The Details Sigurd Assing s.assingwarwick.ac.uk Department of Statistics.pdf

共40页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:40 页 大小:1.06MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 40
客服
关注