Optimal Stopping with Trees The Details Sigurd Assing s.assingwarwick.ac.uk Department of Statistics

2025-04-29 0 0 1.06MB 40 页 10玖币

侵权投诉

Optimal Stopping with Trees: The Details

Sigurd Assing s.assing@warwick.ac.uk

Department of Statistics

University of Warwick

Coventry, CV4 7AL, UK

Xin Zhi xin.zhi@warwick.ac.uk

Department of Statistics

University of Warwick

Coventry, CV4 7AL, UK

Abstract

The purpose of this paper is two-fold, ﬁrst, to review a recent method introduced by S.

Becker, P. Cheridito, and P. Jentzen, for solving high-dimensional optimal stopping prob-

lems using deep Neural Networks, second, to propose an alternative algorithm replacing

Neural Networks by CART-trees which allow for more interpretation of the estimated stop-

ping rules. We in particular compare the performance of the two algorithms with respect to

the Bermudan max-call benchmark example concluding that the Bermudan max-call may

not be suitable to serve as a benchmark example for high-dimensional optimal stopping

problems. We also show how our algorithm can be used to plot stopping boundaries.

Keywords: high-dimensional optimal stopping, sequential data, CART-trees, American

Option, Bermudan Option, deep learning

1. Describing the Problem

This paper builds upon the work presented in [1] whose authors used neural networks to

directly approximate optimal stopping rules—see the paper and references therein for the

motivation of their approach.

Our interest in this approach comes from applying optimal stopping in the context of

fast changing sequential data. Before explaining what we mean by fast changing, let us ﬁrst

describe our framework of optimal stopping.

Assume that ordered data,

x1,...,xM−1,xM,xM+1,...,xM+N,

is given (or will sequentially be given) of which, initially, only x1,...,xM−1are accessible.

Then, a process is started which makes xM+n, n = 0,1, . . . , N , available to a decision

maker, one after the other, for checking a reward which is associated with each of these

data points. The reward can only be taken while being checked. If the decision maker takes

the reward, the process stops and no further reward can be checked. If they do not take it,

the process continues. Eventually, if none of the rewards associated with xM,...,xM+N−1

is taken, the process would also stop and the decision maker would have to take the reward

associated with the only remaining data point xM+N.

arXiv:2210.04645v2 [math.OC] 23 Apr 2023

Of course, the decision maker’s ultimate goal, which is to stop the process by taking the

highest reward, can never be achieved for sure. Instead of going for the highest reward, an

objective has to be formulated which takes into account the uncertainty of missing out on

higher rewards because the process of checking them was stopped.

But how can this uncertainty be taken into account? When the process starts, that

is checking the reward associated with xM, the only information available is given by

x1,...,xMwhich could be used to infer possible values of xM+1,...,xM+Nbefore checking

the rewards associated with these data points. This inference would only be relevant for de-

cision making if the inferred values can be associated with rewards, too. Therefore, it is usu-

ally assumed that rewards can be determined by a family of functions, u(n, ·), n = 0, . . . , N,

with each function being deﬁned on a space which is big enough to contain all values the

data under consideration could take. In this paper that space is going to be RD, for some

possibly very large D. Adding the parameter nto these functions u(n, ·) takes into account

that the nature of associated rewards might change when sequentially going through the

data.

Now, for us, inferring possible values of xM+1,...,xM+Nmeans generating such values

using a generative model which was trained on x1,...,xM. This way, we could generate

many samples,

xM,x(k)

M+1,...,x(k)

M+N, k = 1, . . . , K,

of a probability distribution which is hopefully close to the data generating process behind

the yet-unobserved true values xM+1,...,xM+N. Note that each sample can be considered

a path starting from the same data point xMwhich, for notational purposes, is also denoted

by x(k)

M, occasionally. Since Kis supposed to be large, in what follows, we would identify this

ensemble of samples with the distribution provided by the generative model. The sample

mean would therefore play the role of the expectation operator.

Then, for each k, we would be able to ﬁnd

ˆν(k)such that u( ˆν(k),x(k)

M+ ˆν(k)) = max

0≤n≤Nu(n, x(k)

M+n)

leading to

Vmax

xM=1

k=1

u( ˆν(k),x(k)

M+ ˆν(k)),

which would be the expected maximal reward, conditional on what we were able to learn

from x1,...,xMabout the yet-unobserved true values xM+1,...,xM+N.

At this point, we should ask whether Vmax

xMwould be a better objective for the decision

maker than the ill-posed highest reward. Unfortunately it is not. By general theory of

optimal stopping1, the optimal stopping rule with respect to the objective Vmax

xMwould still

be to stop by taking the highest reward (as is almost clear from how ˆν(k)was deﬁned).

So, even in an ideal world of no modelling error2, it would not be possible to achieve the

expected maximum reward because this would still require knowing all rewards associated

with the yet-unobserved data points xM+1,...,xM+Nwhen just checking xM.

1. Our main reference for general theory of optimal stopping is [11], in what follows.

2. The generative model might not capture the full reality.

But what type of expected reward could be achieved? Observe that the objective Vmax

can also be written as

sup

{ν(k)}K

k=1

u(ν(k),x(k)

M+ν(k)),

where the supremum is over all stopping-ensembles {ν(k)}K

k=1 the members of which are

functions ν(k): (RD)N+1 → {0, . . . , N}. It turns out that, if one puts a constraint on

the members of possible stopping-ensembles, it would be possible to achieve the maximal

expected reward subject to this constraint.

Deﬁne

xM= sup

{ν(k)

0}K

k=1

u(ν(k)

0,x(k)

M+ν(k)

where the supremum is over all stopping-ensembles {ν(k)

0}K

k=1 the members of which can be

represented3by

ν(k)

n=1

n×f0,n(xM,x(k)

M+1,...,x(k)

M+n),

using functions f0,n : (RD)n+1 → {0,1}which do not dependent on k.

Usually, V0

xMwould be smaller than Vmax

xMbut it is as large as it can be subject to only

using the information available at the point of stopping. Furthermore, general theory of

optimal stopping tells that the supremum used to deﬁne V0

xMis attained at an optimal

stopping-ensemble which can be represented by a collection of functions f?

0,n, n = 1, . . . , N,

such that N

n=1

0,n(xM,x(k)

M+1,...,x(k)

M+n) = 1VxM

0> u(0,xM).

Thus, subject to modelling error, when checking the reward associated with xM, the

decision

do not take this reward iﬀ it is smaller than V0

xM(?)

would be optimal for achieving the objective given by V0

xMwithout the need to know any

of the rewards associated with yet-unobserved data points. And if this decision leads to

checking the reward associated with xM+1, then the decision maker could

•re-train their generative model using x1,...,xM,xM+1;

•based on a new4ensemble of samples

xM+1,x(k)

M+2,...,x(k)

M+N, k = 1, . . . , K,

generated by the re-trained generative model, work out

xM+1 = sup

{ν(k)

1}K

k=1

u( 1 + ν(k)

1,x(k)

M+1 + ν(k)

)

3. These representations are not unique.

4. The observed xM+1 at the beginning of the path xM+1,x(k)

M+2,...,x(k)

M+Nis supposed to indicate that

these samples are diﬀerent to the samples generated for VxM

0, slightly abusing notation.

taking the supremum over all stopping-ensembles {ν(k)

1}K

k=1 the members of which can

be represented by

ν(k)

N−1

n=1

n×f1,n(xM+1,x(k)

M+2,...,x(k)

M+1 + n);

•make the decision whether to take the reward u(1,xM+1) by comparing it to V1

xM+1;

•thus move on to checking the next reward or not, and so on, till stopping.

Remark 1.1. When the rewards associated with xM+n, n = 0, . . . , N, become sequentially

available to the decision maker, if the uncertainty of missing out on higher rewards because

the process of checking them was stopped is taken into account by sequentially training a

generative model for predicting rewards associated with yet-unobserved data points, then

, V1

xM+1

, . . . , VN

xM+Nwould be the largest objectives which can be achieved by decisions

solely based on rewards associated with observed data points and expectations of future

rewards.

Therefore, in the context of machine learning, optimal stopping could come with two

expensive tasks, ﬁrst, training and re-training a generative model, second, working out the

objectives V0

, V1

xM+1

, . . . , VN

xM+Nin a reliable way.

Re-training the model might be necessary because the data generating process be-

hind x1,...,xM+Nhas been highly inhomogeneous in the sense that inferring values for

xM+n+1,...,xM+N, after the initial training of the generative model on x1,...,xM, could

follow a distribution very diﬀerent to the corresponding predictive distribution of the gen-

erative model after being trained on x1,...,xM+n, when further n≥1 unobserved data

points have been made available to the decision maker. Since it is not known at which n

the initial training would be out of touch in the above sense, one would have to re-train the

generative model sequentially, maybe not at step-size one as we suggested, but deﬁnitely at

a certain frequency.

If the data generating process behind the given sequential data is likely to be inhomo-

geneous as described above, we give these data the attribute fast changing.

Assume that the fast changing nature of the data requires the generative model to be

re-trained every N0Nsteps. Then, omitting the ﬁrst of the bullet-points above Remark

1.1, working out the objectives V0

, V1

xM+1

, . . . , VN

xM+N0−1would be based on N0ensembles

of sample paths starting from xM,xM+1,...,xM+N0−1, respectively, each of them generated

by one and the same generative model.

First, generating a full ensemble could come with a signiﬁcant computational cost, sec-

ond, working out practically relevant objectives is usually computationally expensive, in

particular, when the dimension of the data is large. So, at least for any period of N0steps

between re-training the generative model, one would want to estimate decisions for each

step right at the beginning of the period using just one ensemble of sample paths.

That is, when the process of checking rewards starts, one would want to estimate a

stopping rule associated with the optimal stopping-ensemble at which V0

xMis attained, and

the decisions given by this estimate would be applied without further adjustments when

checking the next N0−1 rewards. Only then, a new cycle would start including re-training

and re-estimation.

We therefore conclude that, for our purpose, the problem rather is to be able to ﬁnd

reliable estimates of optimal stopping rules at low computational cost than to work out the

objectives themselves. When searching for possible solutions we came across several recent

papers related to this problem.

Most noticeable, we think, is the approach taken in [1], and follow-up papers [2,3], where

stopping rules are represented by functions similar to our functions f0,n, n = 1, . . . , N, and

these functions are estimated by training deep neural networks.

In [4], though, stopping rules are modelled by binary decision trees. However, treating a

whole stopping rule as a decision tree comes with high computational cost, and the authors

prove indeed that estimating such trees is NP-hard, in general. Nevertheless, an advantage

of using decision trees is that the estimated stopping rules become interpretable.

In [10], the authors translate a cost-awareness problem in computing into an optimal

stopping problem and then, similar to [4], model the corresponding stopping rule by a

decision tree.

In [5], the authors closely follow [1] but estimate the functions f0,n, n = 1, . . . , N, by

successfully applying deep Q-learning.

In [7], the authors improve on the method given in [1] using randomized neural networks

though they also discuss the possibility of applying reinforcement learning in this context.

When experimenting with the algorithm suggested in [1], we noticed that, when imple-

menting it on a standard Desktop Computer, training the neural networks took particularly

long. Furthermore, some interpretability of the estimated optimal stopping rules would be

desirable in certain applications, see [10] for example.

We therefore came up with the idea of using CART-trees instead of neural networks for

estimating f0,n, n = 1, . . . , N. This way, we gain interpretability, on the one hand, and we

lower computational cost, on the other. The crucial idea, though, is to estimate each f0,n

separately, and this crucial idea goes back to [1], originally, as far as we know. Instead of

modelling the whole stopping rule by a decision tree, as in [4], our model is built on a forest

of trees, which again leads to signiﬁcantly lower computational cost.

Using CART-trees comes with disadvantages, too, and one of them is the high variance

associated with any tree-method. Applying additional variance reduction techniques slows

down our algorithm, but it remains reasonably faster than the algorithm suggested in [1],

mainly because training needs less data, which is particularly important in the context of

fast changing sequential data.

Another problem is dimension: our algorithm becomes inaccurate in certain examples

involving higher dimensional data. At the moment we do not know whether this problem

is intrinsically tree-related or not. There is work in progress to answer this question.

The rest of this paper is organised as follows: Section 2motivates our main algorithm,

that is Algorithm 1, but also gives the theoretical background of CART-trees in this

context, while the ﬁnal Section 3deals with applications.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OptimalStoppingwithTrees:TheDetailsSigurdAssings.assing@warwick.ac.ukDepartmentofStatisticsUniversityofWarwickCoventry,CV47AL,UKXinZhixin.zhi@warwick.ac.ukDepartmentofStatisticsUniversityofWarwickCoventry,CV47AL,UKAbstractThepurposeofthispaperistwo-fold,rst,toreviewarecentmethodintroducedbyS.Becker...

展开>> 收起<<

Optimal Stopping with Trees The Details Sigurd Assing s.assingwarwick.ac.uk Department of Statistics.pdf

共40页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Optimal Stopping with Trees The Details Sigurd Assing s.assingwarwick.ac.uk Department of Statistics

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: