checking the next N0−1 rewards. Only then, a new cycle would start including re-training
and re-estimation.
We therefore conclude that, for our purpose, the problem rather is to be able to find
reliable estimates of optimal stopping rules at low computational cost than to work out the
objectives themselves. When searching for possible solutions we came across several recent
papers related to this problem.
Most noticeable, we think, is the approach taken in [1], and follow-up papers [2,3], where
stopping rules are represented by functions similar to our functions f0,n, n = 1, . . . , N, and
these functions are estimated by training deep neural networks.
In [4], though, stopping rules are modelled by binary decision trees. However, treating a
whole stopping rule as a decision tree comes with high computational cost, and the authors
prove indeed that estimating such trees is NP-hard, in general. Nevertheless, an advantage
of using decision trees is that the estimated stopping rules become interpretable.
In [10], the authors translate a cost-awareness problem in computing into an optimal
stopping problem and then, similar to [4], model the corresponding stopping rule by a
decision tree.
In [5], the authors closely follow [1] but estimate the functions f0,n, n = 1, . . . , N, by
successfully applying deep Q-learning.
In [7], the authors improve on the method given in [1] using randomized neural networks
though they also discuss the possibility of applying reinforcement learning in this context.
When experimenting with the algorithm suggested in [1], we noticed that, when imple-
menting it on a standard Desktop Computer, training the neural networks took particularly
long. Furthermore, some interpretability of the estimated optimal stopping rules would be
desirable in certain applications, see [10] for example.
We therefore came up with the idea of using CART-trees instead of neural networks for
estimating f0,n, n = 1, . . . , N. This way, we gain interpretability, on the one hand, and we
lower computational cost, on the other. The crucial idea, though, is to estimate each f0,n
separately, and this crucial idea goes back to [1], originally, as far as we know. Instead of
modelling the whole stopping rule by a decision tree, as in [4], our model is built on a forest
of trees, which again leads to significantly lower computational cost.
Using CART-trees comes with disadvantages, too, and one of them is the high variance
associated with any tree-method. Applying additional variance reduction techniques slows
down our algorithm, but it remains reasonably faster than the algorithm suggested in [1],
mainly because training needs less data, which is particularly important in the context of
fast changing sequential data.
Another problem is dimension: our algorithm becomes inaccurate in certain examples
involving higher dimensional data. At the moment we do not know whether this problem
is intrinsically tree-related or not. There is work in progress to answer this question.
The rest of this paper is organised as follows: Section 2motivates our main algorithm,
that is Algorithm 1, but also gives the theoretical background of CART-trees in this
context, while the final Section 3deals with applications.
5