A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE MAX NENDEL AND ALESSANDRO SGARABOTTOLO

2025-04-30 0 0 1.3MB 33 页 10玖币
侵权投诉
A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX
RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE
MAX NENDEL AND ALESSANDRO SGARABOTTOLO
Abstract. In this paper, we explore a static setting for the assessment of risk in the
context of mathematical finance and actuarial science that takes into account model
uncertainty in the distribution of a possibly infinite-dimensional risk factor. We study
convex risk functionals that incorporate a safety margin with respect to nonparametric
uncertainty by penalizing perturbations from a given baseline model using Wasserstein
distance. We investigate to which extent this form of probabilistic imprecision can be
approximated by restricting to a parametric family of models. The particular form of
the parametrization allows to develop numerical methods based on neural networks,
which give both the value of the risk functional and the worst-case perturbation of the
reference measure. Moreover, we consider additional constraints on the perturbations,
namely, mean and martingale constraints. We show that, in both cases, under suitable
conditions on the loss function, it is still possible to estimate the risk functional by
passing to a parametric family of perturbed models, which again allows for numerical
approximations via neural networks.
Key words: Risk measure, model uncertainty, Wasserstein distance, martingale optimal
transport, parametric estimation, neural network, measurable direction of steepest ascent
AMS 2020 Subject Classification: Primary 62G05; 90C31; Secondary 41A60; 68T07;
91G70
1. Introduction
In this article, we study a class of convex risk functionals which arises naturally in the
context of mathematical finance and actuarial science, when dealing with expected values
for a risk factor whose distribution is not perfectly known. Given a random variable Yon
a probability space (Ω,F,P)taking values in a separable Hilbert space Hendowed with
its Borel σ-algebra B(H), one is usually interested in expressions of the form
EPf(Y)=ZH
f(y)µ(dy),
where f:HRis a, say, continuous loss or payoff function and µ=PY1is the
distribution of Y, i.e., a probability measure on B(H).
Obtaining precise knowledge of the distribution µusing estimation procedures is there-
fore of central importance in applications, and lies at the heart of statistics. In practice,
however, one often has to deal with statistical imperfections, e.g., a lack of data or in-
formation about the dependence structure between single coordinates of Y, leading to a
so-called model calibration error or model specification error, respectively. Therefore, in
many situations, precise knowledge of the underlying distribution µof Ymay not be at
hand, and only a rough estimate or an expert opinion suggesting a particular form of
Date: August 13, 2024.
The authors thank Daniel Bartl, Jonas Blessing, Stephan Eckstein, Michael Kupper, and Riccardo Mat-
tivi for valuable comments and discussions related to this work. Financial support through the Deutsche
Forschungsgemeinschaft (DFG, German Research Foundation) – SFB 1283/2 2021 – 317210226 is gratefully
acknowledged.
1
arXiv:2210.14340v2 [q-fin.RM] 12 Aug 2024
2 MAX NENDEL AND ALESSANDRO SGARABOTTOLO
reference distribution µmay be available. This is a special instance of model uncertainty
appearing, for example, in the context of catastrophic risk in reinsurance or default risk
within large credit portfolios in banking.
In the economic literature, model uncertainty is also referred to as Knightian uncertainty,
and a standard way to deal with it is to look at worst case losses among a set of plausible
probability distributions. In our study, we follow this approach, and estimate worst case
losses over the set Ppof all Borel probability measures on Hwith finite moment of order
p(1,), weighting the different measures via a penalization term depending on the
p-Wasserstein distance from a reference model µ(which is assumed to have finite moment
of order pas well). This leads to an expression of the form
If:= sup
ν∈PpZH
f(z)ν(dz)φWp(µ, ν).(1.1)
Here, the penalty function φ: [0,)[0,], which is assumed to be nondecreasing with
φ(0) = 0, reflects a degree of confidence in the reference measure, which could, for example,
be related to the availability of data for the estimation of µ. The value φ(a) = for some
a > 0corresponds to a rejection of every model νwith Wasserstein distance Wp(µ, ν)a,
and the limit case φ=∞ · (0,)resembles perfect confidence in the measure µ.
Functionals of the form (1.1) belong to the class of convex risk measures and, under
suitable conditions on the penalty function φ, to the class of coherent risk measures, cf.
[2] and [18]. Moreover, they are widely studied in the context of distributionally robust
optimization problems, see, for example, [5,20,26,31,33,34], where the authors usually
consider an additional optimization procedure, leading to an inf-sup-formulation.
A standard approach to tackle the infinite-dimensional optimization related to (1.1) is to
look for a suitable dual formulation, for example, by transforming the primal problem into
a superhedging problem. For example, in [5], the authors transform a class of robust opti-
mized certainty equivalents (OCEs) into a one-dimensional optimization that leads to an
explicit correction term. In general, however, this approach leads to a nested optimization
problem, which can be numerically challenging.
We therefore look at this problem from a similar yet different angle, and aim to identify
a parametric version of the functional (1.1) together with suitable optimizing directions.
This idea is merely related to the paradigm of looking for extreme points in Wasserstein
balls; a topic that has been explored in detail in the case where the reference measure is an
empirical distribution (uniform over the samples) or, more generally, a convex combination
of Dirac measures. In [33], it is shown that extreme points of Wasserstein balls centered
in a measure supported on at most npoints are supported on at most n+ 3 points. The
paper [29] refines this result, showing that these extreme distributions are in fact supported
on n+ 2 points. Finally, in [26], the authors show that, under stronger assumptions on
the loss function, the infinite-dimensional optimization problem can be solved via a convex
shifting of the support points and that the optimizing distribution is supported on the same
number of points as the reference measure. In the case, where µis a convex combination
of Dirac measures, we get a similar result, but in a different fashion, cf. Section 3.1.
The key idea of our approach is to look for a parametric version of the functional (1.1)
in terms of a first order approximation as the level of uncertainty related to the penalty
function φtends to zero. More precisely, we introduce a scaling parameter h > 0, and
substitute φwith the rescaled version φh:= (·/h), which allows to control the level of
uncertainty in terms of h. We then consider the operator I(h), given by
I(h)f:= sup
ν∈PpZH
f(z)ν(dz)φhWp(µ, ν)(1.2)
PARAMETRIC ESTIMATION OF RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE 3
for a suitable class of continuous payoff function f:HR, and look for a parametric
version IΘ(h)that asymptotically coincides with I(h)up to a first order error as htends
to zero. To that end, we consider a parameter set Θconsisting of vector fields θ:HH,
which are p-integrable with respect to the reference measure µ. For each parameter θΘ,
we consider a probability measure µθ, which is a shifted version of µ, i.e.,
ZH
f(z)µθ(dz) = ZH
fy+θ(y)µ(dy),(1.3)
and define the parametric version IΘ(h)of I(h), for h > 0, by
IΘ(h)f:= sup
θΘZH
f(z)µθ(dz)φhθLp(µ;H).
In Theorem 2.7, we provide conditions on the parameter set Θand the function f, ensuring
that
lim
h0I(h)f− IΘ(h)f
h= 0.
In Theorem 2.5, we compute a safety margin for asymptotically small h > 0, and show that
an asymptotically optimal parameter θcan be found by looking at directions of steepest
ascent for f.
A crucial step in the direction of studying the asymptotic behaviour of optimization
problems of the form (1.2) was done in [3], see also [7] for its extension to a multi-period
setting using adapted Wasserstein distance and [6,19] for dynamic versions without an
additional optimization. In [3], the authors study sensitivities for various forms of robust
optimization problems over p-Wasserstein balls employing an elegant computational ap-
proach. In the proof of Theorem 2.5, we build on these methods, allowing for more general
penalty functions φand, at the same time, using weaker differentiability assumptions on
f. In particular, we can drop the assumption of differentiability (µ-a.e.), introducing the
concept of a measurable direction of steepest ascent as a generalization of the gradient, cf.
Definition 2.3. This way, we can overcome differentiability issues related to the function
ffor reference measures µ, which are not regular. Most functions of interest in financial
applications, e.g., call options, have a measurable direction of steepest ascent, so that our
results can be applied without any restrictions on the reference measure µ, cf. Remark 2.6
b) for a more thorough discussion.
Following the lead of [3], in Section 2.2, we study an additional mean constraint in the
optimization (1.2). This constraint enters naturally when dealing with risk-neutral pricing,
where the mean of the underlying is assumed to be known (e.g., by standard non-arbitrage
arguments). Thanks to our parametric description of the risk functional, cf. Theorem 2.8,
we can show that uncertainty in the return of a financial position can be replicated by
means of self-financing portfolios of call options or digital options, cf. Section 4.5.
In Section 2.3, we impose a so-called martingale constraint on the functional (1.2). That
is, we restrict the optimization to a set of probability measures, which are given in terms
of a martingale perturbation of the reference measure or, in different words, measures that
admit a martingale coupling with µ. This setup is closely intertwined with the topics of
martingale optimal transport (MOT) and model-free pricing in mathematical finance. For
example, in [8], the authors study robust superhedging problems based on MOT, whereas
[9] provides a precise description and fine properties of the optimal transport plan under a
martingale constraint. We also refer to [28] for a class of robust estimators for superhedging
prices based on martingale measures which are, up to a small perturbation in Wasserstein
distance, equivalent to an empirical distribution.
4 MAX NENDEL AND ALESSANDRO SGARABOTTOLO
In Theorem 2.9, we provide an asymptotic parametrization of the constrained version of
(1.2) through a suitable randomization of the reference measure. Given a direction θΘ,
we include an additional coin flip in (1.3), which is independent of µand determines the
sign of θ. This leads to a parametric description in terms of measures µMart
θ∈ Pp, given
by
ZH
f(z)µMart
θ(dz) = ZHZR
fy+(y)Bsym(ds)µ(dy),(1.4)
where Bsym is the symmetric Bernoulli distribution on Rwith equal probabilities, i.e.,
Bsym{−1}=Bsym{1}=1
2.
In principle, Bsym could be replaced by any distribution on Rwith mean zero. However,
the symmetric Bernoulli distribution has the identifying property that, apart from having
mean zero, RR|s|αBsym(ds)=1for all α(0,); a property that is of central importance
for the asymptotic parametrization in this framework.
Apart from this, the symmetric Bernoulli distribution is fundamentally connected to the
Brownian motion via Donsker’s theorem. Having in mind that, in a Brownian filtration,
all martingales can be represented as stochastic integrals with respect to the Brownian
motion, our parametrization via measures of the form (1.4) can be seen as a microscopic
version of a martingale representation theorem in a completely different and model-free
setting. Moreover, the measure µMart
θcan be represented via the explicit formula
ZH
f(z)µMart
θ(dz) = ZH
fy+θ(y)+fyθ(y)
2µ(dy),(1.5)
which, after substracting RHf(y)µ(dy), leads to a finite difference approximation of the
second derivative of fin the direction θusing central differences. This links µMart
θalso
from an analytic perspective to a Brownian motion through its infinitesimal generator.
In view of numerical approximations, our construction of parametric versions of convex
risk functionals of the form (1.2) leads to a description of asymptotically relevant models for
the optimization in terms of Monge transports (p-integrable vector fields), which, in turn,
suggests a numerical investigation of the risk functional in the spirit of [16], see also [17].
In Section 3.2, we develop a numerical scheme based on neural networks. Previous works
on this topic provide approximations from above based on duality results, whereas our
approach leads to an approximation from below, based on the restriction to a parametric
family of models. As a byproduct, we also approximate the optimizing measure through the
Monge transport generating it. Observing that, for small values of uncertainty, the optimal
transport plan depends on the reference measure only through a multiplicative factor, we
draw a connection to transfer learning, and discuss the robustness of our approach with
respect to deviations in the reference measure.
In machine learning, transfer learning usually consists of taking a neural network trained
on a previous dataset and training only its last layer on a new dataset, which is often
significantly smaller than the first one, see [13] for references to seminal works on this
topic. The idea is that if the two datasets share some common features, the part of the
network that is inherited from the first training is able to extract most of these features,
while the training of the last layer learns from specific characteristics of the new dataset.
In our framework, we can explicitly distinguish between the common feature that can
be extracted from the first training, namely the optimizing vector field, and the feature
that is specific to each reference measure, i.e., the rescaling factor. In particular, once the
approximation is performed on one measure, we can transfer it to a different measure at
the price of a one-dimensional optimization, see Section 4.4 for further details.
PARAMETRIC ESTIMATION OF RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE 5
The paper is organized as follows. In Section 2, we introduce the setup and state the
main results on parametric versions of risk functions of the form (1.2). Section 3provides
numerical methods for the approximation of the parametric risk functional IΘ(h). We
distinguish between two situations leading to different numerical schemes. One is based on
a reduction to a finite-dimensional optimization, cf. Section 3.1, while the other uses an
approximation via neural networks, cf. Section 3.2. In Section 4, we discuss applications,
both, of our theoretical and numerical findings by means of examples from finance and
insurance. Section 5contains the proofs of the main theorems and, in the Appendix A, we
provide a simple approximation result that helps to reduce the set of parameters Θ.
2. Setup and main results
In this section, we introduce the class of convex risk functionals that (together with ad-
ditional constraints) form the center of our study, and we state our main results concerning
their parametric estimation.
Throughout, let p(1,),q:= p
p1(1,)be the conjugate exponent of p, and
(H, ·,·)be a separable Hilbert space. As usual, we endow Hwith its canonical norm
·:= p·,·, and identify the topological dual space Hof Hwith Hitself. We denote
the set of all probability measures νon the Borel σ-algebra B(H)of Hwith
|ν|p:= ZHypν(dy)1/p
<
by Pp=Pp(H).
In the following, we consider a fixed probability measure µ∈ Pp, which we will refer
to as the reference measure or baseline model, and a penalty function φ: [0,)[0,],
which is assumed to be nondecreasing with φ(0) = 0.
For ν∈ Pp, we denote the p-Wasserstein distance between the reference measure µand
νby
Wp(µ, ν) = inf
πCpl(µ,ν)ZH×Hyzpπ(dy, dz)1/p
,(2.1)
where Cpl(µ, ν)is the set of all couplings between µand ν, i.e., the set of probability
measures on the Borel σ-algebra B(H×H)of the product space H×Hwith first and
second marginal µand ν, respectively. We refer to [1] and [32] for a detailed discussion on
Wasserstein distances and, more generally, the topic of optimal transport. Here, we only
recall that, for all ν∈ Pp, there exists an optimal coupling πCpl(µ, ν)that attains the
infimum in (2.1), i.e.,
Wp(µ, ν) = ZH×Hyzpπ(dy, dz)1/p
.
For ϱ[1,), we denote the space of all (µ-equivalence classes of) measurable functions
f:HRwith
fLϱ(µ):= ZH|f(y)|ϱµ(dy)1
<
by Lϱ(µ). In a similar fashion, Lϱ(µ;H)denotes the space of all (µ-equivalence classes of)
measurable functions g:HHwith
gLϱ(µ;H):= ZHg(y)ϱµ(dy)1
<.
Recall that, by assumption, His a separable Hilbert space, so that the image g(H)H
of gis automatically separable.
摘要:

APARAMETRICAPPROACHTOTHEESTIMATIONOFCONVEXRISKFUNCTIONALSBASEDONWASSERSTEINDISTANCEMAXNENDELANDALESSANDROSGARABOTTOLOAbstract.Inthispaper,weexploreastaticsettingfortheassessmentofriskinthecontextofmathematicalfinanceandactuarialsciencethattakesintoaccountmodeluncertaintyinthedistributionofapossiblyi...

展开>> 收起<<
A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE MAX NENDEL AND ALESSANDRO SGARABOTTOLO.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:33 页 大小:1.3MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注