A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE MAX NENDEL AND ALESSANDRO SGARABOTTOLO

2025-04-30 1 0 1.3MB 33 页 10玖币

侵权投诉

A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX

RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE

MAX NENDEL AND ALESSANDRO SGARABOTTOLO

Abstract. In this paper, we explore a static setting for the assessment of risk in the

context of mathematical ﬁnance and actuarial science that takes into account model

uncertainty in the distribution of a possibly inﬁnite-dimensional risk factor. We study

convex risk functionals that incorporate a safety margin with respect to nonparametric

uncertainty by penalizing perturbations from a given baseline model using Wasserstein

distance. We investigate to which extent this form of probabilistic imprecision can be

approximated by restricting to a parametric family of models. The particular form of

the parametrization allows to develop numerical methods based on neural networks,

which give both the value of the risk functional and the worst-case perturbation of the

reference measure. Moreover, we consider additional constraints on the perturbations,

namely, mean and martingale constraints. We show that, in both cases, under suitable

conditions on the loss function, it is still possible to estimate the risk functional by

passing to a parametric family of perturbed models, which again allows for numerical

approximations via neural networks.

Key words: Risk measure, model uncertainty, Wasserstein distance, martingale optimal

transport, parametric estimation, neural network, measurable direction of steepest ascent

AMS 2020 Subject Classiﬁcation: Primary 62G05; 90C31; Secondary 41A60; 68T07;

91G70

1. Introduction

In this article, we study a class of convex risk functionals which arises naturally in the

context of mathematical ﬁnance and actuarial science, when dealing with expected values

for a risk factor whose distribution is not perfectly known. Given a random variable Yon

a probability space (Ω,F,P)taking values in a separable Hilbert space Hendowed with

its Borel σ-algebra B(H), one is usually interested in expressions of the form

EPf(Y)=ZH

f(y)µ(dy),

where f:H→Ris a, say, continuous loss or payoﬀ function and µ=P◦Y−1is the

distribution of Y, i.e., a probability measure on B(H).

Obtaining precise knowledge of the distribution µusing estimation procedures is there-

fore of central importance in applications, and lies at the heart of statistics. In practice,

however, one often has to deal with statistical imperfections, e.g., a lack of data or in-

formation about the dependence structure between single coordinates of Y, leading to a

so-called model calibration error or model speciﬁcation error, respectively. Therefore, in

many situations, precise knowledge of the underlying distribution µof Ymay not be at

hand, and only a rough estimate or an expert opinion suggesting a particular form of

Date: August 13, 2024.

The authors thank Daniel Bartl, Jonas Blessing, Stephan Eckstein, Michael Kupper, and Riccardo Mat-

tivi for valuable comments and discussions related to this work. Financial support through the Deutsche

Forschungsgemeinschaft (DFG, German Research Foundation) – SFB 1283/2 2021 – 317210226 is gratefully

acknowledged.

arXiv:2210.14340v2 [q-fin.RM] 12 Aug 2024

2 MAX NENDEL AND ALESSANDRO SGARABOTTOLO

reference distribution µmay be available. This is a special instance of model uncertainty

appearing, for example, in the context of catastrophic risk in reinsurance or default risk

within large credit portfolios in banking.

In the economic literature, model uncertainty is also referred to as Knightian uncertainty,

and a standard way to deal with it is to look at worst case losses among a set of plausible

probability distributions. In our study, we follow this approach, and estimate worst case

losses over the set Ppof all Borel probability measures on Hwith ﬁnite moment of order

p∈(1,∞), weighting the diﬀerent measures via a penalization term depending on the

p-Wasserstein distance from a reference model µ(which is assumed to have ﬁnite moment

of order pas well). This leads to an expression of the form

If:= sup

ν∈PpZH

f(z)ν(dz)−φWp(µ, ν).(1.1)

Here, the penalty function φ: [0,∞)→[0,∞], which is assumed to be nondecreasing with

φ(0) = 0, reﬂects a degree of conﬁdence in the reference measure, which could, for example,

be related to the availability of data for the estimation of µ. The value φ(a) = ∞for some

a > 0corresponds to a rejection of every model νwith Wasserstein distance Wp(µ, ν)≥a,

and the limit case φ=∞ · (0,∞)resembles perfect conﬁdence in the measure µ.

Functionals of the form (1.1) belong to the class of convex risk measures and, under

suitable conditions on the penalty function φ, to the class of coherent risk measures, cf.

[2] and [18]. Moreover, they are widely studied in the context of distributionally robust

optimization problems, see, for example, [5,20,26,31,33,34], where the authors usually

consider an additional optimization procedure, leading to an inf-sup-formulation.

A standard approach to tackle the inﬁnite-dimensional optimization related to (1.1) is to

look for a suitable dual formulation, for example, by transforming the primal problem into

a superhedging problem. For example, in [5], the authors transform a class of robust opti-

mized certainty equivalents (OCEs) into a one-dimensional optimization that leads to an

explicit correction term. In general, however, this approach leads to a nested optimization

problem, which can be numerically challenging.

We therefore look at this problem from a similar yet diﬀerent angle, and aim to identify

a parametric version of the functional (1.1) together with suitable optimizing directions.

This idea is merely related to the paradigm of looking for extreme points in Wasserstein

balls; a topic that has been explored in detail in the case where the reference measure is an

empirical distribution (uniform over the samples) or, more generally, a convex combination

of Dirac measures. In [33], it is shown that extreme points of Wasserstein balls centered

in a measure supported on at most npoints are supported on at most n+ 3 points. The

paper [29] reﬁnes this result, showing that these extreme distributions are in fact supported

on n+ 2 points. Finally, in [26], the authors show that, under stronger assumptions on

the loss function, the inﬁnite-dimensional optimization problem can be solved via a convex

shifting of the support points and that the optimizing distribution is supported on the same

number of points as the reference measure. In the case, where µis a convex combination

of Dirac measures, we get a similar result, but in a diﬀerent fashion, cf. Section 3.1.

The key idea of our approach is to look for a parametric version of the functional (1.1)

in terms of a ﬁrst order approximation as the level of uncertainty related to the penalty

function φtends to zero. More precisely, we introduce a scaling parameter h > 0, and

substitute φwith the rescaled version φh:= hφ(·/h), which allows to control the level of

uncertainty in terms of h. We then consider the operator I(h), given by

I(h)f:= sup

ν∈PpZH

f(z)ν(dz)−φhWp(µ, ν)(1.2)

PARAMETRIC ESTIMATION OF RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE 3

for a suitable class of continuous payoﬀ function f:H→R, and look for a parametric

version IΘ(h)that asymptotically coincides with I(h)up to a ﬁrst order error as htends

to zero. To that end, we consider a parameter set Θconsisting of vector ﬁelds θ:H→H,

which are p-integrable with respect to the reference measure µ. For each parameter θ∈Θ,

we consider a probability measure µθ, which is a shifted version of µ, i.e.,

f(z)µθ(dz) = ZH

fy+θ(y)µ(dy),(1.3)

and deﬁne the parametric version IΘ(h)of I(h), for h > 0, by

IΘ(h)f:= sup

θ∈ΘZH

f(z)µθ(dz)−φh∥θ∥Lp(µ;H).

In Theorem 2.7, we provide conditions on the parameter set Θand the function f, ensuring

that

lim

h↓0I(h)f− IΘ(h)f

h= 0.

In Theorem 2.5, we compute a safety margin for asymptotically small h > 0, and show that

an asymptotically optimal parameter θcan be found by looking at directions of steepest

ascent for f.

A crucial step in the direction of studying the asymptotic behaviour of optimization

problems of the form (1.2) was done in [3], see also [7] for its extension to a multi-period

setting using adapted Wasserstein distance and [6,19] for dynamic versions without an

additional optimization. In [3], the authors study sensitivities for various forms of robust

optimization problems over p-Wasserstein balls employing an elegant computational ap-

proach. In the proof of Theorem 2.5, we build on these methods, allowing for more general

penalty functions φand, at the same time, using weaker diﬀerentiability assumptions on

f. In particular, we can drop the assumption of diﬀerentiability (µ-a.e.), introducing the

concept of a measurable direction of steepest ascent as a generalization of the gradient, cf.

Deﬁnition 2.3. This way, we can overcome diﬀerentiability issues related to the function

ffor reference measures µ, which are not regular. Most functions of interest in ﬁnancial

applications, e.g., call options, have a measurable direction of steepest ascent, so that our

results can be applied without any restrictions on the reference measure µ, cf. Remark 2.6

b) for a more thorough discussion.

Following the lead of [3], in Section 2.2, we study an additional mean constraint in the

optimization (1.2). This constraint enters naturally when dealing with risk-neutral pricing,

where the mean of the underlying is assumed to be known (e.g., by standard non-arbitrage

arguments). Thanks to our parametric description of the risk functional, cf. Theorem 2.8,

we can show that uncertainty in the return of a ﬁnancial position can be replicated by

means of self-ﬁnancing portfolios of call options or digital options, cf. Section 4.5.

In Section 2.3, we impose a so-called martingale constraint on the functional (1.2). That

is, we restrict the optimization to a set of probability measures, which are given in terms

of a martingale perturbation of the reference measure or, in diﬀerent words, measures that

admit a martingale coupling with µ. This setup is closely intertwined with the topics of

martingale optimal transport (MOT) and model-free pricing in mathematical ﬁnance. For

example, in [8], the authors study robust superhedging problems based on MOT, whereas

[9] provides a precise description and ﬁne properties of the optimal transport plan under a

martingale constraint. We also refer to [28] for a class of robust estimators for superhedging

prices based on martingale measures which are, up to a small perturbation in Wasserstein

distance, equivalent to an empirical distribution.

4 MAX NENDEL AND ALESSANDRO SGARABOTTOLO

In Theorem 2.9, we provide an asymptotic parametrization of the constrained version of

(1.2) through a suitable randomization of the reference measure. Given a direction θ∈Θ,

we include an additional coin ﬂip in (1.3), which is independent of µand determines the

sign of θ. This leads to a parametric description in terms of measures µMart

θ∈ Pp, given

f(z)µMart

θ(dz) = ZHZR

fy+sθ(y)Bsym(ds)µ(dy),(1.4)

where Bsym is the symmetric Bernoulli distribution on Rwith equal probabilities, i.e.,

Bsym{−1}=Bsym{1}=1

In principle, Bsym could be replaced by any distribution on Rwith mean zero. However,

the symmetric Bernoulli distribution has the identifying property that, apart from having

mean zero, RR|s|αBsym(ds)=1for all α∈(0,∞); a property that is of central importance

for the asymptotic parametrization in this framework.

Apart from this, the symmetric Bernoulli distribution is fundamentally connected to the

Brownian motion via Donsker’s theorem. Having in mind that, in a Brownian ﬁltration,

all martingales can be represented as stochastic integrals with respect to the Brownian

motion, our parametrization via measures of the form (1.4) can be seen as a microscopic

version of a martingale representation theorem in a completely diﬀerent and model-free

setting. Moreover, the measure µMart

θcan be represented via the explicit formula

f(z)µMart

θ(dz) = ZH

fy+θ(y)+fy−θ(y)

2µ(dy),(1.5)

which, after substracting RHf(y)µ(dy), leads to a ﬁnite diﬀerence approximation of the

second derivative of fin the direction θusing central diﬀerences. This links µMart

θalso

from an analytic perspective to a Brownian motion through its inﬁnitesimal generator.

In view of numerical approximations, our construction of parametric versions of convex

risk functionals of the form (1.2) leads to a description of asymptotically relevant models for

the optimization in terms of Monge transports (p-integrable vector ﬁelds), which, in turn,

suggests a numerical investigation of the risk functional in the spirit of [16], see also [17].

In Section 3.2, we develop a numerical scheme based on neural networks. Previous works

on this topic provide approximations from above based on duality results, whereas our

approach leads to an approximation from below, based on the restriction to a parametric

family of models. As a byproduct, we also approximate the optimizing measure through the

Monge transport generating it. Observing that, for small values of uncertainty, the optimal

transport plan depends on the reference measure only through a multiplicative factor, we

draw a connection to transfer learning, and discuss the robustness of our approach with

respect to deviations in the reference measure.

In machine learning, transfer learning usually consists of taking a neural network trained

on a previous dataset and training only its last layer on a new dataset, which is often

signiﬁcantly smaller than the ﬁrst one, see [13] for references to seminal works on this

topic. The idea is that if the two datasets share some common features, the part of the

network that is inherited from the ﬁrst training is able to extract most of these features,

while the training of the last layer learns from speciﬁc characteristics of the new dataset.

In our framework, we can explicitly distinguish between the common feature that can

be extracted from the ﬁrst training, namely the optimizing vector ﬁeld, and the feature

that is speciﬁc to each reference measure, i.e., the rescaling factor. In particular, once the

approximation is performed on one measure, we can transfer it to a diﬀerent measure at

the price of a one-dimensional optimization, see Section 4.4 for further details.

PARAMETRIC ESTIMATION OF RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE 5

The paper is organized as follows. In Section 2, we introduce the setup and state the

main results on parametric versions of risk functions of the form (1.2). Section 3provides

numerical methods for the approximation of the parametric risk functional IΘ(h). We

distinguish between two situations leading to diﬀerent numerical schemes. One is based on

a reduction to a ﬁnite-dimensional optimization, cf. Section 3.1, while the other uses an

approximation via neural networks, cf. Section 3.2. In Section 4, we discuss applications,

both, of our theoretical and numerical ﬁndings by means of examples from ﬁnance and

insurance. Section 5contains the proofs of the main theorems and, in the Appendix A, we

provide a simple approximation result that helps to reduce the set of parameters Θ.

2. Setup and main results

In this section, we introduce the class of convex risk functionals that (together with ad-

ditional constraints) form the center of our study, and we state our main results concerning

their parametric estimation.

Throughout, let p∈(1,∞),q:= p

p−1∈(1,∞)be the conjugate exponent of p, and

(H, ⟨·,·⟩)be a separable Hilbert space. As usual, we endow Hwith its canonical norm

∥·∥ := p⟨·,·⟩, and identify the topological dual space H′of Hwith Hitself. We denote

the set of all probability measures νon the Borel σ-algebra B(H)of Hwith

|ν|p:= ZH∥y∥pν(dy)1/p

<∞

by Pp=Pp(H).

In the following, we consider a ﬁxed probability measure µ∈ Pp, which we will refer

to as the reference measure or baseline model, and a penalty function φ: [0,∞)→[0,∞],

which is assumed to be nondecreasing with φ(0) = 0.

For ν∈ Pp, we denote the p-Wasserstein distance between the reference measure µand

νby

Wp(µ, ν) = inf

π∈Cpl(µ,ν)ZH×H∥y−z∥pπ(dy, dz)1/p

,(2.1)

where Cpl(µ, ν)is the set of all couplings between µand ν, i.e., the set of probability

measures on the Borel σ-algebra B(H×H)of the product space H×Hwith ﬁrst and

second marginal µand ν, respectively. We refer to [1] and [32] for a detailed discussion on

Wasserstein distances and, more generally, the topic of optimal transport. Here, we only

recall that, for all ν∈ Pp, there exists an optimal coupling π∗∈Cpl(µ, ν)that attains the

inﬁmum in (2.1), i.e.,

Wp(µ, ν) = ZH×H∥y−z∥pπ∗(dy, dz)1/p

For ϱ∈[1,∞), we denote the space of all (µ-equivalence classes of) measurable functions

f:H→Rwith

∥f∥Lϱ(µ):= ZH|f(y)|ϱµ(dy)1/ϱ

<∞

by Lϱ(µ). In a similar fashion, Lϱ(µ;H)denotes the space of all (µ-equivalence classes of)

measurable functions g:H→Hwith

∥g∥Lϱ(µ;H):= ZH∥g(y)∥ϱµ(dy)1/ϱ

<∞.

Recall that, by assumption, His a separable Hilbert space, so that the image g(H)⊂H

of gis automatically separable.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

APARAMETRICAPPROACHTOTHEESTIMATIONOFCONVEXRISKFUNCTIONALSBASEDONWASSERSTEINDISTANCEMAXNENDELANDALESSANDROSGARABOTTOLOAbstract.Inthispaper,weexploreastaticsettingfortheassessmentofriskinthecontextofmathematicalfinanceandactuarialsciencethattakesintoaccountmodeluncertaintyinthedistributionofapossiblyi...

展开>> 收起<<

A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE MAX NENDEL AND ALESSANDRO SGARABOTTOLO.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A PARAMETRIC APPROACH TO THE ESTIMATION OF CONVEX RISK FUNCTIONALS BASED ON WASSERSTEIN DISTANCE MAX NENDEL AND ALESSANDRO SGARABOTTOLO

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: