Prediction interval for neural network models using weighted asymmetric loss functions. Milo Grillo1 Yunpeng Han2 and Agnieszka Werpachowska23

2025-05-02 0 0 541.67KB 14 页 10玖币

侵权投诉

Prediction interval for neural network models

using weighted asymmetric loss functions.

Milo Grillo∗1, Yunpeng Han†2, and Agnieszka Werpachowska‡2,3

1Humboldt University of Berlin, Germany

2Savvie AS, Oslo, Norway

2Centre for Microsimulation and Policy Analysis, University of Essex, UK

July 20, 2023

Abstract

We propose a simple and eﬃcient approach to generate a prediction inter-

vals (PI) for approximated and forecasted trends. Our method leverages a

weighted asymmetric loss function to estimate the lower and upper bounds

of the PI, with the weights determined by its coverage probability. We

provide a concise mathematical proof of the method, show how it can

be extended to derive PIs for parametrised functions and discuss its ef-

fectiveness when training deep neural networks. The presented tests of

the method on a real-world forecasting task using a neural network-based

model show that it can produce reliable PIs in complex machine learning

scenarios.

1 Introduction

Neural network models are increasingly often used in prediction tasks, for exam-

ple in weather [1], water level [2], price [3], electricity grid load [4], ecology [5],

demographics [6] or sales forecasting. However, their often cited weakness is

that—in their vanilla form—they provide only point predictions. Meanwhile,

many of their users are interested also in prediction intervals (PIs), that is,

ranges [l, u] containing forecasted values with a given probability (e.g. 95%).

Several approaches have been proposed to facilitate the estimation of PIs

(see [1, 2, 4, 5, 7–14] and references therein):

1. the delta method, which assumes that prediction errors are homogeneous

and normally distributed [10, 12];

∗milo.grillo@hu-berlin.de

†yunpeng@savvie.io

‡a.m.werpachowska@gmail.com

arXiv:2210.04318v5 [stat.ML] 18 Jul 2023

2. Bayesian inference [8], which requires a detailed model of sources of uncer-

tainty, and is extremely expensive computationally for realistic forecasting

scenarios [14];

3. Generalized Likelihood Uncertainty Estimation (GLUE) [7], which re-

quires multiple runs of the model with parameters sampled from a dis-

tribution speciﬁed by the modeller;

4. bootstrap [11], which generates multiple training datasets, leading to high

computational cost for large datasets [14];

5. Mean-Variance Estimation (MVE) [9], which is less computationally de-

manding than the methods mentioned above but also assumes a normal

distribution of errors and gives poor results [14];

6. Lower Upper Bound Estimation (LUBE), which trains the neural network

model to directly generate estimations of the lower and upper bounds of

the prediction interval using a specially designed training procedure with

tunable parameters [1, 2, 4, 14].

The existing methods are either overly restrictive (the delta method, MVE) or

too computationally expensive. We propose a method which is closest in spirit

to LUBE (we train a model to predict either a lower or an upper bound for the

PI) but simpler and less computationally expensive, because it does not require

any parameter tuning.

2 Problem statement

We consider a prediction problem x7→ y, where x∈ X are features (e.g. x∈Rd)

and y∈Ris the predicted variable. We assume that observed data D:=

{(x, y)}N⊂ X × Rare statistically independent N-realisations of a pair of

random variables (X, Y ) with an unknown joint distribution P. We also consider

a model gθwhich, given x∈ X , produces a prediction gθ(x), where θis are model

parameters in parameter space Θ = Rm. When forecasting, the prediction is

also a function of an independent “time” variable t, which is simply included in

The standard model training procedure aims to ﬁnd such θthat, given x∈

X,gθ(·) is an good point estimate of Y|X, e.g. gθ(x)≈E[Y|X=x]. This

is achieved by minimising a loss function—by abuse of notation—of the form

l(y, y′) = l(y−y′) with a minimum at y=y′and increasing suﬃciently fast

for |y−y′|→∞, where yis the observed target value and y′is the model

prediction. More precisely, we minimise the sample average of the loss function

lover the parameters θ:

θ= arg min

i=1

l(yi, gθ(xi)) .

The above procedure can be given a simple probabilistic interpretation by

assuming that the target value yis a realisation of a random variable Ywith the

distribution µ+Z, where µis an unknown “true value” and Zis an i.i.d. error

term with a probability density function ρ(z)∼exp(−l(z)). Two well-known

functions, Mean Squared Error (MSE) and Mean Absolute Error (MAE), corre-

spond to assuming a Gaussian or Laplace distribution for Z, respectively. The

value which minimises the loss function lcorresponds then to the maximum log-

likelihood estimation of the unknown parameter µ, since ln P(y|µ)∼ −l(y−µ).

In this paper we focus on the MAE, in which case the average loss function

l(i.e. negative log-likelihood of data D)

l(y, y′) = |y−y′|

Given an i.i.d. sample {yi}N

i=1, we thus try to minimise

i=1

|yi−y′|

which for N→ ∞ equals E[|Y−y′|]. The optimal value of y′, i.e. the value

which minimises the loss, noted as ˆy, equals

ˆy= arg min

y′∈R

E[|Y−y′|] for Y=µ+L ,

where Lhas Laplace distribution with density ρL(z) = e−|z|/2. The minimum

fulﬁlls the condition ∂E[|Y−y′|]/∂y′= 0. Since

E[|Y−y′|] = E[|µ+L−y′|]

2Z∞

y′−µ

e−|z|(z+µ−y′)dz −1

2Zy′−µ

−∞

e−|z|(z+µ−y′)dz ,

we have

∂E[|Y−y′|]

∂y′=1

2Zy′−µ

−∞

e−|z|dz −1

2Z∞

y′−µ

e−|z|dz ,

which is zero iﬀ y′−µ= 0, hence ˆy=µ. For a ﬁnite sample [yi]N

i=1, ˆyis the

sample median, which approaches µas N→ ∞.

In prediction, we work with an independent variable Xand a dependent

variable Y, under the assumption that there exists some mapping gsuch that

g(X) = Y+ϵwith some error ϵ. We aim to ﬁnd a prediction interval such that

given an x, the predicted value ylies within this interval with probability β.

Note that this problem is equivalent to ﬁnding the αl-th and αu-th percentile

of the distribution Y|X, such that 0 ≤αl≤αu≤1 and αu−αl=β. These

percentiles then correspond to the lower or upper bound of the PI, while βis the

coverage probability of the interval indicating the level of conﬁdence associated

with it. To this end, we are going to generalise the above result and train the

model to predict a desired percentile of the distribution Y|X.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Predictionintervalforneuralnetworkmodelsusingweightedasymmetriclossfunctions.MiloGrillo∗1,YunpengHan†2,andAgnieszkaWerpachowska‡2,31HumboldtUniversityofBerlin,Germany2SavvieAS,Oslo,Norway2CentreforMicrosimulationandPolicyAnalysis,UniversityofEssex,UKJuly20,2023AbstractWeproposeasimpleandefficientapp...

展开>> 收起<<

Prediction interval for neural network models using weighted asymmetric loss functions. Milo Grillo1 Yunpeng Han2 and Agnieszka Werpachowska23.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Prediction interval for neural network models using weighted asymmetric loss functions. Milo Grillo1 Yunpeng Han2 and Agnieszka Werpachowska23

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: