Prediction interval for neural network models using weighted asymmetric loss functions. Milo Grillo1 Yunpeng Han2 and Agnieszka Werpachowska23

2025-05-02 0 0 541.67KB 14 页 10玖币
侵权投诉
Prediction interval for neural network models
using weighted asymmetric loss functions.
Milo Grillo1, Yunpeng Han2, and Agnieszka Werpachowska2,3
1Humboldt University of Berlin, Germany
2Savvie AS, Oslo, Norway
2Centre for Microsimulation and Policy Analysis, University of Essex, UK
July 20, 2023
Abstract
We propose a simple and efficient approach to generate a prediction inter-
vals (PI) for approximated and forecasted trends. Our method leverages a
weighted asymmetric loss function to estimate the lower and upper bounds
of the PI, with the weights determined by its coverage probability. We
provide a concise mathematical proof of the method, show how it can
be extended to derive PIs for parametrised functions and discuss its ef-
fectiveness when training deep neural networks. The presented tests of
the method on a real-world forecasting task using a neural network-based
model show that it can produce reliable PIs in complex machine learning
scenarios.
1 Introduction
Neural network models are increasingly often used in prediction tasks, for exam-
ple in weather [1], water level [2], price [3], electricity grid load [4], ecology [5],
demographics [6] or sales forecasting. However, their often cited weakness is
that—in their vanilla form—they provide only point predictions. Meanwhile,
many of their users are interested also in prediction intervals (PIs), that is,
ranges [l, u] containing forecasted values with a given probability (e.g. 95%).
Several approaches have been proposed to facilitate the estimation of PIs
(see [1, 2, 4, 5, 7–14] and references therein):
1. the delta method, which assumes that prediction errors are homogeneous
and normally distributed [10, 12];
milo.grillo@hu-berlin.de
yunpeng@savvie.io
a.m.werpachowska@gmail.com
1
arXiv:2210.04318v5 [stat.ML] 18 Jul 2023
2. Bayesian inference [8], which requires a detailed model of sources of uncer-
tainty, and is extremely expensive computationally for realistic forecasting
scenarios [14];
3. Generalized Likelihood Uncertainty Estimation (GLUE) [7], which re-
quires multiple runs of the model with parameters sampled from a dis-
tribution specified by the modeller;
4. bootstrap [11], which generates multiple training datasets, leading to high
computational cost for large datasets [14];
5. Mean-Variance Estimation (MVE) [9], which is less computationally de-
manding than the methods mentioned above but also assumes a normal
distribution of errors and gives poor results [14];
6. Lower Upper Bound Estimation (LUBE), which trains the neural network
model to directly generate estimations of the lower and upper bounds of
the prediction interval using a specially designed training procedure with
tunable parameters [1, 2, 4, 14].
The existing methods are either overly restrictive (the delta method, MVE) or
too computationally expensive. We propose a method which is closest in spirit
to LUBE (we train a model to predict either a lower or an upper bound for the
PI) but simpler and less computationally expensive, because it does not require
any parameter tuning.
2 Problem statement
We consider a prediction problem x7→ y, where x∈ X are features (e.g. xRd)
and yRis the predicted variable. We assume that observed data D:=
{(x, y)}N X × Rare statistically independent N-realisations of a pair of
random variables (X, Y ) with an unknown joint distribution P. We also consider
a model gθwhich, given x X , produces a prediction gθ(x), where θis are model
parameters in parameter space Θ = Rm. When forecasting, the prediction is
also a function of an independent “time” variable t, which is simply included in
X.
The standard model training procedure aims to find such θthat, given x
X,gθ(·) is an good point estimate of Y|X, e.g. gθ(x)E[Y|X=x]. This
is achieved by minimising a loss function—by abuse of notation—of the form
l(y, y) = l(yy) with a minimum at y=yand increasing sufficiently fast
for |yy|→∞, where yis the observed target value and yis the model
prediction. More precisely, we minimise the sample average of the loss function
lover the parameters θ:
ˆ
θ= arg min
θ
N
X
i=1
l(yi, gθ(xi)) .
2
The above procedure can be given a simple probabilistic interpretation by
assuming that the target value yis a realisation of a random variable Ywith the
distribution µ+Z, where µis an unknown “true value” and Zis an i.i.d. error
term with a probability density function ρ(z)exp(l(z)). Two well-known
functions, Mean Squared Error (MSE) and Mean Absolute Error (MAE), corre-
spond to assuming a Gaussian or Laplace distribution for Z, respectively. The
value which minimises the loss function lcorresponds then to the maximum log-
likelihood estimation of the unknown parameter µ, since ln P(y|µ)∼ −l(yµ).
In this paper we focus on the MAE, in which case the average loss function
l(i.e. negative log-likelihood of data D)
l(y, y) = |yy|
Given an i.i.d. sample {yi}N
i=1, we thus try to minimise
1
N
N
X
i=1
|yiy|
which for N→ ∞ equals E[|Yy|]. The optimal value of y, i.e. the value
which minimises the loss, noted as ˆy, equals
ˆy= arg min
yR
E[|Yy|] for Y=µ+L ,
where Lhas Laplace distribution with density ρL(z) = e−|z|/2. The minimum
fulfills the condition E[|Yy|]/∂y= 0. Since
E[|Yy|] = E[|µ+Ly|]
=1
2Z
yµ
e−|z|(z+µy)dz 1
2Zyµ
−∞
e−|z|(z+µy)dz ,
we have
E[|Yy|]
y=1
2Zyµ
−∞
e−|z|dz 1
2Z
yµ
e−|z|dz ,
which is zero iff yµ= 0, hence ˆy=µ. For a finite sample [yi]N
i=1, ˆyis the
sample median, which approaches µas N→ ∞.
In prediction, we work with an independent variable Xand a dependent
variable Y, under the assumption that there exists some mapping gsuch that
g(X) = Y+ϵwith some error ϵ. We aim to find a prediction interval such that
given an x, the predicted value ylies within this interval with probability β.
Note that this problem is equivalent to finding the αl-th and αu-th percentile
of the distribution Y|X, such that 0 αlαu1 and αuαl=β. These
percentiles then correspond to the lower or upper bound of the PI, while βis the
coverage probability of the interval indicating the level of confidence associated
with it. To this end, we are going to generalise the above result and train the
model to predict a desired percentile of the distribution Y|X.
3
摘要:

Predictionintervalforneuralnetworkmodelsusingweightedasymmetriclossfunctions.MiloGrillo∗1,YunpengHan†2,andAgnieszkaWerpachowska‡2,31HumboldtUniversityofBerlin,Germany2SavvieAS,Oslo,Norway2CentreforMicrosimulationandPolicyAnalysis,UniversityofEssex,UKJuly20,2023AbstractWeproposeasimpleandefficientapp...

展开>> 收起<<
Prediction interval for neural network models using weighted asymmetric loss functions. Milo Grillo1 Yunpeng Han2 and Agnieszka Werpachowska23.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:541.67KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注