Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction

2025-05-04 0 0 1.44MB 41 页 10玖币
侵权投诉
University of Oxford
Distribution-Free Finite-Sample
Guarantees and Split Conformal
Prediction
by
Roel Hulsman
St. Anne’s College
A dissertation submitted in partial fulfilment of the degree of Master of
Science in Statistical Science.
Department of Statistics, 24–29 St Giles,
Oxford, OX1 3LB
September 2022
arXiv:2210.14735v1 [stat.ML] 26 Oct 2022
Abstract
Modern black-box predictive models are often accompanied by weak perform-
ance guarantees that only hold asymptotically in the size of the dataset or require
strong parametric assumptions. In response to this, split conformal prediction repres-
ents a promising avenue to obtain finite-sample guarantees under minimal distribution-
free assumptions. Although prediction set validity most often concerns marginal
coverage, we explore the related but different guarantee of tolerance regions, refor-
mulating known results in the language of nested prediction sets and extending on the
duality between marginal coverage and tolerance regions. Furthermore, we highlight
the connection between split conformal prediction and classical tolerance predictors
developed in the 1940s, as well as recent developments in distribution-free risk con-
trol. One result that transfers from classical tolerance predictors is that the coverage
of a prediction set based on order statistics, conditional on the calibration set, is a
random variable stochastically dominating the Beta distribution. We demonstrate
the empirical effectiveness of our findings on synthetic and real datasets using a pop-
ular split conformal prediction procedure called conformalized quantile regression
(CQR).
Contents
1 Introduction 1
1.1 Summary and Outline ............................ 2
2 Distribution-Free Finite-Sample Guarantees 3
2.1 Distribution-Free Predictive Inference ................... 3
2.2 Marginal Coverage ............................. 4
2.3 Tolerance Regions .............................. 5
2.4 Risk Control ................................. 6
3 Conformal Prediction 7
3.1 Split Conformal Prediction ......................... 7
3.2 Marginal Coverage and Split Conformal Prediction ............ 8
3.3 Tolerance Regions and Split Conformal Prediction ............ 9
3.4 Conformalized Quantile Regression .................... 10
4 Distribution of Coverage 11
4.1 Classical Tolerance Predictors ....................... 12
4.2 Simple Tolerance Predictors of Order Statistics .............. 12
4.3 Distribution of Coverage for Split Conformal Prediction .......... 13
4.4 Distribution of Empirical Coverage ..................... 15
5 Distribution-Free Risk Control 16
5.1 Conformal Risk Control .......................... 17
5.2 Upper Confidence Bound Calibration ................... 18
5.3 Learn Then Test ............................... 20
6 Experiments 21
6.1 Conformalized Quantile Random Forests ................. 21
6.2 Synthetic Example ............................. 22
6.3 Calibrating a Given Base Predictor ..................... 24
7 Conclusion 25
References 27
A Proofs 34
A.1 Proposition 1 ................................ 34
A.2 Proposition 2 ................................ 34
A.3 Proposition 3 ................................ 34
A.4 Proposition 4 ................................ 36
A.5 Proposition 5 ................................ 37
A.6 Proposition 6 ................................ 37
B Computational Tables 38
1 Introduction
Black-box predictive models have become popular tools in the advent of large datasets
and cheap computing resources. However, the predictive performance of these models is
usually subject to weak statistical guarantees that only hold asymptotically in the size of
the dataset or require strong parametric assumptions on the data generating process. Both
might be unrealistic in practice, therefore the deployment of black-box predictive models
is challenging in contexts where safety is key, such as medicine.
A particular line of research aiming to improve this situation is called post-hoc calibra-
tion. Consider an arbitrary black-box predictor fitted on a proper training set, where the
base predictor is black-box in the sense that we do not seek to understand or modify its
behaviour, but instead wrap it into a larger post-hoc calibration algorithm. The purpose
of such an algorithm is to calibrate the base predictor so that it satisfies some rigorous
statistical guarantees under minimal assumptions. We specifically consider finite-sample
guarantees while making no assumptions on the distribution of the underlying data, en-
tering the field of distribution-free predictive inference.
Conformal prediction is a general framework to construct prediction sets that satisfy some
distribution-free finite-sample guarantee under the assumption of iid data [1,2]. The
widely studied adaptation focused on in this thesis is called split conformal prediction
[3,4]. For a gentle introduction into conformal prediction we refer to [5] and for a more
technical tutorial to [6]. Conformal prediction has been applied in various contexts, such
as drug discovery [7], image classification [8], natural language processing [911] and
voting during the 2020 US presidential election [12]. Although traditionally conformal
prediction starts with the definition of a non-conformity score, we follow an alternative
but equivalent interpretation leveraging nested prediction sets [13].
Given the base predictor and a calibration set {(Xi, Yi)}n
i=1, we are interested in predicting
the label Yn+1 ∈ Y corresponding to a new feature Xn+1 X , while quantifying the cor-
responding prediction uncertainty. Simply put, the split conformal prediction algorithm
inputs the base predictor and the calibration set and outputs a prediction set Sb
λ(Xn+1)
that contains Yn+1 with some distribution-free finite-sample guarantee of certainty. Note
Sb
λ(Xn+1)is indexed by a random variable b
λΛthat determines the size of the set,
where ΛR∪ {±∞} is some closed set.
The most commonly used finite-sample guarantee in distribution-free predictive infer-
ence is that of marginal coverage, guaranteeing that a prediction set Sb
λ(Xn+1)contains
Yn+1 with a pre-specified confidence level 1α(0,1) on average over the iid sample
{(Xi, Yi)}n+1
i=1 . It is a well-known result what value of b
λyields marginal coverage in
split conformal prediction [3,4]. The guarantee of a tolerance region1slightly differs
from marginal coverage, taking explicitly into account that the calibration set is random
and thus that the coverage of Sb
λ(Xn+1), conditional on the calibration set, is a random
variable. Sb
λ(Xn+1)is an (, δ)-tolerance region if it contains at least a pre-specified pro-
portion 1(0,1) of the label population Ywith at least a pre-specified probability
1δ(0,1) over the calibration data, see Section 2.3 for a formal definition. The con-
1Also known as training conditional validity [14] or probably approximately correct (PAC) coverage
[1517], the latter notion arising from statistical learning theory [18].
1
nection between marginal coverage and tolerance regions in the context of split conformal
prediction is considered in [14].
While conformal prediction has been pioneered in roughly the last two decades, proced-
ures to construct tolerance regions based on order statistics have been studied extensively
since the 1940s [1927]. For a concise treatment we refer to [28] and for a more recent
review to [29]. [14] first studied the connection between these ‘classical tolerance predict-
ors’ and split conformal prediction, stating the property of prediction set validity in the
traditional statistical language of classical tolerance predictors and interpreting split con-
formal prediction as a ‘conditional’ version of the classical tolerance predictors proposed
in [19,21].
Exploring the more recent past, [3032] introduced procedures to obtain finite-sample
guarantees using a more general notion of statistical error than the probability of mis-
coverage, motivated by examples where miscoverage is not the natural notion of error.
These ‘distribution-free risk control’ algorithms open new avenues for post-hoc calibra-
tion, most notably by calibrating a base predictor leveraging upper confidence bounds on
the unknown underlying risk function [31] or through multiple hypothesis testing [32], in
the latter case moving into the direction of decision-making. Although the procedures are
similar to split conformal prediction, they rely on entirely different proof techniques and
the relation to split conformal prediction is not obvious.
1.1 Summary and Outline
This thesis reviews split conformal prediction, classical tolerance predictors and distribution-
free risk control within the language of nested prediction sets, and explores the connec-
tions between these procedures. Although no new methodology is proposed, we present
several novel insights.
First, we reformulate known results regarding tolerance regions in the context of split
conformal prediction in the language of nested prediction sets, specifying the value of b
λ
that results in an (, δ)-tolerance region. Furthermore, we expand on the duality between
marginal coverage and tolerance regions, showing that a split conformal prediction set that
satisfies marginal coverage is an (, δ)-tolerance region for certain , δ, and conversely,
that a split conformal (, δ)-tolerance region automatically satisfies marginal coverage for
certain α. This is an extension of [14, Proposition 2a-2b], which only covers the former
relation.
Second, we elaborate on the crucial role of order statistics in the connection between
classical tolerance predictors and split conformal prediction. In particular, we prove that
the coverage of a split conformal prediction set, conditional on the calibration set, is
a random variable stochastically dominating the Beta distribution. To the best of our
knowledge, this result has only been hinted at in [14] and mentioned without proof in
[5] for label populations following a continuous distribution on Y. Interestingly, this
analytical distribution provides an alternative proof technique to obtain marginal coverage
and tolerance regions in split conformal prediction.
Third, our focus with regard to distribution-free risk control lies in getting a better under-
standing of its relation to split conformal prediction. We show that conformal risk control
2
摘要:

UniversityofOxfordDistribution-FreeFinite-SampleGuaranteesandSplitConformalPredictionbyRoelHulsmanSt.Anne'sCollegeAdissertationsubmittedinpartialfullmentofthedegreeofMasterofScienceinStatisticalScience.DepartmentofStatistics,24–29StGiles,Oxford,OX13LBSeptember2022AbstractModernblack-boxpredictivemo...

展开>> 收起<<
Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction.pdf

共41页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:41 页 大小:1.44MB 格式:PDF 时间:2025-05-04

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 41
客服
关注