Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction

2025-05-04 0 0 1.44MB 41 页 10玖币

侵权投诉

University of Oxford

Distribution-Free Finite-Sample

Guarantees and Split Conformal

Prediction

Roel Hulsman

St. Anne’s College

A dissertation submitted in partial fulﬁlment of the degree of Master of

Science in Statistical Science.

Department of Statistics, 24–29 St Giles,

Oxford, OX1 3LB

September 2022

arXiv:2210.14735v1 [stat.ML] 26 Oct 2022

Abstract

Modern black-box predictive models are often accompanied by weak perform-

ance guarantees that only hold asymptotically in the size of the dataset or require

strong parametric assumptions. In response to this, split conformal prediction repres-

ents a promising avenue to obtain ﬁnite-sample guarantees under minimal distribution-

free assumptions. Although prediction set validity most often concerns marginal

coverage, we explore the related but different guarantee of tolerance regions, refor-

mulating known results in the language of nested prediction sets and extending on the

duality between marginal coverage and tolerance regions. Furthermore, we highlight

the connection between split conformal prediction and classical tolerance predictors

developed in the 1940s, as well as recent developments in distribution-free risk con-

trol. One result that transfers from classical tolerance predictors is that the coverage

of a prediction set based on order statistics, conditional on the calibration set, is a

random variable stochastically dominating the Beta distribution. We demonstrate

the empirical effectiveness of our ﬁndings on synthetic and real datasets using a pop-

ular split conformal prediction procedure called conformalized quantile regression

(CQR).

Contents

1 Introduction 1

1.1 Summary and Outline ............................ 2

2 Distribution-Free Finite-Sample Guarantees 3

2.1 Distribution-Free Predictive Inference ................... 3

2.2 Marginal Coverage ............................. 4

2.3 Tolerance Regions .............................. 5

2.4 Risk Control ................................. 6

3 Conformal Prediction 7

3.1 Split Conformal Prediction ......................... 7

3.2 Marginal Coverage and Split Conformal Prediction ............ 8

3.3 Tolerance Regions and Split Conformal Prediction ............ 9

3.4 Conformalized Quantile Regression .................... 10

4 Distribution of Coverage 11

4.1 Classical Tolerance Predictors ....................... 12

4.2 Simple Tolerance Predictors of Order Statistics .............. 12

4.3 Distribution of Coverage for Split Conformal Prediction .......... 13

4.4 Distribution of Empirical Coverage ..................... 15

5 Distribution-Free Risk Control 16

5.1 Conformal Risk Control .......................... 17

5.2 Upper Conﬁdence Bound Calibration ................... 18

5.3 Learn Then Test ............................... 20

6 Experiments 21

6.1 Conformalized Quantile Random Forests ................. 21

6.2 Synthetic Example ............................. 22

6.3 Calibrating a Given Base Predictor ..................... 24

7 Conclusion 25

References 27

A Proofs 34

A.1 Proposition 1 ................................ 34

A.2 Proposition 2 ................................ 34

A.3 Proposition 3 ................................ 34

A.4 Proposition 4 ................................ 36

A.5 Proposition 5 ................................ 37

A.6 Proposition 6 ................................ 37

B Computational Tables 38

1 Introduction

Black-box predictive models have become popular tools in the advent of large datasets

and cheap computing resources. However, the predictive performance of these models is

usually subject to weak statistical guarantees that only hold asymptotically in the size of

the dataset or require strong parametric assumptions on the data generating process. Both

might be unrealistic in practice, therefore the deployment of black-box predictive models

is challenging in contexts where safety is key, such as medicine.

A particular line of research aiming to improve this situation is called post-hoc calibra-

tion. Consider an arbitrary black-box predictor ﬁtted on a proper training set, where the

base predictor is black-box in the sense that we do not seek to understand or modify its

behaviour, but instead wrap it into a larger post-hoc calibration algorithm. The purpose

of such an algorithm is to calibrate the base predictor so that it satisﬁes some rigorous

statistical guarantees under minimal assumptions. We speciﬁcally consider ﬁnite-sample

guarantees while making no assumptions on the distribution of the underlying data, en-

tering the ﬁeld of distribution-free predictive inference.

Conformal prediction is a general framework to construct prediction sets that satisfy some

distribution-free ﬁnite-sample guarantee under the assumption of iid data [1,2]. The

widely studied adaptation focused on in this thesis is called split conformal prediction

[3,4]. For a gentle introduction into conformal prediction we refer to [5] and for a more

technical tutorial to [6]. Conformal prediction has been applied in various contexts, such

as drug discovery [7], image classiﬁcation [8], natural language processing [9–11] and

voting during the 2020 US presidential election [12]. Although traditionally conformal

prediction starts with the deﬁnition of a non-conformity score, we follow an alternative

but equivalent interpretation leveraging nested prediction sets [13].

Given the base predictor and a calibration set {(Xi, Yi)}n

i=1, we are interested in predicting

the label Yn+1 ∈ Y corresponding to a new feature Xn+1 ∈ X , while quantifying the cor-

responding prediction uncertainty. Simply put, the split conformal prediction algorithm

inputs the base predictor and the calibration set and outputs a prediction set Sb

λ(Xn+1)

that contains Yn+1 with some distribution-free ﬁnite-sample guarantee of certainty. Note

λ(Xn+1)is indexed by a random variable b

λ∈Λthat determines the size of the set,

where Λ⊂R∪ {±∞} is some closed set.

The most commonly used ﬁnite-sample guarantee in distribution-free predictive infer-

ence is that of marginal coverage, guaranteeing that a prediction set Sb

λ(Xn+1)contains

Yn+1 with a pre-speciﬁed conﬁdence level 1−α∈(0,1) on average over the iid sample

{(Xi, Yi)}n+1

i=1 . It is a well-known result what value of b

λyields marginal coverage in

split conformal prediction [3,4]. The guarantee of a tolerance region1slightly differs

from marginal coverage, taking explicitly into account that the calibration set is random

and thus that the coverage of Sb

λ(Xn+1), conditional on the calibration set, is a random

variable. Sb

λ(Xn+1)is an (, δ)-tolerance region if it contains at least a pre-speciﬁed pro-

portion 1−∈(0,1) of the label population Ywith at least a pre-speciﬁed probability

1−δ∈(0,1) over the calibration data, see Section 2.3 for a formal deﬁnition. The con-

1Also known as training conditional validity [14] or probably approximately correct (PAC) coverage

[15–17], the latter notion arising from statistical learning theory [18].

nection between marginal coverage and tolerance regions in the context of split conformal

prediction is considered in [14].

While conformal prediction has been pioneered in roughly the last two decades, proced-

ures to construct tolerance regions based on order statistics have been studied extensively

since the 1940s [19–27]. For a concise treatment we refer to [28] and for a more recent

review to [29]. [14] ﬁrst studied the connection between these ‘classical tolerance predict-

ors’ and split conformal prediction, stating the property of prediction set validity in the

traditional statistical language of classical tolerance predictors and interpreting split con-

formal prediction as a ‘conditional’ version of the classical tolerance predictors proposed

in [19,21].

Exploring the more recent past, [30–32] introduced procedures to obtain ﬁnite-sample

guarantees using a more general notion of statistical error than the probability of mis-

coverage, motivated by examples where miscoverage is not the natural notion of error.

These ‘distribution-free risk control’ algorithms open new avenues for post-hoc calibra-

tion, most notably by calibrating a base predictor leveraging upper conﬁdence bounds on

the unknown underlying risk function [31] or through multiple hypothesis testing [32], in

the latter case moving into the direction of decision-making. Although the procedures are

similar to split conformal prediction, they rely on entirely different proof techniques and

the relation to split conformal prediction is not obvious.

1.1 Summary and Outline

This thesis reviews split conformal prediction, classical tolerance predictors and distribution-

free risk control within the language of nested prediction sets, and explores the connec-

tions between these procedures. Although no new methodology is proposed, we present

several novel insights.

First, we reformulate known results regarding tolerance regions in the context of split

conformal prediction in the language of nested prediction sets, specifying the value of b

that results in an (, δ)-tolerance region. Furthermore, we expand on the duality between

marginal coverage and tolerance regions, showing that a split conformal prediction set that

satisﬁes marginal coverage is an (, δ)-tolerance region for certain , δ, and conversely,

that a split conformal (, δ)-tolerance region automatically satisﬁes marginal coverage for

certain α. This is an extension of [14, Proposition 2a-2b], which only covers the former

relation.

Second, we elaborate on the crucial role of order statistics in the connection between

classical tolerance predictors and split conformal prediction. In particular, we prove that

the coverage of a split conformal prediction set, conditional on the calibration set, is

a random variable stochastically dominating the Beta distribution. To the best of our

knowledge, this result has only been hinted at in [14] and mentioned without proof in

[5] for label populations following a continuous distribution on Y. Interestingly, this

analytical distribution provides an alternative proof technique to obtain marginal coverage

and tolerance regions in split conformal prediction.

Third, our focus with regard to distribution-free risk control lies in getting a better under-

standing of its relation to split conformal prediction. We show that conformal risk control

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UniversityofOxfordDistribution-FreeFinite-SampleGuaranteesandSplitConformalPredictionbyRoelHulsmanSt.Anne'sCollegeAdissertationsubmittedinpartialfullmentofthedegreeofMasterofScienceinStatisticalScience.DepartmentofStatistics,2429StGiles,Oxford,OX13LBSeptember2022AbstractModernblack-boxpredictivemo...

展开>> 收起<<

Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction.pdf

共41页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: