
1 Introduction
Black-box predictive models have become popular tools in the advent of large datasets
and cheap computing resources. However, the predictive performance of these models is
usually subject to weak statistical guarantees that only hold asymptotically in the size of
the dataset or require strong parametric assumptions on the data generating process. Both
might be unrealistic in practice, therefore the deployment of black-box predictive models
is challenging in contexts where safety is key, such as medicine.
A particular line of research aiming to improve this situation is called post-hoc calibra-
tion. Consider an arbitrary black-box predictor fitted on a proper training set, where the
base predictor is black-box in the sense that we do not seek to understand or modify its
behaviour, but instead wrap it into a larger post-hoc calibration algorithm. The purpose
of such an algorithm is to calibrate the base predictor so that it satisfies some rigorous
statistical guarantees under minimal assumptions. We specifically consider finite-sample
guarantees while making no assumptions on the distribution of the underlying data, en-
tering the field of distribution-free predictive inference.
Conformal prediction is a general framework to construct prediction sets that satisfy some
distribution-free finite-sample guarantee under the assumption of iid data [1,2]. The
widely studied adaptation focused on in this thesis is called split conformal prediction
[3,4]. For a gentle introduction into conformal prediction we refer to [5] and for a more
technical tutorial to [6]. Conformal prediction has been applied in various contexts, such
as drug discovery [7], image classification [8], natural language processing [9–11] and
voting during the 2020 US presidential election [12]. Although traditionally conformal
prediction starts with the definition of a non-conformity score, we follow an alternative
but equivalent interpretation leveraging nested prediction sets [13].
Given the base predictor and a calibration set {(Xi, Yi)}n
i=1, we are interested in predicting
the label Yn+1 ∈ Y corresponding to a new feature Xn+1 ∈ X , while quantifying the cor-
responding prediction uncertainty. Simply put, the split conformal prediction algorithm
inputs the base predictor and the calibration set and outputs a prediction set Sb
λ(Xn+1)
that contains Yn+1 with some distribution-free finite-sample guarantee of certainty. Note
Sb
λ(Xn+1)is indexed by a random variable b
λ∈Λthat determines the size of the set,
where Λ⊂R∪ {±∞} is some closed set.
The most commonly used finite-sample guarantee in distribution-free predictive infer-
ence is that of marginal coverage, guaranteeing that a prediction set Sb
λ(Xn+1)contains
Yn+1 with a pre-specified confidence level 1−α∈(0,1) on average over the iid sample
{(Xi, Yi)}n+1
i=1 . It is a well-known result what value of b
λyields marginal coverage in
split conformal prediction [3,4]. The guarantee of a tolerance region1slightly differs
from marginal coverage, taking explicitly into account that the calibration set is random
and thus that the coverage of Sb
λ(Xn+1), conditional on the calibration set, is a random
variable. Sb
λ(Xn+1)is an (, δ)-tolerance region if it contains at least a pre-specified pro-
portion 1−∈(0,1) of the label population Ywith at least a pre-specified probability
1−δ∈(0,1) over the calibration data, see Section 2.3 for a formal definition. The con-
1Also known as training conditional validity [14] or probably approximately correct (PAC) coverage
[15–17], the latter notion arising from statistical learning theory [18].
1