1 Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based

2025-04-28 0 0 2.97MB 22 页 10玖币
侵权投诉
1
Few-Shot Calibration of Set Predictors via
Meta-Learned Cross-Validation-Based
Conformal Prediction
Sangwoo Park, Kfir M. Cohen, Osvaldo Simeone
Abstract
Conventional frequentist learning is known to yield poorly calibrated models that fail to reliably quantify the
uncertainty of their decisions. Bayesian learning can improve calibration, but formal guarantees apply only under
restrictive assumptions about correct model specification. Conformal prediction (CP) offers a general framework
for the design of set predictors with calibration guarantees that hold regardless of the underlying data generation
mechanism. However, when training data are limited, CP tends to produce large, and hence uninformative, predicted
sets. This paper introduces a novel meta-learning solution that aims at reducing the set prediction size. Unlike prior
work, the proposed meta-learning scheme, referred to as meta-XB, (i) builds on cross-validation-based CP, rather
than the less efficient validation-based CP; and (ii) preserves formal per-task calibration guarantees, rather than
less stringent task-marginal guarantees. Finally, meta-XB is extended to adaptive non-conformal scores, which are
shown empirically to further enhance marginal per-input calibration.
Index Terms
Conformal prediction, meta-learning, cross-validation-based conformal prediction, set prediction, calibration.
I. INTRODUCTION
A. Context and Motivation
In modern application of artificial intelligence (AI), calibration is often deemed as important as the
standard criterion of (average) accuracy [1]. A well-calibrated model is one that can reliably quantify
the uncertainty of its decisions [2], [3]. Information about uncertainty is critical when access to data is
limited and AI decisions are to be acted on by human operators, machines, or other algorithms. Recent
Sangwoo Park, Kfir M. Cohen, and Osvaldo Simeone are with King’s Communication, Learning, & Information Processing (KCLIP) lab,
Department of Engineering, King’s College London, London WC2R 2LS, U.K.
E-mail: sangwoo.park@kcl.ac.uk
Code is available at https://github.com/kclip/meta-XB.
arXiv:2210.03067v1 [stat.ML] 6 Oct 2022
2
Fig. 1. Illustration of proposed meta-learned cross-validation-based CP (XB-CP) scheme, referred to as meta-XB. The example refers to the
problem of classifying received radio signals depending on the modulation scheme used to generate it, e.g., QPSK or FM [11], [12]. Based
on data from multiple tasks, meta-XB optimizes a hyperparameter vector ξby minimizing the average set prediction size. As compared to
conventional XB, shown on the top-right part of the figure, which uses a fixed hyperparameter vector ξ, meta-XB can achieve reduced set
prediction size, while maintaining the per-task validity property (1).
work on calibration for AI has focused on Bayesian learning, or related ensembling methods, as means
to quantify epistemic uncertainty [4]–[7]. However, recent studies have shown the limitations of Bayesian
learning when the assumed model likelihood or prior distribution are misspecified [8]. Furthermore, exact
Bayesian learning is computationally infeasible, calling for approximations such as Monte Carlo (MC)
sampling [9] and variational inference (VI) [10]. Overall, under practical conditions, Bayesian learning
does not provide formal guarantees of calibration.
Conformal prediction (CP) [13] provides a general framework for the calibration of (frequentist or
Bayesian) probabilistic models. The formal calibration guarantees provided by CP hold irrespective of the
(unknown) data distribution, as long as the available data samples and the test samples are exchangeable
– a weaker requirement than the standard i.i.d. assumption. As illustrated in Fig. 1, CP produces set
predictors that output a subset of the output space Yfor each input x, with the property that the set
contains the true output value with probability no smaller than a desired value 1αfor α[0,1].
Mathematically, for a given learning task τ, assume that we are given a data set Dτwith Nτsamples,
i.e., Dτ={zτ[i]}Nτ
i=1, where the ith sample zτ[i] = (xτ[i], yτ[i]) contains input xτ[i]∈ Xτand target
yτ[i]∈ Yτ. CP provides a set predictor Γ(·|Dτ, ξ) : Xτ2Yτ, specified by a hyperparameter vector ξ,
that maps an input xτ∈ Xτto a subset of the output domain Yτbased on a data set Dτ. Calibration
amounts to the per-task validity condition
P(yτΓ(xτ|Dτ, ξ)) 1α, (1)
which indicates that the set predictor Γ(xτ|Dτ, ξ)contains the true target yτwith probability at least 1α.
3
In (1), the probability P(·)is taken over the ground-truth, exchangeable, joint distribution p(Dτ, zτ), and
bold letters represent random variables.
The most common form of CP, referred to as validation-based CP (VB-CP), splits the data set into train-
ing and validation subsets [13]. The validation subset is used to calibrate the set prediction ΓVB
α(xτ|Dτ, ξ)
on a test example xτfor a given desired miscoverage level αin (1). The drawback of this approach is that
validation data is not used for training, resulting in inefficient set predictors ΓVB
α(xτ|Dτ, ξ)in the presence
of a limited number Nτof data samples. The average size of a set predictor Γ(xτ|Dτ, ξ), referred to as
inefficiency, is defined as
Lτ(ξ) = EΓ(xτ|Dτ, ξ),(2)
where the average is taken with respect to the ground-truth joint distribution p(Dτ, zτ).
A more efficient CP set predictor was introduced by [14] based on cross-validation. The cross-validation-
based CP (XB-CP) set predictor ΓK-XB
α(xτ|Dτ, ξ)splits the data set Dτinto Kfolds to effectively use
the available data for both training and calibration. XB-CP can also satisfy the per-task validity condition
(1)1.
Further improvements in efficiency can be obtained via meta-learning [15]. Meta-learning jointly
processes data from multiple learning tasks, say τ1, . . . , τT, which are assumed to be drawn i.i.d. from a task
distribution p(τ). These data are used to optimize the hyperparameter ξof the set predictor Γ(xτ|Dτ, ξ)
to be used on a new task τp(τ). Specifically, reference [16] introduced a meta-learning-based method
that modifies VB-CP. The resulting meta-VB algorithm satisfies a looser validity condition with respect
to the per-task inequality (1), in which the probability in (1) is no smaller than 1αonly on average
with respect to the task distribution p(τ).
B. Main Contributions
In this paper, we introduce a novel meta-learning approach, termed meta-XB, with the aim of reducing
the inefficiency (2) of XB-CP, while preserving, unlike [16], the per-task validity condition (1) for every
task τ. Furthermore, we incorporate in the design of meta-XB the adaptive nonconformity (NC) scores
introduced in [18]. As argued in [18] for conventional CP, adaptive NC scores are empirically known to
improve the per-task conditional validity condition
P(yτΓ(xτ|Dτ, ξ)|xτ=xτ)1α. (3)
This condition is significantly stronger than (1) as it holds for any test input xτ. A summary of the
considered CP schemes can be found in Fig. 2.
1We refer here in particular to the jackknife-mm scheme presented in Section 2.2 of [14].
4
Fig. 2. Conformal prediction (CP)-based set predictors in the presence of limited data samples: Validation-based CP (VB-CP) [13] and the
more efficient cross-validation-based CP (XB-CP) [14] provide set predictors that satisfy the per-task validity condition (1); while previous
works on meta-learning for VB-CP [16], [17], which aims at improving efficiency, do not offer validity guarantees when conditioning on a
given task τ. In contrast, the proposed meta-XB algorithm outputs efficient set predictors with guaranteed per-task validity. By incorporating
adaptive NC scores [18], meta-XB can also empirically improve per-input conditional validity (see (3)). The last column illustrates efficiency,
per-task validity, and per-task conditional validity for a simple example with possible outputs ygiven by black dots, where the ground-truth
outputs are given by the colored crosses and the corresponding set predictions by circles. Per-task validity (see (1)) holds if the set prediction
includes the ground-truth output with high probability for each task τ; while per-task conditional validity (see (3)) holds when the set
predictor is valid for each input. Conditional validity typically results in prediction sets of different sizes depending on the input [19]–[22].
Inefficiency (see (2)) measures the average size of the prediction set.
Overall, the contribution of this work can be summarized as follows:
We introduce meta-XB, a meta-learning algorithm for XB-CP, that can reduce the average prediction
set size (2) as compared to XB-CP, while satisfying the per-task validity condition (1), unlike existing
meta-learning algorithms for CP;
We incorporate adaptive NC scores [18] in the design of meta-XB, demonstrating via experiments
that adaptive NC scores can enhance conditional validity as defined by condition (3).
II. DEFINITIONS AND PRELIMINARIES
In this section, we describe necessary background material on CP [13], [23], VB-CP [13], XB-CP [14],
and adaptive NC scores [18].
A. Nonconformity (NC) Scores
At a high level, given an input xτfor some learning task τ, CP outputs a prediction set Γ(xτ|Dτ, ξ)
that includes all outputs y∈ Yτsuch that the pair (xτ, y)conforms well with the examples in the
5
available data set Dτ={zτ[i]=(xτ[i], yτ[i])}Nτ
i=1. We recall from Section 1 that ξrepresents a vector
of hyperparameter. The key underlying assumption is that data set Dτand test pair zτ= (xτ, yτ)are
realizations of exchangeable random variables Dτand zτ.
Assumption 1: For any learning task τ, data set Dτand a test data point zτare exchangeable random
variables, i.e., the joint distribution p(Dτ, zτ) = p(zτ[1], . . . , zτ[Nτ], zτ)is invariant to any permutation
of the variables {zτ[1],...,zτ[Nτ],zτ}. Mathematically, we have the equality p(zτ[1], . . . , zτ[Nτ+ 1]) =
p(zτ[π(1)], . . . , zτ[π(Nτ+ 1)]) with zτ=zτ[Nτ+ 1], for any permutation operator π(·). Note that the
standard assumption of i.i.d. random variables satisfies exchangeability.
CP measures conformity via NC scores, which are generally functions of the hyperparameter vector ξ,
and are defined as follows.
Definition 1: (NC score) For a given learning task τ, given a data set ˜
Dτ={˜zτ[i] = (˜xτ[i],˜yτ[i])}˜
Nτ
i=1
Dτwith ˜
NτNτsamples, a nonconformity (NC) score is a function NC(z|˜
Dτ, ξ)that maps the data
set ˜
Dτand any input-output pair z= (x, y)with x∈ Xτand y∈ Yτto a real number while satisfying
the permutation-invariance property NC(z|{˜zτ[1],...,˜zτ[˜
N]}, ξ) = NC(z|{˜zτ[π(1)],...,˜zτ[π(˜
N)]}, ξ)for
any permutation operator π(·).
A good NC score should express how poorly the point (xτ, y)“conforms” to the data set ˜
Dτ. The most
common way to obtain an NC score is via a parametric two-step approach. This involves a training
algorithm defined by a conditional distribution p(φ|˜
Dτ, ξ), which describes the output φ
φ
φof the algorithm
as a function of training data set ˜
Dτ⊆ Dτand hyperparameter vector ξ. This distribution may describe the
output of a stochastic optimization algorithm, such as stochastic gradient descent (SGD), for frequentist
learning, or of a Monte Carlo method for Bayesian learning [24]–[26]. The hyperparameter vector ξmay
determine, e.g., learning rate schedule or initialization.
Definition 2: (Conventional two-step NC score) For a learning task τ, let `τ(z|φ)represent the loss of
a machine learning model parametrized by vector φon an input-output pair z= (x, y)with x∈ Xτand
y∈ Yτ. Given a training algorithm p(φ|˜
Dτ, ξ)that is invariant to permutation of the training set ˜
Dτ, a
conventional two-step NC score for input-output pair zgiven data set ˜
Dτis defined as
NC(z|˜
Dτ, ξ) := Eφ
φ
φp(φ|˜
Dτ)`τ(z|φ
φ
φ).(4)
Due to the permutation-invariance of the training algorithm, it can be readily checked that (4) is a valid
NC score as per Definition 1.
B. Validation-Based Conformal Prediction (VB-CP)
VB-CP [13] divides the data set Dτinto a training data set Dtr
τof Ntr
τsamples and a validation data
set Dval
τof Nval
τsamples with Ntr
τ+Nval
τ=Nτ. It uses the training data set Dtr
τto evaluate the NC scores
摘要:

1Few-ShotCalibrationofSetPredictorsviaMeta-LearnedCross-Validation-BasedConformalPredictionSangwooPark,KrM.Cohen,OsvaldoSimeoneAbstractConventionalfrequentistlearningisknowntoyieldpoorlycalibratedmodelsthatfailtoreliablyquantifytheuncertaintyoftheirdecisions.Bayesianlearningcanimprovecalibration,bu...

展开>> 收起<<
1 Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:2.97MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注