1 Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based

2025-04-28 0 0 2.97MB 22 页 10玖币

侵权投诉

Few-Shot Calibration of Set Predictors via

Meta-Learned Cross-Validation-Based

Conformal Prediction

Sangwoo Park, Kﬁr M. Cohen, Osvaldo Simeone

Abstract

Conventional frequentist learning is known to yield poorly calibrated models that fail to reliably quantify the

uncertainty of their decisions. Bayesian learning can improve calibration, but formal guarantees apply only under

restrictive assumptions about correct model speciﬁcation. Conformal prediction (CP) offers a general framework

for the design of set predictors with calibration guarantees that hold regardless of the underlying data generation

mechanism. However, when training data are limited, CP tends to produce large, and hence uninformative, predicted

sets. This paper introduces a novel meta-learning solution that aims at reducing the set prediction size. Unlike prior

work, the proposed meta-learning scheme, referred to as meta-XB, (i) builds on cross-validation-based CP, rather

than the less efﬁcient validation-based CP; and (ii) preserves formal per-task calibration guarantees, rather than

less stringent task-marginal guarantees. Finally, meta-XB is extended to adaptive non-conformal scores, which are

shown empirically to further enhance marginal per-input calibration.

Index Terms

Conformal prediction, meta-learning, cross-validation-based conformal prediction, set prediction, calibration.

I. INTRODUCTION

A. Context and Motivation

In modern application of artiﬁcial intelligence (AI), calibration is often deemed as important as the

standard criterion of (average) accuracy [1]. A well-calibrated model is one that can reliably quantify

the uncertainty of its decisions [2], [3]. Information about uncertainty is critical when access to data is

limited and AI decisions are to be acted on by human operators, machines, or other algorithms. Recent

Sangwoo Park, Kﬁr M. Cohen, and Osvaldo Simeone are with King’s Communication, Learning, & Information Processing (KCLIP) lab,

Department of Engineering, King’s College London, London WC2R 2LS, U.K.

E-mail: sangwoo.park@kcl.ac.uk

Code is available at https://github.com/kclip/meta-XB.

arXiv:2210.03067v1 [stat.ML] 6 Oct 2022

Fig. 1. Illustration of proposed meta-learned cross-validation-based CP (XB-CP) scheme, referred to as meta-XB. The example refers to the

problem of classifying received radio signals depending on the modulation scheme used to generate it, e.g., QPSK or FM [11], [12]. Based

on data from multiple tasks, meta-XB optimizes a hyperparameter vector ξ∗by minimizing the average set prediction size. As compared to

conventional XB, shown on the top-right part of the ﬁgure, which uses a ﬁxed hyperparameter vector ξ, meta-XB can achieve reduced set

prediction size, while maintaining the per-task validity property (1).

work on calibration for AI has focused on Bayesian learning, or related ensembling methods, as means

to quantify epistemic uncertainty [4]–[7]. However, recent studies have shown the limitations of Bayesian

learning when the assumed model likelihood or prior distribution are misspeciﬁed [8]. Furthermore, exact

Bayesian learning is computationally infeasible, calling for approximations such as Monte Carlo (MC)

sampling [9] and variational inference (VI) [10]. Overall, under practical conditions, Bayesian learning

does not provide formal guarantees of calibration.

Conformal prediction (CP) [13] provides a general framework for the calibration of (frequentist or

Bayesian) probabilistic models. The formal calibration guarantees provided by CP hold irrespective of the

(unknown) data distribution, as long as the available data samples and the test samples are exchangeable

– a weaker requirement than the standard i.i.d. assumption. As illustrated in Fig. 1, CP produces set

predictors that output a subset of the output space Yfor each input x, with the property that the set

contains the true output value with probability no smaller than a desired value 1−αfor α∈[0,1].

Mathematically, for a given learning task τ, assume that we are given a data set Dτwith Nτsamples,

i.e., Dτ={zτ[i]}Nτ

i=1, where the ith sample zτ[i] = (xτ[i], yτ[i]) contains input xτ[i]∈ Xτand target

yτ[i]∈ Yτ. CP provides a set predictor Γ(·|Dτ, ξ) : Xτ→2Yτ, speciﬁed by a hyperparameter vector ξ,

that maps an input xτ∈ Xτto a subset of the output domain Yτbased on a data set Dτ. Calibration

amounts to the per-task validity condition

P(yτ∈Γ(xτ|Dτ, ξ)) ≥1−α, (1)

which indicates that the set predictor Γ(xτ|Dτ, ξ)contains the true target yτwith probability at least 1−α.

In (1), the probability P(·)is taken over the ground-truth, exchangeable, joint distribution p(Dτ, zτ), and

bold letters represent random variables.

The most common form of CP, referred to as validation-based CP (VB-CP), splits the data set into train-

ing and validation subsets [13]. The validation subset is used to calibrate the set prediction ΓVB

α(xτ|Dτ, ξ)

on a test example xτfor a given desired miscoverage level αin (1). The drawback of this approach is that

validation data is not used for training, resulting in inefﬁcient set predictors ΓVB

α(xτ|Dτ, ξ)in the presence

of a limited number Nτof data samples. The average size of a set predictor Γ(xτ|Dτ, ξ), referred to as

inefﬁciency, is deﬁned as

Lτ(ξ) = EΓ(xτ|Dτ, ξ),(2)

where the average is taken with respect to the ground-truth joint distribution p(Dτ, zτ).

A more efﬁcient CP set predictor was introduced by [14] based on cross-validation. The cross-validation-

based CP (XB-CP) set predictor ΓK-XB

α(xτ|Dτ, ξ)splits the data set Dτinto Kfolds to effectively use

the available data for both training and calibration. XB-CP can also satisfy the per-task validity condition

(1)1.

Further improvements in efﬁciency can be obtained via meta-learning [15]. Meta-learning jointly

processes data from multiple learning tasks, say τ1, . . . , τT, which are assumed to be drawn i.i.d. from a task

distribution p(τ). These data are used to optimize the hyperparameter ξof the set predictor Γ(xτ|Dτ, ξ)

to be used on a new task τ∼p(τ). Speciﬁcally, reference [16] introduced a meta-learning-based method

that modiﬁes VB-CP. The resulting meta-VB algorithm satisﬁes a looser validity condition with respect

to the per-task inequality (1), in which the probability in (1) is no smaller than 1−αonly on average

with respect to the task distribution p(τ).

B. Main Contributions

In this paper, we introduce a novel meta-learning approach, termed meta-XB, with the aim of reducing

the inefﬁciency (2) of XB-CP, while preserving, unlike [16], the per-task validity condition (1) for every

task τ. Furthermore, we incorporate in the design of meta-XB the adaptive nonconformity (NC) scores

introduced in [18]. As argued in [18] for conventional CP, adaptive NC scores are empirically known to

improve the per-task conditional validity condition

P(yτ∈Γ(xτ|Dτ, ξ)|xτ=xτ)≥1−α. (3)

This condition is signiﬁcantly stronger than (1) as it holds for any test input xτ. A summary of the

considered CP schemes can be found in Fig. 2.

1We refer here in particular to the jackknife-mm scheme presented in Section 2.2 of [14].

Fig. 2. Conformal prediction (CP)-based set predictors in the presence of limited data samples: Validation-based CP (VB-CP) [13] and the

more efﬁcient cross-validation-based CP (XB-CP) [14] provide set predictors that satisfy the per-task validity condition (1); while previous

works on meta-learning for VB-CP [16], [17], which aims at improving efﬁciency, do not offer validity guarantees when conditioning on a

given task τ. In contrast, the proposed meta-XB algorithm outputs efﬁcient set predictors with guaranteed per-task validity. By incorporating

adaptive NC scores [18], meta-XB can also empirically improve per-input conditional validity (see (3)). The last column illustrates efﬁciency,

per-task validity, and per-task conditional validity for a simple example with possible outputs ygiven by black dots, where the ground-truth

outputs are given by the colored crosses and the corresponding set predictions by circles. Per-task validity (see (1)) holds if the set prediction

includes the ground-truth output with high probability for each task τ; while per-task conditional validity (see (3)) holds when the set

predictor is valid for each input. Conditional validity typically results in prediction sets of different sizes depending on the input [19]–[22].

Inefﬁciency (see (2)) measures the average size of the prediction set.

Overall, the contribution of this work can be summarized as follows:

•We introduce meta-XB, a meta-learning algorithm for XB-CP, that can reduce the average prediction

set size (2) as compared to XB-CP, while satisfying the per-task validity condition (1), unlike existing

meta-learning algorithms for CP;

•We incorporate adaptive NC scores [18] in the design of meta-XB, demonstrating via experiments

that adaptive NC scores can enhance conditional validity as deﬁned by condition (3).

II. DEFINITIONS AND PRELIMINARIES

In this section, we describe necessary background material on CP [13], [23], VB-CP [13], XB-CP [14],

and adaptive NC scores [18].

A. Nonconformity (NC) Scores

At a high level, given an input xτfor some learning task τ, CP outputs a prediction set Γ(xτ|Dτ, ξ)

that includes all outputs y∈ Yτsuch that the pair (xτ, y)conforms well with the examples in the

available data set Dτ={zτ[i]=(xτ[i], yτ[i])}Nτ

i=1. We recall from Section 1 that ξrepresents a vector

of hyperparameter. The key underlying assumption is that data set Dτand test pair zτ= (xτ, yτ)are

realizations of exchangeable random variables Dτand zτ.

Assumption 1: For any learning task τ, data set Dτand a test data point zτare exchangeable random

variables, i.e., the joint distribution p(Dτ, zτ) = p(zτ[1], . . . , zτ[Nτ], zτ)is invariant to any permutation

of the variables {zτ[1],...,zτ[Nτ],zτ}. Mathematically, we have the equality p(zτ[1], . . . , zτ[Nτ+ 1]) =

p(zτ[π(1)], . . . , zτ[π(Nτ+ 1)]) with zτ=zτ[Nτ+ 1], for any permutation operator π(·). Note that the

standard assumption of i.i.d. random variables satisﬁes exchangeability.

CP measures conformity via NC scores, which are generally functions of the hyperparameter vector ξ,

and are deﬁned as follows.

Deﬁnition 1: (NC score) For a given learning task τ, given a data set ˜

Dτ={˜zτ[i] = (˜xτ[i],˜yτ[i])}˜

Nτ

i=1 ⊆

Dτwith ˜

Nτ≤Nτsamples, a nonconformity (NC) score is a function NC(z|˜

Dτ, ξ)that maps the data

set ˜

Dτand any input-output pair z= (x, y)with x∈ Xτand y∈ Yτto a real number while satisfying

the permutation-invariance property NC(z|{˜zτ[1],...,˜zτ[˜

N]}, ξ) = NC(z|{˜zτ[π(1)],...,˜zτ[π(˜

N)]}, ξ)for

any permutation operator π(·).

A good NC score should express how poorly the point (xτ, y)“conforms” to the data set ˜

Dτ. The most

common way to obtain an NC score is via a parametric two-step approach. This involves a training

algorithm deﬁned by a conditional distribution p(φ|˜

Dτ, ξ), which describes the output φ

φof the algorithm

as a function of training data set ˜

Dτ⊆ Dτand hyperparameter vector ξ. This distribution may describe the

output of a stochastic optimization algorithm, such as stochastic gradient descent (SGD), for frequentist

learning, or of a Monte Carlo method for Bayesian learning [24]–[26]. The hyperparameter vector ξmay

determine, e.g., learning rate schedule or initialization.

Deﬁnition 2: (Conventional two-step NC score) For a learning task τ, let `τ(z|φ)represent the loss of

a machine learning model parametrized by vector φon an input-output pair z= (x, y)with x∈ Xτand

y∈ Yτ. Given a training algorithm p(φ|˜

Dτ, ξ)that is invariant to permutation of the training set ˜

Dτ, a

conventional two-step NC score for input-output pair zgiven data set ˜

Dτis deﬁned as

NC(z|˜

Dτ, ξ) := Eφ

φ∼p(φ|˜

Dτ,ξ)`τ(z|φ

φ).(4)

Due to the permutation-invariance of the training algorithm, it can be readily checked that (4) is a valid

NC score as per Deﬁnition 1.

B. Validation-Based Conformal Prediction (VB-CP)

VB-CP [13] divides the data set Dτinto a training data set Dtr

τof Ntr

τsamples and a validation data

set Dval

τof Nval

τsamples with Ntr

τ+Nval

τ=Nτ. It uses the training data set Dtr

τto evaluate the NC scores

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Few-ShotCalibrationofSetPredictorsviaMeta-LearnedCross-Validation-BasedConformalPredictionSangwooPark,KrM.Cohen,OsvaldoSimeoneAbstractConventionalfrequentistlearningisknowntoyieldpoorlycalibratedmodelsthatfailtoreliablyquantifytheuncertaintyoftheirdecisions.Bayesianlearningcanimprovecalibration,bu...

展开>> 收起<<

1 Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: