Towards Practical Few-Shot Query Sets Transductive Minimum Description Length Inference Ségolène Martin

2025-05-06 1 0 1.56MB 16 页 10玖币

侵权投诉

Towards Practical Few-Shot Query Sets:

Transductive Minimum Description Length Inference

Ségolène Martin

Université Paris-Saclay, Inria,

CentraleSupélec, CVN

Malik Boudiaf

ÉTS Montreal

Emilie Chouzenoux∗

Université Paris-Saclay, Inria,

CentraleSupélec, CVN

Jean-Christophe Pesquet†

Université Paris-Saclay, Inria,

CentraleSupélec, CVN

Ismail Ben Ayed‡

ÉTS Montreal

Abstract

Standard few-shot benchmarks are often built upon simplifying assumptions on the

query sets, which may not always hold in practice. In particular, for each task at

testing time, the classes effectively present in the unlabeled query set are known

a priori, and correspond exactly to the set of classes represented in the labeled

support set. We relax these assumptions and extend current benchmarks, so that the

query-set classes of a given task are unknown, but just belong to a much larger set of

possible classes. Our setting could be viewed as an instance of the challenging yet

practical problem of extremely imbalanced

-way classiﬁcation,

being much

larger than the values typically used in standard benchmarks, and with potentially

irrelevant supervision from the support set. Expectedly, our setting incurs drops in

the performances of state-of-the-art methods. Motivated by these observations, we

introduce a

rim

ual Minimum

escription

ngth (

PADDLE

) formulation,

which balances data-ﬁtting accuracy and model complexity for a given few-shot

task, under supervision constraints from the support set. Our constrained MDL-like

objective promotes competition among a large set of possible classes, preserving

only effective classes that beﬁt better the data of a few-shot task. It is hyper-

parameter free, and could be applied on top of any base-class training. Furthermore,

we derive a fast block coordinate descent algorithm for optimizing our objective,

with convergence guarantee, and a linear computational complexity at each iteration.

Comprehensive experiments over the standard few-shot datasets and the more

realistic and challenging i-Nat dataset show highly competitive performances of

our method, more so when the numbers of possible classes in the tasks increase. Our

code is publicly available at

https://github.com/SegoleneMartin/PADDLE

1 Introduction

The performance of deep learning models is often seriously affected when tackling new tasks with

limited supervision, i.e., classes that were unobserved during training and for which we have only a

handful of labeled examples. Few-shot learning [

] focuses on this generalization challenge,

which occurs in a breadth of applications. In standard few-shot settings, a deep network is ﬁrst trained

∗

E. Chouzenoux acknowledges support from the European Research Council Starting Grant MAJORIS

ERC-2019-STG-850925.

†The work of J.-C. Pesquet is supported by the ANR Chair in AI BRIDGEABLE.

‡

The work of I. Ben Ayed is supported by the DATAIA Institute, and is part of his sabbatical-leave visit at

the Université Paris-Saclay.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.14545v1 [cs.LG] 26 Oct 2022

on a large-scale dataset including labeled instances sampled from an initial set of classes, commonly

referred to as the base classes. Subsequently, for novel classes, unobserved during the base training,

supervision is restricted to a limited number of labeled instances per class. During the test phase,

few-shot methods are evaluated over individual tasks, each including a small batch of unlabeled test

samples (the query set) and a few labeled instances per novel class (the support set).

The transduction approach has becomes increasingly popular in few-shot learning, and a large body

of recent methods focused on this setting, including, for instance, those based on graph regularization

[

], optimal transport [

], feature transformations [

], information maximization [

]

and transductive batch normalization[

], among other works [

]. Transductive

few-shot classiﬁers make joint predictions for the batch of query samples of each few-shot task,

independently of the other tasks. Unlike inductive inference, in which prediction is made for one

testing sample at a time, transduction accounts for the statistics of the query set of a task

, typically

yielding substantial improvements in performance. On standard benchmarks, the gap in classiﬁcation

accuracy between transductive and inductive few-shot methods may reach

10%

; see [

], for example.

This connects with a well-known fact in classical literature on transductive inference [

which prescribes transduction as an effective way to mitigate the difﬁculty inherent to learning from

limited labels. It is worth mentioning that transductive methods inherently depend on the statistical

properties of the query sets. For instance, the recent studies in [

] showed that variations in the

class balance within the query set may affect the performances of transductive methods.

Transductive few-shot classiﬁcation occurs in a variety of practical scenarios, in which we naturally

have access to a batch of unlabeled samples at test time. In commonly used few-shot benchmarks,

the query set of each task is small (less than

100

examples), and is sampled from a limited number

of classes (typically 5). Those choices are relevant in practice: During test time, one typically has

access to small unlabeled batches of potentially correlated (non-i.i.d.) samples, e.g., smart-device

photos taken at a given time, video-stream sequences, or in pixel prediction tasks such as semantic

segmentation [

], where only a handful of classes appear in the testing batch. However, the standard

few-shot benchmarks are built upon further assumptions that may not always hold in practice: the

few classes effectively present in the unlabeled query set are assumed both to be known beforehand

and to match exactly the set of classes of the labeled support set. We relax these assumptions, and

extend current benchmarks so that the query-set classes are unknown and do not match exactly the

support-set classes, but just belong to a much larger set of possible classes. Speciﬁcally, we allow the

total number of classes represented in the support set to be higher than typical values, while keeping

the number of classes that are effectively present in the query set to be much smaller.

Our challenging yet practical setting raises several difﬁculties for state-of-the-art transductive few-

shot classiﬁers: (i) it is an instance of the problem of highly imbalanced classiﬁcation; (ii) the labeled

support set includes “distraction” classes that may not actually be present in the test query samples;

(iii) we consider

-way classiﬁcation tasks with

much larger than the typical values used in the

current benchmarks. We evaluated

of the best-performing state-of-the-art transductive few-shot

methods and, expectedly, observed drops in their performance in this challenging setting.

Motivated by these experimental observations, we introduce a Minimum Description Length (MDL)

inference, which balances data-ﬁtting accuracy and model complexity for a given few-shot task,

subject to supervision constraints from the support set. The model-complexity term can be viewed as

a continuous relaxation of a discrete label cost, which penalizes the number of non-empty clusters

in the solution, ﬁtting the data of a given task with as few unique labels as necessary. It encourages

competition among a large set of possible classes, preserving only those that ﬁt better the task.

Our formulation is hyper-parameter free, and could be applied on top of any base-class training.

Furthermore, we derive a fast primal-dual block coordinate descent algorithm for optimizing our

objective, with convergence guarantee, and a linear computational complexity at each iteration

thanks to closed-form updates of the variables. We report comprehensive experiments and ablation

studies on mini-Imagenet, tiered-Imagenet, and the more realistic and challenging iNat dataset

[

] for ﬁne-grained classiﬁcation, with

908

classes and

227

ways at test-time. Our method yields

competitive performances in comparison to the state-of-the-art, with gaps increasing substantially

with the numbers of possible classes in the tasks.

Note that the transductive setting is different from semi-supervised few-shot learning [

] that uses extra

data. The only difference between inductive and tranductive inference is that predictions are made jointly for the

query set of a task, rather than one sample at a time.

2 Few-shot setting and task generation

Base training

Let

Dbase ={xn,yn}|Dbase|

n=1

denotes the base dataset, where each

xn∈ Xbase

is a

sample from some input space

yn∈ {0,1}|Ybase|

the associated ground-truth label, and

Ybase

the

set of base classes. Base training learns a feature extractor

fφ:X → Z

, with parameters

and

lower-dimensional space. For this stage, an abundant few-shot literature adopts episodic training,

which views

Dbase

as a series of tasks (or episodes) so as to simulate testing time. Then, a meta-learner

is devised to produce the predictions. However, it has been widely established over recent years that

a basic training, followed by transfer-learning strategies, outperforms most meta-learning methods

[24, 25, 26, 4, 11]. Hence, we adopt a standard cross-entropy training in this work.

Evaluation

Evaluation is carried out over few-shot tasks, each containing samples from

Dtest =

{xn,yn}|Dtest|

n=1

, where

yn∈ {0,1}|Ytest|

, with constraint

Ybase ∩ Ytest =∅

, i.e., the test and base

classes are distinct. Each task includes a labelled support set

S={xn,yn}n∈S

and an unlabelled

query set

Q={xn}n∈Q

, both containing examples from classes in

Ytest

. Using the feature extractor

fφ

trained on base data, the goal is to predict the classes of the unlabeled samples in

for each

few-shot task, independently of the other tasks.

Task generation

For a given task, let

denote the total number of possible classes that are

represented in the labeled support set

, and

Keff

the number of classes that appear effectively in

unlabeled query set

(i.e. classes represented by at least one sample). The standard task generation

protocol assumes that the set of effective classes in

matches exactly the set of classes in

, i.e.

K=Keff  |Ytest|

, which amounts to knowing exactly the few classes that appear in the test samples.

Then, a ﬁxed number of instances per class are sampled for each query set, forcing class balance. We

relax these assumptions, and propose a setting where the set of effective classes in

is not known

exactly and belongs to a much larger set of possible classes. First, we allow the total number of

possible classes

to be higher than typical values, with

K=|Ytest|

, while keeping the number of

effective classes in the query set to be typically much smaller:

Keff  |Ytest|=K

. For instance,

in our experiments,

can go up to

227

. This, as expected, results in

-way problems that are

more challenging than the standard 5-way tasks used in the few-shot literature. Secondly, once the

effective classes are ﬁxed, and given a budget of images, we sample the query set under the data joint

distribution (i.e. uniform draws among all available examples), so that the statistics of the sampled

query sets more faithfully reﬂect the natural distribution of classes. Figure 1 illustrates our framework

(base training, inference and task generation).

3 Proposed few-shot inference formulation

For a given few-shot task, let

Q={xn}n∈IQ

and

S={xn,yn}n∈IS

be two subsets of

Dtest

such

that

Q∩S=∅

. Let

N=|Q|+|S|

. Up to a reordering, we can suppose that

IQ={1,...,|Q|}

and

IS={|Q|+ 1, . . . , N}

. We denote

zn=fφ(xn)

the feature vector corresponding to the data

sample

, and

un= (un,k)1≤k≤K∈ {0,1}K

the variable assigning the data to one of the possible

classes in

{1, . . . , K}

, i.e.

un,k = 1

belongs to class

and

otherwise. We deﬁne the variables

giving the class proportions ˆu= (ˆuk)1≤k≤K∈∆Kas

ˆuk=1

|Q|

n=1

un,k ∀k∈ {1, . . . , K},(1)

where ∆Kis the unit simplex of RK.

We propose to cast transductive few-shot inference as the minimization of an objective balancing

data-ﬁtting accuracy and partition complexity, subject to supervision constraints (known labels) from

the support set

. Our approach estimates jointly

U= (un)1≤n≤N

, which deﬁnes a partition of

Figure 1: Overview of the proposed framework: Base training, PADDLE inference and task genera-

tion. The example depicted in the right-hand side illustrates how the support and query classes do not

match exactly, unlike in standard few-shot settings. The support set includes “distraction” classes that

may not actually be present in the query set, e.g. classes “Alligator”, “Peacock” and “Butterﬂy”. All

possible classes (

classes) are represented in the support set, but only an unknown subset among

these Kpossible classes (Keff effective classes) appear in the query set, with Keff K.

S∪Q, and the class prototypes W= (wk)1≤k≤K∈(Rd)Kthrough the following problem:

minimize

U,W

k=1

n=1

un,kkwk−znk2

| {z }

data-ﬁtting accuracy

−λ

k=1

ˆukln(ˆuk)

| {z }

partition complexity

,(2)

s.t un∈∆K∀n∈ {1,...,|Q|},

un=yn∀n∈ {|Q|+ 1, . . . , N}.(C)

Note that, in the second line of

(2)

, we have relaxed the integer constraints on query assignments to

facilitate optimization.

Effect of each term

The purpose of objective

(2)

is to classify the data of a few-shot task with as

few unique labels as necessary. The data-ﬁtting term has the same form as the standard

-means

objective for clustering, but is constrained with supervision from the support-set labels. This term

evaluates, within each class

, the deviation of features from class prototype

, thereby encouraging

consistency of samples of the same class. The partition-complexity term implicitly penalizes the

number of effective (non-empty) classes that appear in the solution (

Keff ≤K

), encouraging low

cardinality partitions of query set

. This term is the Shannon entropy of class proportions within

: it reaches its minimum when all the samples of

belong to a single class, i.e.,

∃j∈ {1, . . . , K}

such that

ˆuj= 1

and all other proportions vanish. It achieves its maximum for perfectly balanced

partitions of

, satisfying

ˆuk= 1/K

for all

. In practice, this terms promotes solutions that contain

only a handful of effective classes among a larger set of Kpossible classes.

Connection to Minimum Description Length (MDL)

The objective in

(2)

could be viewed as a

partially-supervised instantiation of the general MDL principle. Originated in information theory,

MDL is widely used in statistical model selection [

]. Assume that we want to describe some

input data Z= (zn)1≤n≤|Q|with a statistical model M, among a family of possible models. MDL

prescribes that the best model corresponds to the shortest description of the data, according to some

coding scheme underlying the model. It balances data-ﬁtting accuracy and model complexity by

minimizing

L(Z|M) + λL(M)

w.r.t

L(Z|M)

measures the code length of the prediction of

made by

, while

L(M)

encourages simpler models (Occam’s razor [

]), e.g., models with less

parameters. For instance, in unsupervised clustering, it is common to minimize a discrete label cost as

a measure of model complexity [

]. Such a label cost is the number of effective (non-empty)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsPracticalFew-ShotQuerySets:TransductiveMinimumDescriptionLengthInferenceSégolèneMartinUniversitéParis-Saclay,Inria,CentraleSupélec,CVNMalikBoudiafÉTSMontrealEmilieChouzenouxUniversitéParis-Saclay,Inria,CentraleSupélec,CVNJean-ChristophePesquetyUniversitéParis-Saclay,Inria,CentraleSupélec,CVN...

展开>> 收起<<

Towards Practical Few-Shot Query Sets Transductive Minimum Description Length Inference Ségolène Martin.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Practical Few-Shot Query Sets Transductive Minimum Description Length Inference Ségolène Martin

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: