Controller-Guided Partial Label Consistency Regularization with Unlabeled Data Qian-Wei Wang12 Bowen Zhao1 Mingyan Zhu1 Tianxiang Li1 Zimo Liu2 Shu-Tao Xia1 2 1Tsinghua Shenzhen International Graduate School Tsinghua University

2025-05-06 0 0 813.88KB 11 页 10玖币

侵权投诉

Controller-Guided Partial Label Consistency Regularization with Unlabeled Data

Qian-Wei Wang1,2, Bowen Zhao1, Mingyan Zhu1, Tianxiang Li1, Zimo Liu2, Shu-Tao Xia1, 2*

1Tsinghua Shenzhen International Graduate School, Tsinghua University

2Research Center of Artiﬁcial Intelligence, Peng Cheng Laboratory

{wanggw21, zbm18, zmy20, litx21}@mails.tsinghua.edu.cn, liuzm@pcl.ac.cn, xiast@sz.tsinghua.edu.cn

Abstract

Partial label learning (PLL) learns from training examples

each associated with multiple candidate labels, among which

only one is valid. In recent years, beneﬁting from the strong

capability of dealing with ambiguous supervision and the

impetus of modern data augmentation methods, consistency

regularization-based PLL methods have achieved a series of

successes and become mainstream. However, as the partial

annotation becomes insufﬁcient, their performances drop sig-

niﬁcantly. In this paper, we leverage easily accessible un-

labeled examples to facilitate the partial label consistency

regularization. In addition to a partial supervised loss, our

method performs a controller-guided consistency regulariza-

tion at both the label-level and representation-level with the

help of unlabeled data. To minimize the disadvantages of in-

sufﬁcient capabilities of the initial supervised model, we use

the controller to estimate the conﬁdence of each current pre-

diction to guide the subsequent consistency regularization.

Furthermore, we dynamically adjust the conﬁdence thresh-

olds so that the number of samples of each class participating

in consistency regularization remains roughly equal to alle-

viate the problem of class-imbalance. Experiments show that

our method achieves satisfactory performances in more prac-

tical situations, and its modules can be applied to existing

PLL methods to enhance their capabilities.

Introduction

In real-world applications, data with unique and correct label

is often too costly to obtain (Zhou 2018; Li, Guo, and Zhou

2019; Wang, Yang, and Li 2020; Liu et al. 2023; Chen et al.

2023). Instead, users with varying knowledge and cultural

backgrounds tend to annotate the same image with differ-

ent labels. Traditional supervised learning framework base

on ”one instance one label” assumption is out of its capabil-

ity when faced with such ambiguous examples, while partial

label learning (PLL) provides an effective solution. Con-

ceptually speaking, PLL (H¨

ullermeier and Beringer 2006;

Nguyen and Caruana 2008; Cour, Sapp, and Taskar 2011;

Zhang and Yu 2015; Yu and Zhang 2017; Feng et al. 2020;

Lyu et al. 2021; Lv et al. 2020; Wen et al. 2021; Wang et al.

2023; Shi et al. 2023) learns from ambiguous labeling in-

formation where each training example is associated with a

*Corresponding author.

Figure 1: The performances of ConCont (with additional un-

labeled examples), DPLL and PiCO with 100%,50%,20%

and 10% partially annotated examples on CIFAR-10 dataset.

set of candidate labels, among which only one is assumed

to be valid. The key to accomplish this task is to ﬁnd the

correct correspondence between each training example and

its ground-truth label from the ambiguous candidate col-

lections, i.e. disambiguation. In recent years, partial label

learning manifested its capability in solving a wide range

of applications such as multimedia contents analysis (Zeng

et al. 2013; Chen, Patel, and Chellappa 2018; Xie and Huang

2018), web mining (Jie and Orabona 2010), ecoinformatics

(Liu and Dietterich 2012), etc.

With the explosive research of PLL in the deep learning

paradigm, consistency regularized disambiguation-based

methods (Wang et al. 2022b; Wu, Wang, and Zhang 2022;

Wang et al. 2022c; Li et al. 2023; Xia et al. 2023) have

achieved signiﬁcantly better results than other solutions, and

have gradually become the mainstream. Such methods usu-

ally perturb the samples in the feature space without chang-

ing their label semantics, and then use various methods in

the label space or representation space to make the outputs

of different variants consistent. Modern data augmentation

methods (Cubuk et al. 2019a,b; DeVries and Taylor 2017)

further improve their performances.

However, their performances are achieved when the num-

ber of partially labeled examples is sufﬁcient. In real-world

arXiv:2210.11194v4 [cs.AI] 27 Feb 2024

applications, PLL is usually in a scenario where the label-

ing resources are constrained, and the adequacy of partial

annotation is not guaranteed. Under this circumstance, ex-

isting consistency regularization-based methods often fails

to achieve satisfactory performances. As shown in Figure

1, DPLL (Wu, Wang, and Zhang 2022) and PiCO (Wang

et al. 2022b) achieve state-of-the-art performances when us-

ing complete partial training set, but as the proportion of par-

tial example decreases, their accuracies drop signiﬁcantly.

The reason behind this phenomenon is that when the num-

ber of labels is scarce and inherently ambiguous, there is

not enough supervision information to guide the initial su-

pervised learning of the model, which leads to the conver-

gence of the consistency regularization to the wrong direc-

tion and the emergence of problems such as overﬁtting and

class-imbalance.

Witnessing the enormous power of unlabeled examples

via consistency regularization (Sohn et al. 2020; Berthelot

et al. 2020; Xie et al. 2020), we hope to facilitate par-

tial label consistency regularization through these readily

available data. To this end, an effective framework needs to

be designed to maximize the potential of partial and unla-

beled examples, as well as reasonable mechanisms to guide

the model when supervision information is scarce and am-

biguous. In this paper, we propose consistency regulariza-

tion with controller (abbreviated as ConCont). Our method

learns from the supervised information in the training targets

(i.e., candidate label sets) via a supervised loss, while per-

forming controller-guided consistency regularization at both

label- and representation-levels with the help of unlabeled

data. To avoid negative regularization, the controller divides

the examples as conﬁdent or unconﬁdent according to the

prior information and the learning state of the model, and

applies different label- and representation-level consistency

regularization strategies, respectively. Furthermore, we dy-

namically adjust the conﬁdence thresholds so that the num-

ber of samples of each class participating in consistency reg-

ularization remains roughly equal to alleviate the problem of

class-imbalance.

Related Work

Traditional PLL methods can be divided into two categories:

averaging-based (H¨

ullermeier and Beringer 2006; Cour,

Sapp, and Taskar 2011; Zhang, Zhou, and Liu 2016) and

identiﬁcation-based (Jin and Ghahramani 2002; Liu and Di-

etterich 2012; Feng and An 2019; Ni et al. 2021). Averaging-

based methods treat all the candidate labels equally, while

identiﬁcation-based methods aim at identifying the ground-

truth label directly from candidate label set. With the pop-

ularity of deep neural networks, PLL has been increasingly

studied in deep learning paradigm. Yao et al. (2020) made

the ﬁrst attempt with an entropy-based regularizer enhanc-

ing discrimination. Lv et al. (2020) propose a classiﬁer-

consistent risk estimator for partial examples that theoreti-

cally converges to the optimal point learned from its fully su-

pervised counterpart under mild condition, as well as an ef-

fective method progressively identifying ground-truth labels

from the candidate sets. Wen et al. (2021) propose a family

of loss functions named leveraged weighted loss taking the

trade-offs between losses on partial labels and non-partial

labels into consideration, advancing the former method to a

more generalized case. Xu et al. (2021) consider the learning

case where the candidate labels are generated in an instance-

dependent manner.

Recently, consistency regularization-based PLL methods

have achieved impressive results, among which two repre-

sentatives are: PiCO (Wang et al. 2022b) and DPLL (Wu,

Wang, and Zhang 2022). They can be seen as perform-

ing consistency regularization at the representation-level and

label-level, respectively. To be speciﬁc, PiCO aligns the rep-

resentations of the augmented variants of samples belonging

to the same class, and calculates a representation prototype

for each class, and then disambiguates the label distribution

of the sample according to the distance between the sample

representation and each class prototype, forming an itera-

tive EM-like optimization. While DPLL aligns the output la-

bel distributions of multiple augmented variants to a confor-

mal distribution, which serves as a comprehensiveness of the

label distribution for all augmentations. Despite achieving

state-of-the-art performances under fully PLL datasets, their

consistency regularization rely heavily on the sufﬁciency of

partial annotations, which greatly limits their applications.

Our work is also related with semi-supervised PLL

(Wang, Li, and Zhou 2019; Wang and Zhang 2020). Despite

similar learning scenarios, previous semi-supervised PLL

methods are all based on nearest-neighbor or linear classi-

ﬁers, and have not been integrated with modern consistency

regularized deep learning, which are very different with our

method in terms of algorithm implementation.

Methodology

Notations

Let X ⊆ Rdbe the input feature space and Y=

{1,2, . . . , C}denote the label space. We attempt to induce

a multi-class classiﬁer f:X 7→ [0,1]Cfrom partial la-

bel training set Dp={(xi,yi)|1≤i≤p}and an ad-

ditional unlabeled set Du={xi|p+ 1 ≤i≤p+u}.

Here, yi= (yi

1, yi

2, . . . , yi

C)∈ {0,1}Cis the partial label,

in which yi

j= 1 indicates the j-th label belonging to the

candidate set of sample xiand vice versa. Following the

basic assumption of PLL, the ground-truth label ℓ∈ Y is

inaccessible to the model while satisﬁes yi

ℓ= 1. In order

to facilitate the uniﬁed operation of partial and unlabeled

examples, the target vector of unlabeled examples is repre-

sented as (1,1,...,1), i.e., the candidate set equals to Y,

containing no label information. For the classiﬁer f, we use

fj(x)to denote the output of classiﬁer fon label jgiven

input x.

Consistency Regularization with Controller

Brieﬂy, our method learns from the supervised information

in the training targets (i.e., candidate label sets) via a super-

vised loss, while performing controller-guided consistency

regularization at both label- and representation-level with

the help of unlabeled data. The consistency regularization

here works by aligning the outputs of different augmented

variants of each example. To prevent the model from falling

Figure 2: Partial cross-entropy loss transforms multi-class

predicted distribution into binary.

into poor convergence due to ambiguity and sparsity of su-

pervision, we design a controller to divide the examples ac-

cording to the prior information and the learning state of the

model to guide the subsequent consistency regularization.

Due to the inaccessibility of unique-label, it is infeasible

to train the neural network by minimizing the cross-entropy

loss between model prediction and training target as usual.

To overcome this difﬁculty, the original multi-class classi-

ﬁcation is transformed into a binary classiﬁcation in our

method according to the implication of partial label, i.e. the

input should belong to one of the candidate classes. In de-

tail, our method aggregates the predicted probabilities over

candidate labels and non-candidate labels respectively (see

Figure 2), forming a binary distribution over class candidate

and class non-candidate. Then, the instance is classiﬁed to

class candidate with a simple binary cross-entropy loss.

Given instance xand its training target y, the partial

cross-entropy loss can be expressed as:

Lpart =−log

j=1

yjfj(x).(1)

We encourage the model’s output to remain invariant to

perturbations applied to input images that do not change

the label semantics. With the guidance of controller, our

method minimizes the divergence of the output from a

pair of different data augmentations: (Augw(·),Augs(·)) at

the label-level and (Augs(·),Aug′

s(·)) at the representation-

level (shown in Figure 5), where Augw(·)denotes the weak

augmentation, Augs(·)and Aug′

s(·)denote two different

strong augmentations. The reason for this design is that

the predicted label distribution from the weakly-augmented

variant is more reliable, which is then used to generate the

pseudo-label of the input example for subsequent consis-

tency regularization; while previous studies (He et al. 2020;

Chen et al. 2020) have shown that the representation align-

ment of more diverse augmented variants is more conduc-

tive to improve the model’s ability. Moreover, we also ex-

plored the ”two-branch” framework, i.e., performs label-

and representation-level consistency regularization on the

same pair of augmentations (Augw(·),Augs(·)). However,

the representation learning of this approach is less effective,

meanwhile, sharing the same pair of augmentations may

lead to overﬁtting. Experiments also conﬁrmed that its per-

formance is not as good as the adopted ”three-branch” ver-

sion.

Figure 3: The overall framework of ConCont. Dotted lines

represent sharing backbone.

Let ˜

yi=f(Augw(xi)) be the predicted label distribution

of the weakly-augmented variant, we predict the pseudo-

label of example (xi,yi)by selecting the candidate class

with the maximum predicted probability: ˆyi= arg max( ˜

yi◦

yi), where ◦means the vector point-wise multiplication. To

estimate how conﬁdent the model is about the example, the

controller computes a partial conﬁdence score (p-score for

short) siand based on which classiﬁes it as conﬁdent or un-

conﬁdent yet. Given the partial label yand predicted label

distribution ˜

y, the p-score for example (x,y)consists of the

following three terms (for the sake of simplicity, we omit the

superscript i):

• Label information: The prior label information from the

provided training target. The fewer the number of candi-

date classes, the greater the amount of information and

vice versa. This term is calculated as p1=1

j=1 yj.

• Candidate margin: Measure the prominence in proba-

bility between pseudo-label and other candidate classes.

We think that the probability margin is more suitable for

measuring the conﬁdence than simply the predicted prob-

ability over the class. For example, for an instance with

5 candidate classes, the model is obviously more con-

ﬁdent in ˜

y= (0.5,0.2,0.1,0.1,0.1), comparing with

y′= (0.5,0.49,0.01,0.0,0.0). This term is calculated

as p2=max( ˜

y◦y,1)−max( ˜

y◦y,2)

j=1 ˜

yjyj, where max(·, k)returns

the k-th largest value of the input vector.

• Supervised learning state: The average predicted proba-

bility over non-candidate classes, measuring the current

learning state of the model from partial supervision, writ-

ten as p3=1−PC

j=1 ˜

yjyj

C−PC

j=1 yj

The overall p-score is computed as the direct sum of the

three terms.

p=p1+p2+p3.(2)

The Precision-Recall (P-R) curves of classifying conﬁ-

dent or unconﬁdent examples using p-scores and traditional

maximum probabilities are shown in Figure 4. It can be

seen that although the curve of p-score has some ﬂuctua-

tions when recall is small, when recall is greater than 60%

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Controller-GuidedPartialLabelConsistencyRegularizationwithUnlabeledDataQian-WeiWang1,2,BowenZhao1,MingyanZhu1,TianxiangLi1,ZimoLiu2,Shu-TaoXia1,2*1TsinghuaShenzhenInternationalGraduateSchool,TsinghuaUniversity2ResearchCenterofArtificialIntelligence,PengChengLaboratory{wanggw21,zbm18,zmy20,litx21}@ma...

展开>> 收起<<

Controller-Guided Partial Label Consistency Regularization with Unlabeled Data Qian-Wei Wang12 Bowen Zhao1 Mingyan Zhu1 Tianxiang Li1 Zimo Liu2 Shu-Tao Xia1 2 1Tsinghua Shenzhen International Graduate School Tsinghua University.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Controller-Guided Partial Label Consistency Regularization with Unlabeled Data Qian-Wei Wang12 Bowen Zhao1 Mingyan Zhu1 Tianxiang Li1 Zimo Liu2 Shu-Tao Xia1 2 1Tsinghua Shenzhen International Graduate School Tsinghua University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: