Controller-Guided Partial Label Consistency Regularization with Unlabeled Data Qian-Wei Wang12 Bowen Zhao1 Mingyan Zhu1 Tianxiang Li1 Zimo Liu2 Shu-Tao Xia1 2 1Tsinghua Shenzhen International Graduate School Tsinghua University

2025-05-06 0 0 813.88KB 11 页 10玖币
侵权投诉
Controller-Guided Partial Label Consistency Regularization with Unlabeled Data
Qian-Wei Wang1,2, Bowen Zhao1, Mingyan Zhu1, Tianxiang Li1, Zimo Liu2, Shu-Tao Xia1, 2*
1Tsinghua Shenzhen International Graduate School, Tsinghua University
2Research Center of Artificial Intelligence, Peng Cheng Laboratory
{wanggw21, zbm18, zmy20, litx21}@mails.tsinghua.edu.cn, liuzm@pcl.ac.cn, xiast@sz.tsinghua.edu.cn
Abstract
Partial label learning (PLL) learns from training examples
each associated with multiple candidate labels, among which
only one is valid. In recent years, benefiting from the strong
capability of dealing with ambiguous supervision and the
impetus of modern data augmentation methods, consistency
regularization-based PLL methods have achieved a series of
successes and become mainstream. However, as the partial
annotation becomes insufficient, their performances drop sig-
nificantly. In this paper, we leverage easily accessible un-
labeled examples to facilitate the partial label consistency
regularization. In addition to a partial supervised loss, our
method performs a controller-guided consistency regulariza-
tion at both the label-level and representation-level with the
help of unlabeled data. To minimize the disadvantages of in-
sufficient capabilities of the initial supervised model, we use
the controller to estimate the confidence of each current pre-
diction to guide the subsequent consistency regularization.
Furthermore, we dynamically adjust the confidence thresh-
olds so that the number of samples of each class participating
in consistency regularization remains roughly equal to alle-
viate the problem of class-imbalance. Experiments show that
our method achieves satisfactory performances in more prac-
tical situations, and its modules can be applied to existing
PLL methods to enhance their capabilities.
Introduction
In real-world applications, data with unique and correct label
is often too costly to obtain (Zhou 2018; Li, Guo, and Zhou
2019; Wang, Yang, and Li 2020; Liu et al. 2023; Chen et al.
2023). Instead, users with varying knowledge and cultural
backgrounds tend to annotate the same image with differ-
ent labels. Traditional supervised learning framework base
on ”one instance one label” assumption is out of its capabil-
ity when faced with such ambiguous examples, while partial
label learning (PLL) provides an effective solution. Con-
ceptually speaking, PLL (H¨
ullermeier and Beringer 2006;
Nguyen and Caruana 2008; Cour, Sapp, and Taskar 2011;
Zhang and Yu 2015; Yu and Zhang 2017; Feng et al. 2020;
Lyu et al. 2021; Lv et al. 2020; Wen et al. 2021; Wang et al.
2023; Shi et al. 2023) learns from ambiguous labeling in-
formation where each training example is associated with a
*Corresponding author.
Copyright © 2024, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Figure 1: The performances of ConCont (with additional un-
labeled examples), DPLL and PiCO with 100%,50%,20%
and 10% partially annotated examples on CIFAR-10 dataset.
set of candidate labels, among which only one is assumed
to be valid. The key to accomplish this task is to find the
correct correspondence between each training example and
its ground-truth label from the ambiguous candidate col-
lections, i.e. disambiguation. In recent years, partial label
learning manifested its capability in solving a wide range
of applications such as multimedia contents analysis (Zeng
et al. 2013; Chen, Patel, and Chellappa 2018; Xie and Huang
2018), web mining (Jie and Orabona 2010), ecoinformatics
(Liu and Dietterich 2012), etc.
With the explosive research of PLL in the deep learning
paradigm, consistency regularized disambiguation-based
methods (Wang et al. 2022b; Wu, Wang, and Zhang 2022;
Wang et al. 2022c; Li et al. 2023; Xia et al. 2023) have
achieved significantly better results than other solutions, and
have gradually become the mainstream. Such methods usu-
ally perturb the samples in the feature space without chang-
ing their label semantics, and then use various methods in
the label space or representation space to make the outputs
of different variants consistent. Modern data augmentation
methods (Cubuk et al. 2019a,b; DeVries and Taylor 2017)
further improve their performances.
However, their performances are achieved when the num-
ber of partially labeled examples is sufficient. In real-world
arXiv:2210.11194v4 [cs.AI] 27 Feb 2024
applications, PLL is usually in a scenario where the label-
ing resources are constrained, and the adequacy of partial
annotation is not guaranteed. Under this circumstance, ex-
isting consistency regularization-based methods often fails
to achieve satisfactory performances. As shown in Figure
1, DPLL (Wu, Wang, and Zhang 2022) and PiCO (Wang
et al. 2022b) achieve state-of-the-art performances when us-
ing complete partial training set, but as the proportion of par-
tial example decreases, their accuracies drop significantly.
The reason behind this phenomenon is that when the num-
ber of labels is scarce and inherently ambiguous, there is
not enough supervision information to guide the initial su-
pervised learning of the model, which leads to the conver-
gence of the consistency regularization to the wrong direc-
tion and the emergence of problems such as overfitting and
class-imbalance.
Witnessing the enormous power of unlabeled examples
via consistency regularization (Sohn et al. 2020; Berthelot
et al. 2020; Xie et al. 2020), we hope to facilitate par-
tial label consistency regularization through these readily
available data. To this end, an effective framework needs to
be designed to maximize the potential of partial and unla-
beled examples, as well as reasonable mechanisms to guide
the model when supervision information is scarce and am-
biguous. In this paper, we propose consistency regulariza-
tion with controller (abbreviated as ConCont). Our method
learns from the supervised information in the training targets
(i.e., candidate label sets) via a supervised loss, while per-
forming controller-guided consistency regularization at both
label- and representation-levels with the help of unlabeled
data. To avoid negative regularization, the controller divides
the examples as confident or unconfident according to the
prior information and the learning state of the model, and
applies different label- and representation-level consistency
regularization strategies, respectively. Furthermore, we dy-
namically adjust the confidence thresholds so that the num-
ber of samples of each class participating in consistency reg-
ularization remains roughly equal to alleviate the problem of
class-imbalance.
Related Work
Traditional PLL methods can be divided into two categories:
averaging-based (H¨
ullermeier and Beringer 2006; Cour,
Sapp, and Taskar 2011; Zhang, Zhou, and Liu 2016) and
identification-based (Jin and Ghahramani 2002; Liu and Di-
etterich 2012; Feng and An 2019; Ni et al. 2021). Averaging-
based methods treat all the candidate labels equally, while
identification-based methods aim at identifying the ground-
truth label directly from candidate label set. With the pop-
ularity of deep neural networks, PLL has been increasingly
studied in deep learning paradigm. Yao et al. (2020) made
the first attempt with an entropy-based regularizer enhanc-
ing discrimination. Lv et al. (2020) propose a classifier-
consistent risk estimator for partial examples that theoreti-
cally converges to the optimal point learned from its fully su-
pervised counterpart under mild condition, as well as an ef-
fective method progressively identifying ground-truth labels
from the candidate sets. Wen et al. (2021) propose a family
of loss functions named leveraged weighted loss taking the
trade-offs between losses on partial labels and non-partial
labels into consideration, advancing the former method to a
more generalized case. Xu et al. (2021) consider the learning
case where the candidate labels are generated in an instance-
dependent manner.
Recently, consistency regularization-based PLL methods
have achieved impressive results, among which two repre-
sentatives are: PiCO (Wang et al. 2022b) and DPLL (Wu,
Wang, and Zhang 2022). They can be seen as perform-
ing consistency regularization at the representation-level and
label-level, respectively. To be specific, PiCO aligns the rep-
resentations of the augmented variants of samples belonging
to the same class, and calculates a representation prototype
for each class, and then disambiguates the label distribution
of the sample according to the distance between the sample
representation and each class prototype, forming an itera-
tive EM-like optimization. While DPLL aligns the output la-
bel distributions of multiple augmented variants to a confor-
mal distribution, which serves as a comprehensiveness of the
label distribution for all augmentations. Despite achieving
state-of-the-art performances under fully PLL datasets, their
consistency regularization rely heavily on the sufficiency of
partial annotations, which greatly limits their applications.
Our work is also related with semi-supervised PLL
(Wang, Li, and Zhou 2019; Wang and Zhang 2020). Despite
similar learning scenarios, previous semi-supervised PLL
methods are all based on nearest-neighbor or linear classi-
fiers, and have not been integrated with modern consistency
regularized deep learning, which are very different with our
method in terms of algorithm implementation.
Methodology
Notations
Let X Rdbe the input feature space and Y=
{1,2, . . . , C}denote the label space. We attempt to induce
a multi-class classifier f:X 7→ [0,1]Cfrom partial la-
bel training set Dp={(xi,yi)|1ip}and an ad-
ditional unlabeled set Du={xi|p+ 1 ip+u}.
Here, yi= (yi
1, yi
2, . . . , yi
C)∈ {0,1}Cis the partial label,
in which yi
j= 1 indicates the j-th label belonging to the
candidate set of sample xiand vice versa. Following the
basic assumption of PLL, the ground-truth label Y is
inaccessible to the model while satisfies yi
= 1. In order
to facilitate the unified operation of partial and unlabeled
examples, the target vector of unlabeled examples is repre-
sented as (1,1,...,1), i.e., the candidate set equals to Y,
containing no label information. For the classifier f, we use
fj(x)to denote the output of classifier fon label jgiven
input x.
Consistency Regularization with Controller
Briefly, our method learns from the supervised information
in the training targets (i.e., candidate label sets) via a super-
vised loss, while performing controller-guided consistency
regularization at both label- and representation-level with
the help of unlabeled data. The consistency regularization
here works by aligning the outputs of different augmented
variants of each example. To prevent the model from falling
Figure 2: Partial cross-entropy loss transforms multi-class
predicted distribution into binary.
into poor convergence due to ambiguity and sparsity of su-
pervision, we design a controller to divide the examples ac-
cording to the prior information and the learning state of the
model to guide the subsequent consistency regularization.
Due to the inaccessibility of unique-label, it is infeasible
to train the neural network by minimizing the cross-entropy
loss between model prediction and training target as usual.
To overcome this difficulty, the original multi-class classi-
fication is transformed into a binary classification in our
method according to the implication of partial label, i.e. the
input should belong to one of the candidate classes. In de-
tail, our method aggregates the predicted probabilities over
candidate labels and non-candidate labels respectively (see
Figure 2), forming a binary distribution over class candidate
and class non-candidate. Then, the instance is classified to
class candidate with a simple binary cross-entropy loss.
Given instance xand its training target y, the partial
cross-entropy loss can be expressed as:
Lpart =log
C
X
j=1
yjfj(x).(1)
We encourage the model’s output to remain invariant to
perturbations applied to input images that do not change
the label semantics. With the guidance of controller, our
method minimizes the divergence of the output from a
pair of different data augmentations: (Augw(·),Augs(·)) at
the label-level and (Augs(·),Aug
s(·)) at the representation-
level (shown in Figure 5), where Augw(·)denotes the weak
augmentation, Augs(·)and Aug
s(·)denote two different
strong augmentations. The reason for this design is that
the predicted label distribution from the weakly-augmented
variant is more reliable, which is then used to generate the
pseudo-label of the input example for subsequent consis-
tency regularization; while previous studies (He et al. 2020;
Chen et al. 2020) have shown that the representation align-
ment of more diverse augmented variants is more conduc-
tive to improve the model’s ability. Moreover, we also ex-
plored the ”two-branch” framework, i.e., performs label-
and representation-level consistency regularization on the
same pair of augmentations (Augw(·),Augs(·)). However,
the representation learning of this approach is less effective,
meanwhile, sharing the same pair of augmentations may
lead to overfitting. Experiments also confirmed that its per-
formance is not as good as the adopted ”three-branch” ver-
sion.
Figure 3: The overall framework of ConCont. Dotted lines
represent sharing backbone.
Let ˜
yi=f(Augw(xi)) be the predicted label distribution
of the weakly-augmented variant, we predict the pseudo-
label of example (xi,yi)by selecting the candidate class
with the maximum predicted probability: ˆyi= arg max( ˜
yi
yi), where means the vector point-wise multiplication. To
estimate how confident the model is about the example, the
controller computes a partial confidence score (p-score for
short) siand based on which classifies it as confident or un-
confident yet. Given the partial label yand predicted label
distribution ˜
y, the p-score for example (x,y)consists of the
following three terms (for the sake of simplicity, we omit the
superscript i):
Label information: The prior label information from the
provided training target. The fewer the number of candi-
date classes, the greater the amount of information and
vice versa. This term is calculated as p1=1
PC
j=1 yj.
Candidate margin: Measure the prominence in proba-
bility between pseudo-label and other candidate classes.
We think that the probability margin is more suitable for
measuring the confidence than simply the predicted prob-
ability over the class. For example, for an instance with
5 candidate classes, the model is obviously more con-
fident in ˜
y= (0.5,0.2,0.1,0.1,0.1), comparing with
˜
y= (0.5,0.49,0.01,0.0,0.0). This term is calculated
as p2=max( ˜
yy,1)max( ˜
yy,2)
PC
j=1 ˜
yjyj, where max(·, k)returns
the k-th largest value of the input vector.
Supervised learning state: The average predicted proba-
bility over non-candidate classes, measuring the current
learning state of the model from partial supervision, writ-
ten as p3=1PC
j=1 ˜
yjyj
CPC
j=1 yj
The overall p-score is computed as the direct sum of the
three terms.
p=p1+p2+p3.(2)
The Precision-Recall (P-R) curves of classifying confi-
dent or unconfident examples using p-scores and traditional
maximum probabilities are shown in Figure 4. It can be
seen that although the curve of p-score has some fluctua-
tions when recall is small, when recall is greater than 60%
摘要:

Controller-GuidedPartialLabelConsistencyRegularizationwithUnlabeledDataQian-WeiWang1,2,BowenZhao1,MingyanZhu1,TianxiangLi1,ZimoLiu2,Shu-TaoXia1,2*1TsinghuaShenzhenInternationalGraduateSchool,TsinghuaUniversity2ResearchCenterofArtificialIntelligence,PengChengLaboratory{wanggw21,zbm18,zmy20,litx21}@ma...

展开>> 收起<<
Controller-Guided Partial Label Consistency Regularization with Unlabeled Data Qian-Wei Wang12 Bowen Zhao1 Mingyan Zhu1 Tianxiang Li1 Zimo Liu2 Shu-Tao Xia1 2 1Tsinghua Shenzhen International Graduate School Tsinghua University.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:813.88KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注