Brief Introduction to Contrastive Learning Pretext Tasks for Visual Representation Zhenyuan Lu

2025-04-27 0 0 4.39MB 8 页 10玖币
侵权投诉
Brief Introduction to Contrastive Learning Pretext
Tasks for Visual Representation
Zhenyuan Lu
Northeastern University, Jan. 2021
Abstract
To improve performance in visual feature representation from photos or videos for
practical applications, we generally require large-scale human-annotated labeled
data while training deep neural networks. However, the cost of gathering and
annotating human-annotated labeled data is expensive. Given that there is a lot of
unlabeled data in the actual world, it is possible to introduce self-defined pseudo
labels as supervisions to prevent this issue. Self-supervised learning, specifically
contrastive learning, is a subset of unsupervised learning methods that has grown
popular in computer vision, natural language processing, and other domains. The
purpose of contrastive learning is to embed augmented samples from the same
sample near to each other while pushing away those that are not. In the following
sections, we will introduce the regular formulation among different learnings. In
the next sections, we will discuss the regular formulation of various learnings.
Furthermore, we offer some strategies from contrastive learning that have recently
been published and are focused on pretext tasks for visual representation.
1 Introduction
Large-scale dataset collection and annotation are time-consuming and costly. To avoid time-
consuming and costly data annotations, a number of self-supervised learning methods have recently
been developed to learn visual representations from massive unlabeled photos or videos that are
not involved in human annotations. One frequent way of learning such visual representations is to
propose a pretext task for the neural network to perform with. Here, we leverage contrastive learning
to focus on the pretext task.
Consider Robert Epstein’s experiment, in which the goal is to encourage participants to draw a
detailed representation of a one-dollar bill (Figure 1). The image sketched for the dollar bill from
memory is depicted in the figure on the left. While the dollar bill is presented, the correct figure is
precisely drawn. As a result, the drawing produced by memory differs significantly from the drawing
produced by the target presented (Epstein 2016). Regardless of how dissimilar these two pictures
are, they share common representations such as Mr. Washington’s figure, the one-dollar inscription,
and others. Humans can comprehend that these two drawings depict the same target, one dollar.
But what if we let the machine guess whether they are from the same image, which may require a
representation based on a pair of positive sample pairs: a drawing and a dollar bill, and a pair of
negative sample pairs: a random other drawing and a dollar bill. This is the concept of contrastive
learning, which has lately been expanded to various algorithms.
2 Formulations Among Different Learning Paradigms
The distinction between different learnings is primarily determined by training labels. There are four
types of visual feature learning methods: (1) supervised learning, (2) semi-supervised learning, (3)
weakly supervised learning, and (4) unsupervised learning (e.g. contrastive learning).
arXiv:2210.03163v1 [cs.CV] 6 Oct 2022
Figure 1: Fig. Left: Drawing of a dollar bill from memory. Right: Drawing subsequently made with
a dollar bill present. Image source: Epstein, 2016.
2.1 Supervised Learning
For supervised learning, the model is given a dataset
X(x1, x2, . . . , xN)
. Such dataset is associated
with manually annotated labels Yi. The training loss function is defined as follows:
loss(D) = min
θ
1
N
N
X
i=1
loss(Xi, Yi)
where
D={Xi}N
i=0
is the
N
labeled training data. The advantage of training models with human-
annotated labels is that they produce significant outcomes in a variety of computer vision applications
(A. Krizhevsky 2012, R. Girshick 2014, D. Tran 2015, J. Long 2015). However, label annotation
is frequently extremely expensive, demanding advanced professional skills and domain expertise.
As a result, the other four learning algorithms are now more common than supervised learning for
lowering labeling costs.
2.2 Semi-supervised Learning
The model is given a small labeled dataset X and a large unlabeled dataset Z for semi-supervised
learning. This dataset is associated with manually annotated labels Y i. The following is the definition
of the training loss function:
loss(D1, D2) = min
θ
1
N
N
X
i=1
loss(Xi, Yi) + 1
M
M
X
i=1
loss(Zi, R(Zi, X))
where
D1={Xi}N
i=0
is
N
labeled training dataset, and
D2={Zi}M
i=0
is
M
unlabeled training
dataset.
R(Zi, X)
is a function that represents the relationship between the unlabeled and labeled
training datasets.
2.3 Weakly Supervised Learning
A dataset
X
is associated with a collection of coarse-grained labels
Ci
for weakly supervised learning.
The training loss function for X(x1, x2, . . . , xi)is defined as follows:
loss(D) = min
θ
1
N
N
X
i=1
loss(Xi, Ci)
where
D1={Xi}N
i=0
denotes the training dataset. In a weakly supervised learning system, a fine-
grained label is substantially more expensive than a coarse-grained label. Because of this fact, the
advantage of weak supervision labels is that it is relatively easier to gather large-scale datasets. For
example, picture features collected from websites utilizing the hashtag as coarse-grained labels were
recently introduced (W. Li 2017, D. Mahajan and Y. Li 2018).
2
摘要:

BriefIntroductiontoContrastiveLearningPretextTasksforVisualRepresentationZhenyuanLuNortheasternUniversity,Jan.2021AbstractToimproveperformanceinvisualfeaturerepresentationfromphotosorvideosforpracticalapplications,wegenerallyrequirelarge-scalehuman-annotatedlabeleddatawhiletrainingdeepneuralnetworks...

展开>> 收起<<
Brief Introduction to Contrastive Learning Pretext Tasks for Visual Representation Zhenyuan Lu.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.39MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注