Rethinking Rotation in Self-Supervised Contrastive Learning Adaptive Positive or Negative Data Augmentation Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1

2025-04-29 0 0 4.85MB 10 页 10玖币
侵权投诉
Rethinking Rotation in Self-Supervised Contrastive Learning:
Adaptive Positive or Negative Data Augmentation
Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1
1The University of Tokyo 2NTT Corporation, Japan
{miyai,yu,aizawa}@hal.t.u-tokyo.ac.jp daiki-ikami@go.tuat.ac.jp goirie@ieee.org
Abstract
Rotation is frequently listed as a candidate for data aug-
mentation in contrastive learning but seldom provides sat-
isfactory improvements. We argue that this is because the
rotated image is always treated as either positive or nega-
tive. The semantics of an image can be rotation-invariant
or rotation-variant, so whether the rotated image is treated
as positive or negative should be determined based on the
content of the image. Therefore, we propose a novel aug-
mentation strategy, adaptive Positive or Negative Data Aug-
mentation (PNDA), in which an original and its rotated
image are a positive pair if they are semantically close
and a negative pair if they are semantically different. To
achieve PNDA, we first determine whether rotation is pos-
itive or negative on an image-by-image basis in an unsu-
pervised way. Then, we apply PNDA to contrastive learn-
ing frameworks. Our experiments showed that PNDA im-
proves the performance of contrastive learning. The code
is available at https://github.com/AtsuMiyai/
rethinking_rotation.
1. Introduction
Recently, self-supervised learning [28, 15, 19, 5, 18] has
shown remarkable results in representation learning. The
gap between self-supervised and supervised learning has
been bridged by contrastive learning [19, 5, 17, 7, 2, 3]. For
self-supervised contrastive learning, data augmentation is
one of the most important techniques [33]. A common ap-
proach for contrastive learning creates positives with some
augmentations and encourages them to be pulled closer.
Since this augmentation strategy creates positive samples,
we refer to it as positive data augmentation (PDA). In ad-
dition, some methods [31, 32, 14] use augmentation to cre-
ate negatives and encourage them to be pushed away. This
augmentation strategy is called negative data augmentation
(NDA).
Rotation has been attempted to be used for these aug-
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Repel
90°rotation
Repel
0°rotation
0°rotation 90°rotation
Attract
90°rotation
Attract
0°rotation
0°rotation 90°rotation
Attract
Existing Method
Positive
Data Augmentation
(PDA)
Existing Method
Negative
Data Augmentation
(NDA)
Proposal Method
adaptive
Positive or Negative
Data Augmentation
(PNDA)
Figure 1: Comparison of previous and the proposed aug-
mentation strategy. Upper: PDA treats all rotated images
as positives and encourages them to be pulled closer. mid-
dle: NDA treats all rotated images as negatives and encour-
ages them to be pushed away. Lower: Our proposed PNDA
considers the semantics of the images, and treats rotation as
either positive or negative for each image.
mentations, but few improvements have been made. Al-
though rotation is useful in various fields, Chen et al.
[5] reported that rotation PDA degrades the representation
ability in self-supervised contrastive learning because rota-
tion largely affects image semantics. Since then, rotation
has been treated as harmful for self-supervised contrastive
learning. We consider that this is because previous ap-
proaches tried treating rotation as either positive or negative
without considering the semantics of each image.
To solve this problem and make full use of rotation, it is
important to consider whether the rotation affects the se-
mantics of each image. Natural images are divided into
two classes: rotation-agnostic image (RAI) with an ambigu-
ous orientation and non-rotation-agnostic image (non-RAI)
with a clear orientation. In RAI, an object can have var-
ious orientations. By applying rotation PDA to RAI and
arXiv:2210.12681v2 [cs.CV] 24 Nov 2022
encouraging them to be pulled closer, the image will obtain
embedding features robust to rotation. On the other hand,
in non-RAI, the orientation of an object is limited. By ap-
plying rotation PDA to non-RAI and encouraging them to
be pulled closer, the images will lose orientation informa-
tion and might get undesirable features. For non-RAI, it is
preferable to treat rotation as negative to maintain orienta-
tion information.
Based on this observation, in this study, we introduce
a novel augmentation strategy called adaptive Positive or
Negative Data Augmentation (PNDA). In Fig.1, we show
an overview of PDA, NDA, and PNDA. While PDA and
NDA do not consider the semantics of each image, our pro-
posed PNDA considers the semantics of each image, and
treats rotation as positive if the original and rotated images
have the same semantics and negative if their semantics are
different. To achieve PNDA, we extract RAI for which ro-
tation is treated as positive. However, there is no method to
determine whether an image is RAI or non-RAI. Thus, we
also tackle a novel task for sampling RAI and propose an
entropy-based method. This sampling method focuses on
the difference in the difficulty of the rotation prediction be-
tween RAI and non-RAI and can extract RAI based on the
entropy of the rotation predictor’s output.
We evaluate rotation PNDA with contrastive learning
frameworks such as MoCo v2 and SimCLR. As a result of
several experiments, we showed that the proposed rotation
PNDA improves the performance of contrastive learning,
while rotation PDA and NDA might decrease it.
The contributions of our paper are summarized as fol-
lows:
We propose a novel augmentation strategy called
PNDA that considers the semantics of the images and
treats rotation as the better one of either positive or
negative for each image.
We propose a new task of sampling rotation-agnostic
images for which rotation is treated as positive.
• We apply rotation PNDA with contrastive learning
frameworks, and found that rotation PNDA improves
the performance of contrastive learning.
2. Related work
2.1. Contrastive Learning
Contrastive learning has become one of the most suc-
cessful methods in self-supervised learning [19, 5, 17, 7, 3].
One popular approach for contrastive learning, such as
MoCo [19] and SimCLR [5], is to create two views of the
same image and attract them while repulsing different im-
ages. Many studies have explored the positives or negatives
of MoCo and SimCLR [11, 40, 22]. Some methods, such as
BYOL [17] or SimSiam [7], use only positives, but recent
studies [14, 34] have shown that better representation can
be learned by incorporating negatives into these methods.
For contrast learning, the use of positives and negatives is
important to learn better representations.
2.2. Data Augmentation for Contrastive Learning
There are two main types of augmentation strategies for
contrastive learning: positive data augmentation (PDA) and
negative data augmentation (NDA).
2.2.1 Positive Data Augmentation (PDA)
Contrastive learning methods create positives with aug-
mentations and get them closer. For example, Chen et
al. [5] proposed composition of data augmentations e.g.
Grayscale, Random Resized Cropping, Color Jittering, and
Gaussian Blur to make the model robust to these augmen-
tations. On the other hand, they reported that adding rota-
tion to these augmentations degrades performance. How-
ever, they used rotation PDA without considering the dif-
ference in the semantic content between RAI and non-RAI.
Some work dealt with rotation for contrastive learning by
residual relaxation [35] or combination with rotation pre-
diction [1, 9]. Our work focuses on the semantics of each
rotated image.
2.2.2 Negative Data Augmentation (NDA)
Several methods have been proposed to create negative sam-
ples by applying specific transformations to images [4, 32,
31]. Sinha et al. [31] investigated whether several augmen-
tations, including Cutmix [38] and Mixup [39], which are
typically used as positive in supervised learning, can be
used as NDA for representation learning. However, they
did not argue that rotation NDA is effective. Tack et al. [32]
stated rotation NDA is effective for unsupervised out-of-
distribution detection, but they also did not state that ro-
tation NDA is effective for representation learning. These
methods [4, 32, 31] treat the transformed images as nega-
tives without considering the semantics of each image.
2.3. Rotation Invariance
Rotation invariance is one of many good and well-
studied properties of visual representation, and many ex-
isting methods incorporate rotational invariant features into
feature learning frameworks. For supervised learning, G-
CNNs [8] and Warped Convolutions [21] showed excellent
results in learning rotational invariant features. For self-
supervised learning, Feng et al. [13] worked on rotation
feature learning, which learns a representation that decou-
ples rotation related and unrelated parts. However, previous
works separated the rotation related and unrelated parts im-
plicitly as internal information of the network and did not
摘要:

RethinkingRotationinSelf-SupervisedContrastiveLearning:AdaptivePositiveorNegativeDataAugmentationAtsuyukiMiyai1QingYu1DaikiIkami2GoIrie2KiyoharuAizawa11TheUniversityofTokyo2NTTCorporation,Japan{miyai,yu,aizawa}@hal.t.u-tokyo.ac.jpdaiki-ikami@go.tuat.ac.jpgoirie@ieee.orgAbstractRotationisfrequentlyli...

展开>> 收起<<
Rethinking Rotation in Self-Supervised Contrastive Learning Adaptive Positive or Negative Data Augmentation Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:10 页 大小:4.85MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注