Rethinking Rotation in Self-Supervised Contrastive Learning Adaptive Positive or Negative Data Augmentation Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1

2025-04-29 1 0 4.85MB 10 页 10玖币

侵权投诉

Rethinking Rotation in Self-Supervised Contrastive Learning:

Adaptive Positive or Negative Data Augmentation

Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1

1The University of Tokyo 2NTT Corporation, Japan

{miyai,yu,aizawa}@hal.t.u-tokyo.ac.jp daiki-ikami@go.tuat.ac.jp goirie@ieee.org

Abstract

Rotation is frequently listed as a candidate for data aug-

mentation in contrastive learning but seldom provides sat-

isfactory improvements. We argue that this is because the

rotated image is always treated as either positive or nega-

tive. The semantics of an image can be rotation-invariant

or rotation-variant, so whether the rotated image is treated

as positive or negative should be determined based on the

content of the image. Therefore, we propose a novel aug-

mentation strategy, adaptive Positive or Negative Data Aug-

mentation (PNDA), in which an original and its rotated

image are a positive pair if they are semantically close

and a negative pair if they are semantically different. To

achieve PNDA, we ﬁrst determine whether rotation is pos-

itive or negative on an image-by-image basis in an unsu-

pervised way. Then, we apply PNDA to contrastive learn-

ing frameworks. Our experiments showed that PNDA im-

proves the performance of contrastive learning. The code

is available at https://github.com/AtsuMiyai/

rethinking_rotation.

1. Introduction

Recently, self-supervised learning [28, 15, 19, 5, 18] has

shown remarkable results in representation learning. The

gap between self-supervised and supervised learning has

been bridged by contrastive learning [19, 5, 17, 7, 2, 3]. For

self-supervised contrastive learning, data augmentation is

one of the most important techniques [33]. A common ap-

proach for contrastive learning creates positives with some

augmentations and encourages them to be pulled closer.

Since this augmentation strategy creates positive samples,

we refer to it as positive data augmentation (PDA). In ad-

dition, some methods [31, 32, 14] use augmentation to cre-

ate negatives and encourage them to be pushed away. This

augmentation strategy is called negative data augmentation

(NDA).

Rotation has been attempted to be used for these aug-

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Repel

90°rotation

Repel

0°rotation

0°rotation 90°rotation

Attract

90°rotation

Attract

0°rotation

0°rotation 90°rotation

Attract

Existing Method

Positive

Data Augmentation

(PDA)

Existing Method

Negative

Data Augmentation

(NDA)

Proposal Method

adaptive

Positive or Negative

Data Augmentation

(PNDA)

Figure 1: Comparison of previous and the proposed aug-

mentation strategy. Upper: PDA treats all rotated images

as positives and encourages them to be pulled closer. mid-

dle: NDA treats all rotated images as negatives and encour-

ages them to be pushed away. Lower: Our proposed PNDA

considers the semantics of the images, and treats rotation as

either positive or negative for each image.

mentations, but few improvements have been made. Al-

though rotation is useful in various ﬁelds, Chen et al.

[5] reported that rotation PDA degrades the representation

ability in self-supervised contrastive learning because rota-

tion largely affects image semantics. Since then, rotation

has been treated as harmful for self-supervised contrastive

learning. We consider that this is because previous ap-

proaches tried treating rotation as either positive or negative

without considering the semantics of each image.

To solve this problem and make full use of rotation, it is

important to consider whether the rotation affects the se-

mantics of each image. Natural images are divided into

two classes: rotation-agnostic image (RAI) with an ambigu-

ous orientation and non-rotation-agnostic image (non-RAI)

with a clear orientation. In RAI, an object can have var-

ious orientations. By applying rotation PDA to RAI and

arXiv:2210.12681v2 [cs.CV] 24 Nov 2022

encouraging them to be pulled closer, the image will obtain

embedding features robust to rotation. On the other hand,

in non-RAI, the orientation of an object is limited. By ap-

plying rotation PDA to non-RAI and encouraging them to

be pulled closer, the images will lose orientation informa-

tion and might get undesirable features. For non-RAI, it is

preferable to treat rotation as negative to maintain orienta-

tion information.

Based on this observation, in this study, we introduce

a novel augmentation strategy called adaptive Positive or

Negative Data Augmentation (PNDA). In Fig.1, we show

an overview of PDA, NDA, and PNDA. While PDA and

NDA do not consider the semantics of each image, our pro-

posed PNDA considers the semantics of each image, and

treats rotation as positive if the original and rotated images

have the same semantics and negative if their semantics are

different. To achieve PNDA, we extract RAI for which ro-

tation is treated as positive. However, there is no method to

determine whether an image is RAI or non-RAI. Thus, we

also tackle a novel task for sampling RAI and propose an

entropy-based method. This sampling method focuses on

the difference in the difﬁculty of the rotation prediction be-

tween RAI and non-RAI and can extract RAI based on the

entropy of the rotation predictor’s output.

We evaluate rotation PNDA with contrastive learning

frameworks such as MoCo v2 and SimCLR. As a result of

several experiments, we showed that the proposed rotation

PNDA improves the performance of contrastive learning,

while rotation PDA and NDA might decrease it.

The contributions of our paper are summarized as fol-

lows:

• We propose a novel augmentation strategy called

PNDA that considers the semantics of the images and

treats rotation as the better one of either positive or

negative for each image.

• We propose a new task of sampling rotation-agnostic

images for which rotation is treated as positive.

• We apply rotation PNDA with contrastive learning

frameworks, and found that rotation PNDA improves

the performance of contrastive learning.

2. Related work

2.1. Contrastive Learning

Contrastive learning has become one of the most suc-

cessful methods in self-supervised learning [19, 5, 17, 7, 3].

One popular approach for contrastive learning, such as

MoCo [19] and SimCLR [5], is to create two views of the

same image and attract them while repulsing different im-

ages. Many studies have explored the positives or negatives

of MoCo and SimCLR [11, 40, 22]. Some methods, such as

BYOL [17] or SimSiam [7], use only positives, but recent

studies [14, 34] have shown that better representation can

be learned by incorporating negatives into these methods.

For contrast learning, the use of positives and negatives is

important to learn better representations.

2.2. Data Augmentation for Contrastive Learning

There are two main types of augmentation strategies for

contrastive learning: positive data augmentation (PDA) and

negative data augmentation (NDA).

2.2.1 Positive Data Augmentation (PDA)

Contrastive learning methods create positives with aug-

mentations and get them closer. For example, Chen et

al. [5] proposed composition of data augmentations e.g.

Grayscale, Random Resized Cropping, Color Jittering, and

Gaussian Blur to make the model robust to these augmen-

tations. On the other hand, they reported that adding rota-

tion to these augmentations degrades performance. How-

ever, they used rotation PDA without considering the dif-

ference in the semantic content between RAI and non-RAI.

Some work dealt with rotation for contrastive learning by

residual relaxation [35] or combination with rotation pre-

diction [1, 9]. Our work focuses on the semantics of each

rotated image.

2.2.2 Negative Data Augmentation (NDA)

Several methods have been proposed to create negative sam-

ples by applying speciﬁc transformations to images [4, 32,

31]. Sinha et al. [31] investigated whether several augmen-

tations, including Cutmix [38] and Mixup [39], which are

typically used as positive in supervised learning, can be

used as NDA for representation learning. However, they

did not argue that rotation NDA is effective. Tack et al. [32]

stated rotation NDA is effective for unsupervised out-of-

distribution detection, but they also did not state that ro-

tation NDA is effective for representation learning. These

methods [4, 32, 31] treat the transformed images as nega-

tives without considering the semantics of each image.

2.3. Rotation Invariance

Rotation invariance is one of many good and well-

studied properties of visual representation, and many ex-

isting methods incorporate rotational invariant features into

feature learning frameworks. For supervised learning, G-

CNNs [8] and Warped Convolutions [21] showed excellent

results in learning rotational invariant features. For self-

supervised learning, Feng et al. [13] worked on rotation

feature learning, which learns a representation that decou-

ples rotation related and unrelated parts. However, previous

works separated the rotation related and unrelated parts im-

plicitly as internal information of the network and did not

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RethinkingRotationinSelf-SupervisedContrastiveLearning:AdaptivePositiveorNegativeDataAugmentationAtsuyukiMiyai1QingYu1DaikiIkami2GoIrie2KiyoharuAizawa11TheUniversityofTokyo2NTTCorporation,Japan{miyai,yu,aizawa}@hal.t.u-tokyo.ac.jpdaiki-ikami@go.tuat.ac.jpgoirie@ieee.orgAbstractRotationisfrequentlyli...

展开>> 收起<<

Rethinking Rotation in Self-Supervised Contrastive Learning Adaptive Positive or Negative Data Augmentation Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Rethinking Rotation in Self-Supervised Contrastive Learning Adaptive Positive or Negative Data Augmentation Atsuyuki Miyai1Qing Yu1Daiki Ikami2Go Irie2Kiyoharu Aizawa1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: