encouraging them to be pulled closer, the image will obtain
embedding features robust to rotation. On the other hand,
in non-RAI, the orientation of an object is limited. By ap-
plying rotation PDA to non-RAI and encouraging them to
be pulled closer, the images will lose orientation informa-
tion and might get undesirable features. For non-RAI, it is
preferable to treat rotation as negative to maintain orienta-
tion information.
Based on this observation, in this study, we introduce
a novel augmentation strategy called adaptive Positive or
Negative Data Augmentation (PNDA). In Fig.1, we show
an overview of PDA, NDA, and PNDA. While PDA and
NDA do not consider the semantics of each image, our pro-
posed PNDA considers the semantics of each image, and
treats rotation as positive if the original and rotated images
have the same semantics and negative if their semantics are
different. To achieve PNDA, we extract RAI for which ro-
tation is treated as positive. However, there is no method to
determine whether an image is RAI or non-RAI. Thus, we
also tackle a novel task for sampling RAI and propose an
entropy-based method. This sampling method focuses on
the difference in the difficulty of the rotation prediction be-
tween RAI and non-RAI and can extract RAI based on the
entropy of the rotation predictor’s output.
We evaluate rotation PNDA with contrastive learning
frameworks such as MoCo v2 and SimCLR. As a result of
several experiments, we showed that the proposed rotation
PNDA improves the performance of contrastive learning,
while rotation PDA and NDA might decrease it.
The contributions of our paper are summarized as fol-
lows:
• We propose a novel augmentation strategy called
PNDA that considers the semantics of the images and
treats rotation as the better one of either positive or
negative for each image.
• We propose a new task of sampling rotation-agnostic
images for which rotation is treated as positive.
• We apply rotation PNDA with contrastive learning
frameworks, and found that rotation PNDA improves
the performance of contrastive learning.
2. Related work
2.1. Contrastive Learning
Contrastive learning has become one of the most suc-
cessful methods in self-supervised learning [19, 5, 17, 7, 3].
One popular approach for contrastive learning, such as
MoCo [19] and SimCLR [5], is to create two views of the
same image and attract them while repulsing different im-
ages. Many studies have explored the positives or negatives
of MoCo and SimCLR [11, 40, 22]. Some methods, such as
BYOL [17] or SimSiam [7], use only positives, but recent
studies [14, 34] have shown that better representation can
be learned by incorporating negatives into these methods.
For contrast learning, the use of positives and negatives is
important to learn better representations.
2.2. Data Augmentation for Contrastive Learning
There are two main types of augmentation strategies for
contrastive learning: positive data augmentation (PDA) and
negative data augmentation (NDA).
2.2.1 Positive Data Augmentation (PDA)
Contrastive learning methods create positives with aug-
mentations and get them closer. For example, Chen et
al. [5] proposed composition of data augmentations e.g.
Grayscale, Random Resized Cropping, Color Jittering, and
Gaussian Blur to make the model robust to these augmen-
tations. On the other hand, they reported that adding rota-
tion to these augmentations degrades performance. How-
ever, they used rotation PDA without considering the dif-
ference in the semantic content between RAI and non-RAI.
Some work dealt with rotation for contrastive learning by
residual relaxation [35] or combination with rotation pre-
diction [1, 9]. Our work focuses on the semantics of each
rotated image.
2.2.2 Negative Data Augmentation (NDA)
Several methods have been proposed to create negative sam-
ples by applying specific transformations to images [4, 32,
31]. Sinha et al. [31] investigated whether several augmen-
tations, including Cutmix [38] and Mixup [39], which are
typically used as positive in supervised learning, can be
used as NDA for representation learning. However, they
did not argue that rotation NDA is effective. Tack et al. [32]
stated rotation NDA is effective for unsupervised out-of-
distribution detection, but they also did not state that ro-
tation NDA is effective for representation learning. These
methods [4, 32, 31] treat the transformed images as nega-
tives without considering the semantics of each image.
2.3. Rotation Invariance
Rotation invariance is one of many good and well-
studied properties of visual representation, and many ex-
isting methods incorporate rotational invariant features into
feature learning frameworks. For supervised learning, G-
CNNs [8] and Warped Convolutions [21] showed excellent
results in learning rotational invariant features. For self-
supervised learning, Feng et al. [13] worked on rotation
feature learning, which learns a representation that decou-
ples rotation related and unrelated parts. However, previous
works separated the rotation related and unrelated parts im-
plicitly as internal information of the network and did not